However, there is no synchronization happening. If you insert from
multiple threads, you need to lock the access to the matrix, which most
certainly is less efficient than inserting from a single thread.
You have the same problem when inserting values into a std::vector from
multiple threads.
The point here is that I know that the code is thread safe, that is no more than one thread will try write in position row i col j.
Accondingly performing the assembly in parallel should not required to lock the insertion operation, am I wrong?
Inserting into a single std::vector has similar synchronization
problems, so maybe collecting a vector for each thread is an option
(which then need to be concatenated, ideally without actually copying
the data).
Yes, exactly. If I do that the code works just fine.