However, there is no synchronization happening. If you insert from

multiple threads, you need to lock the access to the matrix, which most

certainly is less efficient than inserting from a single thread.

You have the same problem when inserting values into a std::vector from

multiple threads.

The point here is that I know that the code is thread safe, that is no more than one thread will try write in position row i col j.

Accondingly performing the assembly in parallel should not required to lock the insertion operation, am I wrong?

Inserting into a single std::vector has similar synchronization

problems, so maybe collecting a vector for each thread is an option

(which then need to be concatenated, ideally without actually copying

the data).

Yes, exactly. If I do that the code works just fine.

