both code are not equivalent because one is applying the transformation in-place, whereas the other one need to allocate a temporary. Nonetheless, the root of the "problem" does not lie here and it is a bit more complicated. Let's consider a simpler _expression_:
pts = A * pts;
in this _expression_, the product cannot be carried out in-place because of aliasing issue and a temporary has to be created. Of course, as you realized, if we evaluate the result one column at once, then instead of allocating a whole 3xN temporary, it is enough to allocate a 3x1 temporary vector, and since the size is known at compile time and that it is very small, it can be "allocated" on the stack and even optimized away by the compiler. Unfortunately there is not way for Eigen to figure this out, especially at compile-time. When there is no aliasing at all, the user can tell it:
pts_bis.noalias() = A * pts;
In your case, we would need some kind of colwise/rowwise noalias to tell Eigen that there is no aliasing across columns or rows. To be honest I've never though about that possibility, but given the significant performance hit, and that such kind of aliasing is probably the most frequent when talking about matrix product, this might be worth the effort. I'm still not sure what would be good names for the API though.