[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
> if line length is not on 4 or 8 byte boundarys would MMX be slower due
> to
> alignment problems ?
I *think* there's code in Allegro that tries to align bitmaps lines on a 4
byte boundary, but I can be mistaken here. I'd have to look at the source
to make sure.
i will investigate this too.
> if memory bitmaps are to be made solid (zero extra bytes to make pitch 4
> or
> 8 byte boundarys) than SSE1,2,3 code will be far more difficult.
> As memory usage is hardly an issue anymore.. (even gfx cards have more
> memory on them than the average game uses); is there any reason to be
> concerned with using a few extra bytes per line to achieve 4 or 8 byte
> boundarys.
Not as far as I'm concerned. Presently, Allegro doesn't use SSE3, what
advantages could be implemented for this now that we have code to detect
it?
SSE3 has integer SIMD instructions that can process 128bit at a time.
for masking 8bit images, we could do 16bytes at a time.
requirements are that data must be on 128 bit boundarys.
however using SSE1,2 requires 64bit boundarys (which is something we
should be doing).
even if we do not implement code inside the allegro lib; the bitmap
structure should be aligned so that muppets like me can use SSE2,3 on
the bitmap->line[]
(as i have been doing for over a year now).
> SSE can offer some really cool instructions that would allow lots of
> pixels
> to be processed simultaneously... lets not design the bitmap structure
> so
> we can not take advantage of these.
Agreed. Probably something we can do for 4.3?
im not too concerned when code gets implemented, but i am EXTREMELY
concerned that any changes we make to the bitmap structure should be
planning for the future. and the future will be SSE1,2,3.. especially for
bitmap operations.
please dont design ourselves into a hole simply for 2% gain, based on a
one-off optimization that occurs in limited situations.
> if the tests below are somehow only better on P4 but worse on
> Athlon/P3/non-P4 then it should not be implemented, as your average
> user
> still would not be using a P4.
True, but we could detect at runtime what we're using and use P4 optimized
if we're running on a P4.
yes, allegro code can do this, but user code still needs to access the
bitmap->line[] therefore alignment should happen during create_bitmap()