Re: [AD] exunicod and endianess |
[ Thread Index |
Date Index
| More lists.liballeg.org/allegro-developers Archives
]
> Attached is a patch to src/unicode.c that somewhat fixes the exunicod
> display problems.
Sort of... you have effectively chosen not to support UTF-16BE on big-endian
platforms but rather UTF-16LE. Note that this might be a sensible decision
according to http://zsigri.tripod.com/fontboard/cjk/unicode.html but we need
to clearly acknowledge it.
Moreover, your code is pure little-endian so you don't need to guard it with
#ifdef ALLEGRO_BIG_ENDIAN/#endif.
> Most of the characters now display correctly, and are not "^" anymore;
> anyway the french text displays as "Bienvenue ^ Allegro", while the 5th
> message (el == greek?) and the 7th (he == ??) display as all "^" except
Greek and Hebrew.
> the "Allegro" text. The last two message strings (japanese and another
> language) show correctly except for one "^" in each of them.
>
> I'm puzzled on which part of Allegro the bug is located...
Well, it's not too far, just in your code:
@@ -251,7 +251,13 @@
*/
static int unicode_getc(AL_CONST char *s)
{
+#ifdef ALLEGRO_LITTLE_ENDIAN
return *((unsigned short *)s);
+#elif defined ALLEGRO_BIG_ENDIAN
+ return (*s | (*(s + 1) << 8));
+#else
+#error Unknown endianess
+#endif
}
You got caught by the integer promotion rule:
[#1] When a value with integer type is converted to another
integer type, if the value can be represented by the new
type, it is unchanged.
This means that (char)(-1) is promoted to (int)(-1), not (int)(255), assuming
char is a signed type. In other words, signed types are sign-extended by the
promotion so bit7 of the char is replicated in bit8-bit31 of the int, which
breaks your code.
The attached patch works for me on x86 and should probably work for you too.
If so, I would be puzzled as to why fixing only exunicod doesn't work... How
did you transform
char message_it[] = "B\x00" "e\x00" "n\x00" "v\x00" "e\x00" "n\x00" "u\x00"
"t\x00" "i\x00" " \x00" "a\x00" "d\x00" " \x00\x00\x00";
for example?
--
Eric Botcazou
--- /home/eric/cvs/allegro/src/unicode.c Thu May 15 23:08:57 2003
+++ allegro/src/unicode.c Sun May 25 23:15:41 2003
@@ -247,33 +247,35 @@
/* unicode_getc:
- * Reads a character from a Unicode string.
+ * Reads a character from a UTF-16LE string.
*/
static int unicode_getc(AL_CONST char *s)
{
- return *((unsigned short *)s);
+ unsigned char *u = (unsigned char *)s; /* avoid sign-extending chars */
+ return (u[0] | (u[1] << 8));
}
/* unicode_getx:
- * Reads a character from a Unicode string, advancing the pointer position.
+ * Reads a character from a UTF-16LE string, advancing the pointer position.
*/
static int unicode_getx(char **s)
{
- int c = *((unsigned short *)(*s));
+ unsigned char *u = (unsigned char *)(*s); /* avoid sign-extending chars */
(*s) += sizeof(unsigned short);
- return c;
+ return (u[0] | (u[1] << 8));
}
/* unicode_setc:
- * Sets a character in a Unicode string.
+ * Sets a character in a UTF16-LE string.
*/
static int unicode_setc(char *s, int c)
{
- *((unsigned short *)s) = c;
+ s[0] = c & 0xff;
+ s[1] = (c >> 8) & 0xff;
return sizeof(unsigned short);
}