Re: [AD] exunicod and endianess

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


> Attached is a patch to src/unicode.c that somewhat fixes the exunicod
> display problems.

Sort of... you have effectively chosen not to support UTF-16BE on big-endian 
platforms but rather UTF-16LE. Note that this might be a sensible decision 
according to http://zsigri.tripod.com/fontboard/cjk/unicode.html but we need 
to clearly acknowledge it.

Moreover, your code is pure little-endian so you don't need to guard it with 
#ifdef ALLEGRO_BIG_ENDIAN/#endif.

> Most of the characters now display correctly, and are not "^" anymore;
> anyway the french text displays as "Bienvenue ^ Allegro", while the 5th
> message (el == greek?) and the 7th (he == ??) display as all "^" except

Greek and Hebrew.

> the "Allegro" text. The last two message strings (japanese and another
> language) show correctly except for one "^" in each of them.
>
> I'm puzzled on which part of Allegro the bug is located...

Well, it's not too far, just in your code:

@@ -251,7 +251,13 @@
  */
 static int unicode_getc(AL_CONST char *s)
 {
+#ifdef ALLEGRO_LITTLE_ENDIAN
    return *((unsigned short *)s);
+#elif defined ALLEGRO_BIG_ENDIAN
+   return (*s | (*(s + 1) << 8));
+#else
+#error Unknown endianess
+#endif
 }
 
You got caught by the integer promotion rule:

      [#1] When a value with integer type is converted to  another
       integer  type,  if  the  value can be represented by the new
       type, it is unchanged.

This means that (char)(-1) is promoted to (int)(-1), not (int)(255), assuming 
char is a signed type. In other words, signed types are sign-extended by the 
promotion so bit7 of the char is replicated in bit8-bit31 of the int, which 
breaks your code.


The attached patch works for me on x86 and should probably work for you too. 
If so, I would be puzzled as to why fixing only exunicod doesn't work... How 
did you transform

char message_it[] = "B\x00" "e\x00" "n\x00" "v\x00" "e\x00" "n\x00" "u\x00" 
"t\x00" "i\x00" " \x00" "a\x00" "d\x00" " \x00\x00\x00";

for example?

-- 
Eric Botcazou
--- /home/eric/cvs/allegro/src/unicode.c	Thu May 15 23:08:57 2003
+++ allegro/src/unicode.c	Sun May 25 23:15:41 2003
@@ -247,33 +247,35 @@
 
 
 /* unicode_getc:
- *  Reads a character from a Unicode string.
+ *  Reads a character from a UTF-16LE string.
  */
 static int unicode_getc(AL_CONST char *s)
 {
-   return *((unsigned short *)s);
+   unsigned char *u = (unsigned char *)s;  /* avoid sign-extending chars */
+   return (u[0] | (u[1] << 8));
 }
 
 
 
 /* unicode_getx:
- *  Reads a character from a Unicode string, advancing the pointer position.
+ *  Reads a character from a UTF-16LE string, advancing the pointer position.
  */
 static int unicode_getx(char **s)
 {
-   int c = *((unsigned short *)(*s));
+   unsigned char *u = (unsigned char *)(*s);  /* avoid sign-extending chars */
    (*s) += sizeof(unsigned short);
-   return c;
+   return (u[0] | (u[1] << 8));
 }
 
 
 
 /* unicode_setc:
- *  Sets a character in a Unicode string.
+ *  Sets a character in a UTF16-LE string.
  */
 static int unicode_setc(char *s, int c)
 {
-   *((unsigned short *)s) = c;
+   s[0] = c & 0xff;
+   s[1] = (c >> 8) & 0xff;
    return sizeof(unsigned short);
 }
 


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/