[EGD-discu] counting diacritics in Bépo methodology

[ Thread Index | Date Index | More ergodis.org/discussions Archives ]


I apologize I don't speak/write French.

I have been investigating ways to compute optimal keyboard arrangements for orthographies in Africa which use tone marks. In these orthographies the frequencies of diacritics are much higher than in French or Italian. Bépo seems to be the only keyboard layout optimization project I have found with diacritics. I have been reading the Bépo wiki via google translate and have a question about how the diacritics were counted and added to the frequency count. That is, particularly, if one were to count bigrams like < ué > would that count the same as < ue >? 

I see some frequency counts here: http://bepo.fr/wiki/Fr%C3%A9quence_des_lettres

But it appears that characters with diacritics are counted independent of each other and of their base character. Is this true?


For instance: < á é í > would all be different from < a e i >.  If we had the following text: < aaááàà  eeéé îîii > do you count that as 6x a, 4x e, 4x ´,2x `, 2x ˆ, and 4x i? or do you count that as 2x a, 2x e, 2x á, 2x é, 2x à, 2x î, and 2x i?

The critical difference is that the sum of the diacritics across the vowels might be higher than any single diacritic-vowel pair. This means that a diacritic key could get a higher ranking than a low frequency consonant.

thank you in advance, 

- Hugh Paterson III


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/