Re: [EGD-discu] counting diacritics in Bépo methodology |
[ Thread Index | Date Index | More ergodis.org/discussions Archives ]
On Fri, 24 Apr 2015 12:09:43 -0700 Hugh Paterson <hugh_paterson@xxxxxxx> wrote: > I apologize I don't speak/write French. I apologize too, my English may not be as good as yours. > I have been investigating ways to compute optimal keyboard > arrangements for orthographies in Africa which use tone marks. In > these orthographies the frequencies of diacritics are much higher > than in French or Italian. Bépo seems to be the only keyboard layout > optimization project I have found with diacritics. I have been > reading the Bépo wiki via google translate and have a question about > how the diacritics were counted and added to the frequency count. > That is, particularly, if one were to count bigrams like < ué > would > that count the same as < ue >? > > I see some frequency counts here: > http://bepo.fr/wiki/Fr%C3%A9quence_des_lettres > > But it appears that characters with diacritics are counted > independent of each other and of their base character. Is this true? In french some diacritics are quite frequent while some other are less common. For those which appears frequently, like é à è, they are available directly like any other letter, so yes, they are indeed counted independent of each other and of their base character. > For instance: < á é í > would all be different from < a e i >. If we > had the following text: < aaááàà eeéé îîii > do you count that as 6x > a, 4x e, 4x ´,2x `, 2x ˆ, and 4x i? or do you count that as 2x a, 2x > e, 2x á, 2x é, 2x à, 2x î, and 2x i? On the other hand, the less frequent ones are counted as diacritic + base character. So < aaááàà eeéé îîii > are counted as : 2x a, 2x à, 2x e, 2x é, 2x i plus 2x (´+a), 2x (^+i) That is : 4x a, 2x à, 2x e, 2x é, 4x i, 2x ´, 2x ^ > The critical difference is that the sum of the diacritics across the > vowels might be higher than any single diacritic-vowel pair. This > means that a diacritic key could get a higher ranking than a low > frequency consonant. That’s true, and that happens in the bépo with the ^-key Fact is, there are only 48 keys available on a standard PC102 keyboard. So we took the 48 most frequent letters and made them directly available. That includes é è à ù. We already knew at the time that a ^-key would be made to make âêîôû available on our layout, so we computed the frequency for that diacritic key by summing the frequencies of each of those 5 letters. It turned out that the frequency for that ^-key ranked 27th, before x y z w, ç and k. > > thank you in advance, Regards. -- Nicolas
Attachment:
pgpjiOYgZImp6.pgp
Description: OpenPGP digital signature
Mail converted by MHonArc 2.6.19+ | http://listengine.tuxfamily.org/ |