[Home]

Summary:ASTERISK-01414: [design] General i18n patch for say.c
Reporter:Olle Johansson (oej)Labels:
Date Opened:2004-04-15 14:03:02Date Closed:2004-09-25 02:49:38
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/Internationalization
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) i18n.testsuite.conf
( 1) itsounds.tar.gz
( 2) READMEs.tar.gz
( 3) say_da.diffdiff
( 4) say_da.txt
( 5) say_intl.txt
( 6) say.es.txt
( 7) say.patch
Description:There's a lot of patches for saying numbers around here. We need to unify them and fix a general solution. This bug is for the unification of several other bug reports.

The way we're handling this is
* First a quick fix to say.c to say numbers with various
 language syntaxes supported
* Add sample sound files and documentation
* Start working on a general language-support architecture
 with loadable modules and support for more
 functions, like date, time and various strings

------------------- * -----------------------
If you want to follow the progress, press "Monitor bug" below.
------------------- * -----------------------

****** ADDITIONAL INFORMATION ******

Bug 0001097 Other languages
Bug 0001372 French - disclaimed
Bug 0001349 Danish - disclaimed
Bug 0000300 Portuguese - disclaimed
Bug 0000237 Spanish - disclaimed
Italian - disclaimed

Swedish and Norwegian works as English. Any other languages that does not work as english?

------
Spanish requires these additional sound files:
* 21.gsm thru 29.gsm, cien.gsm, mil.gsm, millon.gsm
* millones.gsm, 100.gsm, 200.gsm, 300.gsm, 400.gsm,
* 500.gsm, 600.gsm, 700.gsm, 800.gsm, 900.gsm, y.gsm
------
Portuguese:
* All files that ends with "F" are feminine
------
Danish:
In addition to English, the following sounds are required: "millions", "and" and "1-and" through "9-and"



Comments:By: Olle Johansson (oej) 2004-04-15 14:17:54

There are several patches for say.c out there. We need to make one unified patch that solves immediate problems and sets the stage for future additions to the logic.

I think there are two things to be fixed (at least)
* Fix so languages that follows the english logic works without patching say.c
* Fix a general structure for other constructs, trying to find out which languages work in the same way for saying numbers as words.

Let say we have a syntax table
1. English-style
2. Danish-style
3. French-style
4. Spanish-style
etc

And then connect languages to it
1. EN, SE, NO
2. DK
3. FR
4. ES

And then create functions in say.c for every style. Maybe a variable in indications.conf for choosing style for a language setting.
...or is hard coding better?

Any other thoughts?

By: Matteo Brancaleoni (mbrancaleoni) 2004-04-15 14:39:47

I have say.c working for Italian (and english of course). sorting out
a patch for general use (ie a function) is very easy.
/me thinks that's better hardcoding, since is simpler than defining a style...
every language has is own way to say numbers.
english is very simple.
1-20, then compoud (more or less) , eg 48 is forty + eight
also there aren't plural/singular problems

italian is a bit worse...
1-20 is like english, the other composite with the exception of numbers
ending with '1' (21,31,41...) and '8' (28,38,...)
and we have also plurals
100 in english is one-hundred 200 two-hundred
but in italian
100 is only hundred, 200 is two-hundred
and also we have the difference with thousands and so on...

perhaps we could split each language into <LANG>.c file
that get included into the make process... so new additions can be
made without affecting say.c

and voicemail must be corrected the same way
(i have already patched for italian...)

By: fossil (fossil) 2004-04-15 15:04:18

Some interface should probably be designed to allow for .so language modules. This will make further development easier, and we won't have to patch say.c all the time. A module loads, registers a language handler (like everything else does) and gets called later to pronounce numbers, dates, etc. You will find that other than with straightforward simple digits, languages vary greatly in pronounciations of numbers and dates grammatically.
Also, for languages like Russian (and German?) a 'mood' argument need to be added to ast_say_XXX funcs (except ast_say_digits). And for many, including Russian, a 'gender' argument is necessary for ast_say_number(), which can be simply ignored by english parser.
There is probably no one *perfect* solution, but we can accomodate a lot of languages with external language modules.

Any other necessary arguments for other languages?

By: ktsaou (ktsaou) 2004-04-15 18:59:48

I have made a suggestion in bug 1347 that I guess will cover all languages:

http://bugs.digium.com/bug_view_page.php?bug_id=0001347

By: Olle Johansson (oej) 2004-04-16 08:33:09

We need to separate *LANGUAGE* from *COUNTRY*

- Zonedata and indications are *COUNTRY*
- say.c patches are *LANGUAGE*

Maybe we should use a three letter language code or the one used in other protocols, like SE_SV for language.

By: fossil (fossil) 2004-04-16 14:08:45

I do not think using some special language codes is necessary, 2-letter ISO is fine. Everyone should simply separate language from country in their minds.

Coming back to different language requirements: I propose that in addition to the ones we already have we create a new 'say' function to pronounce the standard phrases used in *, like the ones used in voicemail. This function will be implemented by all corresponding language .so modules. It may be a bit tedious to implement for many languages, but it will pretty much guarantee compatibility with all languages:
  say_phrase(int phrase_id, int lang, ...);   (with var-args like printf())
It will accept the phrase id from a pre-defined enumeration and whatever insertion arguments are necessary for a phrase. The function can playback the phrase however it wants to with whatever grammatical structure. I do not see any other good way of handling nearly all languages, unfortunately.

By: Olle Johansson (oej) 2004-04-16 15:18:12

I think we should take this in steps. First fix and implement saying numbers for all languages. When that's accomplished, we have a general structure to build upon.

After that, enumeration and phrases. That involves a lot more applications and coders. We need to prove this can be done before we take that struggle.

I like having a modular structure, possible with loadable modules.

I'm also found of using standards when they exist. Most open source apps use language code like "en_us" and "se_sv", let's follow that. I'll check where they come from.

By: Olle Johansson (oej) 2004-04-17 03:17:45

ktsau's input from ASTERISK-1422:
---------------------------------------------------------------
Different languages may have different numeric sounds for different contexts. For example, the Greek language speaks numbers differently if they are applied to male, female or other subjects. Here is how:

ONE man = ENAS andras
ONE woman = MIA gineka
ONE phone = ENA tilefono

Similarly, the Nth numbers are spoken based on the subject:

FIRST man = PROTOS andras
FIRST woman = PROTI gineka
FIRST phone = PROTO tilefono

Dates are even more complex:

March 7th = Martiou EBDOMH
7th of March = EFTA Martiou

(EBDOMH = 7th female, EFTA = 7 neither male nor female)

But also the month names have variations:

March = MARTIOS
but
1st of march = proti MARTIOU

Today, the existing mechanism cannot support the Greek language. The main problem is the digits directory which is used for many purposes and cannot support different versions of the same numbers depending on the context we want to use them.

Ideally to handle such internationalization needs and also provide the maximum flexibility to support other languages too, I suggest this:

1. Language Definition
There must a global definition of languages, possible in indications.conf

This definition should include something like this:

[gr]
date=format,directory
time=format,directory
currency=format,directory
...

Each of these lines should first define the format to be used, similarly to what SayUnixTime accepts today, but then it should accept a number of pre-defined directories (or just one) where all the sound files can be found.

2. All applications willing to read dates, time, currency, etc. should use the indications defined above.

3. All applications (such as voicemail) that use sound files for digits, should have configuration options, per language and context to define the directories to use.

Example for voicemail.conf:

sounddir_message=language,directory
sounddir_messages=language,directory

This will allow it to say (each word is a separate sound file):

one old message = ENA PALIO MINIMA
two old messages = DYO *PALIA* MINIMATA
1st message = PROTO MINIMA

The problems we solve here are:
- When building sentenses, the application will use the directory defined based on the context (messages in the example) and will try to get all its sound files from there. This is how we are going to allow PALIO/PALIA based on the context (one message dir, or many messages dir)

- Since the Nth sounds also depend on the context, the same mechanism should allow the application to use different sound files based on the subject:

FIRST message = PROTO minima -> message dir
FIRST of march = PROTI martiou -> date dir

The defaults for the English language may not require additional voice files from Alice.

By: Olle Johansson (oej) 2004-04-18 05:45:25

I decided to make this a two-step approach. Working on a unified patch for the languages we have patches for, started with french and danish. Please test and confirm if this patch works for those languages.

I'll add spanish and portuguese later for testing. If we're lucky, this short-term solution will make it into 1.0 and then we'll start with a .so architecture that will be a better long-term solution.

By: c960657 (c960657) 2004-04-18 09:17:30

Of course a .so file or similar with C code for each language is the most flexible solution.

However, it would be nice if translators didn't need to write C to implement a language.

Also, it would be nice if all applications could use the same i18n API, e.g. the voicemail application mentioned above.

I have made an example of a config file written in a language based on regular expressions (this is just a suggestion for a file format - I have not implemented it in code).

Each line in the config file consists of a regular expression and a list of commands this maps to. The commands can be either Play("filename") for playing a specific file, or Say("something") for calling say_phrase recursively.

Example:
([0-9])([0-9]{2}) => Say($1) Play("hundred") Say($2)
The line reads the numbers 100-999 in English. The first pair of parentheses match the first digit, and the last pair match the last two digits. The value of the matches is stored in the variables $1 and $2. If say_phrase is called with the string "732", $1 is 7 and $2 is 32. This results in Say(7) Play("hundred") Say(32). In order to say 7 and 32 we need some more rules:

2 => Play("two")
7 => Play("seven")
30 => Play("thirty")
([2-9])([1-9]) => Play($1 * 10) Play($2)

Say(7) just translates to Play("seven") i.e. play the sound file seven.gsm. Say(32) matches the last line, so that $1 is 3 and $2 is 2. * is arithmetic multiplication, so this translates to Play(30) Play(2).

The complete rules for saying the numbers 0-999,999,999 in English are these:

0 => Play("zero")
1 => Play("one")
2 => Play("two")
...
19 => Play("nineteen")
20 => Play("twenty")
30 => Play("thirty")
90 => Play("ninety")
([2-9])([1-9]) => Play($1 * 10) Play($2)
([0-9])([0-9]{2}) => Say($1) Play("hundred") Say($2)
([0-9]{1,3})([0-9]{3}) => Say($1) Play("thousand") Say($2)
([0-9]{1,3})([0-9]{6}) => Say($1) Play("million") Say($2)

In Greek the numbers depend on the gender (according to the comment above - I don't know any Greek). We specify the gender in the string like this (the ",gender=xxx" notation can be anything - the strings on the left are just interpreted as regular expressions):
1,gender=male => Play("enas")
1,gender=female => Play("mia")
1,gender=neutrum => Play("ena")

If a gender is not specified, we make a rule that chooses a "default" gender:
1 => Say(1,gender=neutrum)


Now, if the voicemail application wants to say e.g. "you have 2 old messages" in the local language, it calls say_phrase with the string "you have 2 old message". Note that
a) the string is in English, because that is the language used in Asterisk code, and
b) the word "message" is in singular form, even though it should be "messages". All words used in strings passed to say_phrase should be in their "base" form, i.e. singular. The translation into either "message" or "messages" is done by the language dependant configuration file.

For English, the following rules are necessary:
you have ([0-9]+) (new|old) message => Play("youhave") Say($1) Play($2) Say("message,count=" + $1)
message,count=1 => Play("message")
message,count=[0-9]+ => Play("messages")

The "count" part differs between singular and plural forms. Again, ",count=xx" is just a random string that is defined by convention in the configuration file for a specific language.

If a string matches several rules, only the first is used. Otherwise we'd have to change the last line so that it would not match "message,count=1".


In Greek, according to the comment above, not only the word for "message" but also the word for "old" depends on the number of messages. So the rules get slightly more complicated:

you have ([0-9]+) (new|old) message => Play("youhave") Say($1 + ",gender=neutrum") Play($2 + ",count=" + $1 + ",gender=neutrum") Say("message,count=" + $1)
message,count=1 => Play("minima")
message,count=[0-9]+ => Play("minimata")
old,count=1,gender=neutrum => Play("palio")
old,count=[0-9]+,gender=neutrum => Play("palia")


Does this sound like a feasible solution? Are there situations/languages that it cannot handle? I assume it can handle most latin languages, but how about e.g. Eastern European and Asian languages?

By: flavour (flavour) 2004-04-18 09:41:00

This looks like a good start Olle - I am keen that we have this basic hackery into 1.0 if at all possible, whilst we await the more elegant solution.
I'm not completely sure on how to test this patch - I have applied it to 0.9.0 & no ill effects so far, at least.
SayNumber doesn't seem affected by a SetLanguage command, though & SayDigits doesn't require this level of sophistication.
Looking at the code the French patch doesn't seem correct.

First function has English comments remaining:
+/* ast_say_number_full_fr: French syntax */
+       /* Use english numbers if a given language is supported. */
+       /* As a special case, Norwegian has the same numerical grammar
+          as English */

Second function appears to force use of English!:
+int ast_say_number_fr(struct ast_channel *chan, int num, char *ints, char *language)
+               /* Use english numbers */
+               language = "en";

By: philipp2 (philipp2) 2004-04-19 13:04:18

Quick note: 2-letter ISO will not be sufficient. It is necessary to distinguish certain dialects like

- German: Germany (DE_DE)
- German: Swiss German (CH_DE)

By the way, having looked through the DK patch I *think* that it indeed applies also to German. Also it appears that www.tric.nl (see Wiki) has a patch + sound files for NL, haven't yet seen that listed here in the bugtracker.

By: Olle Johansson (oej) 2004-04-19 15:09:19

flavour: THank you for the feedback. Hopefully fixed the errors you pointed out without breaking.
We need to
* Test more
* Add spanish and portuguese
* Confirm philipp2's suspicion on german based on danish
* Test more
* Test more

I don't know about changing to de_de for the first quick fix. Setlanguage needs to be changed and that may break backwards compatibility. Could someone check into setlanguage and language support in other parts of Asterisk?

Please don't forget to test more. /o

By: Olle Johansson (oej) 2004-04-19 15:23:45

Looking at the portuguese patch in ASTERISK-297 it's a lot of changes. One that seems important is the requirement to signal whether or not a number is feminine. Does this apply to other languages?

The code uses a negative number for feminine and treats all positive numbers as masculine. Is this a good way or do we in the future need to add support for saying negative numbers? Should we add options in other ways?

Please advice.

By: flavour (flavour) 2004-04-19 16:11:55

We will definitely need to support negative numbers - e.g. for all the weather freaks out there ;)
This version patches & compiles cleanly on 0.9.0 again :)
NB How about doing the merge with the std & 'full' versions of the functions at this stage? Would make this maintenance much easier, no?
- I'm not sure how much work this would entail elsewhere, though :/
French patch now correctly affects SayNumber :)
1 tweak that I've found so far is that when Saying 1000 it gives 'un' 'mille', wherease it should be just 'mille' for the first thousand (2000 should be 'deux' mille') This is the same as the Italian that Matteo mentioned before

By: Olle Johansson (oej) 2004-04-19 16:16:44

Ok, so the negative portuguese solution is not going in.

The "un" "mille" seems like a simple fix.

By: Paul Cadach (pcadach) 2004-04-19 16:33:17

Somewhere at Mantis I had a talk about using phone numbers in international notation with leading "+" sign specifying international phone numbers (do distinguish between national and international numbers), but this "plus" sign could play bad game with SayNumber()... Could it be checked too? I.e. to allow to say "plus" word before saying real number's value...

By: Olle Johansson (oej) 2004-04-19 16:38:04

Italian coming right up, in just a few days. Any more language syntaxes on the way?

By: Olle Johansson (oej) 2004-04-19 16:57:53

Good point, plus sign sounds good to me. Do we have "plus" in english as a sound file already, maybe?

By: flavour (flavour) 2004-04-19 17:00:28

English sound file is there already, so this can be handled the same way as decimal point: Playback("plus"), SayNumber("44xxxxx")

By: Olle Johansson (oej) 2004-04-19 17:02:25

We discussed arguments on the #dev channel and concluded that we need an optional
gender argument to saynumber
'f' - feminin
'm' - masculin
'n' - neutrale
'r' - reale
'v' - Commodore vic 20 mainframe

Some languages will use this, but not all.

This needs to be fixed in pbx.c - any takers that want to produce a patch?

By: Olle Johansson (oej) 2004-04-19 17:04:47

I thought about adding the "plus" mainly for saydigits - makes sense there, not really for saynumber. It's late over here, I didn't notice Pcadash wanting this in saynumber, sorry.

By: cyb (cyb) 2004-04-19 23:10:25

None of the French logic seems to be used with the current patch [say_c.txt (20,199 bytes) 04-19-04 15:06] - that's why it says "un mille" rather than just "mille", since it is using the English logic "one thousand".

The problem seems to be with the strcasecmp(): "The function strcasecmp() returns a positive integer if, disregarding case, string s1 is lexically greater than string s2; zero if, other than case the two strings are identical; and a negative integer if, disregarding case, string s1 is lexically less than string s2."

So we need to test for strcasecmp being null instead of non-null. Something like this should do the trick (needs to be changed in both ast_say_number and ast_say_number_full):

int ast_say_number_full(struct ast_channel *chan, int num, char *ints, char *language, int audiofd, int ctrlfd)
{
       if (!strcasecmp(language, "no") || !strcasecmp(language,"se") || !strcasecmp(language,"en") ) {
          return(ast_say_number_full_en(chan, num, ints, language, audiofd, ctrlfd));
       }
       /* French */
       if (!strcasecmp(language, "fr")) {
          return(ast_say_number_full_fr(chan, num, ints, language, audiofd, ctrlfd));
       }
       if (!strcasecmp(language, "da")) {
          return(ast_say_number_full_da(chan, num, ints, language, audiofd, ctrlfd));
       }

       /* Default to english */
       return(ast_say_number_full_en(chan, num, ints, language, audiofd, ctrlfd));
}

By: Olle Johansson (oej) 2004-04-20 02:12:12

New patch. Ignore the portuguese in this patch, it's not done yet.

By: Matteo Brancaleoni (mbrancaleoni) 2004-04-20 17:51:13

Ok, as promised here's the oej patch updated with italian changes.
Also I have included italian digits sounds, along with the makefile
to ease installations (just do make install from the untarred dir).

I'm using this patch in production, so I believe it ok :)

I have already disclaimed last year to Digium.

Also mind that the sounds where paid by the company I work for, which authorized
me to redistribute these sounds for the asterisk community.

By: flavour (flavour) 2004-04-21 02:31:19

French & Italian working fine with latest patch on 0.9.0.
Looks like we have a good formula for a quick win :)
Although for longer-term, I'll second the need to replace, for example, 'fr' with fr_FR & fr_BE. In this case not for variations in say.c, but to have different language files ('septante' vs 'soixante-dix' & 'nonante' vs 'quatre-vingts-dix')

By: Olle Johansson (oej) 2004-04-21 02:31:50

So far Italian is the only disclaimed patch. If I don't hear anything else from Denmark, France, Spain or Portugal I might have to rip everything besides Italian out. That would be a shame. Please respond.

By: Olle Johansson (oej) 2004-04-21 13:45:03

Input from the mailing list:
I did a quick test with the danish numbers in say.c patch (04-20-04 02:11)
and found this..

*1  -- Executing SayNumber("SIP/1000-497f", "1") in new stack
   -- Playing 'digits/1' (language 'da')

*2  -- Executing SayNumber("SIP/1000-497f", "100") in new stack
   -- Playing 'digits/1' (language 'da')
   -- Playing 'digits/hundred' (language 'da')

*3  -- Executing SayNumber("SIP/1000-497f", "101") in new stack
   -- Playing 'digits/1' (language 'da')
   -- Playing 'digits/hundred' (language 'da')
   -- Playing 'digits/1' (language 'da')

*4  -- Executing SayNumber("SIP/1000-497f", "1000") in new stack
   -- Playing 'digits/1' (language 'da')
   -- Playing 'digits/thousand' (language 'da')
   -- Playing 'digits/and' (language 'da')

   -- Executing SayNumber("SIP/1000-497f", "1001") in new stack
   -- Playing 'digits/1' (language 'da')
   -- Playing 'digits/thousand' (language 'da')
   -- Playing 'digits/and' (language 'da')
   -- Playing 'digits/1' (language 'da')

*5  -- Executing SayNumber("SIP/1000-497f", "1000000") in new stack
   -- Playing 'digits/1' (language 'da')
   -- Playing 'digits/million' (language 'da')
   -- Playing 'digits/and' (language 'da')

   -- Executing SayNumber("SIP/1000-497f", "1000001") in new stack
   -- Playing 'digits/1' (language 'da')
   -- Playing 'digits/million' (language 'da')
   -- Playing 'digits/and' (language 'da')
   -- Playing 'digits/1' (language 'da')

*1)    pronounced "en", not an issue in itself but see next point.
*2)    pronounced "et" + "hundrede", different digit 1 "et".
*3)    pronounced "et" + "hundrede" + "og" + "en", there is an "og" missing.
*4)    pronounced "et" + "tusinde", no need for the "og"
*5)    pronounced "en" + "million", no need for the "og"

A few pointers to how it is done...
(Last time I translated VoiceMail software was 7 years ago and the biggest
problem was making our vendor understand that we needed two different 1's. )

(1,2,3...99)
en, to, tre...ni|og|halvfems

(100,101...199)
et|hundrede,
et|hundrede|og|en,
...,
et|hundrede|og|ni|og|halvfems

(1000,1001...1099)
et|tusinde,
et|tusinde|og|en,
...,
et|tusinde|og|ni|og|halvfems

(1100,1101...1999)
et|tusinde|et|hundrede,
et|tusinde|et|hundrede|og|en,
...,
et|tusinde|ni|hundrede|ni|og|halvfems

(1000000,1000001...1000099)
en|million,
en|million|og|en,
...,
en|million|og|ni|og|halvfems

(1000100...1999999)
en|million|et|hundrede,
...,
en|million|ni|hundrede|ni|og|halvfems|tusinde|ni|hundrede|ni|og|ni|og|halvfe
ms

(2000000...X)
to|millioner...X

-- Soren

By: cyb (cyb) 2004-04-21 21:48:19

I just faxed the disclaimer. Matt (ZX81) has talked about sending one as well for the French sound files - not sure if we need it for this patch.

I've tested the current patch with setLanguage(fr) and it seems to work fine.

Note that I haven't coded the feminine gender. It would be quite simple (same rules, only "un" changes to "une") but I'll wait until we figure out the best way to pass the gender to say_number.

By: Olle Johansson (oej) 2004-04-22 03:55:24

Ok, so french and italian is taken care of. Portugal contacted me, so it's on it's way. Danish and Spanish is still non-disclaimed and will not be included if nothing happens soon.

By: c960657 (c960657) 2004-04-22 04:16:53

I faxed a disclaimer for the Danish patch yesterday (my real name is Christian Schmidt). I hope to find time to test your combined patch during the weekend.

By: Olle Johansson (oej) 2004-04-22 04:29:59

Christian: Look at the comments above and see if you can help me fix those problems. Welcome aboard!

So Spanish is the only one without a disclaimer now. We're slowly getting ahead.

By: Angel Gomez (angom) 2004-04-24 03:59:02

I have the disclaimer ready for Spanish, will try to fax it tomorrow (Saturday), but it will surely be done Monday at most.

By: Olle Johansson (oej) 2004-04-24 04:02:59

Oops. All code dislaimed or going to be disclaimed monday. Great!

Guess this means I have to get back to coding fase :-)

Will soon be back with a new patch.

Have a nice weekend!

By: Matteo Brancaleoni (mbrancaleoni) 2004-04-24 09:02:24

small italian sound fixes:
* yesterday.gsm was still in english :)
* oclock isn't used in italian, so is substituted with a 1sec silence file

sounds.tar.gz can be safely deleted, use itsounds.tar.gz

By: ktsaou (ktsaou) 2004-04-24 11:55:24

Although development has been started, I have been thinking about this for some time, and I have another proposal to make.

What if all applications just compose the english string that want it said, and we write a function to do best match searches to find the sound files?

For example, take a look at this:

"You have no messages"

Today, the application knows internally that it has to find 3 sound files:

1. youhave
2. no
3. messages

In Greek however, the sound files have to be played in some other order to have the correct result:

you have 10 mesages = EXETE 10 MINIMATA
you have no mesaages = DEN EXETE MINIMATA

(EXETE = youhave, DEN EXETE = you don't have)

The maximum flexibiligy would be given, if for example, we had a function that will get as argument:

"you have no messages"

and look in a directory for the available sound files. For the English languages it should find:

you have.gsm
no.gzm
messages.gsm

For the Greek language it should find:

you have no.gsm (DEN EXETE)
messages.gsm (MINIMATA)

The administrator could be able to overload the whole sentence by creating a sound file like this:

you have no messages.gsm

The same could be used for numbers, with some pattern mechanism:

1.gsm (one)
11.gsm (eleven)
1XX.gsm (one hundred)
1XXX.gsm (one thousand)
1XXXXXX.gsm (one million)

To say the number 1.001.111, it should play:

1XXXXXX.gsm (one million)
1XXX.gsm (one thousand)
1XX.gsm (one hundred)
11.gsm (eleven)

To make things even smarter, each of those could have a pre and post sound file, like this:

1.gsm (one)
2.gsm (two)
11.gsm (eleven)
PXXX.gsm (prefix for thousands, for english: not existing)
XXXP.gsm (postfix for thousands, for english: "thousands")
1XXX.gsm (one thousand)
etc.

To say: 1011 it should use:

1XXX.gsm (one thousand)
11.gsm (eleven)

To say: 2011 it should use:

PXXX.gsm (prefix for thousands, for english: not existing)
2.gsm (two)
XXXP.gsm (postfix for thousands, for english: "thousands")
11.gsm (eleven)

and to say: 11011, it should use:

PXXX.gsm (prefix for thousands, for english: not existing)
11.gsm (eleven)
XXXP.gsm (postfix for thousands, for english: "thousands")
11.gsm (eleven)

Are these going to give the maximum flexibity for all languages and simplify development significantly?

Costa

By: Olle Johansson (oej) 2004-04-24 15:28:16

-- Need documentation on additional sound files for italian.
-- Need list of portuguese files
-- Need list of french files

* Please use english names on all sound files. Spanish "cien.gsm" should propably be changed. If the file exists in english, use it. If not possible, use local name.

Possibly we should add "sounds.txt" files in various digit directories in the CVS tree. Please produce them - in english and local language. See sounds.txt in Asterisk cvs and use that format. That way, we have every file required documented, with or without a set of sound files.

By: Olle Johansson (oej) 2004-04-24 16:24:37

Ok, new, larger patch, uploaded. *** PLEASE TEST ***

* Say.c now includes fr,da,en,pt,it,es support
* saynumber() now accepts an option 'f' or 'm' for gender ***TEST **
 This is required by portuguese
* The change to saynumber() affects pbx.c and include/asterisk/say.h

Everyone, test. Portuguese - test gender option to saynumber.
The negative number patch you had is no longer supported, since we need negative numbers for weather (sad enough, but reality)

By: Olle Johansson (oej) 2004-04-24 16:33:25

Costa:

In the first step, we're only fixing saynumber() for various languages. After that, we'll take a broader perspective and add an architecture that works in the way you suggests. Right now, let's fix saynumber().

It seems like greek need the gender option as portuguese to fix saynumber().
Looking forward to receiving a patch from you! :-)

By: Olle Johansson (oej) 2004-04-24 16:49:26

If you want language support in IAX2, check patch # 0001476

By: Olle Johansson (oej) 2004-04-24 17:14:13

Sorry for all the mails...

Added new patch that also fixes various apps that use say_number. All of these fixes is not very supportive of portuguese, but that is something we have to fix later. For now, all the applications will be using masculine gender on numbers. I don't know if thats wrong or right or totally stupid, but that is how it is now :-)

By: c960657 (c960657) 2004-04-24 20:05:31

say_da.diffdiff is a diff against say_intl.txt. It fixes the issues raised by Soren on the mailing list.

Danish have two genders, commune and neutrum, so I have introduced c and n in addition to m and f. I have chosen commune as the default for Danish (I think the majority of nouns are commune).

After introduction of the gender, German and Danish no longer follow the same rules. German have three genders (male, female and neutrum). Also, 1 in German not only depends on the gender but also on the role in the sentence (like "he"/"him" in English).

BTW say_intl.txt has inconsistent use of spaces vs. tabs for indentation.

By: Paul Cadach (pcadach) 2004-04-24 23:05:09

I have table-driven "say" function for Russian. I'm trying to adopt it to work for English, French and Danish languages too...

By: mmenaz (mmenaz) 2004-04-25 05:37:54

In a few days I will have all the sounds spoken in italian (with my voice and, in some other days time, with the voice of my wife). It's not "professional level", but it's the complete set, while mbranca provides only sounds for numbers.
I've also included 21.gsm,22.gsm,23.gsm that should be usefull for a 0-24 time format (are you patching also this aspect?).
I will provide also a OpenOffice.sxw file with the filename of the sound, the italian text and the english original.
I will try to fax the disclaimer this night.

By: Matteo Brancaleoni (mbrancaleoni) 2004-04-25 05:57:21

cool.

so please provide also a working patch to make these other sounds useful.
You'll find that translating only sounds isn't sufficent to make other apps working in other langs, like voicemail.

oh, 21,22,23 aren't needed at all... you can already specify time in 24-h format and asterisk will handle that... with already present digits sounds.

By: Olle Johansson (oej) 2004-04-25 06:16:15

mmenaz: Please add a .txt file and add your sound files to another bug report.
OpenOffice is fine, but we only have .txt files for docs within CVS.

At this time we're fixing saynumber. It's a lot of work. After that's done, we need to look into date and times, voicemail and meetme. One thing at a time, friends.

I really like to have confirmations that the latest patch works. Confirm which language you tested and that it works or not. We need confirmations for every language now.

By: mmenaz (mmenaz) 2004-04-25 06:25:41

mbrancaleoni, as you know I'm not a coder :) So let's divide the work, you provide the patches to the asterisk code, I will provide the needed sounds.
Bugs 592, AFAIR, was about 21,22,23 needed sounds. If with your patch it's not needed I will remove them, let me know.

By: Olle Johansson (oej) 2004-04-25 06:52:11

New patch, integrated danish changes.

Simplified, so there's only one function per language.

TEST * TEST * TEST * TEST * TEST * TEST * TEST * TEST * TEST * TEST * TEST

By: flavour (flavour) 2004-04-25 07:20:31

Doesn't patch cleanly on Stable:
patching file apps/app_queue.c
Hunk #1 FAILED at 365.
1 out of 1 hunk FAILED -- saving rejects to file apps/app_queue.c.rej
patching file apps/app_zapscan.c
Hunk #1 FAILED at 322.
1 out of 1 hunk FAILED -- saving rejects to file apps/app_zapscan.c.rej

So I'll now test against Head

By: c960657 (c960657) 2004-04-25 07:45:32

say_da.diffdiff is a diff against your latest say_intl.txt.

When calling ast_say_number_full_da recursively, the options argument should not be reused. In Danish, "million" is commune but "thousand" is neutrum, so the option argument should be "c" and "n", respectively.

Asterisk crashes in strncasecmp when options is null. I am not used to coding C, so the way I check for null-ness may not be the right way to it.


Also, I had to manually add ", (char *) NULL" to apps/app_queue.c to make it compile. app_queue.c (CVS version 1.55) has to calls to say_number, but say_intl.txt only fixes one of them.

By: Olle Johansson (oej) 2004-04-25 08:01:58

Thank you for your input. New patch uploaded.

Please always have a '.txt' extension on patches, makes it easier to view them in browsers.

By: Olle Johansson (oej) 2004-04-25 14:28:09

Dutch disclaimed and integrated into code, thanks to tric.nl

Please test and confirm, test, test,test,test

By: Olle Johansson (oej) 2004-04-25 16:14:34

Dutch didn't use dutch syntax for ast_say_number. Thank you flavour.

By: Olle Johansson (oej) 2004-04-25 16:19:19

We need portuguese sound files for testing. In the previous patch, 'app' mentioned that they could be made available. If so, please upload them to this bug report for testing.

By: Olle Johansson (oej) 2004-04-26 13:32:23

Reminder to test and add your results to the bug report. Thank you for helping us.

By: c960657 (c960657) 2004-04-26 16:24:35

Danish sounds fine.

By: flavour (flavour) 2004-04-26 16:59:07

What would help my testing is a test suite.
I have attached a quick version (i18n.testsuite.conf) that covers some English, French, Italian & Danish tests.
I #include this into my main extensions.conf in a suitable context.

Could each of the patch authors (or other people from their languages) suggest good numbers to use to prove their syntax is working properly?
We can test without sound files just by looking at the console to see what files are requested...

Actually, I think that the English version is sub-optimal!
SayNumber(183) gives "one hundred eighty three" whereas I think it should say "one hundred & eighty three"

Also a 'trivial' fix to the patch is that Dutch shouldn't have /* Spanish syntax */ by it (twice) ;)

edited on: 04-26-04 15:55

By: Olle Johansson (oej) 2004-04-28 08:20:25

New patch that applies to recent CVS head with channel.c using say_number

By: Olle Johansson (oej) 2004-04-28 16:25:06

New patch. Separated danish and german again. Small code changes.

Flavour: We might want to add your test to extensions.conf.sample if we have sound files that we could place in cvs.

All contributors: We need to make sure we have documentation of all sounds files needed for all the languages supported. Please help me with this. You know this better. We're moving closer to CVS commit, so I need your assistance here.

By: flavour (flavour) 2004-04-28 17:14:09

READMEs.tar.gz
Attached a tarfile listing all soundfiles I think are required - please can each author double-check what I have?
At this stage this is just for SayNumber(), not SayDate() or app_voicemail or anything

By: flavour (flavour) 2004-04-28 17:45:24

Updated i18n.testsuite.conf attached - broken into seperate extensions to call to test each language & more examples put in, although there are still more to go in.
My testing so far shows a problem with Portuguese (other than the continued lack of available soundfiles ;) )
SayNumber(183) fails to find '100E', then 'pt-e' (which seems to be a duplication) & then doesn't bother with the '83' at all...

By: philipp2 (philipp2) 2004-04-28 17:53:30

Short note: I hope (this is not a promise, though) to be able to provide German sound files soon.

By: c960657 (c960657) 2004-04-28 17:54:47

All languages appear to be missing 0.gsm.

There seems to be some inconsistency whether the files for "hundred", "thousand" etc. should be named in English, in the local language or in digits (e.g. 1000.gsm).

As mentioned above, I am no expert in German, but if the current implementation uses (almost) the same rules as Danish, the required files should be the same, including the "1-and" sounds. I don't know if "1-and" through "9-and" are required in German - perhaps "1" through "9" followed by "and" is sufficient.

There is a README file for Swedish (se), but not for Norwegian (no), though they both use the same syntax.

In the code, Swedish and Norwegian currently use the same rules as English. However, both these languages use gender, very similar Danish. Actually, they speak two different languages in Norway (nynorsk and bokmål). I don't know how similar they are.

The patch only mentions the option in relation to gender in Portuguese. Gender is also relevant for Danish, Swedish, Norwegian, German, French and possibly other languages. In addition to Portuguese, at least the Danish code properly supports gender.

The support for some languages may not be perfect, but at least some is better than nothing. So I am not arguing that the code should not go in.

By: mmenaz (mmenaz) 2004-04-28 21:57:30

As promised, I've uploaded the complete Italian sounds set (bug # 1514). Now there must be coordination between the italian code developers and the sound recorders. Since I use my wife's voice, I can catch up with coders need very fast and with "low cost" (only my and my wife's time). BTW, producing those files I've understand why professional recordings are so expensive ;)
When/if mbrancaleoni's company will give the full set of sounds to the community, italians will have the luxury of 2 voices to choose from! Fabolous :)
I've disclaimed to digium 3 days ago.

By: Mark Spencer (markster) 2004-04-28 22:23:30

Merged in CVS