[Home]

Summary:ASTERISK-06050: [patch] New App Saynumber, for generic way of easy internationalization
Reporter:crich (crich)Labels:
Date Opened:2006-01-11 10:48:03.000-0600Date Closed:2006-05-08 13:56:04
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/Internationalization
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) app_say_number.patch-5.txt
( 1) test.say
Description:Attached is a litle package which adds a application: Say_Number to asterisk which reads a configuration file containing all the information how digits are constructed in different languages.

I didn't like the attempt to hardcode each language in asterisk, so i wrote this application some months ago.

It contains also a little console program which tests the engine.

I just saw the attempt to add japanese  number-saying to asterisk, that has reminded me to post my work here.

I hope this is useful to someone, probably we can remove the old app_saynumber some time.

All needed information on how to configure new languages is in the READMEs, also the german and english language is given as a sample.



****** ADDITIONAL INFORMATION ******

Disclaimer on files
Comments:By: crich (crich) 2006-01-15 14:32:34.000-0600

The above patch is made against revision 8080 of asterisk/trunk

It should work now also, i have tested it against english and german language.

By: Russell Bryant (russell) 2006-01-16 10:39:20.000-0600

The "post 1.2" identified in the bug summary was something we used to identify an issue that will not be addressed until after 1.2 was released.  Since 1.2 has been released, there is no need for it anymore.  It shouldn't be added to any new reports, since reports that say "post 1.2" are issues that we can quickly identify as being around for quite a while.  Thanks!

By: crich (crich) 2006-01-16 11:20:13.000-0600

sorry drumkilla, didn't knew that.

By: crich (crich) 2006-01-22 02:42:45.000-0600

The new patch applies against trunk Revision 8431.

By: Luigi Rizzo (rizzo) 2006-01-22 08:12:12.000-0600

i think the approach makes a lot of sense.
However I have some reservations on the implementation, because:
- some languages, as you can see in the existing 'say.c', might
 have male or female version of numbers to be used in different
 situations. I am not sure your approach can cope with it-basically
 it seems that your language cannot build strings taking parts
 from the number and other parts from the rule e.g. to add a 'f'
 or 'm' suffix or whatever;

- there is already code in asterisk to match numeric patterns e.g. in
 the dialplan, and handle variables, build strings from substrings, etc.
 So why define a newer syntax rather than reuse the existing one
 (see example below).
- same for config files, why re-read them all the times with a new
 parser (and a different syntax) rather than reusing the existing syntax

E.g. one could use ',' as a separator, assume that the input number is
a variable called N, and write an asterisk-style configurations like the
one below, that would then be parsed using the standard navigation
functions to walk configuration files, and functions from pbx.c
to play with variables and patterns etc.

[en]
rule => _[0-9], ${N}
rule => _[1-9]0, ${N}
rule => _[1-9]00, ${N:0:1}, hundred
rule => _[1-9]000, ${N:0:1}, thousand
rule => _[1-9]XXX, ${N:0:1}, say(${N:1))
rule => _[2-9][1-9], ${N:0:1}0, ${N:1}

By: crich (crich) 2006-01-22 08:19:00.000-0600

i agree. It makes sense to resuse the existing parsing functions and also to have a sort of equal notation between the different config files.

Btw. The approach copes with Numberparts + additional strings.



By: crich (crich) 2006-01-22 17:11:09.000-0600

why not breacking with the good old  (a bit useless) extensions.conf style:

rule => pattern,todo

and having:

pattern => todo

which makes a bit more sense i think. Then we could use the ast_category_browse functions to split the "pattern" from the "todo".

So your example would get:

[en]
_X => ${N}
_NX => ${N:0:1}0
_N00 => ${N:0:1}, hundred
_NXX => ${N:0:1}, hundred, and , say(${N:1)
_N000 => ${N:0:1}, thousand
_NXXX => ${N:0:1}, and, say(${N:1))
_[2-9]N => ${N:0:1}0, ${N:1}


This would cleanup the recursion part a bit, because it won't take care of the first pattern part, but thats not very important at all..

I didn't find any nice *public* Variable splitting and value extracting function, please give me a hint. In pbx.c i found only parse_variable_name and substring, which are not public at all.

The other functions handle only channel variables, but this ${N} Variable won't be a channel variable, would it ? It could indeed be a temporary channel variable .. but still how does the splitting work on a variable, is it pbx_substitude_variables_helper ?  I mean i needed something like:

ast_setup_app_var( "N" , number) ;

then call something like:

ast_retrieve_val_from_app_var( "N:0:1", &ret);

that would be nice.


Probably we can use ast_expr here, but i don't know how this works.

By: Luigi Rizzo (rizzo) 2006-01-23 10:42:11.000-0600

well it's a patch so you are free to write a small wrapper around
pbx_substitute_variables_helper_full() to make it public,
and then in your 'saynumber' code invoke it with a suitably
set varlist (with N equal to the original number, and possibly
more stuff inherited from the channel if you like so).
For patterns, you could just store them in your own
list and use ast_extension_match() for matching.
I am sure that this would reduce a lot the size of your code.

By: crich (crich) 2006-01-23 17:30:00.000-0600

the source shrinked quite a lot.. still the varhead stuff needs to be implemented, at the moment N is a real channel variable .. but it works so far. a very basic config.sample is attached.

By: Olle Johansson (oej) 2006-01-24 03:21:42.000-0600

Remember that language syntaxes are used in voicemail as well.

By: Olle Johansson (oej) 2006-01-24 03:23:04.000-0600

tdesc = "Generic System() application";

??? ;-)

By: Olle Johansson (oej) 2006-01-24 03:26:59.000-0600

Crich: Loading the config each time the app is called seems expensive. Add a load_config function that also can be called from a reload function (that is missing).

The logging needs to be cleaned up, as well as the // comments removed.

To be backwards compatible, we can't fail for a non-defined language, we have to fall back to the default (en).

Can we try to implement all implemented languages in the current saynumber?

There might be a need for adding soundfile references in the definition if I remember correctly.

By: crich (crich) 2006-01-24 03:35:09.000-0600

tdesc - hehe , that may happen copy/paste

cfg load optimizations and formatting:

yep you're quite right, that's some sort of first shot, i'd like some people to think about it before begining to optimize more.

Implemeting ALL Languages will be a hard Job, i hope it is possible with this attempt, thats why i've posted it here.

I hope the language maintainer have a look at it and can tell wether it is possible to construct their numbering with it.

By: Luigi Rizzo (rizzo) 2006-01-24 14:20:17.000-0600

ok i have taken the suggestion to simplify even further the
rewritten say.c in ASTERISK-5527. It works really well for english,
i have included in test.say the chunk of C code that is needed
to parse the configuration, and the configuration for english
(two lines are missing, for a leading '-' and leading '+').

Still needs a bit of polishing e.g. i have to check what to do when
one of the components is missing (i noticed 'digits/billion' is not in
the distribution!), see how efficient is to set a chanvar rather
than pass a separate list to pbx_substitute_variables_helper()
(i will probably need the 'full' version), and especially get
feedback for other languages.

I agree that this approach t might be useful for other "spelling" functions,
such as enumeration, dates, times, etc.
In fact, it would be great to use it because people could easily
customize their preferred formats in the config file.

By: crich (crich) 2006-01-24 14:31:54.000-0600

looks good rizzo. If you use this attempt i definitely want some karma ! ;)

was just fixing the mentioned issues, so i think i can stop here hm ?

By: Luigi Rizzo (rizzo) 2006-01-24 14:36:05.000-0600

yes i don't care about points.
but you cannot stop here, we need syntax for german :)

By: crich (crich) 2006-01-24 14:43:02.000-0600

just a joke ;) we'll need to set a path variable like:

[en]
path=> en/digits
_XX => ..

to simplify the configs.

By: Luigi Rizzo (rizzo) 2006-01-24 14:49:06.000-0600

not totally sure about the need for a path. first, the
prefix (en, it, de...) is already implicit in the play routines
so it would be redundant. second, it is not always digits that
you want. anyways, you only write the patterns once,
and numbers have only a few digits, and same for dates.

Of course a bit of simplification would came if we had
patterns with length specifiers e.g. c{n,m} as in
conventional regex matches between n and m occurrence
of character c (or class c). Coming soon...

By: crich (crich) 2006-01-26 14:52:54.000-0600

patch-5 news:

* Added the german language up to 1 million as sample
* added support for minus
* readded implicit digits, for simple configs

How should we go on now?

By: crich (crich) 2006-01-26 15:16:28.000-0600

I thought about how we can make the other language specific things like date and enumeration a bit more generic, what do you think about this:



[language]
numberrule => pattern,rule
enumrule => pattern,rule
daterule => format-shortcut,pattern,rule

e.g.

[en]
numberrule => _X,${N}

enumrule => _X,${N}
enumrule => _XX,${N:0:1}0,SayEnum(${N})

daterule => A,_X,SayWeekDay(${N}) ; 1 -> Monday , 2->Tuesday ..
daterule => a,_X,SayDate(A,${N}) ; Call Recursive the Date-Engine with format-shortcut A
daterule => d,_X,SayEnum(${N}) ; call SayEnum - engine
daterule => d,_XX,SayEnum(${N}) ; call SayEnum - engine




Ideas and comments quite welcome.

By: Luigi Rizzo (rizzo) 2006-01-26 15:23:42.000-0600

for enumeration it is simple - you can use a prefix of some kind
(e.g. i used %e) and there is no extra code required:

; enumeration
_%eX => digits/h-${N}
_%e1X => digits/h-${N}
_%e[2-9]0 => digits/h-${N}
_%e[2-9][1-9] => say:${N:0:1}0, digits/h-${N:1}
_%e[1-9]XX => say:${N:0:1}, digits/hundred, say:%e${N:1}
; and so on

By: crich (crich) 2006-01-26 15:30:27.000-0600

we could prefix the datestuff too then, like:

_%AX => digits/day-${N}
_%BX => digits/mon-${N}
_%BXX => digits/mon-${N}

but then we need another prefix for the enumerate stuff, e is already used for numeric day of month. ..

By: Luigi Rizzo (rizzo) 2006-01-26 15:30:34.000-0600

for date/time, you need a bit of support to split the time of day
(or whatever is passed as argument) into the struct tm fields,
and save them into variables, so that you can use them later.
I am just uncertain on which variables to use, as you have several -
year, month, day, day_of_week, hour, minute, second, timezone.
There is a bit of pollution of the channel variables.

But it is certainly a promising road to get rid of the existing say.c

By: crich (crich) 2006-01-26 15:34:26.000-0600

we mustn't put them into the channel vars. we can create an own varshead and use pbx_substitute_variables_varshead instead. This would even in the saynumber case be much more safe, since N is probably an often used variable in extensions.conf

By: Olle Johansson (oej) 2006-03-09 14:42:48.000-0600

This is currently discussed on asterisk-dev - please join the discussion there :-)

By: Serge Vecher (serge-v) 2006-05-08 11:40:52

looks like this patch needs an update; at the very least for new loader changes.

By: Luigi Rizzo (rizzo) 2006-05-08 13:56:01

code with this functionality is in app_playfile now and we
just need to complete the syntax configuration files for
the various languages.
Thanks to crich for suggesting this nice idea on how to
improve the 'say' implementation.