[Home]

Summary:ASTERISK-05527: [patches for review/feedback] [post 1.2] say.c rewrite.
Reporter:Luigi Rizzo (rizzo)Labels:
Date Opened:2005-11-09 11:56:50.000-0600Date Closed:2011-06-07 14:02:55
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Core/General
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) say.20051111b.tgz
( 1) say-20051109.tgz
( 2) say-20051110.tgz
( 3) say-20051115a.tgz
Description:as announced, i am working on a rewrite of say.c to make it easier to
maintain. I am submitting here a snapshot of the current code for review.
THIS IS ONLY FOR REVIEW AND FEEDBACK, NOT MEANT TO BE COMMITTED NOW

If you want to try it, expand the tgz in the asterisk/ directory
(WATCH OUT IT WILL OVERWRITE YOUR say.c), apply the patch say.Makefile.patch
to the Makefile in the same directory.
You also have to change the first argument to ast_localtime in say.c, i think
it is just a couple of instances (trivial).
The code is now broken in one main part (say.c) plus language-specific files
which reside in the subdirectory say/ and are included by the main file.
The main file say.c is also largely commented explaining how the various
language-specific parts should be written.
If you are a native speaker and have corrections to the individual files,
please send them to me at rizzo at icir dot org so i can incorporate them.

****** ADDITIONAL INFORMATION ******

Right now i have only split the file, cleaned up a bit the interfaces,
and documented the main part. I have tested the english variant and
it seems to work - have not tried other languages.
In the process i have found and removed a lot of minor bugs (such as nl
functions calling english functions in the middle of their body,
and the same from danish to german and possibly a few more) but not
tested the grammar as i don't speak most of the languages.

TODO:
+ add back copyrights to the individual files (they are under the same
 copyright as the main file)
+ cleanup interfaces so they use say_args_t as the first argument in the
 same way as say_number_full_*();
+ reduce the variety of different methods used to stream a file - the
 code currently uses 4, probably just one is enough;
+ implement the date/time functions as instances of say_date_with_format()
 with specific formats
and probably a lot more redundancy removal once i have figured out
more common parts.
Comments:By: Sergey Basmanov (sb) 2005-11-10 05:57:09.000-0600

There is another idea about say.c
Look at http://bugs.digium.com/view.php?id=3832
But unfortunately it was suspended...

By: Luigi Rizzo (rizzo) 2005-11-10 09:03:14.000-0600

I had a quick look at the other patch.
From an inspection of the (huge) diff it looks it was mainly
addressing the dynamic loading of language interpreters, whereas
here I am especially focused on cleaning up the (internal) interfaces
for the various say* functions, and removing huge duplications of code
that do the same thing - even within a single language.

By: Sergey Basmanov (sb) 2005-11-10 09:11:15.000-0600

So, it would be great to have separate modules for each language (to reduce size of say.c) and clean code without any duplicates..
May it's possible to merge both ideas ?

By: Luigi Rizzo (rizzo) 2005-11-10 09:13:40.000-0600

I have uploaded an updated patch which does the following:
+ implement say_{time,date,tatetime} in terms of say_date_with_format();
 the actual formats are specified as strings in the per-language descriptor.
+ use say_args_t to pass arguments to all the say* routines.

TODO:
+ I have still left in the various different methods to stream a file,
 but at this point i am strongly convinced that all routines should only
 use the same method, possibly followed by a "flush" call at the end of
 the top-level ast_say_*() function.
+ many languages do have a say_date_with format routine, but lack format
 strings for date, time and datetime.
+ some languages lack routines to say ordinal numbers or date/times.
+ the handling of times and dates is at times inconsistent.
 E.g. there were bugs in the italian version, which appeared to be
 simply copied from some other language. Portuguese has some strange
 calls too. I have marked with XXX the points which i did not understand.

I would kindly request people to have a close look at the way the routines
in their mothertongue and see if they make sense. Individual files are now
rather short (around 200 lines) so it should not take too much time.

By: Luigi Rizzo (rizzo) 2005-11-10 09:23:32.000-0600

sb: i would suggest one step at a time, as follows:
1) complete the cleanup of the restructured code that i submitted here;
2) test it a bit to make sure it is not a regression wrt existing one
  (english does work for me, and considering how many bugs i found in
  some of the other languages i doubt we can expect a regression; but
  we do need a bit of testing);
3) replace the existing code with this one;
4) use the existing loader mechanism to make the whole of say.so a loadable
  module. If one wants to add a new language s/he has to rebuild only
  say.so and not the whole asterisk, and more granularity is probably
  not required.
5) if someone is really willing to, he can then add dynamic loading of
  individual languages. But i really don't believe it is worth the effort.
In any case, i would hope that 4) and 5) are done after importing the
loader patch i submitted a few months ago (forget the number).

By: Sergey Basmanov (sb) 2005-11-10 09:42:48.000-0600

rizzo: I agree with You. This weekend I'll try to review _ru part of code..
And about modules: why not to have separate module for each language? This way we simplify navigation through source and also making it more clear.. And another reason is that almost nobody need to have all languages loaded.. I think most of people needs maximum two at time :)
Anyway, all of this after 1.2 release.

By: Luigi Rizzo (rizzo) 2005-11-11 10:13:10.000-0600

one more version of the code, which reduces even more the language-specific
parts (especially for say_date_with_format), constifies arguments,
and uses only one consistent method to stream a file.
Also, there is now a printf-like function to build and play a file,
avoiding multi-line sequences (snprintf(...); play(...)) and
buffers all over the place.

At this stage, the code is really awaiting review.
Apart from changing say.o into say.so, the only remaining
simplification would be the use of strftime formats for date, time and
datetime (thus avoiding redundant arguments in the per-language descriptor).

By: Luigi Rizzo (rizzo) 2005-11-15 11:38:31.000-0600

yet another update of say.c which compiles fine with 2005.11.15 sources
by just untarring the tgz and applying the makefile patch.
Todays version include a fix for pronouncing '100' in dutch,
some assorted code cleanup, and the implementation of 'c', 'x' and 'X'
formats for language-specific datetime, date and time formats.

This latter change basically makes the ast_say_{datetime|date|time}() functions
useless because there is a 1-character strftime format for that.

Note, browsing the code, that ast_say_datetime() is never used
(only once in app_datetime.c which is not compiled); ast_say_date() is
only used once in res/res_agi.c; ast_say_time() is used 2 times.

Also note that ast_say_datetime_from_now() is never ever used.
I don't even know (from the source) what it is supposed to do
exactly...

Once again, i need feedback for  the various languages.

By: Igor Goncharovsky (igorg) 2005-11-17 05:48:17.000-0600

I send new say_ru.c to email. SayNumbers work well, but i have not test date and enumeration, i am waiting for sound files.

One note: may be we will continue execute dial plan while any soun files is missed? Now when we can try to use mised sound dialplan execution stoped...

By: Peng Yong (ppyy) 2006-01-05 07:32:48.000-0600

Chinese locale name standardlise:

http://bugs.digium.com/view.php?id=6135

By: benjk (benjk) 2006-01-06 06:47:53.000-0600

please keep in mind that there are languages where counted objects cannot be constructed from "number" + "object" such as is the case in Japanese for example ...

one day:
Chinese originated word for "one" plus Chinese originated word for "day"

BUT ...

two days:
Japanese native word for "two" minus last syllable plus Japanese native word for "day"

three days:

Japanese native word for "three" blended together with Japanese native word for "day"

ten days:
Japanese native word for "ten" plus Japanese native word for "day"

eleven days:
Chinese originated word for "ten" plus Chinese originated word for "one" plus Chinese orginated word for "day"

twenty days:
special word for "twenty" only ever used in this context plus Japanese native word for "day"

twenty-four days:
Chinese originated word for "twenty" plus Japanese native word for "four" minus last consonant plus Chinese originated word for "day"

twenty-seven fays:
Chinese originated word for "twenty" plus Chinese originated word for "seven" plus Chinese originated word for "day"

... which is irregular of sorts because normally twenty-seven would be Chinese originated word for "twenty" plus Japanese native word for "seven"

There are many more oddities in Japanese counting rules which render any system that simply uses number + object to construct the text unusable for Japanese. Other languages outside of the Latin/Greek influenced group of languages have their own often incompatible rules.

I am all for simplification, but in this case we have to take the large variety of foreign languages into account.



By: crich (crich) 2006-01-22 02:46:54.000-0600

Please have a look at ASTERISK-6050 which implements a generic way of saying numbers. we could make such thing for the date stuff also ..

By: Luigi Rizzo (rizzo) 2006-03-28 15:32:31.000-0600

this patch has been superceded by the one in ASTERISK-6050 which is a much
better way of handling the problem.