[Home]

Summary:ASTERISK-14988: [patch] Not all SIP extensions receive a page
Reporter:David Brillert (aragon)Labels:
Date Opened:2009-10-14 13:09:53Date Closed:2012-09-05 09:19:16
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Applications/app_page
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) 16073-1.4.24.1.diff
( 1) pagingcontext.txt
( 2) pagingtest.txt
Description:I have 187 SIP extensions configured to receive a page.
Dial plan looks correct but when CLI is watched during paging execution only 140 phones receive the paging execution.

****** ADDITIONAL INFORMATION ******

Attaching dial plan configuration.
Attaching CLI during test page.
Asterisk 1.4.24.1

I marked as major priority since paging is used to broadcast emergency paging notification to tenant and not all devices can hear paging.

Two dual core Pentium 2.8 Xeon
4 GB RAM installed.
Comments:By: David Brillert (aragon) 2009-10-14 14:00:38

This reminded me of an earlier bug report and it is my understanding that the 128 extension paging limitation had been removed from Asterisk

ASTERISK-13344 Summary 0014217: [patch] app_page causes undefined behavior when paging a page group with more than 128 extensions

It appears to me that some of my context is simply ignored by Asterisk when I try to page all the phones in my context.

By: Elazar Broad (ebroad) 2009-10-14 14:12:22

On line 8 of your SIP debug, the debug line shows the page command ending at extension 451, whereas the command listed in your extensions snippet ends at 504, so we definitely have a problem here.

By: Elazar Broad (ebroad) 2009-10-14 14:35:45

Is 'scrubbed' the same length as the actual context name?

By: David Brillert (aragon) 2009-10-14 14:42:43

scrubbed = 8 characters
original context =7 characters

By: David Brillert (aragon) 2009-10-14 15:47:28

I did try upgrading to 1.4.26.2 but downgraded back to 1.4.24.1 (previously installed version) after seeing the CLI getting spammed with these warnings during each page

[Oct  2 12:35:08] WARNING[16852] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:08] WARNING[16851] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:08] WARNING[16859] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:08] WARNING[16858] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:08] WARNING[16869] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:09] WARNING[16868] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:09] WARNING[16886] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:09] WARNING[16888] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:09] WARNING[16893] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:09] WARNING[16889] file.c: Unexpected control subclass '-1'
[Oct  2 12:35:09] WARNING[16891] file.c: Unexpected control subclass '-1'

By: David Brillert (aragon) 2009-10-14 15:52:30

Hmmm, scratch last comment:
I just checked /var/log/asterisk/messages and I see same warnings in 1.4.24.1

[Oct 14 12:35:05] WARNING[25441] file.c: Unexpected control subclass '-1'
[Oct 14 12:35:05] WARNING[25436] file.c: Unexpected control subclass '-1'
[Oct 14 12:35:05] WARNING[25448] file.c: Unexpected control subclass '-1'
[Oct 14 12:35:05] WARNING[25440] file.c: Unexpected control subclass '-1'
[Oct 14 12:35:05] WARNING[25442] file.c: Unexpected control subclass '-1'
[Oct 14 12:35:05] WARNING[25443] file.c: Unexpected control subclass '-1'
[Oct 14 12:35:05] WARNING[25434] file.c: Unexpected control subclass '-1'

By: Elazar Broad (ebroad) 2009-10-19 15:10:32

It looks like you are hitting the extension length cap. If you don't mind, can you try changing VAR_BUF_SIZE in main/pbx.c from 4096 to 8192. If you are not comfortable with this, I can provide a patch.

By: David Brillert (aragon) 2009-10-19 15:56:00

I'd like to test a patch

By: David Brillert (aragon) 2009-10-23 15:24:27

Ebroad:
The patch appears to work.
I checked the CLI and all extensions received the page command.
I think you can commit this.

By: David Brillert (aragon) 2009-10-28 13:20:23

We tested the paging with an emergency lockdown today at a school involving local police etc...
The patch worked well and everything went off without a hitch.
Everybody heard the page.

By: Elazar Broad (ebroad) 2009-10-30 10:55:34

Duly noted, thanks!

By: David Brillert (aragon) 2010-04-20 15:49:51

Any chance this patch will get committed in next Asterisk 1.4 release?

By: Paul Belanger (pabelanger) 2010-04-27 15:02:30

Talking with Tilghman on #asterisk-dev, increasing the buffer may not be the best approach.

<Corydon76-dig> pabelanger: anytime we increase buffers allocated on the stack, we have a grave potential for causing other issues

Have you considered a dialplan change to branch to the number of extensions needed.

By: David Brillert (aragon) 2010-09-27 14:34:51

papelanger or tilghman:
Can you elaborate on the dial plan change to branch to the number of extensions needed???

By: Russell Bryant (russell) 2010-09-29 17:39:13

Perhaps they meant something like this: (totally untested)

[foo]

exten => page,1,Page(Local/group1@foo&Local/group2@foo&Local/group3@foo)
exten => group1,1,Page(SIP/1&SIP/2&...)
exten => group2,1,Page(SIP/3&SIP/4&...)
exten => group3,1,Page(SIP/5&SIP/6&...)


It would work, I think, but I wonder what kind of performance impact there would be, if any, since it will result in 4 MeetMe() conference bridges being created internally.  I'm surprised that you're able to page that many phones at once without a problem.  It requires making 140 calls at once and dumping them all into a conference bridge.

By: David Brillert (aragon) 2010-09-30 08:46:59

Thanks Russell:
If you are surprised by the number of working extensions = 140, what would your expectation be for the number of possible working extensions?
Is this a limitation of the CPU, or signaling limitations in Asterisk?
Just curious...

By: Russell Bryant (russell) 2010-09-30 10:17:31

My concern would be with it being a CPU constraint issue.  It's not that I would expect 140 concurrent calls is a problem in general, but having that many channels in a conference bridge and originating that many calls all at the same time I would expect cause quite a spike.

By: David Brillert (aragon) 2010-10-04 10:54:45

Hi Russell,

Branching the dial plan did not fix the issue.
The CLI shows all extensions are called by Asterisk.  But not all phones hear the page.  The CPU load in HTOP increases to about 4.00 and subsequent  pages increase the load to 8.00 etc...
The load is not astronomical but it seems enough to delay signaling to the phones or prevent audio from reach all the phones.
After additional benchmarking I can reliably page between 75 and 100 phones in my setup and that's all she wrote.

I have been using the attached patch with no issues since 2009-10-19
This patch could be committed IMHO.
But if not then the ticket can still be closed out and I will still continue to use the patch.

By: Ronald Chan (loloski) 2011-02-18 03:01:51.000-0600

any progress here??? thanks!

By: David Brillert (aragon) 2011-02-18 08:08:55.000-0600

loloski: 16073-1.4.24.1.diff works to allow my dial plan to complete execution and IMHO can be committed and the bug report closed out.  I have been using 16073-1.4.24.1.diff since 2009-10-19 without a single issue related to this patch.

Additional testing of paging many devices proves there is a CPU limit that limits the number of phones that can receive a page even though the * CLI clearly shows that those extensions were dialed.  In other words while the CLI displays each phone being called and the page successful I cannot actually hear the page coming from the phone.
Also the CPU stays spiked for a while after the page is finished so if another page occurs in rapid succession while the CPU is still spiked the second page can be heard on even fewer phones and the CPU spikes even higher.
On a Xeon 2.8 CPU with 4GB RAM the most phones I could simultaneously (parallel) call and actually hear a page reliably was about 40.
If I broke down the dial plan to page each phone in series then I could reliably hear the page from all phones (over 150).  The downside to this is that there is a small delay between paging each phone.  The upside is that every phone gets a page and the emergency paging system works as it should (everyone hears the emergency announcement).  Another upside is that the CPU does not spike during this type of page.

By: Matt Jordan (mjordan) 2012-09-05 09:19:08.478-0500

Based on aragon's comments, it feels the best answer to this problem is that a work around is available - which is to say, don't page all devices in a single shot.

That being said, in Asterisk 11, app_page has been converted over to use ConfBridge and the bridging layer for mixing internally instead of chan_dahdi.  This should also improve performance of app_page, and let it handle more devices reliably.

As such, I'm going to close this issue out as "workaround available".  If you require higher performance from app_page, you may want to consider Asterisk 11.