[Home]

Summary:ASTERISK-26210: chan_sip: NOTIFYs for BLF start queuing up and fail to be sent out, Asterisk 13
Reporter:Steven Sedory (stevensedory)Labels:
Date Opened:2016-07-18 11:15:54Date Closed:2017-10-10 09:40:36
Priority:MajorRegression?
Status:Closed/CompleteComponents:Channels/chan_sip/Subscriptions
Versions:13.9.1 Frequency of
Occurrence
Constant
Related
Issues:
Environment:ESXi Host VM, CentOS 6.6 (FreePBX's Current Distro: 10.13.66-64bit), Quad Core Intel Xeon E5530 2.4GHz, 4GB RAMAttachments:( 0) afterreboot_000001
( 1) afterreboot_000002
( 2) afterreboot_000003
( 3) afterreboot_000004
( 4) afterreboot_000005
( 5) afterreboot_000006
( 6) afterreboot_000007
( 7) debug_log_123456
Description:The issue we are experiencing seems exactly the same as the one already "fixed" here: ASTERISK-21677

The initial description there is exactly what we're experiencing. The main difference is that we're on CentOS 6.6, and Asterisk 13.
Comments:By: Asterisk Team (asteriskteam) 2016-07-18 11:15:55.328-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

By: Joshua C. Colp (jcolp) 2016-07-18 11:40:02.443-0500

Thank you for taking the time to report this bug and helping to make Asterisk better. Unfortunately, we cannot work on this bug because your description did not include enough information. Please read over the Asterisk Issue Guidelines [1] which discusses the information necessary for your issue to be resolved and the format that information needs to be in. We would be grateful if you would then provide a more complete description of the problem. At a minimum, we need:

1. The specific steps or actions you took that caused you to encounter the problem.
2. The behavior you expected and the location of documentation that led you to that expectation.
3. The behavior you actually encountered.

To demonstrate the issue in detail, please include Asterisk log files generated per the instructions on the wiki [2]. If applicable, please ensure that protocol-level trace debugging is enabled, e.g., 'sip set debug on' if the issue involves chan_sip, and configuration information such as dialplan and channel configuration.

Thanks!

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines

[2] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information



By: Steven Sedory (stevensedory) 2016-07-23 19:40:11.508-0500

Sorry about that. My issue is just so clearly described as the already fixed one "ASTERISK-21677" that I was hoping if someone could check if that bug had reappeared in a recent update or something.

I will work on getting the correct logs. Thanks.

By: Joshua C. Colp (jcolp) 2016-07-25 13:58:42.992-0500

Do you have the log from before messages started being queued up? queueing happens if a NOTIFY is in progress, so it's important to see the information before to trace the flow

By: Steven Sedory (stevensedory) 2016-07-25 14:10:26.659-0500

Setup:

We have a few clients on a FreePBX server running Asterisk 13.9.1. We have a handful of Polycom VVX 300s and 400s setup with extensions on this server, and many of them have BLF keys, which are primarily used to show each user when their office companions are on a phone call or not.

On other servers we have, one of them being on Asterisk 13.7.0, we have the same setup, but without the issue.

The issue:

For an unknown amount of time after the endpoints are powered on (this seems to be somewhere around 1 day, but for the sake of not making a gigantic log file, I'm avoiding debugging for that long for now), BLF's work just fine. Then after a day or so, they stop working, meaning they do not light up when said BFL monitored extension is called or on a call. By rebooting the extensions, the problem goes away again for another day or so.

When this happens, a lot of messages like this show up in the logs:
  Extension Changed 521[ext-local] new state Ringing for Notify User 523 (queued)
  Extension Changed 521[ext-local] new state Ringing for Notify User 524 (queued)
  Extension Changed 521[ext-local] new state Ringing for Notify User 522 (queued)
*Notice the "(queued)" and that this happens when extension 521 is called, and the three phones with a BLF key for 521 do not light up as they should

Reproduction and logging:

The extensions used in the reproduction of the problem are 521,522,523,524,403, and 9877. 521-524 are part of a small office, with 524 being at a remote location. 403 and 9877 are located at our office, and have been setup as test extensions. 403, similar to 521-524, has BLF keys for 521-523.

Once I started logger per wiki [2], I called 521 from extension 9877. You will notice the following in the attached log file (debug_log_123456):
  1. Extensions 522-524 BLF keys do not work, and this is the result in the log:
     [2016-07-25 11:39:24] VERBOSE[2084] chan_sip.c: Extension Changed 521[ext-local] new state Ringing for Notify User 523 (queued)
     [2016-07-25 11:39:24] VERBOSE[2084] chan_sip.c: Extension Changed 521[ext-local] new state Ringing for Notify User 524 (queued)
     [2016-07-25 11:39:24] VERBOSE[2084] chan_sip.c: Extension Changed 521[ext-local] new state Ringing for Notify User 522 (queued)
  2. Extension 403 BFL key DOES work, and note that there is no "(queued)" in the log:
     [2016-07-25 11:39:24] VERBOSE[2084] chan_sip.c: Extension Changed 521[ext-local] new state Ringing for Notify User 403

We have other customers on the same server, so you will see other queued messages similar to the ones above, as they also have the same BLF issue going on.

Conclusion:

I would imagine that after some time, my test extension 403 will start getting queued messages too and therefore the BLF keys will stop working. I just set this up yesterday (Sunday), so I will report when/if that happens.

It sounds very much like the issue explained in the bug I already linked in the comment above is back, though I of course may be wrong. Thank you for your help on this matter.

By: Steven Sedory (stevensedory) 2016-07-25 14:14:41.081-0500

Hi Joshua,

I will call the customer now and see if they will let me restart their phones, and then turn logging back on. I just imagine the file will be very large because it's going to have to run for at least a day.

By: Steven Sedory (stevensedory) 2016-07-25 16:30:46.484-0500

This should give some more insight into the issue, as earlier on in the log, you should see that there weren't certain queue messages, but then later on the reappear.

Had to split due to log file size

By: Joshua C. Colp (jcolp) 2016-07-26 05:18:38.202-0500

The reason this is happening is that we are sending a NOTIFY message to the Polycom but it's not acknowledging receipt of the NOTIFY we send it. We retransmit but eventually give up. In this scenario we don't process any remaining queued items.

By: Steven Sedory (stevensedory) 2016-07-28 15:28:33.539-0500

What can I do to see why Polycom is not getting them?

By: Joshua C. Colp (jcolp) 2016-07-28 15:36:29.200-0500

You could do a packet capture to confirm it's being sent, otherwise you'd need to see if there's logs or something on the Polycom which shows why it would be ignoring the NOTIFY. I don't have any recent experience with Polycom phones.

By: George Joseph (gjoseph) 2016-08-12 10:37:04.045-0500

The patch that fixed the issue in ASTERISK-21677 was definitely applied to 13 so we're probably looking at a different cause.

Were you able to get an actual packet trace against a problem phone?  Also could you do a {{sip show settings}} and attach it?  I'm interested in timer_t1, timer_b and the subscription timers.  If they're set per peer, the the settings for one of the peers in question would help.

thanks

By: Steven Sedory (stevensedory) 2017-10-09 19:13:08.903-0500

So sorry I didn't comment on this sooner. The problem seems to be resolved.

By: Richard Mudgett (rmudgett) 2017-10-10 09:40:36.887-0500

Closing per reporter's comment