[Home]

Summary:ASTERISK-28960: bridge: System gets into state where bridge is terminated after joining
Reporter:Robert Sutton (rsutton@noojee.com.au)Labels:fax webrtc
Date Opened:2020-06-24 00:36:55Date Closed:2020-07-13 12:00:02
Priority:MinorRegression?
Status:Closed/CompleteComponents:Bridges/bridge_simple
Versions:16.10.0 Frequency of
Occurrence
Occasional
Related
Issues:
Environment:ubunut 18.04, docker asterisk 16.10Attachments:
Description:system gets into state where bridges collapse within a few seconds

calls kept flowing into the system, but bridges would collapse without error within a few seconds of starting.

Seems to be happening about once a week.

This particular system has only ever run 16.10

I have a core dump from the running system and full logs including a period with debug set to 4

below is a typical call flow while the system was in this state.

{noformat}
Jun 24 10:53:29      -- Executing [555555555@routesv2-inbound:1] Set("PJSIP/trunk-00000fa4", "AGIEXITONHANGUP=yes") in new stack
Jun 24 10:53:29      -- Executing [555555555@routesv2-inbound:2] Set("PJSIP/trunk-00000fa4", "recovery-did=555555555") in new stack
Jun 24 10:53:29      -- Executing [555555555@routesv2-inbound:3] AGI("PJSIP/trunk-00000fa4", "agi://127.0.0.1/route?targ=555555555&ctx=Pre-Inbound") in new stack
Jun 24 10:53:34      -- AGI Script Executing Application: (mixmonitor) Options: (/var/spool/asterisk/monitor/1592960009.5408-4444444444-3333333333-I-1.wav,,curl 'http://127.0.0.1:8080/rest/recordingMigration/migrate?guid=1592960014936-35584&apiKey=xxxxxxxxxxxx')
Jun 24 10:53:34    == Begin MixMonitor Recording PJSIP/trunk-00000fa4
Jun 24 10:53:34      -- Started music on hold, class 'queue17', on channel 'PJSIP/trunk-00000fa4'
Jun 24 10:53:35      -- PJSIP/417-00000faa is ringing
Jun 24 10:53:35      -- PJSIP/417-00000faa is ringing
Jun 24 10:53:40      -- PJSIP/417-00000faa answered
Jun 24 10:53:40      -- Executing [activity-agi@activity:1] Set("PJSIP/417-00000faa", "AGIEXITONHANGUP=yes") in new stack
Jun 24 10:53:40      -- Executing [activity-agi@activity:2] AGI("PJSIP/417-00000faa", "agi://127.0.0.1/activityAgi") in new stack
Jun 24 10:53:40    == Begin MixMonitor Recording PJSIP/trunk-00000fa4
Jun 24 10:53:40      -- AGI Script Executing Application: (bridge) Options: (PJSIP/trunk-00000fa4,x)
Jun 24 10:53:40      -- Stopped music on hold on PJSIP/trunk-00000fa4
Jun 24 10:53:40    == Spawn extension (routesv2-inbound, 555555555, 3) exited non-zero on 'Surrogate/PJSIP/trunk-00000fa4'
Jun 24 10:53:40      -- Channel PJSIP/trunk-00000fa4 joined 'simple_bridge' basic-bridge <67256011-e289-430d-bc1c-08b96c97811e>
Jun 24 10:53:40      -- Channel PJSIP/417-00000faa joined 'simple_bridge' basic-bridge <67256011-e289-430d-bc1c-08b96c97811e>
Jun 24 10:53:40    == Begin MixMonitor Recording PJSIP/trunk-00000fa4
Jun 24 10:53:40  OK  == End MixMonitor Recording PJSIP/trunk-00000fa4
Jun 24 10:53:40  OK  == End MixMonitor Recording PJSIP/trunk-00000fa4
Jun 24 10:53:41      -- Channel PJSIP/trunk-00000fa4 left 'simple_bridge' basic-bridge <67256011-e289-430d-bc1c-08b96c97811e>
Jun 24 10:53:41      -- Channel PJSIP/417-00000faa left 'simple_bridge' basic-bridge <67256011-e289-430d-bc1c-08b96c97811e>
Jun 24 10:53:41      -- Manager 'njcontact' from 127.0.0.1, hanging up channel: PJSIP/417-00000faa
Jun 24 10:53:41    == Spawn extension (activity, activity-agi, 2) exited non-zero on 'PJSIP/417-00000faa'
Jun 24 10:53:42  OK  == End MixMonitor Recording PJSIP/trunk-00000fa4

{noformat}

for future reference:
{noformat}
the issue started somewhere between 10:38 and 10:46

debug  set to level 4 at 10:56:08

core dump triggered at 10:55:32

system restart at 10:57:30 approx
{noformat}
Comments:By: Asterisk Team (asteriskteam) 2020-06-24 00:36:56.603-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Kevin Harwell (kharwell) 2020-06-24 16:23:48.770-0500

Would it be possible to also enable SIP tracing in the Asterisk log:
{noformat}
$CLI> pjsip set logger on
{noformat}
Or get a pcap during an occurrence?

Also could you try bumping the debug up to level 5 as well?

Thanks!

By: Robert Sutton (rsutton@noojee.com.au) 2020-06-24 19:49:00.787-0500

I have a pcap from yesterdays event.

here is data extracted from it around a single call.

The call recording from the point the bridge was initiated only contains the agent's voice

{noformat}
cat call-agent.txt                                                                                                                                                                                                              
|Time     | agentip                       |                                                                                                                                                                                                          
|         |                   | 192.168.160.59    |                                                                                                                                                                                                          
|7043.977842|         100 Trying|                   |SIP Status 100 Trying                                                                                                                                                                                  
|         |(64091)  ------------------>  (5060)   |                                                                                                                                                                                                          
|7044.164403|         180 Ringing                   |SIP Status 180 Ringing                                                                                                                                                                                  
|         |(64091)  ------------------>  (5060)   |                                                                                                                                                                                                          
|7045.293780|         200 OK SDP (g711U te          |SIP Status 200 OK                                                                                                                                                                                      
|         |(64091)  ------------------>  (5060)   |                                                                                                                                                                                                          
|7045.294888|         ACK       |                   |SIP ACK From: "E CPAP DIRECT GOLD COAST RECEPTION" <sip:3333333333@192.168.160.59> To:<sip:112@agentip> CSeq:25407                                                                              
|         |(64091)  <------------------  (5060)   |                                                                                                                                                                                                          
|7045.294888|         ACK       |                   |SIP ACK From: "E CPAP DIRECT GOLD COAST RECEPTION" <sip:3333333333@192.168.160.59> To:<sip:112@agentip> CSeq:25407                                                                              
|         |(64091)  <------------------  (5060)   |                                                                                                                                                                                                          
|7056.095022|         BYE       |                   |SIP Request BYE CSeq:25408                                                                                                                                                                              
|         |(64091)  ------------------>  (5060)   |                                                                                                                                                                                                          
|7056.095022|         BYE       |                   |SIP Request BYE CSeq:25408                                                                                                                                                                              
|         |(64091)  ------------------>  (5060)   |                                                                                                                                                                                                          
|7056.095412|         200 OK    |                   |SIP Status 200 OK                                                                                                                                                                                      
|         |(64091)  <------------------  (5060)   |                                                                                                                                                                                                          
|7056.095412|         200 OK    |                   |SIP Status 200 OK                                                                                                                                                                                      
|         |(64091)  <------------------  (5060)   |  

cat call-caller.txt                                                                                                                                                                                                              
|Time     | 103.47.234.6                          |
|         |                   | 192.168.160.59    |                  
|6943.106587|         INVITE SDP (g711A g7          |SIP INVITE From: <sip:22222222222@providerip> To:<sip:11111111111@externalip:5060;line=fsadhxs> Call-ID:CLA_XRQCIWh2blh6VFVeWQtQW3dcRgN2fGouAy9UBgdYA0oONRdaBGh3d2pFf09dR1kFUEV3 CSeq:21926098
|         |(5060)   ------------------>  (5060)   |
|6943.106587|         INVITE SDP (g711A g7          |SIP INVITE From: <sip:22222222222@providerip> To:<sip:11111111111@externalip:5060;line=fsadhxs> Call-ID:CLA_XRQCIWh2blh6VFVeWQtQW3dcRgN2fGouAy9UBgdYA0oONRdaBGh3d2pFf09dR1kFUEV3 CSeq:21926098
|         |(5060)   ------------------>  (5060)   |
|6943.108645|         100 Trying|                   |SIP Status 100 Trying
|         |(5060)   <------------------  (5060)   |
|6943.108645|         100 Trying|                   |SIP Status 100 Trying
|         |(5060)   <------------------  (5060)   |
|6943.113836|         183 Session Progress          |SIP Status 183 Session Progress
|         |(5060)   <------------------  (5060)   |
|6943.113836|         183 Session Progress          |SIP Status 183 Session Progress
|         |(5060)   <------------------  (5060)   |
|6943.113964|         200 OK SDP (g711U g7          |SIP Status 200 OK
|         |(5060)   <------------------  (5060)   |
|6943.113964|         200 OK SDP (g711U g7          |SIP Status 200 OK
|         |(5060)   <------------------  (5060)   |
|6943.140271|         ACK       |                   |SIP Request INVITE ACK 200 CSeq:21926098
|         |(5060)   ------------------>  (5060)   |
|6943.140271|         ACK       |                   |SIP Request INVITE ACK 200 CSeq:21926098
|         |(5060)   ------------------>  (5060)   |
|7056.096775|         BYE       |                   |SIP Request BYE CSeq:26142
|         |(5060)   <------------------  (5060)   |
|7056.096775|         BYE       |                   |SIP Request BYE CSeq:26142
|         |(5060)   <------------------  (5060)   |
|7056.116854|         200 OK    |                   |SIP Status 200 OK
|         |(5060)   ------------------>  (5060)   |
|7056.116854|         200 OK    |                   |SIP Status 200 OK
|         |(5060)   ------------------>  (5060)   |

egrep -i  '3333333333|1592959781|00000EF7|6f508be4eab4|00000f25' asterisk-logs.txt
Jun 24 10:49:41      -- AGI Script Executing Application: (mixmonitor) Options: (/var/spool/asterisk/monitor/1592959775.5179-anonymous-03//127.0.0.1:8080/rest/recordingMigration/migrate?guid=1592959781143-20668&apiKey=xxxxxxxxxxxxxxx')
Jun 24 10:49:41      -- Executing [11111111111@routesv2-inbound:1] Set("PJSIP/trunk-00000ef7", "AGIEXITONHANGUP=yes") in new stack
Jun 24 10:49:41      -- Executing [11111111111@routesv2-inbound:2] Set("PJSIP/trunk-00000ef7", "recovery-did=11111111111") in new stack
Jun 24 10:49:41      -- Executing [11111111111@routesv2-inbound:3] AGI("PJSIP/trunk-00000ef7", "agi://127.0.0.1/route?targ=11111111111&ctx=
Jun 24 10:49:47      -- AGI Script Executing Application: (mixmonitor) Options: (/var/spool/asterisk/monitor/1592959781.5182-3333333333-0://127.0.0.1:8080/rest/recordingMigration/migrate?guid=1592959787095-23261&apiKey=xxxxxxxxxxxxxxx')
Jun 24 10:49:47    == Begin MixMonitor Recording PJSIP/trunk-00000ef7
Jun 24 10:49:47      -- Started music on hold, class 'queue', on channel 'PJSIP/trunk-00000ef7'
Jun 24 10:49:58    == Executing [curl 'http://127.0.0.1:8080/rest/recordingMigration/migrate?guid=1592959781143-20668&apiKey=8d26f182-978
Jun 24 10:51:22      -- PJSIP/112-00000f25 is ringing
Jun 24 10:51:22      -- PJSIP/112-00000f25 is ringing
Jun 24 10:51:23      -- PJSIP/112-00000f25 answered
Jun 24 10:51:23      -- Executing [activity-agi@noojee-activity:1] Set("PJSIP/112-00000f25", "AGIEXITONHANGUP=yes") in new stack
Jun 24 10:51:23      -- Executing [activity-agi@noojee-activity:2] AGI("PJSIP/112-00000f25", "agi://127.0.0.1/activityAgi") in new stack
Jun 24 10:51:23    == Begin MixMonitor Recording PJSIP/trunk-00000ef7
Jun 24 10:51:23      -- AGI Script Executing Application: (bridge) Options: (PJSIP/trunk-00000ef7,x)
Jun 24 10:51:23      -- Stopped music on hold on PJSIP/trunk-00000ef7
Jun 24 10:51:23      -- Channel PJSIP/trunk-00000ef7 joined 'simple_bridge' basic-bridge <a96a4bc4-e774-414a-8688-6f508be4eab4>
Jun 24 10:51:23    == Spawn extension (routesv2-inbound, 11111111111, 3) exited non-zero on 'Surrogate/PJSIP/trunk-00000ef7'
Jun 24 10:51:23      -- Channel PJSIP/112-00000f25 joined 'simple_bridge' basic-bridge <a96a4bc4-e774-414a-8688-6f508be4eab4>
Jun 24 10:51:23    == Begin MixMonitor Recording PJSIP/trunk-00000ef7
Jun 24 10:51:24  OK  == End MixMonitor Recording PJSIP/trunk-00000ef7
Jun 24 10:51:24  OK  == End MixMonitor Recording PJSIP/trunk-00000ef7
Jun 24 10:51:34      -- Channel PJSIP/112-00000f25 left 'simple_bridge' basic-bridge <a96a4bc4-e774-414a-8688-6f508be4eab4>
Jun 24 10:51:34      -- Channel PJSIP/trunk-00000ef7 left 'simple_bridge' basic-bridge <a96a4bc4-e774-414a-8688-6f508be4eab4>
Jun 24 10:51:34    == Spawn extension (noojee-activity, activity-agi, 2) exited non-zero on 'PJSIP/112-00000f25'
Jun 24 10:51:34  [Jun 24 10:51:34] NOTICE[418]: manager.c:4491 action_hangup: Request to hangup non-existent channel: PJSIP/112-00000F25

{noformat}

By: Kevin Harwell (kharwell) 2020-06-25 13:45:25.696-0500

Looks like from the log the mixmonitor stopped recording for one leg of the call just after the bridge was joined. Not sure why it stopped, but probably why you only hear the agent's voice.

I'm having trouble though following the call trace, and want to make sure I'm reading it right. For the "call-agent" is that Asterisk ----> agent endpoint? And for "call-caller" is that the caller endpoint ----> Asterisk? For the caller everything is doubled but has the same timestamp, so I'm assuming this is an artifact of extracting the data?

Is there any way we could get a copy of the original pcap? If it has private data in it you don't want to attach here (made public) you can send it as an attachment to asteriskteam@digium.com with ASTERISK-28960 as the subject instead.

By: Kevin Harwell (kharwell) 2020-06-25 13:46:24.699-0500

Also can you attach or post the relevant pjsip endpoint configuration, and executing dialplan?

By: Robert Sutton (rsutton@noojee.com.au) 2020-06-25 18:52:19.154-0500

I've provided pcap, cli output, and live core dump via "asteriskteam@digium.com with ASTERISK-28960 as the subject"

You are correct about        
"call-agent" is Asterisk ----> agent endpoint    
 and
     "call-caller" is caller endpoint ----> Asterisk

I don't know why there is doubling in the log, this is how wireshark presented the data. could this be a symptom of the underlying cause?


below is the dialplan involved, as you can see all the action takes place in AGI
In the AGI the caller is played hold music and possibly other messages while waiting, we are not using asterisk's native queues
One or more agents are originated via dialplan to AGI.
When an agent answers, a bridge is initiated via AGI

{noformat}
[routesv2-inbound]
exten => _[s0-9].,1,set(AGIEXITONHANGUP=yes)
exten => _[s0-9].,n,set(recovery-did=${EXTEN})
exten => _[s0-9].,n,AGI(agi://${TOMCAT_HOST}/route?targ=${EXTEN}&ctx=Pre-Inbound)
exten => _[s0-9].,n,goto(routesv2-checks,${EXTEN},1)
exten => _[s0-9].,n,Hangup

[activity]
exten => activity-agi,1,set(AGIEXITONHANGUP=yes)
exten => activity-agi,n,AGI(agi://${TOMCAT_HOST}/activityAgi)
exten => activity-agi,n,wait(0.5)
exten => activity-agi,n,goto(activity-agi|1)

{noformat}

agent endpoint configuration (realtime)
{noformat}
ParameterName                      : ParameterValue
=========================================================
100rel                             : yes
accept_multiple_sdp_answers        : false
accountcode                        :
acl                                : deny/permit
aggregate_mwi                      : true
allow                              : (ulaw|g729|ilbc)
allow_overlap                      : true
allow_subscribe                    : true
allow_transfer                     : true
aors                               : 112
asymmetric_rtp_codec               : false
auth                               : 112
bind_rtp_to_media_address          : false
bundle                             : false
call_group                         :
callerid                           : Email Desk
callerid_privacy                   : allowed_not_screened
callerid_tag                       :
connected_line_method              : invite
contact_acl                        :
context                            : routesv2-handset
cos_audio                          : 0
cos_video                          : 0
device_state_busy_at               : 2
direct_media                       : false
direct_media_glare_mitigation      : none
direct_media_method                : invite
disable_direct_media_on_nat        : false
dtls_auto_generate_cert            : No
dtls_ca_file                       :
dtls_ca_path                       :
dtls_cert_file                     :
dtls_cipher                        :
dtls_fingerprint                   : SHA-256
dtls_private_key                   :
dtls_rekey                         : 0
dtls_setup                         : active
dtls_verify                        : No
dtmf_mode                          : rfc4733
fax_detect                         : false
fax_detect_timeout                 : 0
follow_early_media_fork            : true
force_avp                          : false
force_rport                        : true
from_domain                        :
from_user                          :
g726_non_standard                  : false
ice_support                        : true
identify_by                        : username,ip
ignore_183_without_sdp             : false
inband_progress                    : false
incoming_mwi_mailbox               :
language                           :
mailboxes                          : 112
max_audio_streams                  : 1
max_video_streams                  : 1
media_address                      :
media_encryption                   : no
media_encryption_optimistic        : false
media_use_received_transport       : false
message_context                    :
moh_passthrough                    : false
moh_suggest                        : default2
mwi_from_user                      :
mwi_subscribe_replaces_unsolicited : no
named_call_group                   :
named_pickup_group                 :
notify_early_inuse_ringing         : false
one_touch_recording                : false
outbound_auth                      : 112
outbound_proxy                     :
pickup_group                       : 1
preferred_codec_only               : false
record_off_feature                 : automixmon
record_on_feature                  : automixmon
refer_blind_progress               : true
rewrite_contact                    : true
rpid_immediate                     : false
rtcp_mux                           : false
rtp_engine                         : asterisk
rtp_ipv6                           : false
rtp_keepalive                      : 0
rtp_symmetric                      : true
rtp_timeout                        : 0
rtp_timeout_hold                   : 0
sdp_owner                          : -
sdp_session                        : Asterisk
send_connected_line                : yes
send_diversion                     : true
send_pai                           : false
send_rpid                          : false
set_var                            :
srtp_tag_32                        : false
sub_min_expiry                     : 0
subscribe_context                  :
suppress_q850_reason_headers       : false
t38_udptl                          : false
t38_udptl_ec                       : none
t38_udptl_ipv6                     : false
t38_udptl_maxdatagram              : 0
t38_udptl_nat                      : false
timers                             : yes
timers_min_se                      : 90
timers_sess_expires                : 1800
tone_zone                          :
tos_audio                          : 0
tos_video                          : 0
transport                          : transport-udp-nat
trust_connected_line               : yes
trust_id_inbound                   : true
trust_id_outbound                  : true
use_avpf                           : false
use_ptime                          : false
user_eq_phone                      : false
voicemail_extension                : *3
webrtc                             : no

{noformat}

By: Robert Sutton (rsutton@noojee.com.au) 2020-06-25 18:56:18.859-0500

I was reading the issue title and saw that it is ambiguous.

The system remains stable for long periods of time, but when it gets into this state bridges fail.

By: Joshua C. Colp (jcolp) 2020-06-29 03:57:50.727-0500

You're using the Bridge dialplan application which does more than just bridge, it also steals a channel away from another thread which does involve setting a temporary hangup flag. When this occurs do bridges that do not occur as a result of the Bridge dialplan application work?

By: Robert Sutton (rsutton@noojee.com.au) 2020-06-29 04:27:28.526-0500

On this particular instance I didn't check other bridges. But on another occasion when we had similar symptoms, we shut down the AGI and tried to used asterisk queues, but the calls were still unable to bridge.

By: Joshua C. Colp (jcolp) 2020-06-29 05:02:07.815-0500

I would suggest adding some additional logging to the dialplan[1] to state who hung up in the situation. This will narrow down the potential area. As it is I'm not seeing anything standing out, except a lot of BYE requests coming in with a non-normal reason (Recovery on timer expiry) suddenly. You can see it by filtering using "sip.reason_cause_q850 == 102". There's a few at the start, but about 7000 packets in there is a lot. If this is what your trunk is - then I'd expect that channel to be hung up and things to terminate. That may be what is happening. In fact, in some cases Asterisk has already bridged things enough that the cause code is getting passed through to the remote side (not in all cases mind you, but some).

[1] https://wiki.asterisk.org/wiki/pages/viewpage.action?pageId=20185363

By: Asterisk Team (asteriskteam) 2020-07-13 12:00:01.273-0500

Suspended due to lack of activity. This issue will be automatically re-opened if the reporter posts a comment. If you are not the reporter and would like this re-opened please create a new issue instead. If the new issue is related to this one a link will be created during the triage process. Further information on issue tracker usage can be found in the Asterisk Issue Guidlines [1].

[1] https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines