[Home]

Summary:ASTERISK-29030: res_rtp_asterisk: Additional RTP-frame (with wrong SSRC) gets inserted when switching from progress to established
Reporter:Matthias Hensler (cubbi)Labels:patch
Date Opened:2020-08-14 11:28:55Date Closed:2021-04-30 09:00:34
Priority:MinorRegression?Yes
Status:Closed/CompleteComponents:Channels/chan_sip/General Resources/res_rtp_asterisk
Versions:16.10.0 16.11.0 16.12.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:CentOS 7Attachments:( 0) 0001-res_rtp_asterisk-More-robust-timestamp-checking.patch
( 1) 60350068.pcap
( 2) 60350082.pcap
( 3) 60408587.pcap
( 4) 64008592.pcap
( 5) debug_all2.txt
Description:Hi there,

we have a strange problem in which a broken RTP-frame (old sequence-number but
new ssrc) is sent in the moment a call gets established after its progress
phase.

In that scenario we are receiving calls from our upstream provider (via
TELES-SBC) and forwarding them to a customer. Both calllegs are provided via
chan_sip and use G711a as voicecodec. A ringtone is provided via
progressinband currently (I want to switch that off, as this is at least
part of the problem).

Before I add the details I have to emphasize, that only calls to one specific
customer show the problem, and no other calls which are distributed in exact the
same way. So far, after checking numerous traces, I have no clue what so ever
about which difference in the setup for this one customer is, that leads to
this issue.

Via upstream this bug is causing the Teles to drop all audioframes after
receiving that broken RTP frame.

It might (and I cannot really confirm this at this moment) that the bug was
introduced with Asterisk 16.10.0, as the first problem report was shortly after we
updated from 16.9.0. So far we already updated to 16.11.0 and 16.12.0 which
all show the same issue (however the problem is so obscure that it took us
really long to figure the rootcause for the customers problem).

As such I can say with confidence that the problem is present in versions
16.10.0, 16.11.0 and 16.12.0, while 16.9.0 could be ok. To make matters
complicated in pointing to a specific version is, that the customer also did
several updates of his software in the meantime, so it might be difficult to
rollback to the specific version in which everything was ok.


I am not sure on how to proceed here, to find the point in where Asterisk gets
mixed up in the RTP flow, so I am hoping you have either a clue where the
issue might come from or on how to debug (and of course fix it).

This Asterisk installation processes several 10000 calls a day, with >200 calls
in parallel, but as said only one client triggers that issue, while all other
clients are fine.



OK, up to the setup and configuration (I "XXX"ed some values):

Our "sip.conf" looks like this:
{noformat}
[general]
context=inbound_mapping
allowoverlap=no
realm=XXX
bindport=5060
language=de
canreinvite=no
alwaysauthreject=yes
allowguest=no
bindaddr=::
tcpenable=yes
tcpbindaddr=::
tlsenable=yes
tlsbindaddr=::
tlscertfile=/etc/asterisk/certificate.pem
tlscipher=HIGH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK
tlsclientmethod=tlsv1
transport=udp,tls
srvlookup=no
allowsubscribe=no
disallow=all
allow=g722
allow=alaw
t38pt_udptl=yes,redundancy,maxdatagram=400
progressinband=yes
prematuremedia=no
sendrpid=yes
trustrpid=no
send_diversion=no
allowtransfer=no
tos_sip=cs3
tos_audio=ef

[our_inbound]
type=peer
host=XXX.XXX.XXX.XXX
qualify=yes
context=inbound_provider
disallow=all
allow=g722
allow=alaw

[iads](!)
type=friend
context=iads_unregistered
dtmfmode=auto
host=dynamic
amaflags=documentation
canreinvite=no
ignoresdpversion=yes

[the_customer](iads)
type=friend
accountcode=0049XXXX
setvar=NOTRUF_UUI=002DXXXXXXX
secret=XXXXX
context=iads_registered
acl=public
{noformat}

The interesting part of our "extensions.conf" looks like this:

{noformat}
[inbound_provider]
exten => _+X.,1,Goto(00${EXTEN:1},1)
exten => _X.,1,Noop(Inboundcall XXX)
exten => _X.,n,GotoIf($[${DIALPLAN_EXISTS(routing_intern,${EXTEN},1)} = 1]?DoRouting:)
exten => _X.,n,Noop(Zielnummer ${EXTEN} existiert nicht)
exten => _X.,n,Hangup(1)
exten => _X.,n(DoRouting),Goto(routing_intern,${EXTEN},1)

[routing_intern]
include => inbound_mapping

[inbound_mapping]
exten => _0049XXX!,1,Set(Peerstring="SIP/the_customer")
exten => _0049XXX!,n(DoMapping),Gosub(gosub-do_mapping,${EXTEN},1(${Peerstring},3,0049,,0049XXX,XXX,XXX,XXX,XX))

[gosub-do_mapping]
exten => _00Z.,1,Noop(Mapping)
exten => _00Z.,n,Set(DIAL_ARGS=)
exten => _00Z.,n,Set(CallTimeout=180)
exten => _00Z.,n,Dial(${ARG1},${CallTimeout},${DIAL_ARGS})
{noformat}

So in short: an inbound call from "our_inbound" is signaled to
"Dial(SIP/our_customer,180,)". All stuff inbetween is just for checking
channellimits, billing, redirections, etc. and should not be important for
this issue.


In the following I will present the SIP-traces and RTP-packets of a broken
call. In that I switched the ip-address of "our_inbound" to 1.1.1.1, the
ip-address of the Asteriskserver to 2.2.2.2 and the ip-address of the client
"the_customer" to 3.3.3.3. (For our reference the pcap-trace for LAG 1 which
is 1.1.1.1 <-> 2.2.2.2 has the id 60155926 and the trace for LAG 2 which is
2.2.2.2 <-> 3.3.3.3 has the id 60155922). In case of "our_inbound" the
RTP-media is terminated at a different ip for which I will use 1.1.1.2.

{noformat}
-----------------------------------------------------------------------------
At 0.000000 the call is initiated by "our_inbound":
-----------------------------------------------------------------------------
INVITE sip:+49XXX@customer.XXX.net;transport=udp;user=phone SIP/2.0
Via: SIP/2.0/UDP 1.1.1.1:5060;branch=z9hG4bK00E0F515158AAD9FCE106BF410B7
From: "+49XXX" <sip:+49XXX@mgc-cluster.XXX.net;user=phone>;tag=0UUHS0000030000E1D0101Cu045FV8G07E1469
To: "+49XXX" <sip:+49XXX@customer.XXX.net;user=phone>
Call-ID: 8415e00015f5-5f36402b-2d18768f-23c282b0-3916f46-01
CSeq: 163 INVITE
Contact: <sip:sgc_c@1.1.1.1:5060;transport=udp>
Allow-Events: refer
Allow: INVITE, ACK, BYE, CANCEL, OPTIONS, PRACK, INFO, REFER, NOTIFY, SUBSCRIBE, UPDATE
Content-Type: application/sdp
Max-Forwards: 70
P-Asserted-Identity: <sip:+49XXX@mgc-cluster.XXX.net;user=phone>
P-Preferred-Identity: <sip:+49XXX@mgc-cluster.XXX.net>
Privacy: none
Supported: timer, replaces, histinfo, 100rel
User-Agent: TELES-SBC
Content-Length:   229

v=0
o=- 2285767812167987084 1 IN IP4 1.1.1.1
s=TELES-SBC
c=IN IP4 1.1.1.2
t=0 0
m=audio 11940 RTP/AVP 8 101
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=silenceSupp:off - - - -

-----------------------------------------------------------------------------
At 0.001351 Asterisk is replying with Trying:
-----------------------------------------------------------------------------
SIP/2.0 100 Trying
Via: SIP/2.0/UDP 1.1.1.1:5060;branch=z9hG4bK00E0F515158AAD9FCE106BF410B7;received=1.1.1.1
From: "+49XXX" <sip:+49XXX@mgc-cluster.XXX.net;user=phone>;tag=0UUHS0000030000E1D0101Cu045FV8G07E1469
To: "+49XXX" <sip:+49XXX@customer.XXX.net;user=phone>
Call-ID: 8415e00015f5-5f36402b-2d18768f-23c282b0-3916f46-01
CSeq: 163 INVITE
Server: Asterisk PBX 16.12.0
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE
Supported: replaces, timer
Session-Expires: 1800;refresher=uas
Contact: <sip:+49XXX@2.2.2.2:5060>
Content-Length: 0

-----------------------------------------------------------------------------
In parallel Asterisk now signales "our_customer":
-----------------------------------------------------------------------------
INVITE sip:+49XXX@3.3.3.3 SIP/2.0
Via: SIP/2.0/UDP 2.2.2.2:5060;branch=z9hG4bK7d70de6a
Max-Forwards: 70
From: <sip:+49XXX@2.2.2.2>;tag=as437a6f2f
To: <sip:+49XXX@3.3.3.3>
Contact: <sip:+49XXX@2.2.2.2:5060>
Call-ID: 46f3325b1867f4985b09200e05ff2519@2.2.2.2:5060
CSeq: 102 INVITE
User-Agent: Asterisk PBX 16.12.0
Date: Fri, 14 Aug 2020 07:41:31 GMT
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE
Supported: replaces, timer
Remote-Party-ID: "+49XXX" <sip:+49XXX@2.2.2.2>;party=calling;privacy=off;screen=no
Content-Type: application/sdp
Content-Length: 278

v=0
o=root 1849926336 1849926336 IN IP4 2.2.2.2
s=Asterisk PBX 16.12.0
c=IN IP4 2.2.2.2
t=0 0
m=audio 22504 RTP/AVP 8 9 101
a=rtpmap:8 PCMA/8000
a=rtpmap:9 G722/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=maxptime:150
a=sendrecv

-----------------------------------------------------------------------------
First the customer replies with Trying:
-----------------------------------------------------------------------------
SIP/2.0 100 Trying
Via: SIP/2.0/UDP 2.2.2.2:5060;branch=z9hG4bK7d70de6a
To: <sip:+49XXX@3.3.3.3>
From: <sip:+49XXX@2.2.2.2>;tag=as437a6f2f
Call-ID: 46f3325b1867f4985b09200e05ff2519@2.2.2.2:5060
CSeq: 102 INVITE
Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, OPTIONS, INFO, SUBSCRIBE, UPDATE
User-Agent: Swyx LinkMgr/12.11 (SW.Rel12.10_20200423.1)
Content-Length: 0

-----------------------------------------------------------------------------
And then (after around 0.4s) with Ringing:
-----------------------------------------------------------------------------
SIP/2.0 180 Ringing
Via: SIP/2.0/UDP 2.2.2.2:5060;branch=z9hG4bK7d70de6a
Contact: <sip:+49XXX@3.3.3.3:65002;transport=udp>
To: "XXX, XXX"<sip:+49XXX@3.3.3.3>;tag=a236c740
From: <sip:+49XXX@2.2.2.2>;tag=as437a6f2f
Call-ID: 46f3325b1867f4985b09200e05ff2519@2.2.2.2:5060
CSeq: 102 INVITE
Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, OPTIONS, INFO, SUBSCRIBE, UPDATE
User-Agent: Swyx LinkMgr/12.11 (SW.Rel12.10_20200423.1)
Content-Length: 0

-----------------------------------------------------------------------------
With that Asterisk starts (at 0.426196) ringing to "our_inbound":
-----------------------------------------------------------------------------
SIP/2.0 180 Ringing
Via: SIP/2.0/UDP 1.1.1.1:5060;branch=z9hG4bK00E0F515158AAD9FCE106BF410B7;received=1.1.1.1
From: "+49XXX" <sip:+49XXX@mgc-cluster.XXX.net;user=phone>;tag=0UUHS0000030000E1D0101Cu045FV8G07E1469
To: "+49XXX" <sip:+49XXX@customer.XXX.net;user=phone>;tag=as32a03bca
Call-ID: 8415e00015f5-5f36402b-2d18768f-23c282b0-3916f46-01
CSeq: 163 INVITE
Server: Asterisk PBX 16.12.0
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE
Supported: replaces, timer
Session-Expires: 1800;refresher=uas
Contact: <sip:+49XXX@2.2.2.2:5060>
Content-Length: 0

-----------------------------------------------------------------------------
And then (because of "inbandprogress=yes") Asterisk sends progress and starts
sending RTP:
-----------------------------------------------------------------------------
SIP/2.0 183 Session Progress
Via: SIP/2.0/UDP 1.1.1.1:5060;branch=z9hG4bK00E0F515158AAD9FCE106BF410B7;received=1.1.1.1
From: "+49XXX" <sip:+49XXX@mgc-cluster.XXX.net;user=phone>;tag=0UUHS0000030000E1D0101Cu045FV8G07E1469
To: "+49XXX" <sip:+49XXX@customer.XXX.net;user=phone>;tag=as32a03bca
Call-ID: 8415e00015f5-5f36402b-2d18768f-23c282b0-3916f46-01
CSeq: 163 INVITE
Server: Asterisk PBX 16.12.0
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE
Supported: replaces, timer
Session-Expires: 1800;refresher=uas
Contact: <sip:+49XXX@1.1.1.1:5060>
Content-Type: application/sdp
Require: timer
Content-Length: 252

v=0
o=root 829526440 829526440 IN IP4 1.1.1.1
s=Asterisk PBX 16.12.0
c=IN IP4 1.1.1.1
t=0 0
m=audio 46586 RTP/AVP 8 101
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=maxptime:150
a=sendrecv

-----------------------------------------------------------------------------
For the next 3.2s Asterisk is sending RTP frames to 1.1.1.2 containing a
ringtone. In total 161 packets are sent with the SSRC 783be076.
The flow is 2.2.2.2:46586 -> 1.1.1.2:11940
-----------------------------------------------------------------------------
PT=ITU-T G.711 PCMA, SSRC=0x783BE076, Seq=27297, Time=160, Mark
PT=ITU-T G.711 PCMA, SSRC=0x783BE076, Seq=27298, Time=320
PT=ITU-T G.711 PCMA, SSRC=0x783BE076, Seq=27299, Time=480
[...]
PT=ITU-T G.711 PCMA, SSRC=0x783BE076, Seq=27455, Time=25440
PT=ITU-T G.711 PCMA, SSRC=0x783BE076, Seq=27456, Time=25600
PT=ITU-T G.711 PCMA, SSRC=0x783BE076, Seq=27457, Time=25760

-----------------------------------------------------------------------------
Now "our_customer" answers the call:
-----------------------------------------------------------------------------
SIP/2.0 200 OK
Via: SIP/2.0/UDP 2.2.2.2:5060;branch=z9hG4bK7d70de6a
Contact: <sip:+49XXX@2.2.2.2:65002;transport=udp>
To: "XXX, XXX"<sip:+49XXX@3.3.3.3>;tag=a236c740
From: <sip:+49XXX@2.2.2.2>;tag=as437a6f2f
Call-ID: 46f3325b1867f4985b09200e05ff2519@2.2.2.2:5060
CSeq: 102 INVITE
Allow: INVITE, ACK, CANCEL, BYE, NOTIFY, REFER, OPTIONS, INFO, SUBSCRIBE, UPDATE
Content-Type: application/sdp
User-Agent: Swyx LinkMgr/12.11 (SW.Rel12.10_20200423.1)
Content-Length: 208

v=0
o=- 3330958455 1 IN IP4 3.3.3.3
s=Swyx LinkMgr
c=IN IP4 3.3.3.3
t=0 0
m=audio 55542 RTP/AVP 8 101
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-15
a=ptime:20

-----------------------------------------------------------------------------
Asterisk acks:
-----------------------------------------------------------------------------
ACK sip:+49XXX@3.3.3.3:65002;transport=udp SIP/2.0
Via: SIP/2.0/UDP 2.2.2.2:5060;branch=z9hG4bK0ea35c4b
Max-Forwards: 70
From: <sip:+49XXX@2.2.2.2>;tag=as437a6f2f
To: <sip:+49XXX@3.3.3.3>;tag=a236c740
Contact: <sip:+49XXX@2.2.2.2:5060>
Call-ID: 46f3325b1867f4985b09200e05ff2519@2.2.2.2:5060
CSeq: 102 ACK
User-Agent: Asterisk PBX 16.12.0
Content-Length: 0
{noformat}

-----------------------------------------------------------------------------
RTP between Asterisk and "our_customer" is exchanged.
Asterisk sends 311 packets with SSRC 7f5a528c from 2.2.2.2:22504 to
3.3.3.3:55542, while "our_customer" sends 317 packets with SSRC 6b48138f from
3.3.3.3:55542 to 2.2.2.2:22504:
-----------------------------------------------------------------------------
{noformat}
3.687479 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14272, Time=0, Mark
3.687489 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14273, Time=160
3.707447 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14274, Time=320
3.726456 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14275, Time=480
3.746436 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14276, Time=640
3.764300 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=140, Time=3320682905, Mark
3.766431 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14277, Time=800
3.774281 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=141, Time=3320683065
3.786686 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14278, Time=960
3.794455 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=142, Time=3320683225
[...]
9.920692 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14587, Time=50400
9.934279 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=449, Time=3320732345
9.41389 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14588, Time=50560
9.964234 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=450, Time=3320732505
{noformat}

-----------------------------------------------------------------------------
Meanwhile Asterisk establishes (at 3.660704) the call with "our_inbound"
-----------------------------------------------------------------------------
{noformat}
SIP/2.0 200 OK
Via: SIP/2.0/UDP 1.1.1.1:5060;branch=z9hG4bK00E0F515158AAD9FCE106BF410B7;received=1.1.1.1
From: "+49XXX" <sip:+49XXX@mgc-cluster.XXX.net;user=phone>;tag=0UUHS0000030000E1D0101Cu045FV8G07E1469
To: "+49XXX" <sip:+49XXX@customer.XXX.net;user=phone>;tag=as32a03bca
Call-ID: 8415e00015f5-5f36402b-2d18768f-23c282b0-3916f46-01
CSeq: 163 INVITE
Server: Asterisk PBX 16.12.0
Allow: INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, SUBSCRIBE, NOTIFY, INFO, PUBLISH, MESSAGE
Supported: replaces, timer
Session-Expires: 1800;refresher=uas
Contact: <sip:+49XXX@2.2.2.2:5060>
Content-Type: application/sdp
Require: timer
Content-Length: 252

v=0
o=root 829526440 829526440 IN IP4 2.2.2.2
s=Asterisk PBX 16.12.0
c=IN IP4 2.2.2.2
t=0 0
m=audio 46586 RTP/AVP 8 101
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16
a=ptime:20
a=maxptime:150
a=sendrecv


-----------------------------------------------------------------------------
"our_inbound" acks
-----------------------------------------------------------------------------
ACK sip:+49XXX@2.2.2.2:5060 SIP/2.0
Via: SIP/2.0/UDP 1.1.1.1:5060;branch=z9hG4bK00E0F515158AAD9FCF7F6278F296
From: "+49XXX" <sip:+49XXX@mgc-cluster.XXX.net;user=phone>;tag=0UUHS0000030000E1D0101Cu045FV8G07E1469
To: "+49XXX" <sip:+49XXX@customer.XXX.net;user=phone>;tag=as32a03bca
Call-ID: 8415e00015f5-5f36402b-2d18768f-23c282b0-3916f46-01
CSeq: 163 ACK
Contact: <sip:+49XXX@1.1.1.1:5060;transport=udp>
Max-Forwards: 70
Content-Length:     0
{noformat}

-----------------------------------------------------------------------------
And now the fun part with the real issues begins.
"our_inbound" sends 313 RTP-packets with SSRC 7f5a528c from 1.1.1.2:11940 to
2.2.2.2:46586. Asterisk switches its SSRC to 6b48138f sending 316 packets from
2.2.2.2:46586 to 1.1.1.2:11940.

However, just after starting Asterisk includes another RTP-frame with the new
SSRC 6b48138f but with the old(!) sequence number. At this moment RTP breaks
on the Teles at "our_inbound".
-----------------------------------------------------------------------------
{noformat}
3.695997 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14273, Time=160, Mark
3.696054 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=27458, Time=0             (<---- !)
3.715943 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14274, Time=320
3.734968 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14275, Time=480
3.754930 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14276, Time=640
3.762733 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=140, Time=3320682905
3.774945 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14277, Time=800
3.782714 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=141, Time=3320683065
3.795174 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14278, Time=960
3.802902 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=142, Time=3320683225
[...]
9.942691 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=449, Time=3320732345
9.949898 PT=ITU-T G.711 PCMA, SSRC=0x6B48138F, Seq=14588, Time=50560
9.962661 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=450, Time=3320732505
9.983132 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=451, Time=3320732655
10.007195 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=452, Time=3320732825
10.022785 PT=ITU-T G.711 PCMA, SSRC=0x7F5A528C, Seq=453, Time=3320732985
{noformat}

-----------------------------------------------------------------------------
I am speculating, but something causes Asterisk to send a spurious RTP-frame
which seems to belong to the inband-progress signal (as it uses the old
sequencenumbers), but the RTP-stack does not longer know the old stream and
thus sending it with the new SSRC-identifier and a timestamp of 0.

Still, that issue seems to only affect calls which a routed to "our_customer".
As to why I am at total loss, as I checked a lot of calls to other clients,
which all look the same (Asterisk creating an inband ringtone and switching to
a new SSRC when the call established). All calls to the other customers make a
perfect switch on the leg to "our_inbound", only on calls to "our_customer"
show that spurious RTP-frame.


As always I am happy to provide any additional infos or details needed.
Comments:By: Asterisk Team (asteriskteam) 2020-08-14 11:28:56.407-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution. Please note that log messages and other files should not be sent to the Sangoma Asterisk Team unless explicitly asked for. All files should be placed on this issue in a sanitized fashion as needed.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Matthias Hensler (cubbi) 2020-08-17 09:22:28.986-0500

Thanks for properly formatting my traces.

I just added some additional information which might help to figure out the problem. The Swyx-agent at "our_customer" seems to send two RTP packets at the same moment the call is established, although the RTP-timestamp suggest that the packets should be actually 20ms apart. To properly visualize this I added the pcap-timestamps to the relevant RTP traces above.

This seems to be a strange behaviour on Swyx's-side, as no other client seems to do this. Maybe that behavior is the origin of the problem on Asterisk's-side. Indeed it seems that Asterisk does not create the spurious RTP-packet by itself, but just inserts the wrong SSRC in it. As to why I unfortunately still have no clue, it seems to me that all calls coming from Swyx looking the same, but only in some calls Asterisk uses the old/wrong SSRC in its relayed RTP.

By: Joshua C. Colp (jcolp) 2020-08-17 12:22:15.293-0500

Thanks for the report and debug. However we also need protocol specific debug captured at the time of the issue. Please include the following:

* Asterisk log files generated using the instructions on the Asterisk wiki [1], with the appropriate protocol debug options enabled, e.g. 'pjsip set logger on' if the issue involves the chan_pjsip channel driver.
* Configuration information for the relevant channel driver, e.g. pjsip.conf.
* A wireshark compatible packet capture, captured at the same time as the Asterisk log output.

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

In this case the debug level output and a packet capture would be good.

By: Matthias Hensler (cubbi) 2020-08-18 04:59:24.204-0500

Attached is a current calltrace with the documented problem.

As these are pcap-files on further filtering on numbers and ip-addresses are made.

60350082.pcap is an inboundcall "our_provider" to asterisk
60350068.pcap is the outboundleg asterisk to "our_customer"/Swyx.

debug_all2.txt should contain all relevant Asterisk-Debugoutput with "core debug 5" and "sip debug on". Please note that I had to filter that file to remove other, non relevant calls. I tried to be careful to not remove any output for this specific call, but should anything missing please just note, as I have the unfiltered output still available.

Also please note: the RTP-frames in the pcap-files are stripped of the payload, as the actual audio should not be relevant and cannot be submitted due to privacy issues.

By: Matthias Hensler (cubbi) 2020-08-18 05:10:05.036-0500

For reference for this new call. In 60350068.pcap you will see two RTP-packets from Swyx to Asterisk: at 4.557922 with SSRC 0x46FE16E5, sequence 6753 and timestamp 0 (plus mark) and at just a moment later at 4.557931 a second packet with the same SSRC, sequence 6754 and a timestamp of 160.

Why Swyx does this is unknown to me. However Asterisk seems to garble up these packets when relaying to the inbound leg:

In 60350082.pcap you see these two RTP-packets at 4.567921 with SSRC 0x46FE16E5, sequence 6764 and timestamp 160 (plus mark), but also a packet at 4.567951 with the same SSRC, but sequence 14366 (which corresponds to the previous RTP-stream containing the inband-ringtone) and a timestamp of 0.

By: Matthias Hensler (cubbi) 2020-08-18 05:33:30.932-0500

Some speculation: as it seems that the problem surfaced after updating from 16.9.0 to 16.10.0 I browsed a diff between both versions. There is this one commit f2ba1919e6 from ASTERISK-28773 with overrides the SSRC of an RTP-bridge, when a Mark is received.

I could be wrong, as I am not that deep into the internal RTP flow of Asterisk, but it seems strange to me, that only the SSRC is changed, without looking at the sequence numbers and timestamps. Could it be, that because of an unfortunate timing issue, that Asterisk wants to issue a last ringtone-frame, but because of commit f2ba1919e6 it is gets its SSRC overwritten by the mark-frame received from the newly established call? If in doubt I could recompile our version with that patch reversed. But maybe this is already a clue for you and you have a proper solution/patch we could try.

By: Kevin Harwell (kharwell) 2020-08-18 12:39:09.731-0500

If you could reverse the patch for ASTERISK-28773 and see if that fixes your issue that would be quite helpful.

By: Matthias Hensler (cubbi) 2020-08-19 03:56:41.130-0500

Well, reversing the patch for ASTERISK-28773 did something unexpected. The "additional" RTP-frame is still there, and its sequence number is still matching the previous stream of the inband-ringing. However, the SSRC does not longer match the new RTP-stream of the established call (so the patch for ASTERISK-28773 is at least related), but is now a complete different SSRC which does not match any other SSRC of the call.

Could it be, that Asterisk tries to mux one RTP-frame from the established call into the stream with the inband-ringing (therefore using the sequence numbers from that stream), but in parallel creating the new stream? If you carefully check how the sequencenumbers on the traces above a created, you see that all frames starting with sequence 6754 and timestamp of 160 in 60350068.pcap are copied 1:1 in 60350082.pcap (only that 6754 gets the mark in that trace), while the frame with 6743 and timestamp of 0 in 60350068.pcap is copied without mark and the SSRC from the inband-ringing in 60350082.pcap.

In our new traces (which I will upload now) that first frame is copied with a completely different SSRC and is completely unrelated to any of the other frames. I am thinking that I saw this issue earlier, but did not investigate, as it did not cause any trouble. But my suspicion is, that this strange behaviour of Swyx is causing some strange timing issues on the asterisk, which just surfaced with patch ASTERISK-28773, as now a previous unrelated RTP-frame suddenly gets a SSRC from an related RTP-stream and causing hickups because of the timejump this frame creates.

By: Matthias Hensler (cubbi) 2020-08-19 04:20:34.051-0500

Attached are two new traces for a call with the patch from ASTERISK-28773 reversed. I did not create a debug-output from Asterisk itself this time, as it is a little bit tedious. If you also need the debug-output let me know and I will try to capture that as well.

Let me summarize what I can see in this traces:
64008592.pcap is once again an inbound-call from "our_provider" to the asterisk. 60408587.pcap is the callleg forwarded from the Asterisk to "our_customer" (Swyx).

If you first look at 60408587.pcap:
* at 4.266133 the Swyx establishes the call with its SIP-OK
* at 4.304191 a first RTP-frame (SSRC 0x180F1B76, Sequence 6281, Timestamp 0 and Mark) is sent by the Swyx.
* at 4.304202 a second RTP-frame (Sequence 6282, Timestamp 160 and no Mark) is sent in parallel by the Swyx.
* from 4.324172 on the RTP-stream from the Swyx seems normal.

However, if you look at 604008592.pcap:
* at 4.272798 the Asterisk establishes the call to "our_provider" with the SIP-OK
* at 4.309923 the first RTP-frame from the established call is send (SSRC 0x180F1B76, Sequence 6282, Timestamp 160 and Mark set)
* at 4.309965 an unrelated RTP-frame is send in parallel (SSRC 0x8E7BA5B, Sequence 7131, Timestamp 0 and no mark)
* from 4.329884 on the RTP-stream looks normal and is a 1:1 copy of the stream from 60408587.pcap.

Obviously Asterisk seems to correctly relay the RTP from Sequence 6282 and onwards. In direction of "our_provider" this frame has the Mark-bit set, while the Swyx-frame has no Mark set.

However, the first RTP-frame (Sequence 6281) received from the Swyx is sent as the second RTP-frame to "our_provider". There it gets its Mark-bit stripped and the sequence-number if changed so it matches the earlier inband-ringing-stream. However the SSRC is now completely unrelated. With the patch from ASTERISK-28773 the SSRC is also copied from the inband-ringing-stream, resulting in a timejump because of the timestamp of 0.

By: Matthias Hensler (cubbi) 2021-04-20 06:19:16.558-0500

Since this bug is getting a little bit old now, I want to give some updates.

I noticed some rework in the RTP-engine related to ASTERISK-29300. We tried the changed with Asterisk 16.17.0, and while there seems to be some improvement, the original problem is still present. The behaviour changed a bit, as in the "wrong" RTP-frame in the outgoing-stream now has a wrong sequencenumber unrelated to the incoming-stream. It seems that Asterisk is generating a wrong frame with an own sequence, instead of reusing the sequencenumbers from the previous incoming stream. However the fundamentally issues stays the same: because the Swyx-client sends two frames at once, Asterisk issues both frames into its outgoing stream, but one of the frame with a completely wrong sequence number.

It would be nice if that issue could be fixed, please tell if any further input or information is needed. Thanks a lot.

Edit: one additional information: Swyx uses a session-timer of 90 seconds be default, causing a reinvite all 45 seconds (each reinvite with a new RTP-stream and each RTP-stream with two leading frames at the beginning). Only sometimes the outgoing stream from asterisk uses a wrong sequence number, while (most?) times the sequence number is in sync. So that might be a real timing/corner issue.

By: Sean Bright (seanbright) 2021-04-21 15:54:00.313-0500

[~cubbi], could you please try the attached patch and let us know your results?

Edit: The patch is going to need some work before merging, but the general problem is that packets with timestamp 0 are being handled differently. RFC3550 says that the initial timestamp SHOULD be chosen randomly, but it looks like Swyx opts to always use 0 for whatever reason. This is probably why you are seeing it consistently with this particular hardware combination.

By: Matthias Hensler (cubbi) 2021-04-26 09:26:23.644-0500

Hi [~seanbright]
many thanks for your effords. I can confirm, that the patch seems to fix the issue for good. All relayed RTP-streams have valid and increasing sequence numbers now. That jitter caused by Swyx seems to be copied as is now with the correct sequence number.
So if a properly reworked patch could be merged in one of the next releases that would be great.

I will also notify Swyx about that issue.

By: Sean Bright (seanbright) 2021-04-26 09:31:56.213-0500

FWIW - It's not technically a bug with Swyx, but if they _did_ choose their initial RTP timestamps randomly, you would see this problem much less often.

By: Sean Bright (seanbright) 2021-04-27 15:19:59.819-0500

Hi [~cubbi] - I've [attached the patch|^0001-res_rtp_asterisk-More-robust-timestamp-checking.patch] that is currently up for review. Could you please test one more time to confirm that this still resolves the issue you are having?

By: Matthias Hensler (cubbi) 2021-04-29 09:54:38.038-0500

Hi [~seanbright], many thanks again. I can confirm that the current patch still resolves the issue. So it would be great to find it merged in some future release.

Most likely Swyx will also change the behaviour about the initial RTP-timestamp in some of their next releases, so hopefully this corner case should no longer happen either way then.

By: Joshua C. Colp (jcolp) 2021-04-29 09:57:16.883-0500

Sean has put the change up for review and inclusion. It is unlikely to make the next release, but would be in the one after.

By: Friendly Automation (friendly-automation) 2021-04-30 09:00:36.040-0500

Change 15842 merged by Friendly Automation:
res_rtp_asterisk: More robust timestamp checking

[https://gerrit.asterisk.org/c/asterisk/+/15842|https://gerrit.asterisk.org/c/asterisk/+/15842]

By: Friendly Automation (friendly-automation) 2021-04-30 09:03:26.491-0500

Change 15807 merged by Joshua Colp:
res_rtp_asterisk: More robust timestamp checking

[https://gerrit.asterisk.org/c/asterisk/+/15807|https://gerrit.asterisk.org/c/asterisk/+/15807]

By: Friendly Automation (friendly-automation) 2021-04-30 09:03:42.062-0500

Change 15843 merged by Joshua Colp:
res_rtp_asterisk: More robust timestamp checking

[https://gerrit.asterisk.org/c/asterisk/+/15843|https://gerrit.asterisk.org/c/asterisk/+/15843]