[Home]

Summary:ASTERISK-28864: RTP Timestamp not increasing after several transfers and codec changes
Reporter:nappsoft (nappsoft)Labels:patch
Date Opened:2020-04-30 01:27:03Date Closed:2020-04-30 13:00:09
Priority:MajorRegression?No
Status:Closed/CompleteComponents:Resources/res_rtp_asterisk
Versions:16.9.0 Frequency of
Occurrence
Constant
Related
Issues:
Environment:Attachments:( 0) ast_translate.patch
Description:After facing the following issue in production I was able to reproduce it in the lab:

Our real world procedure:

- Incoming call (G711a) going to a queue
- pickup the channel (so not answering an INVITE, but doing a PickupChan) with a different codec (Opus in our case, didn't test with others)
- placing the call into a queue again with an attended ast-transfer
- the call is answered (G711a) by an agent (not with a pickup this time), using a PJSIP based softphone
- the transfer is executed. Now both channels use G711a

So now audio will no longer need to be transcoded. (The problem does not seem to happen when we use G711a for all clients all the time).

What now happens is that we could observe one-way-audio or no audio at all. However this was not caused by a network issue but by wrong rtp timestamps: rtp timestamps in the outgoing stream suddenly were not increased anymore.

I've added some logging and could observe the following behavior: the delivery time used in calc_txstamp was going back in time in res_rtp_asterisk.c. (80000ms each time).

It seems like the problems were caused by the fact that the timestamps on rx side were lower than before after the transfer. I was able to get rid of the issue by patching translate.c

Please note that the issue could also be influenced by the fact that we use a slightly patched version of the opus codec. However the fact that asterisk seems to struggle when f->delivery is (for whatever reason) lower than path->nextin is a problem worth looking at I guess.
Comments:By: Asterisk Team (asteriskteam) 2020-04-30 01:27:04.254-0500

Thanks for creating a report! The issue has entered the triage process. That means the issue will wait in this status until a Bug Marshal has an opportunity to review the issue. Once the issue has been reviewed you will receive comments regarding the next steps towards resolution.

A good first step is for you to review the [Asterisk Issue Guidelines|https://wiki.asterisk.org/wiki/display/AST/Asterisk+Issue+Guidelines] if you haven't already. The guidelines detail what is expected from an Asterisk issue report.

Then, if you are submitting a patch, please review the [Patch Contribution Process|https://wiki.asterisk.org/wiki/display/AST/Patch+Contribution+Process].

Please note that once your issue enters an open state it has been accepted. As Asterisk is an open source project there is no guarantee or timeframe on when your issue will be looked into. If you need expedient resolution you will need to find and pay a suitable developer. Asking for an update on your issue will not yield any progress on it and will not result in a response. All updates are posted to the issue when they occur.

By: Kevin Harwell (kharwell) 2020-04-30 09:46:14.809-0500

Please attach the relevant configuration (queues, pjsip endpoint, etc...) if possible. As well please attach a full Asterisk debug log with SIP tracing enabled [1] and/or pcap captured at the time of an incident.

[1] https://wiki.asterisk.org/wiki/display/AST/Collecting+Debug+Information

Thanks!

By: Kevin Harwell (kharwell) 2020-04-30 09:49:05.920-0500

{quote}
we use a slightly patched version of the opus codec.
{quote}
By this am I correct in assuming you are using an open source version of codec opus in Asterisk vs. the one provided by Sangoma?

By: Kevin Harwell (kharwell) 2020-04-30 09:50:11.156-0500

Are jitterbuffers used in anyway in the scenario setup?

By: nappsoft (nappsoft) 2020-04-30 10:13:14.811-0500

No, there are no jitterbuffers in use (except for if asterisk uses them automaticelly without any configuration)

Yes, we use an opensource version (digium didn't have opus support when we first came across this github repo). However we call it opusnb and limit it to narrowband. It's only in use for mobile clients because of bandwidth limitations.

By: nappsoft (nappsoft) 2020-04-30 10:46:44.067-0500

There was nothing usefull in the logs by the way, I neede to create log output myself first before I got something usefull:

what happened in my case was:

calc_txstamp called from rtp_raw_write always returned 0, because of that rtp->lastts always kept the same value (as rtp->lastts = rtp->lastts + ms*rate).

By: Joshua C. Colp (jcolp) 2020-04-30 11:02:56.903-0500

What was the output of "core show channel" on the involved channels at the same time too?

By: nappsoft (nappsoft) 2020-04-30 11:14:31.918-0500

the output looked good. however I will need to reproduce this again and maybe add some output for the following operation:

               path->nextin = ast_tvadd(path->nextin, ast_samp2tv(
                        f->samples, ast_format_get_sample_rate(f->subclass.format)));

While the patch is fixing the issue completely it is certainly not the right way to solve this... will let you know in a few minutes.

By: nappsoft (nappsoft) 2020-04-30 11:26:09.819-0500

first channel:

          Name: PJSIP/Hansli-00000006
          Type: PJSIP
      UniqueID: LumiMagic-1588263466.62
      LinkedID: LumiMagic-1588263453.30
     Caller ID: ***********
Caller ID Name: (N/A)
Connected Line ID: anonymous
Connected Line ID Name: Anonymous
Eff. Connected Line ID: anonymous
Eff. Connected Line ID Name: Anonymous
   DNID Digits: (N/A)
      Language: de
         State: Up (6)
 NativeFormats: (alaw)
   WriteFormat: slin
    ReadFormat: slin
WriteTranscode: Yes (slin@8000)->(alaw@8000)
 ReadTranscode: Yes (alaw@8000)->(slin@8000)
Time to Hangup: 0
  Elapsed Time: 0h0m26s
     Bridge ID: 13ef0866-941f-4a2b-b5f1-c583473574d1
--   PBX   --
       Context: default
     Extension:
      Priority: 1
    Call Group: 0
  Pickup Group: 0
   Application: AppDial
          Data: (Outgoing Line)
Call Identifer: [C-00000003]
     Variables:


-- Streams --
Name: audio
   Type: audio
   State: sendrecv
   Group: -1
   Formats: (alaw)
   Metadata:

second channel:

-- General --
          Name: PJSIP/A1020srvID3-00000003
          Type: PJSIP
      UniqueID: LumiMagic-1588263453.30
      LinkedID: LumiMagic-1588263453.30
     Caller ID: anonymous
Caller ID Name: Anonymous
Connected Line ID: **************
Connected Line ID Name: (N/A)
Eff. Connected Line ID: ************
Eff. Connected Line ID Name: (N/A)
   DNID Digits: 0445405046
      Language: de
         State: Up (6)
 NativeFormats: (alaw)
   WriteFormat: slin
    ReadFormat: slin
WriteTranscode: Yes (slin@8000)->(alaw@8000)
 ReadTranscode: Yes (alaw@8000)->(slin@8000)
Time to Hangup: 0
  Elapsed Time: 0h1m19s
     Bridge ID: 280b9a27-2de0-4eb2-bd62-88431d0519c0
--   PBX   --
       Context: default
     Extension: 0445405046
      Priority: 33
    Call Group: 0
  Pickup Group: 0
   Application: Queue
          Data: group16,irtwx,,,30,
Call Identifer: [C-00000003]
     Variables:

-- Streams --
Name: audio
   Type: audio
   State: sendrecv
   Group: -1
   Formats: (alaw)
   Metadata:

These two chanels are bridged over two local channels where we have the translation path:

 NativeFormats: (opusnb)
   WriteFormat: slin
    ReadFormat: slin
WriteTranscode: Yes (slin@8000)->(opusnb@8000)
 ReadTranscode: Yes (opusnb@8000)->(slin@8000)

-- Streams --
Name: audio
   Type: audio
   State: sendrecv
   Group: -1
   Formats: (opusnb)
   Metadata:


=> so it seems to have to do with the fact that after the transfer the local channel still uses the opusnb and alaw<->opusnb<->alaw makes something strange with the timestamps. (The funny thing is: apart from that everything, includung music on hold all transcoding paths etc. is working without any issue)

Output for f->samples and ast_format_get_sample_rate(f->subclass.format) is repeatedly the following as soon as the alaw<->alaw channels are bridged:

[Apr 30 18:20:16] DEBUG[28084][C-00000003]: translate.c:615 ast_translate: 960 8000
[Apr 30 18:20:16] DEBUG[27932][C-00000003]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:20:16] DEBUG[27932][C-00000003]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:20:16] DEBUG[28084][C-00000003]: translate.c:615 ast_translate: 160 8000


By: nappsoft (nappsoft) 2020-04-30 11:41:10.556-0500

Just to mention: with a normal call (no local channels involved) the output looks sane, there are no 960 sample jumps:

[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2900][C-00000007]: translate.c:615 ast_translate: 160 8000
[Apr 30 18:27:56] DEBUG[2942][C-00000007]: translate.c:615 ast_translate: 160 8000

and there we have:

WriteTranscode: Yes (slin@8000)->(opusnb@8000)
 ReadTranscode: Yes (opusnb@8000)->(slin@8000)

and:

WriteTranscode: Yes (slin@8000)->(alaw@8000)
 ReadTranscode: Yes (alaw@8000)->(slin@8000)


By: Kevin Harwell (kharwell) 2020-04-30 11:46:47.416-0500

Any way to run a test using another codec besides opus? Not sure about the open source one, but Sangoma's does fiddle a bit with the timing (even subtracting) under certain circumstances. Other codecs could too possibly. Although a check, and catch in the core could be a good thing no matter what.

Also a SIP trace, or pcap would be good though just to see the negotiations and possibly help in setting up a potential test if/when someone does get around to looking more into this.

By: nappsoft (nappsoft) 2020-04-30 11:51:59.098-0500

Anyway: why does asterisk not get rid of the transcoding path in this scenario? After the transfer is completed there should be no need to transcode in the local channel anymore...

By: Joshua C. Colp (jcolp) 2020-04-30 11:55:10.944-0500

I don't believe Local channel is that intelligent, and until 16 there wasn't really a notification mechanism to know that things had actually changed on the respective sides.

By: nappsoft (nappsoft) 2020-04-30 12:17:50.920-0500

Well I guess the problem comes from this:

translate.c:667 ast_translate: Sample size different 160 vs 960

what would make sense in the opus case (8000hz vs 48000hz = times 6).

So forget this for the moment, I will check if I can tweak something in the codec.

By: Kevin Harwell (kharwell) 2020-04-30 13:00:09.749-0500

Okay suspending for now, but just comment again if you need it to reopen, or if you feel like it warrants it then create a new issue.