ASTERISK-22532: Fix chan_pjsip two party alice initiated test failures

[Home]

Summary: ASTERISK-22532: Fix chan_pjsip two party alice initiated test failures

Reporter: Matt Jordan (mjordan) Labels:

Date Opened: 2013-09-13 13:47:41 Date Closed: 2013-12-03 09:36:58.000-0600

Priority: Major Regression?

Status: Closed/Complete Components: Channels/chan_pjsip Tests/testsuite

Versions: 12.0.0-alpha1 Frequency of
Occurrence

Related
Issues:
is related to ASTERISK-22615 sip_attended_transfer: crash on disposed of object in native RTP bridge

Environment: Attachments:

Description: Well, this is a bit odd:

* [https://bamboo.asterisk.org/bamboo/browse/AST-ATTSCD-C632TE-81/test/case/1415281]
* [https://bamboo.asterisk.org/bamboo/browse/AST-ATTSCD-C632TE-81/test/case/1415282]

The fact that the hangup cause mappings themselves are wrong indicates that this is more than just a TCP binding issue (or something along those lines)

Comments: By: Mark Michelson (mmichelson) 2013-09-18 11:01:01.995-0500

Looking at the logs from build 86, the two-party tests are failing due to a TCP binding issue. The "bob" instance of Asterisk is unable to bind to TCP port 5062.

Prior to the two-party tests, the following three SIP tests run:

channels/SIP/sip_attended_transfer
channels/pjsip/basic_calls/outgoing/nominal/echo
channels/pjsip/basic_calls/outgoing/off-nominal/bob_does_not_answer

sip_attended_transfer does not use TCP at all.
echo binds an instance of Asterisk to port 5062.
bob_does_not_answer attempts to bind a SIPp instance to TCP port 5062. I cannot tell from the test logs whether this binding succeeds. However, I can tell that the attempted calls from Asterisk to TCP port 5062 result in ECONNREFUSED, leading to the test failures seen in that test.

The best I can deduce at the moment is that either the echo or the bob_does_not_answer test are leaving TCP port 5062 occupied. As a result, future tests, such as the two-party alice-initiated tests, are essentially non-starters since they cannot bind to the required port.

If I could see the SIPp output in more detail from the bob_does_not_answer test, I could narrow down the offender in this case. My guess, since SIPp is exiting with error code 97 (meaning a failure on an internal operation), the echo test is the culprit here. Fixing the echo test may result in other tests passing properly.

I'll have a look at other test runs to see if the results are consistent with build 86.
By: Mark Michelson (mmichelson) 2013-09-18 12:35:11.762-0500

I looked at build 85. The results are similar, yet different.

Again, the two party tests are failing due to "bob" being unable to bind to TCP port 5062. However, the prior conditions are slightly different. The following SIP tests fail prior to the two-party tests:

channels/SIP/sip_attended_transfer
channels/pjsip/basic_calls/outgoing/nominal/echo
channels/pjsip/basic_calls/outgoing/off-nominal/bob_does_not_exist

What's interesting here is that the echo test still fails, but this time the bob_does_not_answer test succeeds. This would imply that the echo test failure is not what is causing the port to remain occupied. It's worth noting, though, that in this particular test run, it's plain as day that the bob_does_not_exist test fails because SIPp is unable to bind to a specific port (the error output does not say which port the trouble occurred on).

This, unfortunately, adds a bit of randomness to the mix.

In build 86, if bob_does_not_answer could not bind to TCP port 5062, then we should have seen a SIPp error stating so in the logs, as evidenced by the logs for bob_does_not_exist in build 85. However, such a log message is not there, so it may be reasonable to assume that the test failed for a different reason entirely. Unfortunately, it is not clear from log output why it would be.

In build 85, the bob_does_not_exist test clearly fails due to the inability to bind to a port. However, the bob_does_not_answer test succeeded, and did so *after* the bob_does_not_exist test ran.

The only way to resolve this is to dig a bit deeper. The only constant so far is that the two-party tests are always failing for the same reason. The lead-up to those test failures varies, though.
By: Mark Michelson (mmichelson) 2013-09-18 14:36:17.501-0500

Well, despite why the tests may be failing during individual testsuite runs on the bamboo build agents, I can confirm there are test issues when the tests are run individually. Tentatively, I can attribute the problem to a premature hangup of some sort. After Alice detects talking, she attempts to playback the tt-weasels file so that Bob can hear it. The problem is that the playback fails because the channel hangs up during the playback.
By: Mark Michelson (mmichelson) 2013-09-19 10:50:58.869-0500

I am taking a break from this issue for a bit. Just in case someone else ends up looking at this instead, here is what I have found.

I used the alice_hangs_up test as the baseline test. I modified the test-config.yaml test to only originate the UDP IPv4 call from Alice to Bob. I also updated the AMI expectations to only expect 1 instance of each event instead of 4. I also modified Alice and Bob's extensions.conf files to play demo-congrats instead of tt-weasels in their BackgroundDetect calls. This probably isn't actually necessary or beneficial, but for the sake of disclosure, you should know I did it.

When running the test against Asterisk trunk, sometimes the test would succeed. Most of the time, it would fail. The failure always came on Bob's end because the BackgroundDetect application was reporting that it failed to detect audio. Debugging ast_read() and ast_rtp_read(), I found that Bob was never reading in audio frames when the test would fail.

I changed the UUT instance of Asterisk not to use a native RTP local bridge and instead use a core bridge. When I did this, the test would pass every time. This is the point I am at while debugging the test. The problem likely stems from the UUT box doing something incorrectly when performing a native RTP local bridge.