[Home]

Summary:ASTERISK-16411: [patch] subchannel remains half-open after call transfer
Reporter:Jonathan Hunter (jmhunter)Labels:
Date Opened:2010-07-23 18:02:58Date Closed:2010-08-07 17:36:09
Priority:MinorRegression?No
Status:Closed/CompleteComponents:Channels/chan_skinny
Versions:Frequency of
Occurrence
Related
Issues:
Environment:Attachments:( 0) backtrace_first_ring_bug.txt
( 1) backtrace.txt
( 2) backtrace2.txt
( 3) backtrace2-svn-with-moved-call.txt
( 4) backtrace-threads_first_ring_bug.txt
( 5) chan_skinny-transfer-fixup-v1.txt
( 6) chan_skinny-transfer-fixup-v2.txt
( 7) chan_skinny-transfer-test-patch.txt
( 8) chan_skinny-transfer-v3.txt
( 9) chan_skinny-transfer-v4.txt
(10) chan_skinny-transfer-v5.txt
(11) debug_asterisk_first_ring_bug
(12) skinny-debug.txt
(13) skinny-debug-v5-test1.txt
(14) skinnytransfver.v6.diff
(15) skinnytransfver.v8.diff
Description:Call transfers don't seem to work properly using chan_skinny.

To reproduce:

[ jmhtest1 is a Skinny phone (Cisco 7905G)   ]
[ 5022 is a DAHDI-connected analogue handset ]

1. jmhtest1 picks up handset and calls 6001 (music on hold)
2. jmhtest1 presses 'Transfer'
3. jmhtest1 dials 5022
4. 5022 picks up
5. jmhtest1 presses 'Transfer'
6. jmhtest1 replaces handset in cradle

At this point, jmhtest1 still shows "Connected" on its display, and now won't give out a dialtone when handset is picked up. Pressing the 'EndCall' softkey leads to messages such as:
[Jul 23 23:26:21] WARNING[19173]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '1' on 'jmhtest1'
Received Softkey Event: End Call(1/1)
[Jul 23 23:26:22] WARNING[19173]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '1' on 'jmhtest1'
Received Softkey Event: End Call(1/1)

Hanging up 5022 (the recipient of the call transfer) does not make any difference, jmhtest1 still does not work and I have to reboot the phone.

Trying a blind transfer (missing out step 5 above) seems to be a bit hit and miss - this either works fine, or places the call on hold. It's done both for me at various times, I think it's to do with whether Asterisk has crashed recently (see ASTERISK-16400).

In addition, at step 3 above, I can hear the audio from the original call to 6001, as well as the audio from 5022 (where I am transferring to). That surely shouldn't happen - jmhtest1 should talk to 5022 and only 5022 at that point, right?
Comments:By: dea (dea) 2010-07-26 19:52:13

Lightly tested patch that addresses the lack of a second subchannel at the start of a transfer and telling the phone that the second subchannel has gone away.

By: Jonathan Hunter (jmhunter) 2010-07-27 16:53:43

Thanks! The patch applies OK, with the addition of a ",NULL)" to the skinny_new call.

I have just tried again, using another DAHDI handset instead of music on hold, but I can't see a measurable difference :-(

In fact, I have since then managed to trigger a double free error, and a coredump. The double free happened when the transfer 'target' (5022, DAHDI/7-1) hung up before the original call recipient (DAHDI/1-1):

   -- Stopped music on hold on DAHDI/1-1
   -- Native bridging DAHDI/7-1 and DAHDI/1-1
[Jul 27 22:39:37] WARNING[31737]: chan_dahdi.c:4630 dahdi_enable_ec: Unable to enable echo cancellation on channel 7 (No such device)
[Jul 27 22:39:37] WARNING[31737]: chan_dahdi.c:4630 dahdi_enable_ec: Unable to enable echo cancellation on channel 1 (No such device)
[Jul 27 22:39:37] ERROR[31737]: astobj2.c:258 internal_ao2_ref: refcount -1 on object 0xd60d890
gmpbx*CLI> *** glibc detected *** /usr/sbin/asterisk: double free or corruption (!prev): 0x0d60d258 ***
======= Backtrace: =========
/lib/libc.so.6[0x2f4a96]
/lib/libc.so.6(cfree+0x90)[0x2f7fb0]
/usr/sbin/asterisk[0x8085e40]
/usr/sbin/asterisk(__ao2_ref+0x35)[0x8085d2c]
/usr/sbin/asterisk(ast_channel_release+0x2e)[0x80ab839]
/usr/lib/asterisk/modules/cdr_custom.so[0x525252]
/usr/sbin/asterisk[0x80a25ea]
/usr/sbin/asterisk(ast_cdr_detach+0xcd)[0x80a2f06]
/usr/sbin/asterisk(ast_bridge_call+0x1f88)[0x80f40b0]
/usr/lib/asterisk/modules/app_dial.so[0xb4d38a]
/usr/lib/asterisk/modules/app_dial.so[0xb4dd34]
/usr/sbin/asterisk(pbx_exec+0x1ea)[0x8130372]
/usr/sbin/asterisk[0x8139632]
/usr/sbin/asterisk(ast_spawn_extension+0x53)[0x813ae84]
/usr/sbin/asterisk[0x813b80c]
/usr/sbin/asterisk(ast_pbx_run_args+0x37)[0x813d3f5]
/usr/sbin/asterisk(ast_pbx_run+0x19)[0x813d421]
/usr/lib/asterisk/modules/chan_skinny.so[0x66a1a2]
/usr/lib/asterisk/modules/chan_skinny.so[0x66a6d8]
/usr/sbin/asterisk[0x818d56e]
/lib/libpthread.so.0[0x41945b]
/lib/libc.so.6(clone+0x5e)[0x35c23e]

The core dump happened when trying to replicate this.

Possibly related info: The core dump was generated after rebooting the skinny handset (I couldn't initiate another call, because the subchannel had been left hanging around).. Perhaps Asterisk had got itself confused?

By: dea (dea) 2010-07-27 18:13:13

The subchannel is actuall gone, but the phone does not know it,
hence all of the 'cannot find subchannel with reference' messages.

V2 of this patch re-adds notifications to skinny_hangup for the
three subchannel states- active, inactive and only

I will admit I do not fully understand the ao2 code, but the
backtrace appears indicate chan_dahdi in the crash and not
chan_skinny



By: Jonathan Hunter (jmhunter) 2010-07-27 18:32:22

Thanks for the updated patch!

I'm sure we're getting closer - but still no observed difference on this one :(

Is there any more detailed debugging I can enable for skinny, that would be of use?

Call-id: 11
   -- Starting simple switch on '5909@jmhtest1'
   -- Executing [6001@sip:1] Answer("Skinny/5909@jmhtest1-11", "") in new stack
   -- Executing [6001@sip:2] NoCDR("Skinny/5909@jmhtest1-11", "") in new stack
   -- Executing [6001@sip:3] MusicOnHold("Skinny/5909@jmhtest1-11", "mugss") in new stack
   -- Started music on hold, class 'mugss', on Skinny/5909@jmhtest1-11
   -- Starting simple switch on '5909@jmhtest1'
   -- Executing [5022@sip:1] Gosub("Skinny/5909@jmhtest1-12", "macro-stdexten,s,1(DAHDI/7)") in new stack
   -- Executing [s@macro-stdexten:3] Dial("Skinny/5909@jmhtest1-12", "DAHDI/7,30") in new stack
   -- Called 7
   -- DAHDI/7-1 is ringing
   -- DAHDI/7-1 is ringing
[Jul 28 00:30:04] WARNING[13835]: chan_dahdi.c:4685 dahdi_enable_ec: Unable to enable echo cancellation on channel 7 (No such device)
   -- DAHDI/7-1 answered Skinny/5909@jmhtest1-12
   -- Hanging up on 'DAHDI/7-1'
   -- Hungup 'DAHDI/7-1'
 == Spawn extension (macro-stdexten, s, 3) exited non-zero on 'Skinny/5909@jmhtest1-12'
   -- Starting simple switch on 'DAHDI/7-1'
[Jul 28 00:30:08] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
Transfer: No subchannel to transfer
[Jul 28 00:30:12] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
[Jul 28 00:30:14] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
[Jul 28 00:30:14] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
[Jul 28 00:30:15] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
[Jul 28 00:30:15] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
[Jul 28 00:30:15] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
[Jul 28 00:30:15] WARNING[13818]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '12' on 'jmhtest1'
   -- Hanging up on 'DAHDI/7-1'
   -- Hungup 'DAHDI/7-1'

By: dea (dea) 2010-07-27 20:24:19

Can you set the verbose log level to 5?
This may be similar to the park issue (minus the crash), in
that there is not a remote peer to transfer.  Can you try
another device as the transferee?

By: Jonathan Hunter (jmhunter) 2010-07-28 02:19:30

Sorry, I should have mentioned, I did try it with another DAHDI device - had the same thought. It still behaved the same.

I've tried it again just now to capture the more detailed debug logs, and discovered different behaviour. This time, the handset was still confused, but no error logs showed up on the Asterisk console. It triggered another segfault in Asterisk (backtrace attached)

hostname*CLI> core set verbose 5
Verbosity was 3 and is now 5
   -- Starting simple switch on '5909@jmhtest1'
   -- Executing [5021@sip:1] Gosub("Skinny/5909@jmhtest1-3", "macro-stdexten,s,1(DAHDI/1)") in new stack
   -- Executing [s@macro-stdexten:3] Dial("Skinny/5909@jmhtest1-3", "DAHDI/1,30") in new stack
   -- Called 1
   -- DAHDI/1-1 is ringing
   -- DAHDI/1-1 is ringing
   -- DAHDI/1-1 is ringing
   -- DAHDI/1-1 is ringing
[Jul 28 08:07:06] WARNING[15504]: chan_dahdi.c:4685 dahdi_enable_ec: Unable to enable echo cancellation on channel 1 (No such device)
   -- DAHDI/1-1 answered Skinny/5909@jmhtest1-3
   -- Starting simple switch on '5909@jmhtest1'
   -- Started music on hold, class 'default', on DAHDI/1-1
   -- Executing [5022@sip:1] Gosub("Skinny/5909@jmhtest1-4", "macro-stdexten,s,1(DAHDI/7)") in new stack
   -- Executing [s@macro-stdexten:3] Dial("Skinny/5909@jmhtest1-4", "DAHDI/7,30") in new stack
   -- Called 7
   -- DAHDI/7-1 is ringing
   -- DAHDI/7-1 is ringing
   -- DAHDI/7-1 is ringing
[Jul 28 08:07:12] WARNING[15505]: chan_dahdi.c:4685 dahdi_enable_ec: Unable to enable echo cancellation on channel 7 (No such device)
   -- DAHDI/7-1 answered Skinny/5909@jmhtest1-4
[Jul 28 08:07:15] NOTICE[15504]: chan_skinny.c:4196 skinny_fixup: skinny_fixup(DAHDI/7-1, DAHDI/7-1<MASQ>)
      > Killing inactive sub 3
 == Spawn extension (macro-stdexten, s, 3) exited non-zero on 'Skinny/5909@jmhtest1-4'
      > Killing only sub 4
   -- Native bridging DAHDI/7-1 and DAHDI/1-1
[Jul 28 08:07:15] WARNING[15504]: chan_dahdi.c:4685 dahdi_enable_ec: Unable to enable echo cancellation on channel 7 (No such device)
[Jul 28 08:07:15] WARNING[15504]: chan_dahdi.c:4685 dahdi_enable_ec: Unable to enable echo cancellation on channel 1 (No such device)
   -- Stopped music on hold on DAHDI/1-1
   -- Native bridging DAHDI/7-1 and DAHDI/1-1

## skinny call has ended. Pressing 'End Call' softkey doesn't seem to show
## anything on Asterisk now, but the phone is still confused and thinks the
## call is still active

[Jul 28 08:08:01] WARNING[15504]: chan_dahdi.c:4685 dahdi_enable_ec: Unable to enable echo cancellation on channel 7 (No such device)
[Jul 28 08:08:01] WARNING[15504]: chan_dahdi.c:4685 dahdi_enable_ec: Unable to enable echo cancellation on channel 1 (No such device)
[Jul 28 08:08:01] ERROR[15504]: astobj2.c:116 INTERNAL_OBJ: bad magic number 0x0 for 0xbd36450
hostname*CLI>
Disconnected from Asterisk server
Executing last minute cleanups
Asterisk ending (0).
[root@hostname ~]# /usr/sbin/safe_asterisk: line 111: 15450 Segmentation fault      (core dumped) nice -n $PRIORITY ${ASTSBINDIR}/asterisk ${CLIARGS} ${ASTARGS} >&/dev/${TTY} </dev/${TTY}
Asterisk ended with exit status 139
Asterisk exited on signal 11.
Automatically restarting Asterisk.

By: Jonathan Hunter (jmhunter) 2010-07-28 02:27:26

I also decided to test this with a 7920 handset, in case there was something odd about the 7905 I've been using for testing. I discovered something odd, that also happened with the 7905 (but I forgot to mention above). Asterisk thought the handset was already offhook - on the 7905 this meant I received no dialtone; on the 7920 I merely got the message below. I solved it on the 7905, to produce the debug logs above, by restarting Asterisk.


## New skinny handset registers with Asterisk; this handset has been powered
## down since before Asterisk starts up, so there are definitely no current
## calls on it)

   -- Starting Skinny session from 10.xxx.x.xx
   -- Skinny mwi_event_cb found 0 new messages
   -- Device 'SEP00xxxxxxxxxx' successfully registered
Device capability set to '0xc (ulaw|alaw)'
Adding button: 9, 1

## Dial 6001 (music on hold) from 7920 skinny handset

[Jul 28 08:21:19] WARNING[15640]: chan_skinny.c:1673 find_subchannel_by_instance_reference: Could not find subchannel with reference '0' on 'jmhwireless1'
   -- Starting simple switch on '5908@jmhwireless1'
   -- Got offhook message when device (5908@jmhwireless1) already offhook
   -- Executing [6001@sip:1] Answer("Skinny/5908@jmhwireless1-2", "") in new stack
   -- Executing [6001@sip:2] NoCDR("Skinny/5908@jmhwireless1-2", "") in new stack
   -- Executing [6001@sip:3] MusicOnHold("Skinny/5908@jmhwireless1-2", "mugss") in new stack
   -- Started music on hold, class 'mugss', on Skinny/5908@jmhwireless1-2
   -- Stopped music on hold on Skinny/5908@jmhwireless1-2
 == Spawn extension (sip, 6001, 3) exited non-zero on 'Skinny/5908@jmhwireless1-2'
[Jul 28 08:21:49] WARNING[15640]: chan_skinny.c:1694 find_subchannel_by_reference: Could not find any lines that contained a subchannel with reference '2' on device 'jmhwireless1'

By: dea (dea) 2010-07-28 14:42:53

I had to update my test environment to Trunk, as I had been working with 1.6.2 SVN code.

The issues are basically that chan_skinny did not keep up with the rest of the
code.  There are rtp changes that are causing your segfaults (10 lines of code to fix)

The stuck display is a result of the phone not knowing which call is which.
I have a fix.  It is ugly, but I do not see a better way.

Oh, and the 7920/7921 phones are just plain wierd and make terrible test cases.  They work differently than the wired phones, so they should be a last resort for testing.

I would like to post a test patch with all of the fixes rolled up.  If you can confirm it works for you, I will split them up.  To apply you will need to use 'svn revert channels/chan_skinny.c'



By: Jonathan Hunter (jmhunter) 2010-07-28 19:59:55

Thanks Dan - again, very much appreciated!

I have applied the patch:

# svn revert channels/chan_skinny.c
# wget 'https://issues.asterisk.org/file_download.php?file_id=26822&type=bug' -O - | patch -p0
# make clean install

but the 7905 phone still doesn't notice that the subchannel has gone away:
[Jul 29 01:59:03] WARNING[25308]: chan_skinny.c:1675 find_subchannel_by_instance_reference: Could not find subchannel with reference '1' on 'jmhtest1'
[Jul 29 01:59:03] WARNING[25308]: chan_skinny.c:1675 find_subchannel_by_instance_reference: Could not find subchannel with reference '1' on 'jmhtest1'

I wonder if there's anything else we can do to figure out what's going on...?

Thank you!

By: dea (dea) 2010-07-28 21:32:26

Is the crash gone?  How about the transfers, any
improvement?  

What version of software is running on the phone?  We've
seen some phone firmware expect different packet structures.
I doubt that is the case here, since the 7905 is not exactly
a new model, but I guess it is possible.

If the cli debug does not shed some light on it, then we may
need to start some packet captures.

By: Sergio (salecha) 2010-07-29 06:24:22

Cisco IP phones in 7910, being hung, when I press the button to transfer restarts asterisk (v1.6.2.9). But in Asterisk 1.8 Beta 3 does not happen.

By: dea (dea) 2010-07-29 13:50:27

I love it when it is something simple/stoopid.  I hate it when I spend
three days looking for it.

For testing, in skinny_hangup() move the call to:
     AST_LIST_REMOVE(&l->sub, sub, list);
below the: if (d->registered) block

We are removing the entry before we actualy do anything with the
request and then 10ish lines later we check to see if the subchannel
list is empty.

By: Damien Wedhorn (wedhorn) 2010-07-29 15:21:06

Moving the AST_LIST_REMOVE below the d->registered test may help this issue, but it will create another one. A sub attached to a line on a device that is not registered still needs to be removed from the l->sub list.

The basic structure is to detach the sub, cleanup the indications, then delete.

By: dea (dea) 2010-07-29 15:30:28

Sorry if I was not clear, the issue is that the AST_LIST_REMOVE occurs
before the AST_LIST_EMPTY.  I was suggesting that the remove be placed after
WHOLE if (d->registered) {} block.

I also just found a think-o in handle_transfer_button.  The first call to
transmit_callstate should go to newsub->callid and not sub->callid.

The last issue effects new firmware and that is mis-use of transmit_clear_display.  Older phones and firmware do not care, but newer
versions do.  It seems that almost every instance of transmit_clear_display
should be transmit_clearpromptmessage.

I do not have a patch yet as I am also testing a greatly simplified skinny_hangup() that may or may not be related to the fact that I can now
use attended and blind transfers without segfaults and without confused phones.

By: Damien Wedhorn (wedhorn) 2010-07-29 15:39:04

The sub needs to be removed before the test otherwise you'll have count the number of subs and compare to 1 (rather than just testing if there are any). The indications etc will depend on how many subs exist _after_ the current sub is removed.

It looks like you're right about the handle_transfer_button issue (surprised that wasn't picked up in testing). It wouldn't have impacted me because I only had old phones at the time.

I think the cleardisplay stuff is based on packet traces (79xx phones) but it may have changed in more recent firmware versions.

By: Jonathan Hunter (jmhunter) 2010-07-29 15:54:38

Thanks Dan, definitely getting there!!

So, the behaviour has changed with this patch (I have moved the AST_LIST_REMOVE line to below the whole if (d->registered) block, i.e. just above ast_mutex_lock(&sub->lock))

When I hang up (step 6 above), this now triggers a segfault straight away. Have I moved the line to the wrong place?

I've tried the code with the patch from last night, and also from svn ("svn revert channels/chan_skinny.c") - neither worked properly... :-(

I have attached the backtrace from the segfault - let me know if I can do anything more to check things?

Many thanks again!

By: dea (dea) 2010-07-29 15:57:19

You are indeed correct with the current code in skinny_hangup.
The attached v4 patch has two trivial changes that allow IP communicator
to transfer calls without corrupt displays.

Old and new phones seem OK with a clear_prompt replacing clear_display
while new phones throw errors about invalid message type 9A whenever
a clear_display is sent.  In any case it is not germane to this issue.

I also found that I could eliminate the active/inactive concepts from
skinny_hangup, but that is also not relevant to this issue.

Note to testers, revert all previous changes and apply only V4.  V3 also includes the fixes for the get_codec crash in 16046 and was uploaded by mistake.



By: Jonathan Hunter (jmhunter) 2010-07-29 16:19:59

Thanks Dan.. No v4 patch attached yet (or am I just not seeing it??)

By: Damien Wedhorn (wedhorn) 2010-07-29 16:24:09

jmhunter, you can use the V3 patch. The additional code will be committed and shouldn't cause any extra issues (and if it does, it would be good to know about them now).

By: Jonathan Hunter (jmhunter) 2010-07-29 16:33:09

Have tested with v3 and the phone thinks subchannel is still present. Then, when hanging up the DAHDI device, Asterisk crashes:

Program terminated with signal 11, Segmentation fault.
#0  0x002fc55d in strncasecmp () from /lib/libc.so.6
#0  0x002fc55d in strncasecmp () from /lib/libc.so.6
No symbol table info available.
#1  0x080ab215 in ast_channel_cmp_cb (obj=0xbc87520, arg=0xb7d8dcd4, flags=0) at channel.c:1608
       chan = (struct ast_channel *) 0xbc87520
       cmp_args = (struct ast_channel *) 0xb7d8dcd4
       name_len = 8
       ret = 1
       __PRETTY_FUNCTION__ = "ast_channel_cmp_cb"

By: Jonathan Hunter (jmhunter) 2010-07-29 16:57:33

Dan - you have made me very happy :-)

Applied v4 patch, and my first transfer was entirely successful! No Asterisk crash, and the phone knew the subchannel had gone!

That's excellent - thank you very much.

Is the patch OK for regular use? I'll be setting up the production server with these phones next week, so will apply this patch, and the ASTERISK-16400 patch, and report back after a few days of actual usage..

Thank you again!

By: Jonathan Hunter (jmhunter) 2010-07-29 17:05:00

OK, nearly there. In the exact scenario outlined above, this doesn't actually work.

If I miss out step 5 (i.e. hang up the skinny handset, rather than pressing the Transfer softkey) all is well.

If I press the Transfer softkey a second time, the same problem occurs (phone doesn't know subchannel has gone). No immediate crash when I tested it just now, but that may be luck rather than anything else..

By: dea (dea) 2010-07-29 17:06:13

All other changes reverted?  Backtrace2 is odd, as it appears
to have a valid line and subchannel.

I think we need a matrix of tests-

1.  Attended by transfer-dial-remote_answer-transfer
2.  Blind by transfer-dial-transfer
3.  Blind by transfer-dial-endcall
4.  Blind by transfer-dial-hangup (handset/speaker/headset)

Do you have a more complete backtrace from the crash?

By: Damien Wedhorn (wedhorn) 2010-07-29 17:06:41

jmlhunter, can you confirm that V3 and V4 give the same results (the additional stuff in V3 is another bug, and if it's causing issues would be of concern).

By: dea (dea) 2010-07-29 17:17:28

There was an instant segfault without the extras, but it looks like
the rtp_engine.c has been fixed, so tesing just V4 is possible after
a svn update main/rtp_engine.c

By: Jonathan Hunter (jmhunter) 2010-07-29 17:42:45

I'd also add:
5. Attended by transfer-dial-remote_answer-hangup

OK, I've now updated via "svn update" and am trying with patch v4. I ran "core restart now" in between each test, just in case.

Here are the results of my simple testing. Bear in mind I'm not familiar with all these behaviours, so I may be doing something wrong - hopefully not! In each example I am dialling my music-on-hold extension and transferring it to my test DAHDI phone.

Test 1: No crash; phone doesn't realise subchannel has gone; call transfers OK
Test 2: No crash; call didn't actually transfer (DAHDI phone stops ringing sometimes, and the softkeys on the phone change - I'm not certain what they should be in this situation)
Test 3: No crash; phone doesn't realise subchannel has gone; call didn't actually transfer (DAHDI phone kept ringing, but no audio when I picked it up)
Test 4: No crash; phone doesn't realise subchannel has gone; call didn't actually transfer (DAHDI phone kept ringing, but no audio when I picked it up)
Test 5: No crash; works OK

By: dea (dea) 2010-07-29 18:05:24

Progress. V5 introduces a couple additional transmit_callstate calls, and
a debugging/trace feature that may be too ugly to live.  It assigns a
call 'slot' number to be used by the phone display (01, 02, 03, etc)

If the additional transmit_callstate calls do not clear the display,
we can use the call slot number to figure out which call did not get
cleared from the display (the original, or the call setting up the transfer)

By: Jonathan Hunter (jmhunter) 2010-07-29 19:26:26

OK, now testing with v5.

Test 1: All worked OK
Test 2: Doesn't work. First time round, Asterisk segfaulted (ran out of space in /tmp, so no coredump, sorry!). Second time round, transfer didn't work properly (very odd behavior) but phone was not confused regarding subchannels, once I had hung up
Test 3, 4: Shows that call is transferring; DAHDI device rings, but no audio when picked up. Phone still confused re subchannels
Test 5: OK

I wonder if I am doing something different when testing the blind transfers.

I'm using the following steps:

a. Pick up skinny phone handset
b. Dial 6001 (my music on hold extension)
c. I can now hear music
d. Press 'Transfer' softkey
e. Dial 5022 (DAHDI extension)

From this point, the steps diverge:

Test 1: Pick up DAHDI handset, then press 'Transfer' sofkey, then replace skinny phone handset
Test 2: Press 'End call' softkey, then replace skinny phone handset
Test 3: Press 'Transfer' softkey, then replace skinny phone handset
Test 4: Replace skinny phone handset
Test 5: Pick up DAHDI handset, then replace skinny phone handset

By: Jonathan Hunter (jmhunter) 2010-07-29 19:30:45

Update: Test 1 actually doesn't work OK. The call transfer works, but the phone still indicates an outbound call "To 6001" is active. Pressing the End Call or Transfer sofkeys shows this:

[Jul 30 01:27:20] WARNING[18033]: chan_skinny.c:1675 find_subchannel_by_instance_reference: Could not find subchannel with reference '1' on 'jmhtest1'
Transfer: No subchannel to transfer

I'll attach a skinny debug log for test #1.

By: dea (dea) 2010-07-30 00:40:55

All my tests have been with a phone on each end, not MoH.
It really should work the same, but after the park issue,
I would not be surprised if it does not.

The verbose level was too low.  I expected to see two log
messages along the lines of 'Killing (active/inactive/only) sub'.
They are only displayed at verbose level 4 or higher.  Based on
the log and your last post, it is clear it was the inactive
sub that was orphaned, which has the least amount of state information
conveyed back to the phone, so I cannot say I am surprised.

I'm afraid the road forward is going to slow down and require a stack
of additional packet captures from a phone connected to call manager.

By: Damien Wedhorn (wedhorn) 2010-07-30 01:57:30

It might be easier if you
- configure --enable-dev-mode
- make menuselect (and under compiler flags select skinny devmode)
- make install

you'll now have the option:

skinny set debug packet

which should show the flow of packets (with only minimal details - but inline with the other debug messages)

Dan, a bit of playing around and the orphaned channel appears to run through "Killing active sub"



By: Damien Wedhorn (wedhorn) 2010-07-30 02:15:26

Dan, going back to V4, you've (and I had it wrong when commented) got the SKINNY_ONHOOK and sub->callid the wrong way around. sub->callid should come before the SKINNY state.

By: Damien Wedhorn (wedhorn) 2010-07-30 02:30:29

v6 uploaded. Fixes issues for me for both attended and unattended transfers.

By: Damien Wedhorn (wedhorn) 2010-07-30 16:46:25

v8 uploaded. Couple of extra tweaks
- phone can call itself and hangup (see second problem in 12324)
- indications of call 1 are not hosed when transfer button pressed

By: Jonathan Hunter (jmhunter) 2010-07-30 21:01:52

Just a quick note to say thanks for the latest patches - I am travelling at the moment and will be setting up the production server after the weekend. I will test asap using the new server, and report back!

By: Jonathan Hunter (jmhunter) 2010-08-03 04:47:40

OK - quick status report. Am still setting up new production infrastructure, so haven't been able to fully test to the same level as in my lab - but v8 patch seems to solve at least some of the problems I've been having.

Have just connected a 7906 to test with, and am hitting different segfault issues (also says "Unsupported device type '369 (7906)' found.") so may have to just convert these phones to SIP for now..

By: Damien Wedhorn (wedhorn) 2010-08-03 04:52:02

The 7906 isn't supported, but a quick google seems to indicate it's a fairly old single line phone with soft keys. Should be fairly easy to get working quickly. If you want, open another bug and we'll see if we can add support for it straight away.

By: Sergio (salecha) 2010-08-05 10:06:56

Index: channels/chan_skinny.c
channels/chan_skinny.c (revision 253158)

I found a solution to avoid the re-pressing the asterisk key cisco shuttle in 7910 (being hanged). I made two patches. Well, what I did was the following:

1 - Original
case STIMULUS_TRANSFER:
if (skinnydebug)
ast_verb(1, "Received Stimulus: Transfer(%d/%d)\n", instance, callreference);
if (l->transfer)
handle_transfer_button(sub);
else
transmit_displaynotify(d, "Transfer disabled", 10);
break;

PATCH

case STIMULUS_TRANSFER:
if (skinnydebug)
ast_verb(1, "Received Stimulus: Transfer(%d/%d)\n", instance, callreference);
if (l->transfer){
if (!sub) {
   ast_verbose("Transfer: No subchannel to transfer\n");
}else {
   handle_transfer_button(sub);
}
}else{
transmit_displaynotify(d, "Transfer disabled", 10);
}
break;


2 - Original

case SOFTKEY_TRNSFER:
if (skinnydebug)
ast_verb(1, "Received Softkey Event: Transfer(%d/%d)\n", instance, callreference);
if (l->transfer)
handle_transfer_button(sub);
else
transmit_displaynotify(d, "Transfer disabled", 10);
break;

PATCH

case SOFTKEY_TRNSFER:
if (skinnydebug)
ast_verb(1, "Received Softkey Event: Transfer(%d/%d)\n", instance, callreference);
if (l->transfer){
if (!sub) {
   ast_verbose("Transfer: No subchannel to transfer\n");
}else {
   handle_transfer_button(sub);
}
}else{
transmit_displaynotify(d, "Transfer disabled", 10);
}
break;

By: Damien Wedhorn (wedhorn) 2010-08-05 16:51:50

salecha, can you try v8 of the patch and see if it fixes the transfer issues in trunk.

By: Sergio (salecha) 2010-08-06 06:05:49

With the patch skinnytransfver.v8.diff transfer was solved.

1 - The extension 2850 call the Cisco phone 7910 (ext. 2410).
2 - Communication is set correctly.
3 - Press the transfer key on the Cisco 7910 to make the transfer and extension 2850 is on hold with music.
4 - The Cisco 7910 I have a dial tone and dial extension 2803.
5 - The 2803 extension accepts the call and establishing communication with the extension 2410.
6 - Press the transfer in the Cisco 7910 and the transfer is successful.

*** But now the only problem is in step 1, to answer the call during the first ring. restarts asterisk.



By: Sergio (salecha) 2010-08-06 08:36:19

Information: the problem in the first ring at random, sometimes occurs and sometimes not. but usually happens to answer the call in the middle of the first ring.

By: Sergio (salecha) 2010-08-07 08:16:25

hello. I commented that the problems yet to take care of transfer and on the first ring I tested with the following patches and the two work:

first solution---------------------------------------------------------------------------

patch for transfer
skinnytransfver.v8.diff
wget 'https://issues.asterisk.org/file_download.php?file_id=26865&type=bug' -O - | patch -p0
patch por first ring
cleanup.stateconnected2.diff
wget 'https://issues.asterisk.org/file_download.php?file_id=26930&type=bug' -O - | patch -p0

second solution---------------------------------------------------------------------------

patch for transfer
skinnytransfver.v8.diff
wget 'https://issues.asterisk.org/file_download.php?file_id=26865&type=bug' -O - | patch -p0

patch por first ring
skinny.answercrash.diff
wget 'https://issues.asterisk.org/file_download.php?file_id=26929&type=bug' -O - | patch -p0

---------------------------------------------------------------------------

By: Digium Subversion (svnbot) 2010-08-07 17:36:07

Repository: asterisk
Revision: 281257

U   trunk/channels/chan_skinny.c

------------------------------------------------------------------------
r281257 | wedhorn | 2010-08-07 17:36:06 -0500 (Sat, 07 Aug 2010) | 13 lines

Fix up handling and indications during transfer.

Cleaned up handling of onhook indications and added indications if more than one sub on device. Also fixes issue in 12324 so that the phone can call itself without locking up.

(closes issue ASTERISK-16411)
Reported by: jmhunter
Patches:
     chan_skinny-transfer-v4.txt uploaded by DEA (license 3)
     skinnytransfver.v8.diff uploaded by wedhorn (license 30)
Tested by: jmhunter, salecha, wedhorn

Review: NA

------------------------------------------------------------------------

http://svn.digium.com/view/asterisk?view=rev&revision=281257