[Home]

Summary:ASTERISK-18626: The patch found in r333265 on res_jabber.c breaks authentication to a jabber server.
Reporter:Robert Dailey (rdailey)Labels:
Date Opened:2011-09-26 09:36:55Date Closed:2011-10-20 15:14:45
Priority:MajorRegression?Yes
Status:Closed/CompleteComponents:Resources/res_jabber
Versions:1.8.7.0 Frequency of
Occurrence
Related
Issues:
is caused byASTERISK-18078 [patch] Segfault when publishing device states via XMPP and not connected
Environment:Attachments:( 0) jabber_fixes_component_v2.patch
( 1) jabber_fixes_component.patch
( 2) jabber_patch_ammends.patch
( 3) jabber-ammends-non-sasl-connection.patch
Description:The patch found in r333265 on res_jabber.c breaks authentication to a jabber server.  Asterisk is configured as a jabber component.  Working normally in 1.8.6.0.  Upgraded to 1.8.7.0, and the following messages appear, indicating a problem:

WARNING[7052]: res_jabber.c:1473 aji_send_raw: JABBER: Unable to send message to asterisk, we are not connected
WARNING[7052]: res_jabber.c:1737 aji_act_hook: Jabber didn't seem to handshake, failed to authenticate.

The connection to the jabber server remains in state "Connecting" forever, while giving the above warnings every 55 seconds.  Reverting the patch from r333265 solves the problem, allowing normal connection.
Comments:By: Leif Madsen (lmadsen) 2011-09-26 09:52:07.139-0500

http://svnview.digium.com/svn/asterisk?view=revision&revision=333265

By: Michael L. Young (elguero) 2011-09-26 10:18:20.590-0500

Can you provide the XMPP server you are using and the version of the server?

By: Robert Dailey (rdailey) 2011-09-26 10:55:24.232-0500

It has been tested against openfire 3.6.4 and ejabberd (version unknown at the time of this posting).

By: Jonathan Rose (jrose) 2011-09-26 12:22:43.341-0500

Hmmmm, this one is quite flabbergasting.  I'm poking at it, but could you please inject these lines:

ast_verb(3, "client state = %d\n", client->state);
ast_verb(3, "client timeout = %d\n", client->timeout);

above :
if (aji_is_secure(client)) {
at line 1455.

and then tell me the verbose output of what that says while trying to connect?

I'm still trying to come up with a way to clean up these extra conditions while still mitigating the segfault mentioned in ASTERISK-18078.

By: Robert Dailey (rdailey) 2011-09-26 14:57:27.323-0500

[Sep 26 14:56:42] WARNING[2941]: res_jabber.c:1476 aji_send_raw: JABBER: Unable to send message to asterisk, we are not connected[Sep 26 14:56:43] WARNING[2941]: res_jabber.c:1740 aji_act_hook: Jabber didn't seem to handshake, failed to authenticate.
   -- client state = 3
   -- client timeout = 0
   -- client state = 3
   -- client timeout = 0
   -- client state = 3
   -- client timeout = 0
   -- client state = 3
   -- client timeout = 0
   -- client state = 1
   -- client timeout = 50
[Sep 26 14:57:37] WARNING[2941]: res_jabber.c:1476 aji_send_raw: JABBER: Unable to send message to asterisk, we are not connected[Sep 26 14:57:38] WARNING[2941]: res_jabber.c:1740 aji_act_hook: Jabber didn't seem to handshake, failed to authenticate.


By: Michael L. Young (elguero) 2011-09-26 15:24:25.586-0500

Jonathan,

I feel bad since this was my patch if I broke something so I hope I am not overstepping by working on this as well to help find a solution.

I think I have the solution if you want to take a look at the attached patch.

I confirmed the error and confirmed that this patch seems to fix the problem.

It would appear when using the component setting for res_jabber.c, the client state in the current code is not set to AJI_CONNECTING until after the function ast_aji_send is called.  Therefore, ast_aji_send sees the status is not AJI_CONNECTING and therefore doesn't send the handshake.

Under the client setup (which is what I tested with), this is not an issue because the client->state is set to AJI_CONNECTING before that function is called.

So, the solution is to set client->state to AJI_CONNECTING before calling ast_aji_send.  I did a little clean up and added a comment to make the code a little clearer.

By: Michael L. Young (elguero) 2011-09-26 17:21:22.834-0500

Here is patch version 2 based on the testing that was done by the reporter.

from irc:

{noformat}
<p3nguin> It's connected!
<p3nguin> But I'm still getting the message that it is not connected.
<p3nguin> [Sep 26 16:12:02] WARNING[21003]: res_jabber.c:1473 aji_send_raw: JABBER: Unable to send message to asterisk, we are not connected[Sep 26 16:12:02] WARNING[21003]: res_jabber.c:2765 aji_recv_loop: JABBER:  Network Timeout
<p3nguin> and then there's the unrelated Invalid XML error, too.
<JonathanRose> hmmmm
<JonathanRose> Is the connection working otherwise?
<JonathanRose> The fact that it is say it isn't connected could easily just mean it isn't connected at that moment.
<p3nguin> It looks like it.
<JonathanRose> Is the message repeating over and over again like before?
<p3nguin> JabberSend() is working; I receive the message.
<p3nguin> Yes, 55 seconds.
<JonathanRose> Well that's a positive result.
<JonathanRose> hmmmm
<JonathanRose> That's a little distressing.
<p3nguin> jabber show connections shows Connected every time I check it.
<JonathanRose> alright
<JonathanRose> I'll continue looking at this tomorrow.  I think we are on the right track though.
<p3nguin> Wait.
<p3nguin> It shows Disconnected.
<p3nguin> Then goes to connected again.
<JonathanRose> Hmmmm
<p3nguin> I ran jabber show connections right when the warning popped up, and it was disconnected.
{noformat}

This is what I wrote on irc but I think everyone had left for the day:
{noformat}
<elguero> JonathanRose: Looks like the problem is that client->state is not being updated properly?... look at around line 2827 in aji_recv_loop... if client->timeout  is 0 and the client state is AJI_CONNECTED, if keepalive is set, it will try to send a keep alive... but aji_send_raw will return -1 because timeout is 0 and the state is not AJI_CONNECTING
<elguero> probably adding client->state = AJI_CONNECTING above the call to aji_send_raw would take care of that
{noformat}

Also one more note, when connecting as component the publishing of devices states is turned off.

By: Jonathan Rose (jrose) 2011-09-27 09:49:13.768-0500

Reproduced the bugs we were having yesterday this morning on my test box and this patch seems to fix them.

I'm just a little confused on this section though...
@@ -2818,9 +2826,11 @@
} else if (res == IKS_NET_TLSFAIL) {
ast_log(LOG_ERROR, "JABBER:  Failure in TLS.\n");
} else if (client->timeout == 0 && client->state == AJI_CONNECTED) {
+ client->state = AJI_CONNECTING;
res = client->keepalive ? aji_send_raw(client, " ") : IKS_OK;
if (res == IKS_OK) {
client->timeout = 50;
+ client->state = AJI_CONNECTED;
} else {
ast_log(LOG_WARNING, "JABBER:  Network Timeout\n");
}


I don't really love the status change we are using here, because we aren't connecting, we are just sending some kind of keep-alive.  Perhaps what we should be doing instead is manually changing the timeout to a non-zero value before calling aji_send_raw (so that we don't get dumped for being connected with a timeout of 0), and if we don't go back and change client->timeout to 50, we set it back to 0 when we are done.

So... instead, it'd be like:

@@ -2818,9 +2826,11 @@
} else if (res == IKS_NET_TLSFAIL) {
ast_log(LOG_ERROR, "JABBER:  Failure in TLS.\n");
} else if (client->timeout == 0 && client->state == AJI_CONNECTED) {
+ client->timeout = -1
res = client->keepalive ? aji_send_raw(client, " ") : IKS_OK;
if (res == IKS_OK) {
client->timeout = 50;
} else {
ast_log(LOG_WARNING, "JABBER:  Network Timeout\n");
+ client->timeout = 0;
}


I've already confirmed that this works.  I'd just like your take on whether or not you think it's more or less appropriate.

Also, I think I'll wait to commit this until Mr. Dailey has had an opportunity to test the new patch as well.



EDIT:  Ok, wow, that came out really ugly.  Here, I'll pastebin it.
http://pastebin.com/UstpLgRQ

By: Michael L. Young (elguero) 2011-09-27 14:16:21.706-0500

Well, I agree with not changing the client state since it seems to indicate a disconnect occurred when all we are doing is sending a keep alive.

Here is the only thing that I would be concerned with.  The idea for checking the client->timeout before calling iks_send_raw() was as a precaution to make sure we didn't just lose the connection and still think we were connected to the server.  Trying to send something to the XMPP server when there was no connection is what would cause a segfault.  If we change the timeout to -1 in order to bypass that condition, could we be possibly setting ourselves up for a segfault?

That is my only concern with going this route.

By: Jonathan Rose (jrose) 2011-09-27 14:49:04.104-0500

Changes a little of Michael's patch to work in a non-state-change way, explicitly allowing exemptions to be made for the check on aji_send_raw when sending a keep_alive message.

By: Jonathan Rose (jrose) 2011-09-27 14:50:15.635-0500

<elguero> let me know what you think...
<JonathanRose> elguero:  I think that if that were the case, we'd be exposing the exact same problem by changing the state to connecting.
<JonathanRose> The appropriate course of action I believe if that could be a potential problem... would be to add a helper function to aji_send_raw so that we can include a bypass to that check in this particular instance.
<elguero> JonathanRose: you know.. you are right
<JonathanRose> Yay!  I'm almost caught up with the broken clock today then.
<elguero> Good, its about time!  ... iif I remember correctly, the segfault problem was mainly with iksemel not handling things properly and causing the segfault... so we are trying to work around that behaviour... so, I am all for your change to the patch
<elguero> really shouldn't be a problem
<JonathanRose> Like I was saying, instead of doing this elaborate thing with the change of states...
<JonathanRose> and the timer change
<JonathanRose> we create a helper function for aji_send_raw that takes an additional argument which allows it to skip that section.
<JonathanRose> I'll write up a change to the patch real fast.
<elguero> that sounds good to me...
<JonathanRose> http://pastebin.com/21CriB9E
<JonathanRose> elguero:  Like that.  I think.
<elguero> JonathanRose: that works for me... I like it...
<JonathanRose> Now if the stars align and I can get the p3nguin to test it, we'll be all set.
<elguero> exactly what I was thinking... hope we can get him to test this soon
<JonathanRose> I'll go ahead and post this patch to the issue.

Also a copy of our discourse that led to the above attachment.

By: Robert Dailey (rdailey) 2011-09-27 23:25:44.765-0500

I'm having some issues with the revised (latest) patch.  I'll try to get you on IRC later to work this out so we can continue testing.

By: Jonathan Rose (jrose) 2011-09-28 08:26:55.924-0500

Make sure you are using a fresh copy of 1.8.7.0 either from tar or from the svn/asterisk/tags folder.  Go to the 1.8.7.0 directory (where you'd configure/make from) and follow these steps.

$ wget https://issues.asterisk.org/jira/secure/attachment/41249/jabber_patch_ammends.patch

*should see a lot of connection/download/progress stuff here, you probably already know what to expect.

$ patch -p0 < jabber_patch_ammends.patch

*then you should see:
patching file res/res_jabber.c

I've already tested this on a fresh checkout of 1.8.7.0 from the svn repository, so I'm going to have to chalk this one up to patching mistakes for now unless the above also doesn't work.

By: Robert Dailey (rdailey) 2011-09-28 11:56:31.290-0500

It has something to do with copying the patch out of the pastebin you left for me on IRC yesterday -- I didn't see the attachment here.  Using the downloaded patch file fixed that little problem...  I always apply the patch with -p0 unless the paths listed in the patch need adjustment.  Even then I'll sometimes edit the paths so I can patch with -p0.

With the patch, I'm not seeing the warnings anymore.  I'm also not seeing any Disconnected state with jabber show connections when I check it periodically.  This may be the winning entry!  I'll continue running this change today to see if anything else arises.  Thank you all for your efforts in this and other issues.

By: Jonathan Rose (jrose) 2011-09-28 15:40:45.091-0500

No problem.

I went ahead and threw the patch up on review board.  https://reviewboard.asterisk.org/r/1464/

I think we'll probably have this closed out pretty soon if you don't uncover some latent disaster.

By: Barry L. Kline (blkline) 2011-09-28 16:50:54.389-0500

I've tried the patch shown in Jonathan Rose's post at 3:40.   I get the exact same result as I did before:

WARNING[22482]: res_jabber.c:1481 aji_send_raw_full: JABBER: Unable to send message to vrec, we are not connected[Sep 28 17:44:36]

I have a fresh installation of 1.8.7, patched as above and installed on a CentOS 5.6 32-bit system.   1.8.6 works as expected.

Version: Openfire 3.7.0

By: Michael L. Young (elguero) 2011-09-28 22:12:22.920-0500

Barry,

Can you post any debug information?  In jabber.conf, you can turn on the debugging.  Also, how are you connecting to Openfire, as client or component?  Using TLS, SASL?  Perhaps your jabber.conf settings would be helpful.

That info will help a lot.

I have tested this patch on Centos 5.7 x86 using trunk.  I have tested connecting as client and as component.  The tests have been with the latest ejabberd xmpp server, all with success.  So any extra info would be very helpful.

By: Barry L. Kline (blkline) 2011-09-29 18:04:47.229-0500

Hi Michael.

With debug=true in jabber.conf I got this message on the console:

[Sep 29 18:57:13]
JABBER: vrec INCOMING: <?xml version='1.0' encoding='UTF-8'?><stream:stream xmlns:stream="http://etherx.jabber.org/streams" xmlns="jabber:client" from="jabber" id="9eb4688c" xml:lang="en" version="1.0"><stream:features><mechanisms xmlns="urn:ietf:params:xml:ns:xmpp-sasl"><mechanism>DIGEST-MD5</mechanism><mechanism>PLAIN</mechanism><mechanism>CRAM-MD5</mechanism></mechanisms><compression xmlns="http://jabber.org/features/compress"><method>zlib</method></compression><auth xmlns="http://jabber.org/features/iq-auth"/></stream:features>
[Sep 29 18:57:13] WARNING[23231]: res_jabber.c:1481 aji_send_raw_full: JABBER: Unable to send message to vrec, we are not connected[Sep 29 18:57:36]   == Registered application 'VoiceMail'

jabber.conf:

[general]
debug=yes                                                                       ;;Turn on debugging by default.
autoprune=no                                                            ;;Auto remove users from buddy list.
autoregister=yes                                                        ;;Auto register users from buddy list.

[vrec]                                                          ;;label
type=client
serverhost=jabber.mydomain
username=asteriskpbx2@jabber/jabber
secret=PASSWORD
usetls=no
usesasl=no                                                                      ;;Use sasl or not
statusmessage="I am your PBX"
;
timeout=100


If there's anything else I can add please let me know.

By: Michael L. Young (elguero) 2011-09-29 23:01:23.172-0500

Okay.  That did help.  Thank you for the extra information.

Here is my assessment as to what is happening in this other situation.  Lets see if anyone agrees with it.

My config and testing was done with SASL and TLS set to yes.  I thought I had tested with them off at one point and then turned it back on but I guess not.

Starting to understand the sequence of events here better in the code.  Looks like at the very beginning, the client is initialized.  The very first message sent to the XMPP server is the header message.  That header message is sent out through iksemel library using iks_send_raw(), not through aji_send_raw().  Asterisk receives a reply back from the xmpp server with the methods it can use for connecting.  Asterisk then attempts to connect but the check for the client state now prevents the "unencrypted" message from being sent because we are "disconnected" to the server since we haven't finished the authorization routine resulting in the message that Barry is getting.

Looking at aji_reconnect(), sets the client->state to AJI_DISCONNECTED.  Then it calls aji_initialize, which sets up the client connection.  I am thinking that if we then have a connection to the server, before returning IKS_OK, we could set the client->state to AJI_CONNECTING in order to send authorization info.  We know we have a path to the server, we should then finish the connecting routine and probably changing the client state would be safe.

Jonathan will probably have to trace all of this but that is my understanding of where we are failing in this scenario.

FYI:  This method of connecting is obsolete http://xmpp.org/extensions/xep-0078.html.

One final point, I couldn't get this scenario to connect with ejabberd as is.  It responds that jabber:iq:auth (non-sasl) is unavailable.  I found this link that says if a client advertises that it is version 1.0 compatible (which Asterisk is doing as well as iksemel library does with the initial header), the client has to support SASL and TLS.  Therefore, ejabberd won't let the client (asterisk) connect with non-sasl.  It would appear that perhaps OpenFire does allow it?

By: Michael L. Young (elguero) 2011-09-29 23:07:21.575-0500

The last patch uploaded is a fail.

It messes up the roster and device events from working properly.



By: Jonathan Rose (jrose) 2011-09-30 10:25:38.248-0500

Well, since we stay in disconnected mode before registering, we can't send the message since we explicitly aren't sending the message unless we are connected (with time left before timeout) or connecting.  You would normally think this sort of thing would put you into the connecting state, but it looks as though we just don't bother with state control when we aren't using one of the secure modes... which seems rather backwards to me.

On the other hand... I reverted to 333264 and tested this, and while I didn't get the error messages, I didn't exactly get connected either.  Jabber messages wouldn't arrive for me and I didn't show up as logged in on my other clients.

So what I'd like from Mr. Kline for now would be for him to do similar steps when working from r333264 so that I can see if he is getting connected in that version anyway.

Steps to take...

svn up -r 333264 (if checked out from svn.  If not, svn co <asterisk1.8address> -r 333264)
build asterisk as you normally would.
set debug to yes in jabber.conf

1. asterisk -gcvvv
2. wait until the CLI> prompt appears
3. Some JABBER debug messages should fire as a dialog between asterisk and the server gets responded to.  Start the log at the first of these messages.
4. jabber show connections
5. end the log at this display of the connection state.

I'm also looking into seeing if there isn't a way to make the r333265 patch less intrusive by using something other than states to skip the iks_send_raw.

By: Michael L. Young (elguero) 2011-09-30 10:47:22.563-0500

Yep, it doesn't seem to me that the states are being used exactly the way one would think.

I was able to connect and use JabberSend with that last patch (using sasl).  But, it looks like some other functions are checking for AJI_DISCONNECTED during the whole initialization routine and that is why they did not fire off some of the other functions when creating the client connection and connecting.

Since non-sasl authentication is obsolete, we might need to eventually take out that support... not sure how much of an impact that would have but something we should consider down the road.

By: Barry L. Kline (blkline) 2011-09-30 18:48:34.237-0500

I attempted to take the steps that you mentioned but I'm getting a segfault upon running Asterisk.  I did delete all modules from the older build (heeding the warning provided by "make install" but the segfaults remain.

Barry



By: Jonathan Rose (jrose) 2011-10-03 08:33:32.737-0500

Barry, could you please get a back trace for this segfault?

The steps, in case you aren't familiar:
1. Recompile asterisk with the menuselect option 'DONT_OPTIMIZE' in the Compiler Flags section.

2. $gdb asterisk
*requires gnu debugger

3. set args -gc

4. run

5. once asterisk is fully started, do whatever you have to do to cause the segfault.

6. When gdb gives you a prompt, enter: bt full

Then just copy the output of bt full for our use here.

By: Jonathan Rose (jrose) 2011-10-03 16:09:53.978-0500

Would anyone object if I reverted 333265 until we can work out a more comprehensive patch for this?  I've been looking through res_jabber today and I'm seeing a lot of potential for nasty problems to occur and I'm thinking that we really shouldn't be making our decisions for whether to send messages or not around the connection state.  The problem is that the iks_parser struct is being mangled when our connection fails and we have no real means of indicating this in our client structure right now.  I think this can probably be fixed pretty easily though.

By: Barry L. Kline (blkline) 2011-10-03 16:31:46.329-0500

I'd be happy just to be able to upgrade to 1.8.7 without trouble with jabber!  I'm certainly willing to do whatever testing that you want, but I can only work with the affected server in the evenings when the business is closed.  So this makes it quite slow-going for me.

By: Michael L. Young (elguero) 2011-10-03 17:18:21.184-0500

We might as well fix it all the way instead of quick fixes.

That is fine with me, to revert... only becomes a problem if the connection to the XMPP server is lost, which is very seldom in my case.

Don't want to hold up the next release especially since this regression has the possibility of affecting anyone connecting as a component to their XMPP server.

By: Barry L. Kline (blkline) 2011-10-20 12:34:11.108-0500

I just downloaded the patches for 1.8.7.1 and didn't see where the changes to this module from 1.8.6 were reverted.  I'm just wondering what the status is.  Will 1.8.8 have res_jabber back to the original version?  I'm eager as I'm stopped at 1.8.6 until there is a resolution to this problem.

As a workaround, will the res_jabber.c file from 1.8.6 drop into the 1.8.7 tree or are there other changes that preclude me from doing so?

Thanks!

Barry


By: Jonathan Rose (jrose) 2011-10-20 15:14:45.044-0500

The patch causing the regression was reverted.  ASTERISK-18078 is still causing significant problems though and needs to be addressed.