Opened 4 years ago
Last modified 7 weeks ago
#175 reopened defect
libpurple API called from different threads
| Reported by: | cavedon | Owned by: | vadim |
|---|---|---|---|
| Priority: | critical | Milestone: | QuteCom 3.0 |
| Component: | misc | Version: | 2.2 |
| Keywords: | Cc: |
Description
libpurple is not thread safe add all API must be called from the *same* thread:
http://developer.pidgin.im/wiki/WhatIsLibpurple#Isitthreaded
In the current implementation, libpurple is initialized in
PurpleIMFactory::PurpleIMInit(), but all other operations are handled by a glib event loop PurpleMainEventLoop?() running in a seperate thread created by PurpleIMFactory::PurpleIMFactory
For this reason, libpurple might crash in very weird ways (as I actually found trying to use an external libpurple compiled against nss).
I think the easiest way should be handling initialization and every call to libpurple de-initialization of liburple inside that glib loop (which is created before initializing libpurple).
The alternative, more efficient, would be to get rid of the glib event loop and sue WengoPhone::runEvents() even loop also for libpurple. I do not know how hard it is tough, as it should also handle event on sockets.
Attachments (4)
Change History (20)
comment:1 Changed 4 years ago by cavedon
comment:2 Changed 3 years ago by chris-mac
- Milestone changed from QuteCom 2.3 to QuteCom 2.2-RC4
- Version changed from 2.2-RC3 to 2.2
comment:3 Changed 3 years ago by laurent
- Resolution set to fixed
- Status changed from new to closed
comment:4 Changed 3 years ago by cavedon
- field_os set to all
- Resolution fixed deleted
- Status changed from closed to reopened
Thanks for fixing this issue. We are getting closer to having qutecom reliably working with external libpruple!
However I noticed that there is race condition: PurpleIMFactory needs to wait for libpurple to be initialized before continuing, because it will make calls into libpurple API.
I am attaching a patch to fix this. I am not an expert on glib, so feel free to rewrite it if there is a better way to do that.
However, I have realized that there are many other calls into libpurple API that are not done in the proper thread, but come from the main qutecom thread, and this violates libpurple specification. However, it would not make sense to wrap every possible calls with some code to schedule it on the glib thread with the glib main loop. So, in the end, probably the only viable via to *ensure* thread-safety in libpurple would be to merge the two loops (i.e. make libpurple use the main application loop).
Changed 3 years ago by cavedon
Changed 3 years ago by cavedon
comment:5 Changed 3 years ago by cavedon
Please consider "purple-wait-init.patch" and ignore "purple-wait-init.2.patch", sorry for the double upload.
comment:6 follow-up: ↓ 7 Changed 3 years ago by nikita
Hello, to fix http://trac.qutecom.org/ticket/173 we have merged the two eventloop under linux when it's possible (Qt configured to use glib eventloop).
I will try your patch as soon as possible.
comment:7 in reply to: ↑ 6 ; follow-up: ↓ 10 Changed 3 years ago by cavedon
Replying to nikita:
Hello, to fix http://trac.qutecom.org/ticket/173 we have merged the two eventloop under linux when it's possible (Qt configured to use glib eventloop).
Mhm, there are at least 3 event loops in qutecom
- Qt event loop (for the GUI)
- Model even loop (based on boost thread and implemented in owutil/thread), run by the WengoPhone? class
- libpurple glib-based event loop
In order to fix this issue we need to merge the libpurple loop with the *model* loop.
Actually merging the GUI loop with the libpurple loop, wouldn't it worsen the user experience, as the GUI might get blocked by libpurple operations? (apart from breaking the Model-View-Controller paradigm...)
I will try your patch as soon as possible.
Thanks.
PS: of course if the model & libpurple threads get merged, than my patch, as well as changeset [519] should no longer be needed.
comment:8 follow-up: ↓ 9 Changed 3 years ago by cavedon
Mhm, now it crashing right at startup with:
** (<unknown>:5513): CRITICAL **: purple_eventloop_get_ui_ops: assertion `eventloop_ui_ops != NULL' failed Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffd5ffb910 (LWP 5554)] 0x00007ffff4dcae43 in purple_timeout_add (interval=0x0, function=0xbce5b2 <PurpleIMConnect::connectCbk(void*)>, data=0x7fffd8009330) at /usr/src/debian/work/build-area/pidgin-2.6.6/./libpurple/eventloop.c:36 36 /usr/src/debian/work/build-area/pidgin-2.6.6/./libpurple/eventloop.c: No such file or directory. in /usr/src/debian/work/build-area/pidgin-2.6.6/./libpurple/eventloop.c Current language: auto The current source language is "auto; currently c". gdb$ bt #0 0x00007ffff4dcae43 in purple_timeout_add (interval=0x0, function=0xbce5b2 <PurpleIMConnect::connectCbk(void*)>, data=0x7fffd8009330) at /usr/src/debian/work/build-area/pidgin-2.6.6/./libpurple/eventloop.c:36 #1 0x0000000000bce818 in PurpleIMConnect::connect (this=0x7fffd8005760) at /home/cavedon/qutecom/qutecom-2.2/libs/imwrapper/src/purple/PurpleIMConnect.cpp:243 #2 0x0000000000dd22ee in Connect::timeoutEventHandler (this=0x7fffd8001d40, sender=...) at /home/cavedon/qutecom/qutecom-2.2/wengophone/src/model/connect/Connect.cpp:114 #3 0x0000000000dd6f92 in boost::_mfi::mf1<void, Connect, Timer&>::operator() (this=0x7fffd80088f0, p=0x7fffd8001d40, a1=...) at /usr/include/boost/bind/mem_fn_template.hpp:162 [...] #30 0x0000000000e520bf in PrivateThread::run (this=0x7fffd8003a10) at /home/cavedon/qutecom/qutecom-2.2/libs/owutil/thread/src/Timer.cpp:108 #31 0x00007ffff4d43ae1 in Thread::runThread (this=0x7fffd8003a10) at /home/cavedon/qutecom/qutecom-2.2/libs/owutil/thread/src/Thread.cpp:70 [...] gdb$ info threads * 19 Thread 0x7fffd5ffb910 (LWP 5554) 0x00007ffff4dcae43 in purple_timeout_add (interval=0x0, function=0xbce5b2 <PurpleIMConnect::connectCbk(void*)>, data=0x7fffd8009330) at /usr/src/debian/work/build-area/pidgin-2.6.6/./libpurple/eventloop.c:36 18 Thread 0x7fffd67fc910 (LWP 5553) 0x00007ffff7bd2977 in close () from /lib/libpthread.so.0 17 Thread 0x7fffd6ffd910 (LWP 5552) 0x00007fffef164f82 in select () from /lib/libc.so.6 16 Thread 0x7fffd77fe910 (LWP 5551) 0x00007fffef164f82 in select () from /lib/libc.so.6 15 Thread 0x7fffd7fff910 (LWP 5550) 0x00007fffef13cc61 in nanosleep () from /lib/libc.so.6 14 Thread 0x7fffdd6da910 (LWP 5549) 0x00007ffff7bd290b in read () from /lib/libpthread.so.0 4 Thread 0x7fffe068c910 (LWP 5539) 0x00007fffef164f82 in select () from /lib/libc.so.6 3 Thread 0x7fffe680f910 (LWP 5538) 0x00007fffef164f82 in select () from /lib/libc.so.6 2 Thread 0x7fffe7010910 (LWP 5537) pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:261 1 Thread 0x7ffff7e14770 (LWP 5534) 0x00007fffef160743 in *__GI___poll (fds=<value optimized out>, nfds=<value optimized out>, timeout=0xffffffff) at ../sysdeps/unix/sysv/linux/poll.c:87
This without my patch and with external libpurple 2.6.6.
comment:9 in reply to: ↑ 8 Changed 3 years ago by cavedon
Replying to cavedon:
This without my patch and with external libpurple 2.6.6.
purple-wait-init.patch fixes this issue. It was due to a race condition in libpurple initialization.
comment:10 in reply to: ↑ 7 ; follow-up: ↓ 11 Changed 3 years ago by nikita
The patchset I was pointing you is to solve http://trac.qutecom.org/ticket/173, it wasn't related to the current report. I pointed it because it's in some way related to the current report. I don't really know why we have that bug, but it's seems to work if Qt is using libpurple eventloop, anyway it's just a creepy workaround until we decide how to resolve imwrapper problem.
About imwrapper thread-safety, I'm afraid of that I'm not a glib expert too, I agree that we need to rewrite the imwrapper <-> model calls or to merge libpurple eventloop with the model loop. But it's certainly a lot of changes, we are thinking about the best way to do it, merging both loops with boost (how difficult it will be considering the actual imwrapper and libpurple glib requirements ?), keeping two separated threads and using Qt signals/slots for communication.
All suggestions or comments are welcome.
About your patch, he will be useful since mine his just concerning Qt with glib under linux.
comment:11 in reply to: ↑ 10 Changed 3 years ago by cavedon
Replying to nikita:
The patchset I was pointing you is to solve http://trac.qutecom.org/ticket/173, it wasn't related to the current report. I pointed it because it's in some way related to the current report.
Ah, ok, sorry for the misunderstanding.
About imwrapper thread-safety, I'm afraid of that I'm not a glib expert too, I agree that we need to rewrite the imwrapper <-> model calls or to merge libpurple eventloop with the model loop. But it's certainly a lot of changes, we are thinking about the best way to do it, merging both loops with boost (how difficult it will be considering the actual imwrapper and libpurple glib requirements ?), keeping two separated threads and using Qt signals/slots for communication.
All suggestions or comments are welcome.
From what I understood, libpurple does not strictly need a glib event loop. You just need to provide some callback that libpurple will use to interface with your even loop. http://developer.pidgin.im/doxygen/dev/html/struct__PurpleEventLoopUiOps.html
The problem of interfacing libpurple with qutecom model loop is that libpurple needs notification about events on file descripts, while the custom boost-based loop does not support it (AFAICT).
IMHO, it does not make sense to extend the current model loop, and reinvent another even-loop. I think the best way would be to change the model loop so that it runs on a glib loop, at a very low level. (i.e. instead of adding events on a custom queue, add them to the glib loop event queue). This change should not be too invasive, and the integration with libpurple would come for free.
I will try to write a patch this weekend, and see how hard it gets.
In any case, I would keep this completely separated from Qt!
Thanks for the feedback.
Changed 3 years ago by cavedon
Changed 3 years ago by cavedon
comment:12 Changed 3 years ago by cavedon
The patch itself for making the Thread class run on a glib event loop is pretty simple (thread-glib.patch) and seems to work.
The patch for making libpurple use the model thread brings some issues I have not solved yet. I am attching the patch so far (libpurple-model-thread.patch) for reference, even though there is some memory corruption happening with libpurple and qutecom segfaults.
Some comments:
- PurpleIMFactory is called from the thread that will run the event loop, so no need to schedule an event for calling purple_core_{init,quit}()
- We need to create the glib main loop before creating the PurpleIMFactory
- We do not want to run on the default glib loop (as Qt ill probably use it), so we create our own GMainContext and call glib API specifying our own context
There is still an issue in the way timeouts are implemented in qutecom. Instead of the callback being called in the same thread after the given timeout, a Timer object is created, which spawns a new thread, which sleeps for the given time and then invokes the callback. The problem is that the callback is invoked in the Timer thread, which is *bad* for libpurple.
Because of that I was getting crash, and had to reschedule an event in the model thread from the callback (see model/connect/Connect.cpp). However I do not know how to easily find all callbacks which need this treatment.
Another possibility would be to change Timer implementation and map qutecom timeouts to glib timeouts in the model thread. The patch is not complicated. I like this idea also because we get rid of that bunch of threads being created and killed. However, I am not sure this will not bring other issues, as the behavior of the callback being called from another thread is documenebted in the Timer API...
Do you have any idea?
Thanks!
comment:13 Changed 2 years ago by dneary
- Milestone changed from QuteCom 2.2-RC4 to QuteCom 3.0
comment:14 Changed 2 years ago by dneary
Has this patch (or a similar one) been committed yet? Can we close this issue?
comment:15 Changed 2 years ago by cavedon
Unfortunately the problem is not fixed.
Libpurple is not being used according to the specification. According to the docs, all API functions must be called from the same thread. However this is not true on QuteCom.
Apparently when a private copy of libpurple is used, this is not causing big issues. But when system libpurple is used, it makes qutecom crash qt logout in some case. I posted two patches to fix the isssue, but for some reason I was not able to understand, the second patch causes memeory corruption. A summary of the status is at comment:12

I should also mention that I tracked down the crash I am having to the actual fact that calls were coming from different threads. I can give you details about the crash, if you want, but it is not the point.