Thread programming is fun, starting a tons of process and syncing them on a task, or asynchronously process a bunch of data. But that fun is having a cost: data sharing have to be somewhat locked or well ordered when accessing it. Else you’re going to face some seriously strange situations like time warping variables at the best and big crashes at worse.
On top of that let’s add that the main tools to protect our datas reads / writes by threads can also lead to strange situations where only parts of the code is still running and the rest is in “deadlock” state.
On little examples detecting those deadlocks seems easy, but when scaled to a bigger project/api, it’s a living hell.
Tools like valgrind may detect the problems at the cost of big performance loss which may lead to being unable to run into the problem before running into the machine limits.
I was searching for a good ressource to help me in my debugging quest when I stumbled upon that article from Aurelian Melinte which I think is very good (both the author and article): https://linuxgazette.net/150/melinte.html that provide some code samples to superseed the pthread functions with some that will make some checks during the calls.
There is also the gdb deadlock detector script by DamZiobro : https://github.com/DamZiobro/gdb-automatic-deadlock-detector
And finally valgrind- helgrind: http://valgrind.org/docs/manual/hg-manual.html
Now back to code and monitoring/debugging !