Florian Weimer writes:
Christian Grothoff:
> I'm seeing some _very_ odd behavior with processes hanging on exit (?)
> with GNU libc 2.28-6 on Debian (amd64 threadripper). This seems to
> happen at random (for random tests, with very low frequency!) in the
> GNUnet (Git master) testsuite when a child process is about to exit.
It looks like you call exit from a signal handler, see
src/util/scheduler.c:
/**
* Signal handler called for signals that should cause us to shutdown.
*/
static void
sighandler_shutdown ()
{
static char c;
int old_errno = errno; /* backup errno */
if (getpid () != my_pid)
exit (1); /* we have fork'ed since the signal handler was created,
* ignore the signal, see https://gnunet.org/vfork discussion */
GNUNET_DISK_file_write (GNUNET_DISK_pipe_handle
(shutdown_pipe_handle, GNUNET_DISK_PIPE_END_WRITE),
&c, sizeof (c));
errno = old_errno;
}
In general, this results in undefined behavior because exit (unlike
_exit) is not an async-signal-safe function.
I suspect you either call the exit function while a fork is in progress,
or since you register this signal handler multiple times for different
signals:
sh->shc_int = GNUNET_SIGNAL_handler_install (SIGINT,
&sighandler_shutdown);
sh->shc_term = GNUNET_SIGNAL_handler_install (SIGTERM,
&sighandler_shutdown);
one call to exit might interrupt another call to exit if both signals
are delivered to the process.
The deadlock you see was introduced in commit
27761a1042daf01987e7d79636d0c41511c6df3c ("Refactor atfork handlers"),
first released in glibc 2.28. The fork deadlock will be gone (in the
single-threaded case) if Debian updates to the current
release/2.28/master branch because we backported commit
60f80624257ef84eacfd9b400bda1b5a5e8e7816 ("nptl: Avoid fork handler lock
for async-signal-safe fork [BZ #24161]") there.
But this will not help you. Even without the deadlock, I expect you
still experience some random corruption during exit, but it's going to
be difficult to spot.
Thanks,
Florian