From 4611c473f1415ceee8f4da94d1ef0c878bca4e4e Mon Sep 17 00:00:00 2001 From: Christian Grothoff Date: Sat, 16 Feb 2019 21:19:23 +0100 Subject: [PATCH] Florian Weimer writes: Christian Grothoff: > I'm seeing some _very_ odd behavior with processes hanging on exit (?) > with GNU libc 2.28-6 on Debian (amd64 threadripper). This seems to > happen at random (for random tests, with very low frequency!) in the > GNUnet (Git master) testsuite when a child process is about to exit. It looks like you call exit from a signal handler, see src/util/scheduler.c: /** * Signal handler called for signals that should cause us to shutdown. */ static void sighandler_shutdown () { static char c; int old_errno = errno; /* backup errno */ if (getpid () != my_pid) exit (1); /* we have fork'ed since the signal handler was created, * ignore the signal, see https://gnunet.org/vfork discussion */ GNUNET_DISK_file_write (GNUNET_DISK_pipe_handle (shutdown_pipe_handle, GNUNET_DISK_PIPE_END_WRITE), &c, sizeof (c)); errno = old_errno; } In general, this results in undefined behavior because exit (unlike _exit) is not an async-signal-safe function. I suspect you either call the exit function while a fork is in progress, or since you register this signal handler multiple times for different signals: sh->shc_int = GNUNET_SIGNAL_handler_install (SIGINT, &sighandler_shutdown); sh->shc_term = GNUNET_SIGNAL_handler_install (SIGTERM, &sighandler_shutdown); one call to exit might interrupt another call to exit if both signals are delivered to the process. The deadlock you see was introduced in commit 27761a1042daf01987e7d79636d0c41511c6df3c ("Refactor atfork handlers"), first released in glibc 2.28. The fork deadlock will be gone (in the single-threaded case) if Debian updates to the current release/2.28/master branch because we backported commit 60f80624257ef84eacfd9b400bda1b5a5e8e7816 ("nptl: Avoid fork handler lock for async-signal-safe fork [BZ #24161]") there. But this will not help you. Even without the deadlock, I expect you still experience some random corruption during exit, but it's going to be difficult to spot. Thanks, Florian --- src/util/scheduler.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/scheduler.c b/src/util/scheduler.c index dd0d5d5cf..3bd7ccec7 100644 --- a/src/util/scheduler.c +++ b/src/util/scheduler.c @@ -658,8 +658,8 @@ sighandler_shutdown () int old_errno = errno; /* backup errno */ if (getpid () != my_pid) - exit (1); /* we have fork'ed since the signal handler was created, - * ignore the signal, see https://gnunet.org/vfork discussion */ + _exit (1); /* we have fork'ed since the signal handler was created, + * ignore the signal, see https://gnunet.org/vfork discussion */ GNUNET_DISK_file_write (GNUNET_DISK_pipe_handle (shutdown_pipe_handle, GNUNET_DISK_PIPE_END_WRITE), &c, sizeof (c)); -- 2.25.1