The pam_exec module uses vfork()+execve() and waitpid() to spawn a child process and wait for its completion. This is a race condition in a multithreaded process using PAM. An other thread could reap the process forked by pam_exec in which case waitpid() would either fail because there is no valid pid to wait for or wait for the wrong process if it happens to reuse the pid. The correct solution would be to use pdfork() and wait with kevent() on the (EVFILT_PROCDESC, process descriptor) event.
Is this actually causing problems? If other threads follow POSIX's non-normative recommendation in the wait page's Application Usage section that wait(), waitpid() with pid = -1 and waitid() with P_ALL not be used, then there will not be an issue.
I found one case where the same issue in the Linux PAM pam_exec module broke vsftpd and vsftpd had to workaround the problem because afaik Linux lacks the required API to avoid this problem completely. I know that pam_exec is a hack and should only be used for testing or after very careful analysis on the other hand the documentation doesn't warn users about the problem and it's a nasty layering violation that blow up into the system administrators face and I don't want to be the poor bastard how has to debug this under time pressure. The PAM policy isn't supposed to inject race conditions into otherwise "working" applications. Pointing to a "non-normative recommendation" won't help users bitten by this problem. My problem with this is that it's a accident waiting to happen and FreeBSD has the APIs to avoid this whole bug class. To make it worse the ones who will run into the problem (system admins) are often incapable of debugging and patching applications complex enough to use pthreads and PAM.
Hi, We've just had a problem that matches this issue. A pam module was using lib curl, which was configured to use async DNS - using pthreads. The PAM module fails to complete the DNS lookup and fails to authenticate. Running the same code from user land, the pthreads work just fine. Was there a resolution found for this issue? Cheers, Joe