Bug 266873

Summary: www/apache24: creating forked processes at 100% CPU spinning in sched_yield()
Product: Ports & Packages Reporter: edenist-fbz
Component: Individual Port(s)Assignee: freebsd-apache (Nobody) <apache>
Status: Closed Overcome By Events    
Severity: Affects Only Me CC: cyberleo, dewayne, fax, freebsd8593, joneum, julien, oljas
Priority: --- Flags: bugzilla: maintainer-feedback? (apache)
Version: Latest   
Hardware: amd64   
OS: Any   

Description edenist-fbz 2022-10-07 00:26:51 UTC
Apache version: apache24-2.4.54 - build with poudriere
FreeBSD: 12.3-STABLE FreeBSD 12.3-STABLE stable/12-3a9e60169ee GENERIC amd64

Since updating OS and apache a couple of weeks ago, I have noticed that my system is getting filled up with httpd processes consuming 100% CPU. 

Here is an example with one rogue process, but after a day of running the system will be eventually full of httpd processes spinning at 100%. 

root   9947 100.0  0.0  207000  41380  -  RJ   10:59   18:42.98 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   10243   1.4  0.1  310180 138488  -  SJ   11:17    0:01.44 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   10059   0.4  0.1  242068  71156  -  SJ   11:06    0:03.63 /usr/local/sbin/httpd -DNOHTTPACCEPT
root   9738   0.0  0.0  207000  41388  -  SsJ  10:37    0:00.35 /usr/local/sbin/httpd -DNOHTTPACCEPT
www    9853   0.0  0.1  244096  70872  -  SJ   10:47    0:06.49 /usr/local/sbin/httpd -DNOHTTPACCEPT
www    9887   0.0  0.1  242500  69808  -  IJ   10:51    0:08.48 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   10058   0.0  0.1  307876 136996  -  IJ   11:06    0:04.76 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   10136   0.0  0.1  242592  71452  -  IJ   11:12    0:02.50 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   10144   0.0  0.1  244644  71328  -  IJ   11:14    0:01.31 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   10166   0.0  0.1  244648  73028  -  SJ   11:15    0:00.99 /usr/local/sbin/httpd -DNOHTTPACCEPT
www   10239   0.0  0.1  209800  61568  -  IJ   11:16    0:00.20 /usr/local/sbin/httpd -DNOHTTPACCEPT


Usual behavior with prefork will be an initial process running as root, which then forks additional processes and sets them to run as www user. What appears to be happening here is that when the fork occurs, something gets stuck before it is able to switch to www. 

In the case of the rogue process above, output of truss -p 9947 is simply a continuous loop of:
sched_yield()					 = 0 (0x0)


For some extra context, I am running this inside a jail, hosting nextcloud with mod_php80, mysql, redis. None of my other web apps are having this issue but it could be something which only appears under load and nextcloud is certainly the heaviest application here. 
Also, I tried switching over to php-fpm with mpm_event and the same issue was encountered there, so it certainly feels like something core to apache's forking code at play here.
Comment 1 edenist-fbz 2022-10-09 01:08:20 UTC
After further observation, I think this might not be apache specific. 

I have some cronjobs which execute /use/local/bin/php directly which are also exhibiting this issue. Which puts the issue possibly on php or something within the core system. Though given none of my other jails [with apache] are having issues, I'm going to lean towards it being php.
Comment 2 cyberleo 2022-10-10 18:48:32 UTC
This occurs in php-fpm as well. Tried with both php74 and php80, with my own world build and the official release tarball.

----8<----
(lldb) process attach --pid 49045
This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.
Process 49045 stopped
* thread #1, name = 'php-fpm', stop reason = signal SIGSTOP
    frame #0: 0x0000000800aa524a libc.so.7`__sys_sched_yield at sched_yield.S:4
Executable module set to "/local/sbin/php-fpm".
Architecture set to: x86_64-unknown-freebsd13.1.
(lldb) bt
* thread #1, name = 'php-fpm', stop reason = signal SIGSTOP
  * frame #0: 0x0000000800aa524a libc.so.7`__sys_sched_yield at sched_yield.S:4
    frame #1: 0x00000008030e0351 libomp.so`::__kmp_wait_4_ptr(spinner=0x000000080313d6dc, checker=1, pred=(libomp.so`__kmp_bakery_check(void*, unsigned int) at kmp_lock.cpp:611), obj=0x000000080313d6c0)(void *, kmp_uint32), void *) at kmp_dispatch.cpp:2688:5
    frame #2: 0x00000008030f8bb8 libomp.so`::__kmp_acquire_ticket_lock(kmp_ticket_lock_t *, kmp_int32) [inlined] __kmp_acquire_ticket_lock_timed_template(lck=<unavailable>) at kmp_lock.cpp:633:3
    frame #3: 0x00000008030f8b99 libomp.so`::__kmp_acquire_ticket_lock(lck=<unavailable>, gtid=<unavailable>) at kmp_lock.cpp:638:16
    frame #4: 0x00000008030fe924 libomp.so`::__kmp_get_global_thread_id_reg() [inlined] __kmp_acquire_bootstrap_lock(lck=<unavailable>) at kmp_lock.h:524:10
    frame #5: 0x00000008030fe913 libomp.so`::__kmp_get_global_thread_id_reg() at kmp_runtime.cpp:249:5
    frame #6: 0x00000008030c2d2e libomp.so`::kmpc_malloc(size_t) [inlined] __kmp_entry_thread() at kmp.h:3304:14
    frame #7: 0x00000008030c2d29 libomp.so`::kmpc_malloc(size=119093) at kmp_alloc.cpp:1098:14
    frame #8: 0x0000000803136677 libomp.so`::__kmp_is_address_mapped(addr=0x0000000802cdd158) at z_Linux_util.cpp:2003:34
    frame #9: 0x0000000803107f25 libomp.so`__kmp_register_library_startup() at kmp_runtime.cpp:6563:15
    frame #10: 0x00000008030fe984 libomp.so`__kmp_do_serial_initialize() at kmp_runtime.cpp:6753:3
    frame #11: 0x0000000803108244 libomp.so`::__kmp_serial_initialize() at kmp_runtime.cpp:7026:3
    frame #12: 0x00000008007a4743 libthr.so.3`thr_fork_impl(a=<unavailable>) at thr_fork.c:269:5
    frame #13: 0x00000008007a4418 libthr.so.3`__thr_fork at thr_fork.c:314:10
    frame #14: 0x00000000006d2651 php-fpm`match + 56945
    frame #15: 0x00000000006dc528 php-fpm`_pcre2_xclass_8 + 1752
    frame #16: 0x00000000006e4cb8 php-fpm`preg_replace_common + 2200
    frame #17: 0x00000000006d69cf php-fpm`set_start_bits + 3391
    frame #18: 0x00000000006d1f48 php-fpm`match + 55144
    frame #19: 0x00000000006d8f87 php-fpm`php_pcre2_substitute + 1367
    frame #20: 0x00000000004304a0 php-fpm`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
----8<----

Even the config test that the rc script runs (php-fpm -t) can fall victim to it:

----8<----
(lldb) process attach --pid 18483
This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.
Process 18483 stopped
* thread #1, name = 'php-fpm', stop reason = signal SIGSTOP
    frame #0: 0x0000000800aa524a libc.so.7`__sys_sched_yield at sched_yield.S:4
Executable module set to "/local/sbin/php-fpm".
Architecture set to: x86_64-unknown-freebsd13.1.
(lldb) bt
* thread #1, name = 'php-fpm', stop reason = signal SIGSTOP
  * frame #0: 0x0000000800aa524a libc.so.7`__sys_sched_yield at sched_yield.S:4
    frame #1: 0x00000008030e0351 libomp.so`::__kmp_wait_4_ptr(spinner=0x000000080313d6dc, checker=1, pred=(libomp.so`__kmp_bakery_check(void*, unsigned int) at kmp_lock.cpp:611), obj=0x000000080313d6c0)(void *, kmp_uint32), void *) at kmp_dispatch.cpp:2688:5
    frame #2: 0x00000008030f8bb8 libomp.so`::__kmp_acquire_ticket_lock(kmp_ticket_lock_t *, kmp_int32) [inlined] __kmp_acquire_ticket_lock_timed_template(lck=<unavailable>) at kmp_lock.cpp:633:3
    frame #3: 0x00000008030f8b99 libomp.so`::__kmp_acquire_ticket_lock(lck=<unavailable>, gtid=<unavailable>) at kmp_lock.cpp:638:16
    frame #4: 0x00000008030fe924 libomp.so`::__kmp_get_global_thread_id_reg() [inlined] __kmp_acquire_bootstrap_lock(lck=<unavailable>) at kmp_lock.h:524:10
    frame #5: 0x00000008030fe913 libomp.so`::__kmp_get_global_thread_id_reg() at kmp_runtime.cpp:249:5
    frame #6: 0x00000008030c2d2e libomp.so`::kmpc_malloc(size_t) [inlined] __kmp_entry_thread() at kmp.h:3304:14
    frame #7: 0x00000008030c2d29 libomp.so`::kmpc_malloc(size=121888) at kmp_alloc.cpp:1098:14
    frame #8: 0x0000000803136677 libomp.so`::__kmp_is_address_mapped(addr=0x0000000803146158) at z_Linux_util.cpp:2003:34
    frame #9: 0x0000000803107f25 libomp.so`__kmp_register_library_startup() at kmp_runtime.cpp:6563:15
    frame #10: 0x00000008030fe984 libomp.so`__kmp_do_serial_initialize() at kmp_runtime.cpp:6753:3
    frame #11: 0x00000008030fe938 libomp.so`::__kmp_get_global_thread_id_reg() at kmp_runtime.cpp:251:7
    frame #12: 0x00000008030e9ee2 libomp.so`::__kmp_api_omp_init_lock(user_lock=0x0000000803679490) at kmp_ftn_entry.h:1083:14
    frame #13: 0x00000008035a7616 libMagickCore-6.so.7`MagickCoreGenesis + 54
    frame #14: 0x000000080334e906 imagick.so`zm_startup_imagick + 182
    frame #15: 0x0000000000643ab8 php-fpm`scan + 22232
    frame #16: 0x0000000000643e9c php-fpm`scan + 23228
    frame #17: 0x0000000000650986 php-fpm`scan + 75174
    frame #18: 0x0000000000643d76 php-fpm`scan + 22934
    frame #19: 0x00000000005d7b13 php-fpm`php_module_startup + 2067
    frame #20: 0x00000000006da2d3 php-fpm`php_pcre2_substitute + 6307
    frame #21: 0x00000000006d8eb0 php-fpm`php_pcre2_substitute + 1152
    frame #22: 0x00000000004304a0 php-fpm`_start(ap=<unavailable>, cleanup=<unavailable>) at crt1_c.c:75:7
----8<----
Comment 3 cyberleo 2022-10-10 18:51:37 UTC
(In reply to cyberleo from comment #2)

The backtrace seems to implicate libomp, and the issue occurs during initialization of one of the PHP extensions that makes use of threading.
Comment 4 Oljas Kuzembaev 2022-10-30 11:49:50 UTC
(In reply to cyberleo from comment #3)
Had same problem with nextcloud. It does not matter mod_php or php-fpm, apache or nginx.
Recent update of php to 8.0.25 fixed that for me.
Comment 5 edenist-fbz 2022-11-05 22:36:38 UTC
(In reply to Oljas Kuzembaev from comment #4)

I am also having this issue with my nextcloud jail. 
Upgrading to php 8.0.25 did not resolve this for me however, nor does updating nextcloud to v25. 

Oljas Kuzembaev, did you happen to upgrade/change anything else which may have resolved this issue for you?
Comment 6 Oljas Kuzembaev 2022-11-14 11:43:27 UTC
(In reply to edenist-fbz from comment #5)
Actually, yes. Simple upgrade changed nothing for me. Sorry for my false positive post about php upgrade. It just delayed 100cpu for 3 days.
Anyhow, at the end I have changed whole environment. I am building pkgs in poudriere, and a had to many options for all my ports.
I have deleted all options and defined only those, relevant for me. This time in make.conf:


ALLOW_UNSUPPORTED_SYSTEM=yes
DEFAULT_VERSIONS+= samba=4.13 php=8.0
# OPTIONS ADD                   REASON
OPTIONS_SET+=AUTHNZ_LDAP        # www/apache
OPTIONS_SET+=BIND918            # net/samba
OPTIONS_SET+=CUPS               # net/samba
OPTIONS_SET+=IMAGICK            # www/Nextcloud
OPTIONS_SET+=IPV6
OPTIONS_SET+=GSSAPI_BASE        # dns/bind-tools
OPTIONS_SET+=LDAP               # www/Nextcloud
OPTIONS_SET+=PCNTL              # www/Nextcloud
OPTIONS_SET+=REDIS              # www/Nextcloud
OPTIONS_SET+=SMB                # www/Nextcloud
# OPTIONS REMOVE
OPTIONS_UNSET+=X11
OPTIONS_UNSET+=GSSAPI_NONE      # dns/bind-tools
# EVENT
WITH_MPM=EVENT                  # www/Apache


No other options set.
With this options nextcloud runs smooth for a week on 2 different hosts. And I hope it will stay that way. I run on 13.1-RELEASE-p2 in 13.1 jail.
Comment 7 Eero Hänninen 2022-11-25 00:10:15 UTC
This issue seems to be related with libomp and Imagemagcik. I have faced same problem with php-fpm. Php-fpm master process takes 100% cpu and truss show just endless
sched_yield()					 = 0 (0x0)

So I have removed OpenMP support from Imagemagcik and let poudriere recompile php-gd and php pecl-imagick extension. After that problem is gone. As far as I tested, this libomp problem presents in 13.1-RELEASE and 13-STABLE, but not in 13.0-RELEASE or even 12.2-RELEASE.

This issue can be reproduce with Imagemagcik 6.x simply, take jpg format picture about 600kb or larger size and execute:
convert -bench 40 image.jpg -sharpen 0x1 null:

And watch this process with truss. If Imagemagcik is compiled with OpenMP support then sooner or later you will see:
sched_yield()					 = 0 (0x0)

and I dont know is this process come to end or not.

without OpenMP support process completes fast and there is no sched_yield() loop.

Another note, seems that pecl-imagcik Readme file has extra section for Imagemagcik and OpenMP implementations:
https://github.com/Imagick/imagick/blob/master/README.md

Hope this helps.
Comment 8 Jochen Neumeister freebsd_committer freebsd_triage 2023-09-02 11:10:37 UTC
i have not read any other entries here now in the last 9 months. i also do not think that the error is Apache24 directly.
I would close the PR here first. If anyone else has the problem, feel free to reopen it. Then we have to clarify who is the right contact for this.