Bug 201543

Summary: www/chromium: chrome processes stuck at 100% cpu
Product: Ports & Packages Reporter: pete
Component: Individual Port(s)Assignee: freebsd-chromium (Nobody) <chromium>
Status: Closed Overcome By Events    
Severity: Affects Some People CC: cperciva, cpm, freebsd, gja822, miguelmclara, pete, shawn.webb, tony, w.schwarzenfeld
Priority: ---    
Version: Latest   
Hardware: Any   
OS: Any   

Description pete 2015-07-13 21:20:45 UTC
As reported in this thread:

https://lists.freebsd.org/pipermail/freebsd-chromium/2015-July/002057.html

It seems that after svn commit: r272566 chrome will get pegged at %100.  I have reproduced this locally on the following system:

> uname -ar
FreeBSD pop.rubicorp.com 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64


It was suggested that a debug kernel would be needed to fully triage this, in lieu of that I have the following output from dtrace that may be of some help (pid 57340 is the pid of a chrome process consuming %100 of CPU load):

> sudo ./procsystime -a -p 57340
Tracing... Hit Ctrl-C to end...
dtrace: 81122 dynamic variable drops with non-empty dirty list
dtrace: 91681 dynamic variable drops with non-empty dirty list
dtrace: 82802 dynamic variable drops with non-empty dirty list
dtrace: 90719 dynamic variable drops with non-empty dirty list
dtrace: 89757 dynamic variable drops with non-empty dirty list
dtrace: 87624 dynamic variable drops with non-empty dirty list
dtrace: 86992 dynamic variable drops with non-empty dirty list
dtrace: 76832 dynamic variable drops with non-empty dirty list
dtrace: 78768 dynamic variable drops with non-empty dirty list
^C
dtrace: 84817 dynamic variable drops with non-empty dirty list

Elapsed Times for PID 57340,

         SYSCALL          TIME (ns)
         geteuid               4658
    gettimeofday               7242
          pwrite              12306
           close              18184
          getpid              19218
           lseek              20134
          sendto              21719
           fstat              21751
           fcntl              47850
            stat              50083
            open              55702
          access              66379
            read             621131
           write            1122555
         recvmsg            3495219
           fsync           28230935
        __sysctl         2155674710
            poll         4290330304
          kevent         4616842597
        _umtx_op         6004928910
          TOTAL:        17101591587

CPU Times for PID 57340,

         SYSCALL          TIME (ns)
         geteuid                473
    gettimeofday               1263
          getpid               1369
           lseek               4377
           fstat               7779
        _umtx_op               9759
          pwrite              11183
           close              15022
          sendto              16664
           fcntl              31658
            stat              42920
            open              49495
          access              61612
           fsync              76978
            read             273086
          kevent             533407
           write             597449
         recvmsg            1209489
            poll            2156248
        __sysctl         1851122250
          TOTAL:         1856222481

Syscall Counts for PID 57340,

         SYSCALL              COUNT
          pwrite                  1
          sendto                  3
          access                  4
           close                  4
         geteuid                  4
    gettimeofday                  4
        _umtx_op                  5
           fsync                  5
            open                  6
            stat                  6
           fcntl                 15
           fstat                 16
           lseek                 21
          getpid                 22
            read                266
          kevent                351
           write                398
            poll                605
         recvmsg               1815
        __sysctl             905216
          TOTAL:             908767


I will attempt further testing info when I have cycles to build a debug kernel, unless someone else beats me to it :)
Comment 1 pete 2015-07-13 21:33:55 UTC
(In reply to pete from comment #0)

as per Colin Percival (who thoughtfully corrected my via twitter):

This bug is triggered *before* r272566 - not after.  the commit in question fixed the bug which was triggering this failure.
Comment 2 Colin Percival freebsd_committer freebsd_triage 2015-07-18 21:50:08 UTC
I think the best solution may be if someone can figure out where in chromium the kern.proc.pid.# sysctl is being read from.  Unfortunately I can't get a debugging-enabled chromium to build since I keep running out of disk space...
Comment 3 Dušan Vejnovič 2015-07-20 16:42:28 UTC
> uname -ar
FreeBSD xxxx.dussan.org 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

Maybe this is the reason why chrome stuck at 100% cpu. After starting chromium browser, it stuck at 100% cpu and in task manager I can found chromium's zombie process:

/bin/sh /usr/local/bin/xdg-settings check default-web-browser chromium-browser.desktop
Comment 4 pete 2015-07-20 17:26:09 UTC
(In reply to Colin Percival from comment #2)
Colin, I will build a GENERIC kernel today hopefully.  Can you point me to a good set of docs I can use, or instructions, on determining how to pull that sysctl data from?
Comment 5 pete 2015-07-20 20:59:21 UTC
(In reply to pete from comment #4)
I have built a kernel with DDB support enabled.  Here is the output from ddb invoked once chromium hit's the %100 CPU utilization bug:

db> show proc 1061
Process 1061 (chrome) at 0xfffff8000303a000:
 state: NORMAL
 uid: 1001  gids: 1001, 0
 parent: pid 1059 at 0xfffff8000303a4c0
 ABI: FreeBSD ELF64
 arguments: chrome:
 threads: 21
100630                   S       uwait    0xfffff8000e33eb80 chrome
100620                   S       uwait    0xfffff800a856b500 chrome
100576                   S       uwait    0xfffff801209f7480 chrome
100575                   S       uwait    0xfffff800afb82600 chrome
100573                   S       uwait    0xfffff8012065d480 chrome
100572                   S       uwait    0xfffff801209f6380 chrome
100571                   S       uwait    0xfffff800affae580 chrome
100564                   S       uwait    0xfffff800a83ffb80 chrome
100563                   S       uwait    0xfffff80120afb380 chrome
100552                   S       uwait    0xfffff800afbf6200 chrome
100528                   S       uwait    0xfffff80002fea400 chrome
100527                   S       uwait    0xfffff80120a57a80 chrome
100524                   S       uwait    0xfffff801209dab80 chrome
100521                   S       uwait    0xfffff80003111e80 chrome
100520                   S       uwait    0xfffff80120524c00 chrome
100515                   Run     CPU 1                       chrome
100514                   S       uwait    0xfffff80003d29400 chrome
100513                   S       uwait    0xfffff800aff1f700 chrome
100512                   S       uwait    0xfffff80003112a00 chrome
100511                   S       piperd   0xfffff8005036a2e8 chrome
100308                   S       uwait    0xfffff800a82c8100 chrome

db> tr 1061         
Tracing pid 1061 tid 100630 td 0xfffff8000e856490
sched_switch() at sched_switch+0x2b3/frame 0xfffffe0000274850
mi_switch() at mi_switch+0xe1/frame 0xfffffe0000274890
sleepq_catch_signals() at sleepq_catch_signals+0xab/frame 0xfffffe0000274910
sleepq_wait_sig() at sleepq_wait_sig+0xf/frame 0xfffffe0000274940
_sleep() at _sleep+0x27d/frame 0xfffffe00002749c0
umtxq_sleep() at umtxq_sleep+0x125/frame 0xfffffe0000274a20
do_wait() at do_wait+0x329/frame 0xfffffe0000274aa0
__umtx_op_wait_uint_private() at __umtx_op_wait_uint_private+0x83/frame 0xfffffe0000274ae0
amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe0000274bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0000274bf0
--- syscall (454, FreeBSD ELF64, sys__umtx_op), rip = 0x8098038cc, rsp = 0x7ffffbbdde88, rbp = 0x7ffffbbddeb0 ---

db> show thread 1061
KDB: reentering
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe00003320b0
kdb_backtrace() at kdb_backtrace+0x39/frame 0xfffffe0000332160
kdb_reenter() at kdb_reenter+0x33/frame 0xfffffe0000332170
trap() at trap+0x54/frame 0xfffffe0000332380
calltrap() at calltrap+0x8/frame 0xfffffe0000332380
--- trap 0xc, rip = 0xffffffff8034cc52, rsp = 0xfffffe0000332440, rbp = 0xfffffe0000332470 ---
db_show_thread() at db_show_thread+0x22/frame 0xfffffe0000332470
db_command() at db_command+0x26d/frame 0xfffffe0000332540
db_command_loop() at db_command_loop+0x64/frame 0xfffffe0000332550
db_trap() at db_trap+0xe0/frame 0xfffffe00003325e0
kdb_trap() at kdb_trap+0x129/frame 0xfffffe0000332630
trap() at trap+0x463/frame 0xfffffe0000332840
calltrap() at calltrap+0x8/frame 0xfffffe0000332840
--- trap 0x3, rip = 0xffffffff80970991, rsp = 0xfffffe0000332900, rbp = 0xfffffe0000332930 ---
kdb_sysctl_enter() at kdb_sysctl_enter+0x91/frame 0xfffffe0000332930
sysctl_root() at sysctl_root+0x24e/frame 0xfffffe0000332980
userland_sysctl() at userland_sysctl+0x1d8/frame 0xfffffe0000332a30
sys___sysctl() at sys___sysctl+0x74/frame 0xfffffe0000332ae0
amd64_syscall() at amd64_syscall+0x351/frame 0xfffffe0000332bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe0000332bf0
--- syscall (202, FreeBSD ELF64, sys___sysctl), rip = 0x80096529a, rsp = 0x7fffffffd9f8, rbp = 0x7fffffffda30 ---



I will leave this system in ddb and can run additional commands as needed.
Comment 6 Colin Percival freebsd_committer freebsd_triage 2015-07-20 21:08:12 UTC
We don't need to debug the kernel here -- we know what the kernel bug is.  What we need to do is figure out why Chromium is hitting this bug; if we can prevent that, we can fix this by adding a patch to the ports tree and avoid needing to have everybody building new kernels.
Comment 7 pete 2015-07-20 21:17:52 UTC
(In reply to Colin Percival from comment #6)
ah i see - i misread your previous comment as debug kernel -d'oh!  i can build a debug version of the chromium port easilly enough.  i'll update this PR when i have it built and have useful data.
Comment 8 Dušan Vejnovič 2015-07-24 15:08:53 UTC
I upgrade chrome to the latest version. Now it processes stuck at 200% cpu!!!
Comment 9 Shawn Webb 2015-07-24 15:15:17 UTC
Just a note that www/chromium is working fine for me on HardenedBSD 11-CURRENT/amd64 with PaX PAGEEXEC/MPROTECT disabled. www/chromium is even compiled as a Position-Independent Executable (PIE) on HardenedBSD. I wonder if you have a bad plugin/extension installed.
Comment 10 Dušan Vejnovič 2015-07-24 15:19:07 UTC
(In reply to Shawn Webb from comment #9)

Only have the NetBeans Connector which is disabled.
Comment 11 pete 2015-07-24 17:34:21 UTC
I have a debug build on my end but have not had a chance to dive deep into debugging things yet.  If anyone on this ticket has a set of GDB commands for me to run that may save me some time.  I can also make the debug build available for local download (saving time on building it).
Comment 12 Tony Narlock 2015-07-24 18:08:21 UTC
(In reply to pete from comment #11)

(this should probably be in 196770, but I don't think Pete is CC'd on it.)

Hi Pete, I am in another ticket (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=196770) and I saw you were producing working chromium builds with DEBUG symbols,

I'm wondering, what environment / build settings (freebsd version, /etc/make.conf, etc) do you use to get builds with debug symbols working correctly?
Comment 13 pete 2015-07-24 18:23:23 UTC
(In reply to Tony Narlock from comment #12)

Hey there - this is the method I'm using to generate a build of chromium with usable debug symbols.  n.b. - enabling DEBUG in make config does not work, I have not been able to fix that yet.  The following method should work though:

> uname -ar
FreeBSD pop.rubicorp.com 10.1-RELEASE-p10 FreeBSD 10.1-RELEASE-p10 #0: Wed May 13 06:54:13 UTC 2015     root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

> cd /usr/ports/www/chromium
> sudo make WITH_DEBUG=yes STRIP=

I let this chug along for a day or so on my 8 core box, then I can run chromium via gdb like so:
> work/stage/usr/local/share/chromium/chrome-wrapper --gdb

I do not do the "make install" step as I still need to have a non-debug version on my system.  I'll go ahead and put this info in 196770 as well.
Comment 14 Dušan Vejnovič 2015-07-31 14:12:29 UTC
Today I upgrade chrome to the latest version. Now its processes still stuck at 100% cpu!!!


If I run chrome from the console I got this error:

[67308:379107328:0731/160517:ERROR:process_posix.cc(374)] Not implemented reached in bool base::Process::SetProcessBackgrounded(bool)
[67308:379107328:0731/160517:ERROR:process_posix.cc(374)] Not implemented reached in bool base::Process::SetProcessBackgrounded(bool)



Bye, Dušan.
Comment 15 gja822 2015-08-08 17:29:18 UTC
If I understand you rightly, it seems that I have similar problem. I can compile and start www/chromium and browse some pages (but without Adobe Flash support smhow, having linux flash plugin installed).
But when I close chrome window, I find chrome process still hanging eating up to ~90% of one of my CPU cores. More than that, I can kill it only with "-KILL" option.
Is that the same bug?

uname:
FreeBSD 9.3-RELEASE-p22 #8 r286420M
Comment 16 pete 2015-08-10 17:58:05 UTC
I am running 10.2-RC2 via bhyve and I am currently unable to trigger this bug:

[pwright@bsd0 ~]$ uname -ar
FreeBSD bsd0.trp-srd.com 10.2-RC2 FreeBSD 10.2-RC2 #0 r286137: Fri Jul 31 17:54:39 UTC 2015     root@releng1.nyi.freebsd.org:/usr/obj/usr/src/sys/GENERIC  amd64

[pwright@bsd0 ~]$ pkg info|grep chromium
chromium-44.0.2403.107         Google web browser based on WebKit


I suspect the fix that we are trying to get into 10.1 is working as expected in 10.2.
Comment 17 Colin Percival freebsd_committer freebsd_triage 2015-08-10 20:32:47 UTC
Yes, the bug fix is in 10.2.
Comment 18 Tony Narlock 2015-09-09 18:28:19 UTC
(In reply to pete from comment #0)
(In reply to Colin Percival from comment #17)

Are either of you (or anyone else) still getting this issue on chromium-44.0.2403.155?
Comment 19 pete 2015-09-09 18:29:55 UTC
I am no longer able to reproduce this bug as I have moved to 10.2-RELEASE.  I am also running 44.0.2403.155.  I think we can resolve this issue IMHO.
Comment 20 Dušan Vejnovič 2015-09-10 03:58:53 UTC
(In reply to Tony Narlock from comment #18)

No, I also move to 10.2-RELEASE. You can close this issue.
Comment 21 Colin Percival freebsd_committer freebsd_triage 2015-09-10 04:14:33 UTC
It would be nice to track down why Chromium is making this bogus sysctl call so that it can be fixed... but I too have moved on to 10.2-RELEASE so it's not a problem for me any more.
Comment 22 gja822 2015-09-10 10:59:14 UTC
Had to upgrade to 10.2, since too many things are not working properly in still release of 9.3.
Comment 23 Tony Narlock 2015-09-15 04:37:03 UTC
(In reply to Colin Percival from comment #21)

Are we still the sysctl calls still happening? Have we checked recently? I'm not getting the 100% issue anymore.
Comment 24 miguelmclara 2015-09-15 22:49:33 UTC
I still see this issue even after upgrading to "45.0.2454.85" in 11-CURRENT, seems to always be trigged by opening new tabs (not a new blank tab but links)

I can't build with DEBUG either so I don't its not easy to provide more details.
Comment 25 Walter Schwarzenfeld freebsd_triage 2018-01-12 22:25:45 UTC
10.1. is EOL. Chromium hast version 61.0.3163.100 and r457431. I think this is overcome by events.