Bug 254639

Summary: Fork syscall performance
Product: Base System Reporter: Kirill <kirill>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Some People CC: markj
Priority: ---    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
dtrace stack none

Description Kirill 2021-03-29 13:12:44 UTC
Hello.

Can someone please explain where is the error? I am running Python code that simulates forking a large process and gives poor performance results. Average 4 seconds.

import time
import os
some_str = ' ' * (100 * 1024 * 1024 * 1024)
while True:
    p = time.time()
    pid = os.fork()
    tt = time.time() - p
    if pid != 0:
        print("%d "%pid + str(tt) )
        time.sleep(1)
    if pid == 0:
        break
Comment 1 Mark Johnston freebsd_committer freebsd_triage 2021-03-29 15:26:17 UTC
How much RAM do you have?  Hopefully more than 100GB.

I suspect that we're spending a lot of time in pmap_copy(), which copies page tables into the child process.  It only copies referenced PTEs, but in this case we'll copy all of the page tables backing some_str.

You can try something like this to confirm:

# dtrace -q -n 'fbt::pmap_copy:entry /pid == $target/{self->ts = timestamp} fbt::pmap_copy:return /self->ts/{@ = sum(timestamp - self->ts); self->ts = 0;} tick-1s {printa("spent %@dns copying\n", @); clear(@);}' -c "python test.py"
Comment 2 Kirill 2021-03-29 15:36:42 UTC
I have 192GB. Trace:

dtrace -q -n 'fbt::pmap_copy:entry /pid == $target/{self->ts = timestamp} fbt::pmap_copy:return /self->ts/{@ = sum(timestamp - self->ts); self->ts = 0;} tick-1s {printa("spent %@dns
 copying\n", @); clear(@);}' -c "python test_fork.py"
spent 238724ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
25576 8.959986448287964
spent 525185651ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 279323ns copying
25593 4.342242956161499
spent 434345424ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 352273ns copying
spent 423458987ns copying
spent 0ns copying
25597 7.248447418212891
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
25602 4.694051027297974
spent 432109608ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
spent 0ns copying
25619 4.810415983200073
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2021-03-30 22:21:45 UTC
If I am reading the output correctly, we are spending hundreds of ms copying page tables, but this is not where the majority of time is spent.

It would also be useful to see output from

# dtrace -n 'profile-997hz /pid == $target/{@[stack()] = count()}' -c "python test.py"
Comment 4 Kirill 2021-03-31 06:04:42 UTC
Created attachment 223724 [details]
dtrace stack

Yep. Somewhere hidden more than 4 seconds. Trace in the attachment.