Summary: | [patch] devel/glib20: loops over all possible file descriptors | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Ports & Packages | Reporter: | Eugene Grosbein <eugen> | ||||||||||||||||
Component: | Individual Port(s) | Assignee: | freebsd-desktop (Team) <desktop> | ||||||||||||||||
Status: | Closed FIXED | ||||||||||||||||||
Severity: | Affects Some People | CC: | adridg, ajacoutot, desktop, emaste, eugen, jhibbits, lantw44, lwhsu, peter.niitenberg, rozhuk.im, swills, tcberner, wulf | ||||||||||||||||
Priority: | --- | Keywords: | patch-ready | ||||||||||||||||
Version: | Latest | Flags: | rozhuk.im:
maintainer-feedback?
rozhuk.im: merge-quarterly? |
||||||||||||||||
Hardware: | Any | ||||||||||||||||||
OS: | Any | ||||||||||||||||||
See Also: |
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240548 https://gitlab.gnome.org/GNOME/glib/issues/1638 https://gitlab.gnome.org/GNOME/glib/merge_requests/574 |
||||||||||||||||||
Bug Depends on: | |||||||||||||||||||
Bug Blocks: | 240549 | ||||||||||||||||||
Attachments: |
|
Description
Eugene Grosbein
2019-03-26 22:39:18 UTC
Forgot to show versions: glib-2.56.3_3,1 mc-4.8.22_1 FreeBSD 11.2-STABLE/amd64 r344922 There exist kind of work around: "limits -n 1000 mc" decreases number of syscalls and delay in 10000 times. Seems like the best solution to this is to add fdwalk() to libc, inspired by kinfo_getfile from libutil. (In reply to Steve Wills from comment #3) It would be nice long-term solution for head but for stable branches we need some change in the port. Created attachment 205914 [details]
patch against glib20 in ports.
Hi,
while trying to understand why the performance of xfce4 (especially xfce4-appfinder) was poor on my system I discovered the same bug as the reporter of thuis bug. I created the attached patch to test whether I could get any significant performance gains by using the sysctl api to enumerate file descriptors.
Before patch:
- start time of xfce4-appfinder >3s.
After patch:
- start time of xfce4-appfinder <1s (instant).
Created attachment 205931 [details]
proposed fix
(In reply to Peter Niitenberg from comment #6) (In reply to Peter Niitenberg from comment #6) Peter, thank you very much for the patch. I somewhat modified it and tested with great success. It helps for the MC case too. For a test, I used mentioned system based on Intel Xeon Gold 6130 CPU @ 2.10GHz running FreeBSD 11.2-STABLE with memory: real memory = 377945587712 (360437 MB) avail memory = 367256449024 (350243 MB) sysctl defaults are: kern.maxfiles: 11520879 kern.maxfilesperproc: 10368783 For unpatched glib20 port, the command "mcview /usr/ports/security/ipsec-tools/files/natt.diff" takes as much time to start as: real 0m3,341s user 0m1,349s sys 0m2,002s Very noticable delay over 3 seconds. With patch applied: real 0m0,216s user 0m0,023s sys 0m0,035s That's 5714% system time ratio and 1546% real time ratio. I've attached edited version of same code that adds missing check for result of additional malloc() call. Dear maintainer, please consider adding this to the port. (In reply to Eugene Grosbein from comment #8) Thank you for reviewing the patch Eugene! Missed the malloc check (it was supposed to go into the following if statement), shouldn't create patches after 00:00 anyhow :). Comment on attachment 205931 [details] proposed fix >--- glib/gspawn.c.orig 2018-09-21 12:29:23.000000000 +0300 >+++ glib/gspawn.c 2019-07-20 15:37:26.923958000 +0300 >@@ -1077,6 +1084,45 @@ set_cloexec (void *data, gint fd) > } > > #ifndef HAVE_FDWALK >+ >+#ifdef __FreeBSD__ >+static int >+fdwalk_s(int (*cb)(void *data, int fd), void *data, gint *resp) >+{ >+ char *bp, *ep; >+ int mib[4] = {CTL_KERN, KERN_PROC, KERN_PROC_FILEDESC, 0}; >+ size_t len = 0; >+ struct kinfo_file kif; >+ >+ mib[3] = (int)getpid(); >+ >+ if (sysctl(mib, nitems(mib), NULL, &len, NULL, 0) != 0) >+ return FALSE; >+ >+ if ((bp = (char*) malloc(len)) == NULL) >+ return FALSE; I am not sure if it is OK to use malloc here. The current fallback fdwalk function says 'Avoid use of opendir/closedir since these are not async-signal-safe'. Since malloc isn't async-signal-safe, using it may break the expectation. (In reply to Ting-Wei Lan from comment #10) Fall back linux code in fdwalk() does use opendir("/proc/self/fd"), readdir() and closedir(). And I could not find the phrase 'Avoid use of opendir/closedir since these are not async-signal-safe' anywhere in the glib sources. (In reply to Eugene Grosbein from comment #11) Oh, it was changed during the development of GLib 2.58. https://gitlab.gnome.org/GNOME/glib/merge_requests/490 It said that using malloc could cause deadlock. Created attachment 205937 [details]
proposed fix
Protect fdwalk_s() with mutex.
(In reply to Ting-Wei Lan from comment #12) Nice catch. I've updated my patch to wrap new code calling sysctl and malloc() with mutex lock if code runs in multi-threaded mode. (In reply to Eugene Grosbein from comment #14) Adding a lock doesn't make an async-signal-unsafe function safe to use. I also ran into this on my POWER9 system. A year ago I thought it was just the bad AST-based VGA framebuffer. But with a Radeon in it I saw the same problem, and spent a bit of time debugging, only to arrive at this exact same result. By using kinfo_getfile() (libutil) in fdwalk() launching mate-calculator from mate-panel went from ~12 seconds down to ~0.5 seconds. kern.maxfilesperproc being ~7.5M. Another glib20 performance issue: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=240517 Same patch required for: x11-toolkits/vte3: src/vtespawn.cc (In reply to Ting-Wei Lan from comment #12) IMHO this is wrong or incomplete description. 1. malloc() can not cause deadlock, it is heavy used by many multithread apps. 2. Docs says that opendir() is thread and mt safe: http://man7.org/linux/man-pages/man3/opendir.3.html But readdir() is not: MT-Unsafe: http://man7.org/linux/man-pages/man3/readdir.3.html 3. Reading dir using more than one syscall is bad idea because between read it can change some content is it will not be handled/processed. As I remember this was a reason to drop opendir() in mine FAM backend and some strange crashes: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214338 So I do not understand why some one can need LOCK() in fdwalk(). Created attachment 207442 [details]
glib patch
Remove lock, use memcpy() instead of x=*y, add extra buf space, remove duplicate includes.
Created attachment 207479 [details]
reworked
(In reply to Ting-Wei Lan from comment #12) > 1. malloc() can not cause deadlock, it is heavy used by many multithread apps. Thread-safe does not imply async-signal-safe. The former can be achieved by using a mutex on shared resources, but the latter cannot. 'man fork' says: If the process has more than one thread, locks and other resources held by the other threads are not released and therefore only async-signal-safe functions (see sigaction(2)) are guaranteed to work in the child process until a call to execve(2) or a similar function. Since fdwalk is called from the child process and it is possible for the parent process to have multiple threads, it can only call async-signal-safe functions. malloc is thread-safe but not async-signal-safe because the implementation is likely to use a lock to protect global data. Assume that there is a thread calling malloc while you are calling fork. If the malloc lock is held while forking, the child process will have the malloc lock held as well. Since the child process only has one thread, which is the thread calling fork, the malloc lock can never be released because the thread holding the lock does not exist in the child process. Therefore, if you call malloc in the child process before exec, it is possible to cause deadlock. > 2. Docs says that opendir() is thread and mt safe: > http://man7.org/linux/man-pages/man3/opendir.3.html > But readdir() is not: MT-Unsafe: > http://man7.org/linux/man-pages/man3/readdir.3.html The current master branch uses neither opendir nor readdir. > 3. Reading dir using more than one syscall is bad idea because between read > it can change some content is it will not be handled/processed. > As I remember this was a reason to drop opendir() in mine FAM backend and > some strange crashes: > https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=214338 > So I do not understand why some one can need LOCK() in fdwalk(). fdwalk should not use LOCK. Where did you find it? I use mine version of patch last 2 weeks on few workstations - see no crashes/issues. fdwalk() in current, in future, with glib 2.62+ this patch can be deleted, if some one backport fdwalk() to stable. https://reviews.freebsd.org/D21206 What should happen to merge it? Created attachment 209087 [details]
more faster
I rework my patch to use struct xfile instead of struct kinfo_file.
struct kinfo_file - causes to read many info, in case files on non local fs this produces traffic and lags.
For example I open in thunar /usr/ports/devel mounted on NFS, and same folder mounted via sshfs, and got ~9000 opened files descriptors.
Then while I try to open any file thunar make call fdwalk() and it freeze on sysctl(CTL_KERN, KERN_PROC, KERN_PROC_FILEDESC) 5-10 seconds, and this happen 2 times, so total freeze time starting from 10 second per one opened file.
struct xfile have less info/size and does not cause any side effects.
ping Created attachment 218218 [details]
patch
rebased to 2.66.0
There's three patches here that seem to do the same thing, in three wildly different ways. One has a bunch of additional synchronisation that the others do not. There's a related phab review (through PR240548: it is https://reviews.freebsd.org/D21206 ) that solves the problem in libc. I'm inclined to +1 the https://bugs.freebsd.org/bugzilla/attachment.cgi?id=218218 patch version (from 2020-09-23) because it matches the patch in the other PR as well. A commit references this bug: Author: tcberner Date: Wed Dec 23 17:30:19 UTC 2020 New revision: 559006 URL: https://svnweb.freebsd.org/changeset/ports/559006 Log: devel/glib20: loops over all possible file descriptors - Stop glib from looping over all possible file descriptors. - This should greatly increase performance PR: 236815 Submitted by: rozhuk.im@gmail.com (committed version) Reported by: Eugene Grosbein <eugen@freebsd.org> MFH: 2020Q4 Changes: head/devel/glib20/Makefile head/devel/glib20/files/patch-glib_gspawn.c A commit references this bug: Author: tcberner Date: Wed Dec 23 17:47:12 UTC 2020 New revision: 559008 URL: https://svnweb.freebsd.org/changeset/ports/559008 Log: MFH: r559006 devel/glib20: loops over all possible file descriptors - Stop glib from looping over all possible file descriptors. - This should greatly increase performance PR: 236815 Submitted by: rozhuk.im@gmail.com (committed version) Reported by: Eugene Grosbein <eugen@freebsd.org> Changes: _U branches/2020Q4/ branches/2020Q4/devel/glib20/Makefile branches/2020Q4/devel/glib20/files/patch-glib_gspawn.c Committed, thanks! |