As far as I could understand from Linux source and documentation Linux getpriority(2) syscall returns values in range [1..40] and glibc counterpart converts the return value using 20 - X formula, so that a caller receives return values in range [-20..19] FreeBSD linux emualtion does not emulate getpriority(2) syscall and FreeBSD syscall is invoked directly. The FreeBSD syscall returns values in range [-20..20], so after glibc convertion user recives incorrect priority level. This problem impacts Linux applications that change their priority levels using relative increments instead of setting an absolute value, especially affected those applications that use nice(3) function that is implemented using getpriority(2)+setpriority(2). Fix: the following patch should fix the problem. it probably contains some unneeded extra lines due to my inexperience with system call handling. it also will return values in range [0..40] instead of real linux [1..40] but that should not cause any problems in practice, at least it has not for me. How-To-Repeat: 1. enable linux emulation 2. Compile the following program using /compat/linux/usr/bin/gcc 3. execute the compiled program 4. see that after setpriority(,,1) getpriority returns 19 /** start of test.c **/ #include <sys/resource.h> int main() { int p; p = getpriority(PRIO_PROCESS, 0); printf("get prio = %d\n", p); setpriority(PRIO_PROCESS, 0, 1); p = getpriority(PRIO_PROCESS, 0); printf("get prio = %d\n", p); return 0; } /*** end of test.c ***/
Responsible Changed From-To: freebsd-bugs->freebsd-emulation Over to emulation mailinglist for review
Committed, thanks! I wonder if the setpriority(2) needs the same cure. Please clarify and let me know. I'll keep the PR open till your reply. -Maxim
on 08.06.2005 23:49 Maxim Sobolev said the following: > Committed, thanks! > > I wonder if the setpriority(2) needs the same cure. Please clarify and > let me know. I'll keep the PR open till your reply. > Maxim, setpriority(2) is not affected, the reason for this assymetry is in Linux's convention for system calls - they return both result and errno in the same register, positive values are reserved for results of successful calls and negative are reserved for -errno for failed calls. Thus they can not return negative priority values in getpriority(2) and have to shift it to positive range. There is no problem, of course, with passing negative values from userland to kernel. Thank you for the commit! -- Andriy Gapon
> on 08.06.2005 23:49 Maxim Sobolev said the following: > > Committed, thanks! > > > > I wonder if the setpriority(2) needs the same cure. Please clarify and > > let me know. I'll keep the PR open till your reply. I wonder why committers commit patches without fully understanding them. > setpriority(2) is not affected, the reason for this assymetry is in > Linux's convention for system calls - they return both result and errno > in the same register, positive values are reserved for results of > successful calls and negative are reserved for -errno for failed calls. > Thus they can not return negative priority values in getpriority(2) and > have to shift it to positive range. There is no problem, of course, with > passing negative values from userland to kernel. Returning -1 for an error is the usual convention for syscalls and is specified by POSIX for getpriority(). The problem is that FreeBSD's getpriority() is highly non-POSIX-conformant (it has an off-by 20 error and wrong semantics for NZERO, and an off-by 1 error), so it can't be mapped to an emulator's getpriority() using the identity map except in rare cases where the emulator's getpriority() is bug for bug compatible. But Linux's getpriority() seems to be highly non-POSIX-conformant in a different, less fundamentally broken way. POSIX specifies that the non-error range of values returned by getpriority() is [0, 2*{NZERO}-1]; -1 is the error indicator. Applications must subtract NZERO to get the actual priority value. High non-POSIX-conformance: FreeBSD: NZERO is 0, so this range is null, and the actual range is [PRIO_MIN, PRIO_MAX] = [-20, 20]; priority -1 collides with the error indicator (the complications to handle this are documented in getpriority(3)). NZERO is not mentioned in getpriority(3). Linux: NZERO is 20, so the POSIX range is [0, 39] which is usable, but the actual range is apparently [1, 40]; the error indicator works normally. Appalications must apparently negate the priority and add 20 to get the actual priority (20 - pri) instead of (pri - NZERO). I think the reason that setpriority(2) is not affected is actually that Linux applications know to use (20 - pri) to recover the actual priority. Fixing getpriority() in FreeBSD and all emulators should involve much the same code: map the range of internal priorities [PRIO_MIN, PRIO_MAX] to getpriority()'s range [0, 2*{SUBSYSTEM_SPECIFIC_NZERO}-1] as linearly as possible (something like: pri |-> (pri - PRIO_MIN) * (2 * SUBSYSTEM_SPECIFIC_NZERO - 1) / (PRIO_MAX - PRIO_MIN) but more complicated, since for if SUBSYSTEM_SPECIFIC_NZERO == 20 the above maps the default priority 0 to (20 * 39 / 2) = 19, but 20 is required; also for Linux there must be a negation. Bruce
on 09.06.2005 16:17 Bruce Evans said the following: >> on 08.06.2005 23:49 Maxim Sobolev said the following: >> > Committed, thanks! >> > >> > I wonder if the setpriority(2) needs the same cure. Please clarify and >> > let me know. I'll keep the PR open till your reply. > > > I wonder why committers commit patches without fully understanding them. I wonder if you fully understood the patch, the issue and the getriority/setpriority. > >> setpriority(2) is not affected, the reason for this assymetry is in >> Linux's convention for system calls - they return both result and errno >> in the same register, positive values are reserved for results of >> successful calls and negative are reserved for -errno for failed calls. >> Thus they can not return negative priority values in getpriority(2) and >> have to shift it to positive range. There is no problem, of course, with >> passing negative values from userland to kernel. > > > Returning -1 for an error is the usual convention for syscalls and is > specified by POSIX for getpriority(). The problem is that FreeBSD's > getpriority() is highly non-POSIX-conformant (it has an off-by 20 error and > wrong semantics for NZERO, and an off-by 1 error), so it can't be mapped > to an emulator's getpriority() using the identity map except in rare cases > where the emulator's getpriority() is bug for bug compatible. But Linux's > getpriority() seems to be highly non-POSIX-conformant in a different, less > fundamentally broken way. > > POSIX specifies that the non-error range of values returned by > getpriority() > is [0, 2*{NZERO}-1]; -1 is the error indicator. Applications must subtract > NZERO to get the actual priority value. Bruce, I think you have misread POSIX specification and you are confusing two things: (1) priority - priority inside the blackbox that schedules processes versus values that should be passed to setpriotiy() and returned from getpriority(); (2) syscall internal implementation versus user-visible libc function. Regaridng #1, here's a direct quote: http://www.opengroup.org/onlinepubs/009695399/functions/getpriority.html "Upon successful completion, getpriority() shall return an integer in the range -{NZERO} to {NZERO}-1. Otherwise, -1 shall be returned and errno set to indicate the error." Also: "The getpriority() and setpriority() functions work with an offset nice value (nice value -{NZERO}). The nice value is in the range [0,2*{NZERO} -1], while the return value for getpriority() and the third parameter for setpriority() are in the range [-{NZERO},{NZERO} -1]." So this is a difference between priority as it is seen in user-land (above libc layer) and priority inside the POSIX blackbox of OS (the one in [0,2*{NZERO} -1] range). My understanding is that FreeBSD and Linux are very close to POSIXly correct implemetations with NZERO=20. In fact, Linux's implementation is completely compliant and FreeBSD allows +20 which is beyond the POSIX range. Also, -1 return value from getpriority() is a problematic point of POSIX specification not implemenations. Regarding #2, both FreeBSD and Linux in their unique ways correctly return errno/priority level from kernel-land to user-land. FreeBSD syscall returns priority already in [-{NZERO},{NZERO} -1] range; Linux syscall returns priority in [1,2*{NZERO}] range and with reversed comparison, and then (g)libc stub of getpritority performs 20-X conversion to return a correct value to application. > High non-POSIX-conformance: > FreeBSD: > NZERO is 0, so this range is null, and the actual range is [PRIO_MIN, > PRIO_MAX] = [-20, 20]; priority -1 collides with the error indicator > (the complications to handle this are documented in getpriority(3)). > NZERO is not mentioned in getpriority(3). There is no getpriority(3), only getpriority(2) and it quite rightly doesn't talk about NZERO as it is of no interest to applications and it quite rightly talks about troubles with -1 as it is a problem of interface defined by POSIX. > Linux: > NZERO is 20, so the POSIX range is [0, 39] which is usable, but the > actual range is apparently [1, 40]; the error indicator works normally. > Appalications must apparently negate the priority and add 20 to get > the actual priority (20 - pri) instead of (pri - NZERO). > > I think the reason that setpriority(2) is not affected is actually that > Linux applications know to use (20 - pri) to recover the actual priority. > > Fixing getpriority() in FreeBSD and all emulators should involve much the > same code: map the range of internal priorities [PRIO_MIN, PRIO_MAX] to > getpriority()'s range [0, 2*{SUBSYSTEM_SPECIFIC_NZERO}-1] as linearly > as possible (something like: > > pri |-> (pri - PRIO_MIN) * (2 * SUBSYSTEM_SPECIFIC_NZERO - 1) / > (PRIO_MAX - PRIO_MIN) > > but more complicated, since for if SUBSYSTEM_SPECIFIC_NZERO == 20 the > above maps the default priority 0 to (20 * 39 / 2) = 19, but 20 is > required; also for Linux there must be a negation. I think you have greatly overcomplicated thing sbecause of your original misunderstanding. Just compile a small program using getpriority/setpriority for FreeBSD, Linux and any other Unix avaialble to you, run it and you will see how simple thingx are in reality and that NZERO is not visible to userland. Read the man pages too. Yes, and try Linux emulation with and without my patch to understand what the problem with emualtion really is. -- Andriy Gapon
On Thu, 9 Jun 2005, Andriy Gapon wrote: > on 09.06.2005 16:17 Bruce Evans said the following: >>> on 08.06.2005 23:49 Maxim Sobolev said the following: >>>> Committed, thanks! >>>> >>>> I wonder if the setpriority(2) needs the same cure. Please clarify and >>>> let me know. I'll keep the PR open till your reply. >> >> I wonder why committers commit patches without fully understanding them. > > I wonder if you fully understood the patch, the issue and the > getriority/setpriority. I thought I did, but I read POSIX partly backwards. >> POSIX specifies that the non-error range of values returned by >> getpriority() >> is [0, 2*{NZERO}-1]; -1 is the error indicator. Applications must subtract >> NZERO to get the actual priority value. > I think you have misread POSIX specification and you are confusing two > things: (1) priority - priority inside the blackbox that schedules > processes versus values that should be passed to setpriotiy() and > returned from getpriority(); (2) syscall internal implementation versus > user-visible libc function. Priority in the black bix is td->td_priority. p->p_nice is supposed to be the user-visible priority offset by NZERO in freeBSD, and it is, but things are made confusing by "fixing" the historical value of NZERO so that NZERO is 0. Biases of 0 are subtle and POSIX has made the NZERO = 0 bias by wrong over-specifying the behaviour as the historical behaviour. > Regaridng #1, here's a direct quote: > http://www.opengroup.org/onlinepubs/009695399/functions/getpriority.html > > "Upon successful completion, getpriority() shall return an integer in > the range -{NZERO} to {NZERO}-1. Otherwise, -1 shall be returned and > errno set to indicate the error." > Also: > "The getpriority() and setpriority() functions work with an offset nice > value (nice value -{NZERO}). The nice value is in the range [0,2*{NZERO} > -1], while the return value for getpriority() and the third parameter > for setpriority() are in the range [-{NZERO},{NZERO} -1]." This is the part that I misread. I only saw the "Also" part and I read it backwards as specifying Linux-like behaviour to avoid the in-band ierror indicator. > So this is a difference between priority as it is seen in user-land > (above libc layer) and priority inside the POSIX blackbox of OS (the one > in [0,2*{NZERO} -1] range). It is a bug in POSIX for POSIX to specify the black box. The FreeBSD black box doesn't actually use this range, and applications and users hardly notice since they mostly see the adjusted priorities (with default priority 0 instead of NZERO). > My understanding is that FreeBSD and Linux are very close to POSIXly > correct implemetations with NZERO=20. In fact, Linux's implementation is > completely compliant and FreeBSD allows +20 which is beyond the POSIX range. > Also, -1 return value from getpriority() is a problematic point of POSIX > specification not implemenations. To conform, FreeBSD would need to expand or shrink the priority range by 1 to cover or drop +20, and change NZERO from 0 to 20 or 21, and move the priorities in the grey box up by NZERO. > Regarding #2, both FreeBSD and Linux in their unique ways correctly > return errno/priority level from kernel-land to user-land. FreeBSD > syscall returns priority already in [-{NZERO},{NZERO} -1] range; Linux Except NZERO is 0 in FreeBSD. > syscall returns priority in [1,2*{NZERO}] range and with reversed > comparison, and then (g)libc stub of getpritority performs 20-X > conversion to return a correct value to application. >> I think the reason that setpriority(2) is not affected is actually that >> Linux applications know to use (20 - pri) to recover the actual priority. It is actually the library stub that does this. So getpriority(2) doesn't give POSIX getpriority in Linux, but getpriority() 3 does. >> Fixing getpriority() in FreeBSD and all emulators should involve much the >> same code: map the range of internal priorities [PRIO_MIN, PRIO_MAX] to >> getpriority()'s range [0, 2*{SUBSYSTEM_SPECIFIC_NZERO}-1] as linearly >> as possible (something like: >> >> pri |-> (pri - PRIO_MIN) * (2 * SUBSYSTEM_SPECIFIC_NZERO - 1) / >> (PRIO_MAX - PRIO_MIN) >> >> but more complicated, since for if SUBSYSTEM_SPECIFIC_NZERO == 20 the >> above maps the default priority 0 to (20 * 39 / 2) = 19, but 20 is >> required; also for Linux there must be a negation. > > I think you have greatly overcomplicated thing sbecause of your original > misunderstanding. Just compile a small program using > getpriority/setpriority for FreeBSD, Linux and any other Unix avaialble > to you, run it and you will see how simple thingx are in reality and > that NZERO is not visible to userland. Read the man pages too. > Yes, and try Linux emulation with and without my patch to understand > what the problem with emualtion really is. This part of my previous mail is almost correct. There is an internal range [PRIO_MIN, PRIO_MAX] which should be mapped to the [-{NZERO}, {NZERO} -1] range (not the [0, 2*{NZERO} - 1] range like I said previously. setpriority() should invert this mapping. Matching the range of the emulated system is actually more important for setpriority(), since applications probably treat values returned by getpriority() as cookies and don't notice if they are out of bounds, but the kernel does range checking on the values passed by setpriority(). In addition, for Linux getpriority() the values must be mapped by pri |-> 20 - pri so that the library stub can restore the previous values. The magic 20 is spelled 20 in the Linux kernel (2.6.10 at least) and as PZERO in glibc (2.3.2 at least). This secondary mapping makes scaling in the first mapping more important, since if FreeBSD had +21 in its priority range, then 20 - pri would give a value of -1 and the library stub would conider this to be an error. Summary: I don't like the committed version since it has many subtle magic numbers in its 20 - X formula: 20: part of Linux adjustment. 20 = 1 + Linux's maximum priority. -1: another part of Linux adjustment 1: factor of 20/20 for the scaling step, where the first 20 is what should be Linux's NZERO and the second 20 is what should be FreeBSD's NZERO (= (PRIO_MAX - PRIO_MIN) / 2). Note that these 20's are subtly different from the 20 in Linux's adjustment. 0: bias for the scaling step (= FreeBSD NZERO). Bruce
on 11.06.2005 21:37 Bruce Evans said the following: > Summary: I don't like the committed version since it has many subtle > magic numbers in its 20 - X formula: > 20: part of Linux adjustment. 20 = 1 + Linux's maximum priority. > -1: another part of Linux adjustment > 1: factor of 20/20 for the scaling step, where the first 20 is what should > be Linux's NZERO and the second 20 is what should be FreeBSD's NZERO > (= (PRIO_MAX - PRIO_MIN) / 2). Note that these 20's are subtly > different from the 20 in Linux's adjustment. > 0: bias for the scaling step (= FreeBSD NZERO). > Bruce, I agree with your reasoning and description of the situation. Yes, "20-X" formula would be broken if there are any significant changes in Linux or FreeBSD kernels with respect to process priorities. Unfortunately I can not promise to do any work to make this conversion more proper, so I suggest that we keep 20-X plus, maybe, /*XXX*/ comment until somebody makes it perfect. Having no conversion would be (was!) worse, I think. Returning to a more general level, I also agree with you that POSIX should specify only interfaces and it is very strange that they talk about some internal states; they made things more confusing while perhaps trying to explain them better. I found the following note in AIX man page: http://publib16.boulder.ibm.com/doc_link/en_US/a_doc_lib/libs/basetrf1/basetrf1tfrm.htm Process priorities in AT&T System V are defined in the range of 0 to 39, rather than -20 to 20 as in BSD, and the nice library routine is supported by both. Accordingly, two versions of the nice are supported by AIX Version 3. The default version behaves like the AT&T System V version, with the Increment parameter treated as the modifier of a value in the range of 0 to 39 (0 corresponds to -20, 39 corresponds to 9, and priority 20 is not reachable with this interface). If I read this correctly, POSIX authors tried to cater to both worlds, so they designed (or merely described) something that has an interface close to BSD internals while talking about mapping it to SysV internals. Having historical nice(2) system call has probably also added its share of complexity. As to the [-20,+20] range, I see in HP-UX, Solaris and AIX man pages that they also have this range. Seems that BSD won over POSIX (and SysV) in this case. -- Andriy Gapon
on 13.06.2005 15:34 Andriy Gapon said the following: > As to the [-20,+20] range, I see in HP-UX, Solaris and AIX man pages > that they also have this range. Seems that BSD won over POSIX (and SysV) > in this case. Another idea - all the above OSes and FreeBSD are POSIX compliant with NZERO=21 and further restriction (which is allowed by POSIX) of prohibiting -21 priority. How does this sound ? :-) -- Andriy Gapon
Responsible Changed From-To: freebsd-emulation->emulation Make the assignment match the others (although this one is probably correct and the others are wrong.)
Maxim, could you please check the conversation that occurred regarding this PR: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/81951 and close the PR ? I think the agreem ent was reached that the fix works correctly and that nobody is going to do any work any time soon on making it adaptive and perfect. -- Andriy Gapon
State Changed From-To: open->closed Fix committed, thank you!