Bug 81951 - [patch] linux emulation: getpriority() returns incorrect value
Summary: [patch] linux emulation: getpriority() returns incorrect value
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 5.4-RELEASE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-emulation (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-06 14:00 UTC by Andriy Gapon
Modified: 2005-08-01 18:23 UTC (History)
0 users

See Also:


Attachments
file.diff (5.26 KB, patch)
2005-06-06 14:00 UTC, Andriy Gapon
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Andriy Gapon 2005-06-06 14:00:03 UTC
As far as I could understand from Linux source and documentation Linux
getpriority(2) syscall returns values in range [1..40] and glibc counterpart
converts the return value using 20 - X formula, so that a caller receives
return values in range [-20..19]
FreeBSD linux emualtion does not emulate getpriority(2) syscall and FreeBSD
syscall is invoked directly. The FreeBSD syscall returns values in range
[-20..20], so after glibc convertion user recives incorrect priority level.
This problem impacts Linux applications that change their priority levels
using relative increments instead of setting an absolute value, especially
affected those applications that use nice(3) function that is implemented
using getpriority(2)+setpriority(2).

Fix: the following patch should fix the problem.
it probably contains some unneeded extra lines due to my inexperience
with system call handling.
it also will return values in range [0..40] instead of real linux [1..40]
but that should not cause any problems in practice, at least it has not
for me.
How-To-Repeat: 
1. enable linux emulation
2. Compile the following program using /compat/linux/usr/bin/gcc
3. execute the compiled program
4. see that after setpriority(,,1) getpriority returns 19

/** start of test.c **/
#include <sys/resource.h>

int main()
{

        int p;
        p = getpriority(PRIO_PROCESS, 0);
        printf("get prio = %d\n", p);

        setpriority(PRIO_PROCESS, 0, 1);

        p = getpriority(PRIO_PROCESS, 0);
        printf("get prio = %d\n", p);

        return 0;
}
/*** end of test.c ***/
Comment 1 Tilman Keskinoz freebsd_committer freebsd_triage 2005-06-06 16:56:58 UTC
Responsible Changed
From-To: freebsd-bugs->freebsd-emulation

Over to emulation mailinglist for review
Comment 2 Maxim Sobolev 2005-06-08 21:49:33 UTC
Committed, thanks!

I wonder if the setpriority(2) needs the same cure. Please clarify and 
let me know. I'll keep the PR open till your reply.

-Maxim
Comment 3 Andriy Gapon 2005-06-09 05:22:32 UTC
on 08.06.2005 23:49 Maxim Sobolev said the following:
> Committed, thanks!
> 
> I wonder if the setpriority(2) needs the same cure. Please clarify and
> let me know. I'll keep the PR open till your reply.
> 

Maxim,

setpriority(2) is not affected, the reason for this assymetry is in
Linux's convention for system calls - they return both result and errno
in the same register, positive values are reserved for results of
successful calls and negative are reserved for -errno for failed calls.
Thus they can not return negative priority values in getpriority(2) and
have to shift it to positive range. There is no problem, of course, with
passing negative values from userland to kernel.

Thank you for the commit!

-- 
Andriy Gapon
Comment 4 Bruce Evans 2005-06-09 14:17:44 UTC
> on 08.06.2005 23:49 Maxim Sobolev said the following:
> > Committed, thanks!
> >
> > I wonder if the setpriority(2) needs the same cure. Please clarify and
> > let me know. I'll keep the PR open till your reply.

I wonder why committers commit patches without fully understanding them.

> setpriority(2) is not affected, the reason for this assymetry is in
> Linux's convention for system calls - they return both result and errno
> in the same register, positive values are reserved for results of
> successful calls and negative are reserved for -errno for failed calls.
> Thus they can not return negative priority values in getpriority(2) and
> have to shift it to positive range. There is no problem, of course, with
> passing negative values from userland to kernel.

Returning -1 for an error is the usual convention for syscalls and is
specified by POSIX for getpriority().  The problem is that FreeBSD's
getpriority() is highly non-POSIX-conformant (it has an off-by 20 error and
wrong semantics for NZERO, and an off-by 1 error), so it can't be mapped
to an emulator's getpriority() using the identity map except in rare cases
where the emulator's getpriority() is bug for bug compatible.  But Linux's
getpriority() seems to be highly non-POSIX-conformant in a different, less
fundamentally broken way.

POSIX specifies that the non-error range of values returned by getpriority()
is [0, 2*{NZERO}-1]; -1 is the error indicator.  Applications must subtract
NZERO to get the actual priority value.

High non-POSIX-conformance:
FreeBSD:
NZERO is 0, so this range is null, and the actual range is [PRIO_MIN,
PRIO_MAX] = [-20, 20]; priority -1 collides with the error indicator
(the complications to handle this are documented in getpriority(3)).
NZERO is not mentioned in getpriority(3).

Linux:
NZERO is 20, so the POSIX range is [0, 39] which is usable, but the
actual range is apparently [1, 40]; the error indicator works normally.
Appalications must apparently negate the priority and add 20 to get
the actual priority (20 - pri) instead of (pri - NZERO).

I think the reason that setpriority(2) is not affected is actually that
Linux applications know to use (20 - pri) to recover the actual priority.

Fixing getpriority() in FreeBSD and all emulators should involve much the
same code: map the range of internal priorities [PRIO_MIN, PRIO_MAX] to
getpriority()'s range [0, 2*{SUBSYSTEM_SPECIFIC_NZERO}-1] as linearly
as possible (something like:

     pri |-> (pri - PRIO_MIN) * (2 * SUBSYSTEM_SPECIFIC_NZERO - 1) /
 	    (PRIO_MAX - PRIO_MIN)

but more complicated, since for if SUBSYSTEM_SPECIFIC_NZERO == 20 the
above maps the default priority 0 to (20 * 39 / 2) = 19, but 20 is
required; also for Linux there must be a negation.

Bruce
Comment 5 Andriy Gapon 2005-06-09 15:36:05 UTC
on 09.06.2005 16:17 Bruce Evans said the following:
>> on 08.06.2005 23:49 Maxim Sobolev said the following:
>> > Committed, thanks!
>> >
>> > I wonder if the setpriority(2) needs the same cure. Please clarify and
>> > let me know. I'll keep the PR open till your reply.
> 
> 
> I wonder why committers commit patches without fully understanding them.

I wonder if you fully understood the patch, the issue and the
getriority/setpriority.
> 
>> setpriority(2) is not affected, the reason for this assymetry is in
>> Linux's convention for system calls - they return both result and errno
>> in the same register, positive values are reserved for results of
>> successful calls and negative are reserved for -errno for failed calls.
>> Thus they can not return negative priority values in getpriority(2) and
>> have to shift it to positive range. There is no problem, of course, with
>> passing negative values from userland to kernel.
> 
> 
> Returning -1 for an error is the usual convention for syscalls and is
> specified by POSIX for getpriority().  The problem is that FreeBSD's
> getpriority() is highly non-POSIX-conformant (it has an off-by 20 error and
> wrong semantics for NZERO, and an off-by 1 error), so it can't be mapped
> to an emulator's getpriority() using the identity map except in rare cases
> where the emulator's getpriority() is bug for bug compatible.  But Linux's
> getpriority() seems to be highly non-POSIX-conformant in a different, less
> fundamentally broken way.
> 
> POSIX specifies that the non-error range of values returned by
> getpriority()
> is [0, 2*{NZERO}-1]; -1 is the error indicator.  Applications must subtract
> NZERO to get the actual priority value.

Bruce,

I think you have misread POSIX specification and you are confusing two
things: (1) priority - priority inside the blackbox that schedules
processes versus values that should be passed to setpriotiy() and
returned from getpriority(); (2) syscall internal implementation  versus
user-visible libc function.

Regaridng #1, here's a direct quote:
http://www.opengroup.org/onlinepubs/009695399/functions/getpriority.html

"Upon successful completion, getpriority() shall return an integer in
the range -{NZERO} to {NZERO}-1. Otherwise, -1 shall be returned and
errno set to indicate the error."
Also:
"The getpriority() and setpriority() functions work with an offset nice
value (nice value -{NZERO}). The nice value is in the range [0,2*{NZERO}
-1], while the return value for getpriority() and the third parameter
for setpriority() are in the range [-{NZERO},{NZERO} -1]."

So this is a difference between priority as it is seen in user-land
(above libc layer) and priority inside the POSIX blackbox of OS (the one
in [0,2*{NZERO} -1] range).

My understanding is that FreeBSD and Linux are very close to POSIXly
correct implemetations with NZERO=20. In fact, Linux's implementation is
completely compliant and FreeBSD allows +20 which is beyond the POSIX range.
Also, -1 return value from getpriority() is a problematic point of POSIX
specification not implemenations.

Regarding #2, both FreeBSD and Linux in their unique ways correctly
return errno/priority level from kernel-land to user-land. FreeBSD
syscall returns priority already in [-{NZERO},{NZERO} -1] range; Linux
syscall returns priority in [1,2*{NZERO}] range and with reversed
comparison, and then (g)libc stub of getpritority performs 20-X
conversion to return a correct value to application.

> High non-POSIX-conformance:
> FreeBSD:
> NZERO is 0, so this range is null, and the actual range is [PRIO_MIN,
> PRIO_MAX] = [-20, 20]; priority -1 collides with the error indicator
> (the complications to handle this are documented in getpriority(3)).
> NZERO is not mentioned in getpriority(3).

There is no getpriority(3), only getpriority(2) and it quite rightly
doesn't talk about NZERO as it is of no interest to applications and it
quite rightly talks about troubles with -1 as it is a problem of
interface defined by POSIX.

> Linux:
> NZERO is 20, so the POSIX range is [0, 39] which is usable, but the
> actual range is apparently [1, 40]; the error indicator works normally.
> Appalications must apparently negate the priority and add 20 to get
> the actual priority (20 - pri) instead of (pri - NZERO).
> 
> I think the reason that setpriority(2) is not affected is actually that
> Linux applications know to use (20 - pri) to recover the actual priority.
> 
> Fixing getpriority() in FreeBSD and all emulators should involve much the
> same code: map the range of internal priorities [PRIO_MIN, PRIO_MAX] to
> getpriority()'s range [0, 2*{SUBSYSTEM_SPECIFIC_NZERO}-1] as linearly
> as possible (something like:
> 
>     pri |-> (pri - PRIO_MIN) * (2 * SUBSYSTEM_SPECIFIC_NZERO - 1) /
>         (PRIO_MAX - PRIO_MIN)
> 
> but more complicated, since for if SUBSYSTEM_SPECIFIC_NZERO == 20 the
> above maps the default priority 0 to (20 * 39 / 2) = 19, but 20 is
> required; also for Linux there must be a negation.


I think you have greatly overcomplicated thing sbecause of your original
misunderstanding. Just compile a small program using
getpriority/setpriority for FreeBSD, Linux and any other Unix avaialble
to you, run it and you will see how simple thingx are in reality and
that NZERO is not visible to userland. Read the man pages too.
Yes, and try Linux emulation with and without my patch to understand
what the problem with emualtion really is.


-- 
Andriy Gapon
Comment 6 Bruce Evans 2005-06-11 19:37:24 UTC
On Thu, 9 Jun 2005, Andriy Gapon wrote:

> on 09.06.2005 16:17 Bruce Evans said the following:
>>> on 08.06.2005 23:49 Maxim Sobolev said the following:
>>>> Committed, thanks!
>>>>
>>>> I wonder if the setpriority(2) needs the same cure. Please clarify and
>>>> let me know. I'll keep the PR open till your reply.
>>
>> I wonder why committers commit patches without fully understanding them.
>
> I wonder if you fully understood the patch, the issue and the
> getriority/setpriority.

I thought I did, but I read POSIX partly backwards.

>> POSIX specifies that the non-error range of values returned by
>> getpriority()
>> is [0, 2*{NZERO}-1]; -1 is the error indicator.  Applications must subtract
>> NZERO to get the actual priority value.

> I think you have misread POSIX specification and you are confusing two
> things: (1) priority - priority inside the blackbox that schedules
> processes versus values that should be passed to setpriotiy() and
> returned from getpriority(); (2) syscall internal implementation  versus
> user-visible libc function.

Priority in the black bix is td->td_priority.  p->p_nice is supposed to
be the user-visible priority offset by NZERO in freeBSD, and it is, but
things are made confusing by "fixing" the historical value of NZERO so
that NZERO is 0.  Biases of 0 are subtle and POSIX has made the NZERO = 0
bias by wrong over-specifying the behaviour as the historical behaviour.

> Regaridng #1, here's a direct quote:
> http://www.opengroup.org/onlinepubs/009695399/functions/getpriority.html
>
> "Upon successful completion, getpriority() shall return an integer in
> the range -{NZERO} to {NZERO}-1. Otherwise, -1 shall be returned and
> errno set to indicate the error."
> Also:
> "The getpriority() and setpriority() functions work with an offset nice
> value (nice value -{NZERO}). The nice value is in the range [0,2*{NZERO}
> -1], while the return value for getpriority() and the third parameter
> for setpriority() are in the range [-{NZERO},{NZERO} -1]."

This is the part that I misread.  I only saw the "Also" part and I read
it backwards as specifying Linux-like behaviour to avoid the in-band
ierror indicator.

> So this is a difference between priority as it is seen in user-land
> (above libc layer) and priority inside the POSIX blackbox of OS (the one
> in [0,2*{NZERO} -1] range).

It is a bug in POSIX for POSIX to specify the black box.  The FreeBSD
black box doesn't actually use this range, and applications and users
hardly notice since they mostly see the adjusted priorities (with
default priority 0 instead of NZERO).

> My understanding is that FreeBSD and Linux are very close to POSIXly
> correct implemetations with NZERO=20. In fact, Linux's implementation is
> completely compliant and FreeBSD allows +20 which is beyond the POSIX range.
> Also, -1 return value from getpriority() is a problematic point of POSIX
> specification not implemenations.

To conform, FreeBSD would need to expand or shrink the priority range by
1 to cover or drop +20, and change NZERO from 0 to 20 or 21, and move the
priorities in the grey box up by NZERO.

> Regarding #2, both FreeBSD and Linux in their unique ways correctly
> return errno/priority level from kernel-land to user-land. FreeBSD
> syscall returns priority already in [-{NZERO},{NZERO} -1] range; Linux

Except NZERO is 0 in FreeBSD.

> syscall returns priority in [1,2*{NZERO}] range and with reversed
> comparison, and then (g)libc stub of getpritority performs 20-X
> conversion to return a correct value to application.

>> I think the reason that setpriority(2) is not affected is actually that
>> Linux applications know to use (20 - pri) to recover the actual priority.

It is actually the library stub that does this.  So getpriority(2) doesn't
give POSIX getpriority in Linux, but getpriority() 3 does.

>> Fixing getpriority() in FreeBSD and all emulators should involve much the
>> same code: map the range of internal priorities [PRIO_MIN, PRIO_MAX] to
>> getpriority()'s range [0, 2*{SUBSYSTEM_SPECIFIC_NZERO}-1] as linearly
>> as possible (something like:
>>
>>     pri |-> (pri - PRIO_MIN) * (2 * SUBSYSTEM_SPECIFIC_NZERO - 1) /
>>         (PRIO_MAX - PRIO_MIN)
>>
>> but more complicated, since for if SUBSYSTEM_SPECIFIC_NZERO == 20 the
>> above maps the default priority 0 to (20 * 39 / 2) = 19, but 20 is
>> required; also for Linux there must be a negation.
>
> I think you have greatly overcomplicated thing sbecause of your original
> misunderstanding. Just compile a small program using
> getpriority/setpriority for FreeBSD, Linux and any other Unix avaialble
> to you, run it and you will see how simple thingx are in reality and
> that NZERO is not visible to userland. Read the man pages too.
> Yes, and try Linux emulation with and without my patch to understand
> what the problem with emualtion really is.

This part of my previous mail is almost correct.  There is an internal
range [PRIO_MIN, PRIO_MAX] which should be mapped to the [-{NZERO},
{NZERO} -1] range (not the [0, 2*{NZERO} - 1] range like I said
previously.  setpriority() should invert this mapping.  Matching the
range of the emulated system is actually more important for setpriority(),
since applications probably treat values returned by getpriority() as
cookies and don't notice if they are out of bounds, but the kernel
does range checking on the values passed by setpriority().  In addition,
for Linux getpriority() the values must be mapped by pri |-> 20 - pri
so that the library stub can restore the previous values.  The magic
20 is spelled 20 in the Linux kernel (2.6.10 at least) and as PZERO
in glibc (2.3.2 at least).  This secondary mapping makes scaling in the
first mapping more important, since if FreeBSD had +21 in its priority
range, then 20 - pri would give a value of -1 and the library stub would
conider this to be an error.

Summary: I don't like the committed version since it has many subtle
magic numbers in its 20 - X formula:
20: part of Linux adjustment.  20 = 1 + Linux's maximum priority.
-1: another part of Linux adjustment
1: factor of 20/20 for the scaling step, where the first 20 is what should
     be Linux's NZERO and the second 20 is what should be FreeBSD's NZERO
     (= (PRIO_MAX - PRIO_MIN) / 2).  Note that these 20's are subtly
     different from the 20 in Linux's adjustment.
0: bias for the scaling step (= FreeBSD NZERO).

Bruce
Comment 7 Andriy Gapon 2005-06-13 13:34:20 UTC
on 11.06.2005 21:37 Bruce Evans said the following:
> Summary: I don't like the committed version since it has many subtle
> magic numbers in its 20 - X formula:
> 20: part of Linux adjustment.  20 = 1 + Linux's maximum priority.
> -1: another part of Linux adjustment
> 1: factor of 20/20 for the scaling step, where the first 20 is what should
>     be Linux's NZERO and the second 20 is what should be FreeBSD's NZERO
>     (= (PRIO_MAX - PRIO_MIN) / 2).  Note that these 20's are subtly
>     different from the 20 in Linux's adjustment.
> 0: bias for the scaling step (= FreeBSD NZERO).
> 

Bruce,

I agree with your reasoning and description of the situation. Yes,
"20-X" formula would be broken  if there are any significant changes in
Linux or FreeBSD kernels with respect to process priorities.
Unfortunately I can not promise to do any work to make this conversion
more proper, so I suggest that we keep 20-X plus, maybe, /*XXX*/ comment
until somebody makes it perfect. Having no conversion would be (was!)
worse, I think.

Returning to a more general level, I also agree with you that POSIX
should specify only interfaces and it is very strange that they talk
about some internal states; they made things more confusing while
perhaps trying to explain them better. I found the following note in
AIX man page:

http://publib16.boulder.ibm.com/doc_link/en_US/a_doc_lib/libs/basetrf1/basetrf1tfrm.htm
Process priorities in AT&T System V are defined in the range of 0 to 39,
rather than -20 to 20 as in BSD, and the nice library routine is
supported by both. Accordingly, two versions of the nice are supported
by AIX Version 3. The default version behaves like the AT&T System V
version, with the Increment parameter treated as the modifier of a value
in the range of 0 to 39 (0 corresponds to -20, 39 corresponds to 9, and
priority 20 is not reachable with this interface).

If I read this correctly, POSIX authors tried to cater to both worlds,
so they designed (or merely described) something that has an interface
close to BSD internals while talking about mapping it to SysV internals.
Having historical nice(2) system call has probably also added its share
of complexity.

As to the [-20,+20] range, I see in HP-UX, Solaris and AIX man pages
that they also have this range. Seems that BSD won over POSIX (and SysV)
in this case.

-- 
Andriy Gapon
Comment 8 Andriy Gapon 2005-06-13 15:09:36 UTC
on 13.06.2005 15:34 Andriy Gapon said the following:
> As to the [-20,+20] range, I see in HP-UX, Solaris and AIX man pages
> that they also have this range. Seems that BSD won over POSIX (and SysV)
> in this case.

Another idea - all the above OSes and FreeBSD are POSIX compliant with
NZERO=21 and further restriction (which is allowed by POSIX) of
prohibiting -21 priority. How does this sound ? :-)

-- 
Andriy Gapon
Comment 9 Mark Linimon freebsd_committer freebsd_triage 2005-06-13 18:35:22 UTC
Responsible Changed
From-To: freebsd-emulation->emulation

Make the assignment match the others (although this one is probably 
correct and the others are wrong.)
Comment 10 Andriy Gapon 2005-06-22 11:32:55 UTC
Maxim,

could you please check the conversation that occurred regarding this PR:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/81951
and close the PR ?
I think the agreem
ent was reached that the fix works correctly and that nobody is going to
do any work any time soon on making it adaptive and perfect.


-- 
Andriy Gapon
Comment 11 Maxim Sobolev freebsd_committer freebsd_triage 2005-08-01 18:22:53 UTC
State Changed
From-To: open->closed

Fix committed, thank you!