Bug 73328 - [patch] top(1) shows NICE as -111 on processes started by idprio
Summary: [patch] top(1) shows NICE as -111 on processes started by idprio
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: 5.3-STABLE
Hardware: Any Any
: Normal Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-10-31 00:10 UTC by J.Porter Clark
Modified: 2018-06-23 03:11 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description J.Porter Clark 2004-10-31 00:10:28 UTC
      The "top" program shows a NICE value of -111 for programs started by
idprio 31.  On 4.X, the "correct" value of 52 is shown.

Fix: 

My guess is that it's probably in /usr/src/usr.bin/top/machine.c about line 747 or so, but I haven't had time to dig into it.
How-To-Repeat:       As root, run "idprio 31 sleep 500 &".
Then run "top -Uroot".  The sleep process is listed as having a NICE value
of -111.  Other values besides 31 produce the same result.
Comment 1 Bruce Evans 2004-10-31 12:15:43 UTC
On Sat, 30 Oct 2004, J. Porter Clark wrote:

> >Description:
>       The "top" program shows a NICE value of -111 for programs started by
> idprio 31.  On 4.X, the "correct" value of 52 is shown.
> >How-To-Repeat:
>       As root, run "idprio 31 sleep 500 &".
> Then run "top -Uroot".  The sleep process is listed as having a NICE value
> of -111.  Other values besides 31 produce the same result.
> >Fix:
>       My guess is that it's probably in /usr/src/usr.bin/top/machine.c about line 747 or so, but I haven't had time to dig into it.

I use this fix.  It may be out of date, and the comments about the "base"
priority are too verbose and not quite right.

%%%
Index: machine.c
===================================================================
RCS file: /home/ncvs/src/usr.bin/top/machine.c,v
retrieving revision 1.51
diff -u -2 -r1.51 machine.c
--- machine.c	6 Jun 2004 19:59:06 -0000	1.51
+++ machine.c	7 Jun 2004 04:37:20 -0000
@@ -573,18 +573,67 @@
 	    smpmode ? smp_Proc_format : up_Proc_format,
 	    pp->ki_pid,
-	    namelength, namelength,
-	    (*get_userid)(pp->ki_ruid),
-	    pp->ki_pri.pri_level - PZERO,
-
-	    /*
-	     * normal time      -> nice value -20 - +20
-	     * real time 0 - 31 -> nice value -52 - -21
-	     * idle time 0 - 31 -> nice value +21 - +52
+	    namelength, namelength, (*get_userid)(pp->ki_ruid),
+  	    pp->ki_pri.pri_level - PZERO,
+	    /*-
+	     * Mapping from various disorganized priority schemes to ordered
+	     * pseudo-nice values:
+	     *
+	     * interrupt thread        base pri  0 - 63   -> nice -180 - -117
+	     * top half kernel thread  base pri  64 - 127 -> nice -116 -  -53
+	     * realtime user threads   rtprio    0 - 31   -> nice  -52 -  -21
+	     * normal user threads     nice     -20 - +20 -> nice  -20 -  +20
+	     * idle user threads       idprio    0 - 31   -> nice  +21 -  +52
+	     *
+	     * The number of interest is really the "base" priority of the
+	     * process, not the niceness of the process directly.  The base
+	     * priority should be what is is td->td_base_pri in the kernel,
+	     * which is ki_pri.pri_native here.  In practice, that can't
+	     * be used directly and the workarounds are complicated because
+	     * of the following bugs:
+	     *     o td->td_base_pri is changed by priority propagation and
+	     *       not even restored.  Thus it cannot be used to determine
+	     *       the priority class.  The other priorities in k_pri can
+	     *       be used for this, but they are set inconsistently too so
+	     *       there is no one place that determines the correct base
+	     *       priority.
+	     *     o td->td_base_pri is not set to a useful value for normal
+	     *       user threads.  It is initialized to 0 and only changed
+	     *       by priority propagation.  Workaround: use the actual
+	     *       nice value for the "base priority" of normal user
+	     *       threads.
+	     *     o kg->kg_user_pri (pri_user here) is not set to a useful
+	     *       value for kernel threads.  It is initialized to PUSER
+	     *       and never changed.  Something like it should be used
+	     *       for all classes of threads to hold the previous priority
+	     *       during priority propagation.  Then there might not need
+	     *       to be a special variable for the user -> kernel
+	     *       transitions (which are a type of priority propagation).
+	     *       I think a stack of such variables is needed in general
+	     *       though -- kg->kg_user_pri is special because it is at
+	     *       the top.
+	     *
+	     * We scale the base priority so that it agrees with the
+	     * historical nice value for normal user threads, although this
+	     * gives negative numbers for higher priority threads.
+	     *
+	     * PRI_BASE() strips the fifo scheduling bit from the priority
+	     * class.  This is not relevant for the conversion to niceness,
+	     * but it should be shown somewhere other as a raw number in
+	     * an abnormal ps format.  We don't use PRI_IS_REALTIME()
+	     * because there is no corresponding classification macro for
+	     * non-realtime priority classes and the details are too
+	     * messy to be hidden in macros.
+	     *
+	     * KNF indent -ci4 is intentionally violated here.
 	     */
-	    (pp->ki_pri.pri_class ==  PRI_TIMESHARE ?
-	    	pp->ki_nice - NZERO :
-	    	(PRI_IS_REALTIME(pp->ki_pri.pri_class) ?
-		    (PRIO_MIN - 1 - (PRI_MAX_REALTIME - pp->ki_pri.pri_level)) :
-		    (PRIO_MAX + 1 + pp->ki_pri.pri_level - PRI_MIN_IDLE))),
+	    PRI_BASE(pp->ki_pri.pri_class) == PRI_ITHD ?
+		PRIO_MIN + (pp->ki_pri.pri_native - PRI_MIN_TIMESHARE) :
+	    PRI_BASE(pp->ki_pri.pri_class) == PRI_REALTIME ?
+		PRIO_MIN + (pp->ki_pri.pri_user - PRI_MIN_TIMESHARE) :
+	    PRI_BASE(pp->ki_pri.pri_class) == PRI_TIMESHARE ?
+		pp->ki_nice - NZERO :
+	    PRI_BASE(pp->ki_pri.pri_class) == PRI_IDLE ?
+		PRIO_MAX + 1 + (pp->ki_pri.pri_user - PRI_MIN_IDLE) :
+	    666,
 	    format_k2(PROCSIZE(pp)),
 	    format_k2(pagetok(pp->ki_rssize)),
%%%

This area is broken in ps too.  The most obvious ones are:

- My ntpd process (which has realtime priority 0 and is correctly
  displayed by top as having "nice" -52) is displayed by `ps -o rtprio'
  as having priority "real:12".  The bogus 12 is just ntpd's current
  priority less PZERO.  This bug is the same as one of the ones fixed
  above.  It is that pri_level gives the current priority so it gives
  a wrong value to subtract from when the process is running at an
  elevated priority in kernel mode.  top and ps seem to get this wrong
  in RELENG_4 too.

- ps.1 says that `-o rtprio' causes a display of "101" for non-rtprio
  processes, but the actual display is "normal" for normal ones and
  a "%u.%u" format for unknown ones.  The man page became inconsistent
  with the code about "101" back in 1998 in rev.1.26 of ps/print.c.
  I think unknown cases occur for at least POSIX scheduling classes.

Bruce
Comment 2 J. Porter Clark 2004-10-31 19:48:45 UTC
On Sun, Oct 31, 2004 at 11:15:43PM +1100, Bruce Evans wrote:
> 
> I use this fix.  It may be out of date, and the comments about the "base"
> priority are too verbose and not quite right.

Seems to work, offset by a couple hundred lines.  Thanks!

> This area is broken in ps too.  The most obvious ones are:

Yes, I had noticed that ps's output is somewhat goofy in these
cases.

-- 
J. Porter Clark           j.porter.clark@nasa.gov
NASA/MSFC Flight & Ground Computers Branch (EI31)
Comment 3 Mark Linimon freebsd_committer freebsd_triage 2005-04-12 01:27:28 UTC
Responsible Changed
From-To: freebsd-i386->freebsd-bugs

This does not sound i386-specific.
Comment 4 Pav Lucistnik freebsd_committer freebsd_triage 2007-05-12 10:35:28 UTC
State Changed
From-To: open->feedback

Can you retry with top(1) from -CURRENT? It shows 
PRI 171, NICE i31 (idprio 31) 
for my processes, which looks correct. 


Comment 5 Pav Lucistnik freebsd_committer freebsd_triage 2007-05-12 10:35:28 UTC
Responsible Changed
From-To: freebsd-bugs->pav

Take replies
Comment 6 Bruce Evans 2007-05-12 15:01:03 UTC
On Sat, 12 May 2007, Pav Lucistnik wrote:

> State-Changed-Why:
> Can you retry with top(1) from -CURRENT? It shows
> PRI 171, NICE i31 (idprio 31)
> for my processes, which looks correct.

I fixed some of the bugs near here -current, but not all of them.  The
following may still be needed for a complete fix:

% Index: machine.c
% ===================================================================
% RCS file: /home/ncvs/src/usr.bin/top/machine.c,v
% retrieving revision 1.78
% diff -u -2 -r1.78 machine.c
% --- machine.c	7 Nov 2006 10:03:10 -0000	1.78
% +++ machine.c	18 Nov 2006 14:36:55 -0000
% @@ -814,5 +814,26 @@
%  		return ("-");
%  	case PRI_REALTIME:
% -		rtpri = pp->ki_pri.pri_level - PRI_MIN_REALTIME;
% +		/*
% +		 * XXX: the kernel doesn't tell us the original rtprio and
% +		 * doesn't really know what it was, so to recover it we
% +		 * must be more chummy with the implementation than the
% +		 * implementation is with itself.  pri_user gives a
% +		 * constant "base" priority, but is only initialized
% +		 * properly for user threads.  pri_native gives what the
% +		 * kernel calls the "base" priority, but it isn't constant
% +		 * since it is changed by priority propagation.  pri_native
% +		 * also isn't properly intialized for all threads, but it
% +		 * is properly initialized for kernel realtime and idletime
% +		 * threads.  Thus we use pri_user for the base priority of
% +		 * user threads (it is always correct) and pri_native for
% +		 * the base priority of kernel realtime and idletime threads
% +		 * (there is nothing better, and it is usually correct).
% +		 *
% +		 * The field width and thus the buffer are too small for
% +		 * values like "kr31F", but such values shouldn't occur,
% +		 * and if they do then the tailing "F" is not displayed.
% +		 */
% +		rtpri = ((pp->ki_flag & P_KTHREAD) ? pp->ki_pri.pri_native :
% +		    pp->ki_pri.pri_user) - PRI_MIN_REALTIME;
%  		snprintf(nicebuf, sizeof(nicebuf), "%sr%d%s",
%  		    kthread, rtpri, fifo);
% @@ -824,5 +845,7 @@
%  		break;
%  	case PRI_IDLE:
% -		rtpri = pp->ki_pri.pri_level - PRI_MIN_IDLE;
% +		/* XXX: as above. */
% +		rtpri = ((pp->ki_flag & P_KTHREAD) ? pp->ki_pri.pri_native :
% +		    pp->ki_pri.pri_user) - PRI_MIN_IDLE;
%  		snprintf(nicebuf, sizeof(nicebuf), "%si%d%s",
%  		    kthread, rtpri, fifo);

This hasn't been committed since it is extremely ugly and I've been hoping
for many years that Someone would fix the kernel so that it is not needed.

ps(1) has related bugs: partial output of top(1) with the above fix:

%   PID USERNAME  THR PRI NICE   SIZE    RES STATE    TIME   WCPU COMMAND
%    10 root        1 171 ki31     0K     8K RUN    324.7H 99.02% idle
%    11 root        1 -32    -     0K     8K WAIT     4:29  0.00% swi4: clock cy
%    26 root        1 171 ki31     0K     8K pgzero   0:10  0.00% pagezero
%   537 root        1  44   r0   836K   480K select   0:09  0.00% ntpd
%    27 root        1 171 ki31     0K     8K pollid   0:00  0.00% idlepoll

Output of ps jx -o rtprio for the corresponding processes:

% USER   PID  PPID  PGID   SID JOBC STAT  TT       TIME COMMAND           RTPRIO
% root    10     0     0     0    0 RL    ??  19486:11.06 [idle]           idle:25
% root    11     0     0     0    0 WL    ??    4:28.78 [swi4: clock cy  intr:52
% root    26     0     0     0    0 DL    ??    0:09.57 [pagezero]       idle:25
% root    27     0     0     0    0 DL    ??    0:00.30 [idlepoll]       idle:25
% root   537     1   537   537    0 Ss    ??    0:09.36 /usr/sbin/ntpd - real:12

This shows ps mangling idle priority 31 to 25 and realtime priority 0 to 12.

This shows ps getting the interrupt priority more or less correct.
Both top and ps print pri_level, which is problematic as partly explained
above since it is not always the same as the "base" priority; top
obfuscates all priorities by subtracting PZERO = 84, so priority 52
becomes -32.  It isn't clear how the interrupt priority should be
displayed in either top; top simply doesn't translate pri_level to an
interrupt pseudo-priority class, so its PRI column gives the class as
a priority modulo the race and the PZERO obfuscation, while ps prints
the priority in the class column without translation.

This doesn't show ps not understanding the priority class code for FIFO
scheduling.

Bruce
Comment 7 Pav Lucistnik freebsd_committer freebsd_triage 2007-05-12 15:18:09 UTC
State Changed
From-To: feedback->open

Looks like there are still some loose ends 


Comment 8 Pav Lucistnik freebsd_committer freebsd_triage 2007-05-12 15:18:09 UTC
Responsible Changed
From-To: pav->freebsd-bugs

Looks like there are still some loose ends
Comment 9 Edwin Groothuis freebsd_committer freebsd_triage 2008-09-25 23:01:13 UTC
Responsible Changed
From-To: freebsd-bugs->edwin

Grab it in the upgrade to 3.8b1
Comment 10 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 08:00:53 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 11 Eitan Adler freebsd_committer freebsd_triage 2018-06-23 03:11:21 UTC
This is no longer an issue on r335277 (not the revision that fixed it; just where tested).