vmstat(8) -w should produce an error message and exit when fed a negative numerical value or a non numerical value at all, in which case atoi simply returns 0. This is the way iostat(8) handles this situation. If we do not check for a negative value, then the negative value we are fed becomes an extremely large unsigned int and the thread will sleep(3) for a long time indeed. Fix: apply my patch, all we need is a simple check if the value is less than 1. This way an error message also occurs if we could not parse a number, since the return value in that case is 0. version of the file i've used: FBSDID("$FreeBSD: src/usr.bin/vmstat/vmstat.c,v 1.98.2.4 2009/11/06 20:33:53 jhb Exp $"); Patch attached with submission follows: How-To-Repeat: > vmstat -w -1 > vmstat -w -32409 > vmstat -w abc
On Sun, 17 Jan 2010, Efstratios Karatzas <gpf.kira@gmail.com> wrote: >> Description: > vmstat(8) -w should produce an error message and exit when fed a negative numerical value or a non numerical value at all, in which case atoi simply returns 0. This is the way iostat(8) handles this situation. > > If we do not check for a negative value, then the negative value we are fed becomes an extremely large unsigned int and the thread will sleep(3) for a long time indeed. Please use line lengths of considerably less than 168 characters. I'm editing this with a slightly wrong $TERMCAP and the line wrap from the long lines is even more horrible than usual. There is another bad atoi() for the wait interval, in the BACKWARD_COMPATIBILITY ifdef, which is a non-optional option. There are other bad atoi()s in the file. One has a bounds check and neither has a type error to break negative values. >> Fix: > apply my patch, all we need is a simple check if the value is less than 1. This way an error message also occurs if we could not parse a number, since the return value in that case is 0. -w 0 used to sort of work -- it was equivalent to not specifying an interval. Maybe some scripts or fingers depend on this. Probably lots still depends on the undocumented BACKWARD_COMPATIBILITY behaviour (vmstat 1 == vmstat -w 1), so this non-optional option can never be removed. The other bad atoi()s don't use this trick, so they give a garbage result for parse errors. Bruce
On Mon, Jan 18, 2010 at 5:55 AM, Bruce Evans <brde@optusnet.com.au> wrote: > On Sun, 17 Jan 2010, Efstratios Karatzas <gpf.kira@gmail.com> wrote: > >>> Description: >> >> vmstat(8) -w should produce an error message and exit when fed a negativ= e >> numerical value or a non numerical value at all, in which case atoi simp= ly >> returns 0. This is the way iostat(8) handles this situation. >> >> If we do not check for a negative value, then the negative value we are >> fed becomes an extremely large unsigned int and the thread will sleep(3)= for >> a long time indeed. > > Please use line lengths of considerably less than 168 characters. =C2=A0I= 'm > editing this with a slightly wrong $TERMCAP and the line wrap from the > long lines is even more horrible than usual. Sorry about that, won't happen again. > > There is another bad atoi() for the wait interval, in the > BACKWARD_COMPATIBILITY ifdef, which is a non-optional option. > > There are other bad atoi()s in the file. =C2=A0One has a bounds check and= neither > has a type error to break negative values. > One pr, one bug fix. Let's see if we can get anywhere with this first. >>> Fix: >> >> apply my patch, all we need is a simple check if the value is less than = 1. >> This way an error message also occurs if we could not parse a number, si= nce >> the return value in that case is 0. > > -w 0 used to sort of work -- it was equivalent to not specifying an > interval. =C2=A0Maybe some scripts or fingers depend on this. =C2=A0Proba= bly lots > still depends on the undocumented BACKWARD_COMPATIBILITY behaviour > (vmstat 1 =3D=3D vmstat -w 1), so this non-optional option can never be > removed. > > The other bad atoi()s don't use this trick, so they give a garbage result > for parse errors. > > Bruce > I still can't understand why would anyone use "vmstat 0" or "vmstat -w 0". It's the same output if you just use "vmstat". I am aware of the BACKWARD_COMPATIBILITY behavior but negative values don't work with it, for example "vmstat -3" produces an illegal option erro= r. Although "vmstat abc" will work just fine but it is not much of a problem. We could perform a more "sophisticated" error check. Elaboration: A call to atoi(3) is actually a call to strtol(3). strtol(3) will return 0 and set errno to EINVAL if no conversion could be performed. eg "vmstat -w abc" So we could check the value of errno when atoi() returns 0. But the man page states that this feature is not portable across all platfo= rms! We could also just accept the "vmstat -w string" behavior as well as the "vmstat -w 0" and try to fix only the negative values which are clearly bug= s. To sum up, I'm willing to rewrite my patch again and again and supply more patches for other bugs as long as they don't remain forever in the PR database uncommented and *if* they are ok, un-commited. I read the quarterly status report yesterday and the relative wiki pages and I 'm starting to get a feeling of what is going on with the process of bugbustin= g and I also feel I must say thank you to everyone who's contributing to this part of the project. Rock on! So, any more input anyone? Thank you for your time, --=20 Efstratios "GPF" Karatzas
Just thought of a simpler solution // if it is really 0 if (atoi(optarg) == 0 && strncmp(optarg, "0", 1) == 0 ) { // do stuff } Prob is that the "vmstat -w 0abc" command would be equivalent to "vmstat -w 0" but I think that's acceptable. Get back at me. -- Efstratios "GPF" Karatzas
On Mon, 18 Jan 2010, Efstratios Karatzas wrote: > Just thought of a simpler solution > > // if it is really 0 > if (atoi(optarg) == 0 && strncmp(optarg, "0", 1) == 0 ) { > // do stuff > } Ugh, use strtol() (correctly) instead of that. > Prob is that the > "vmstat -w 0abc" > command would be equivalent to > "vmstat -w 0" > but I think that's acceptable. Normal error checking for strtol() would check for there being garbage after the number. I don't insist on using strtol() here, but it should be considered whenever fixing an atoi() bug. Bruce
On Mon, 18 Jan 2010, Efstratios Karatzas wrote: > I still can't understand why would anyone use "vmstat 0" or "vmstat -w 0". > It's the same output if you just use "vmstat". > I am aware of the BACKWARD_COMPATIBILITY behavior but negative values > don't work with it, for example "vmstat -3" produces an illegal option error. That accidentally avoids the int -> unsigned conversion error. However, there is another conversion here, not fixed by your version: strtol() returns long, so on 64-bit arches, since long is actually long, it is longer than int, so when atoi blindly converts the value to int it gets truncated to a garbage value if the correct value is > INT_MAX. E.g.: 2**31 becomes -2**31 (accidentally handled as error, with confusing error message about the arg being < 1) 2**32 becomes 0 (accidentally...) 2**32+1 becomes 1 (error not detected) 2**63 first becomes LONG_MAX (2**63-1) with error ERANGE in strtol() becomes 2**31-1 (error not detected) > 2**63: same as 2**63. -2**31 stays as -2**31 -2**32 becomes 0 -2**32+1 becomes 1 >= 2**-63 first becomes LONG_MIN, then INT_MIN (error detected). The large values are not useful so they shouldn't be used, but the same is true for negative values. > Although "vmstat abc" will work just fine but it is not much of a problem. > > We could perform a more "sophisticated" error check. > Elaboration: > > A call to atoi(3) is actually a call to strtol(3). > strtol(3) will return 0 and set errno to EINVAL if no conversion could be > performed. > eg "vmstat -w abc" > So we could check the value of errno when atoi() returns 0. After setting errno to != EINVAL before the call. > But the man page states that this feature is not portable across all platforms! This is not even portable to POSIX systems. On POSIX systems, the POSIX is also undefined on overflow. Other errors are only implementation-defined. This depends on the behaviour being the same as casting strtol() to int, as documented, and on strtol() having a new misfeature in POSIX (that errno is clobbered to EINVAL for some but not all types of parsing errors). This behaviour should never be depended on. Just use strtol() with normal error checking (check that the end pointer has advanced and doesn't point to garbage). errno doesn't need to be checked for this. However, errno needs to be set to != ERANGE before the call and checked == ERANGE after the call to check for range errors. Then the result must be checked to fit in the int or unsigned variable. FreeBSD has 2 nonstandard badly designed interfaces that are intended to be easier to use than strto*() and atoi(): strtonum() and expand_number(). Their bad design starts with then not supporting intmax_t, uintmax_t or any floating point type. strtonum() doesn't even support unsigned longs on 64-bit arches, and it uses the long long abomination. But it is an adequate replacement for strtol(), and thus OK for replacing atoi() in most command option parsing. Another issue is whether numeric command options should be limited to decimal values representable as a long or even as an int. I think they shouldn't have any arbitrary limits, but POSIX may over-specify them as being decimal and/or of limited size. Another design bug in strtonum() is that it only supports decimal numbers. This makes it more compatible with atoi() but less useful than it should be. Bruce
On Tue, Jan 19, 2010 at 4:04 PM, Bruce Evans <brde@optusnet.com.au> wrote: > This behaviour should never be depended on. Â Just use strtol() with > normal error checking (check that the end pointer has advanced and > doesn't point to garbage). Â errno doesn't need to be checked for this. > However, errno needs to be set to != ERANGE before the call and checked > == ERANGE after the call to check for range errors. Â Then the result > must be checked to fit in the int or unsigned variable. > > FreeBSD has 2 nonstandard badly designed interfaces that are intended > to be easier to use than strto*() and atoi(): strtonum() and > expand_number(). Â Their bad design starts with then not supporting > intmax_t, uintmax_t or any floating point type. Â strtonum() doesn't > even support unsigned longs on 64-bit arches, and it uses the long > long abomination. Â But it is an adequate replacement for strtol(), > and thus OK for replacing atoi() in most command option parsing. > > Another issue is whether numeric command options should be limited to > decimal values representable as a long or even as an int. Â I think they > shouldn't have any arbitrary limits, but POSIX may over-specify them as > being decimal and/or of limited size. Â Another design bug in strtonum() > is that it only supports decimal numbers. Â This makes it more compatible > with atoi() but less useful than it should be. Thanks for the input Bruce. I studied the manpages of the aforementioned alternatives to atoi and strto*. I also examined code from OpenBSD and NetBSD to see how they deal with this kind of problem. My conclusion is that strtonum() is the best choise when dealing with simple argument parsing, strtol() is just difficult to use correctly and the error checking is kinda ugly. So I applied a "strtonum fix" to all of the atoi()s in vmstat.c to save some time. Checked the program with various input and it seems ok. I also took style(9) into account. The only thing I don't like about this fix is the self-imposed limit on the values we accept. min limit for everything is 0 -w max is an hour -c max is 10 000 -n max is 1000 Although the openbsd devs don't seem to mind it, for example the max limit for vmstat -w of openbsd is 16,6 mins. If you think that's not acceptable, I'm willing to write a strtol fix. But it definitely looks better now, Cheers -- Efstratios "GPF" Karatzas
On Wed, 20 Jan 2010, Efstratios Karatzas wrote: > Thanks for the input Bruce. > > I studied the manpages of the aforementioned alternatives to atoi and strto*. > I also examined code from OpenBSD and NetBSD to see how they deal with > this kind of problem. My conclusion is that strtonum() is the best choise > when dealing with simple argument parsing, ... > correctly and the error checking is kinda ugly. > The only thing I don't like about this fix is the self-imposed limit on the > values we accept. > min limit for everything is 0 > -w max is an hour > -c max is 10 000 > -n max is 1000 I don't like arbitrary limits. I would use INT_MAX, etc. % --- vmstat.c 2010-01-20 17:12:09.000000000 +0200 % +++ vmstat.orig.c 2010-01-20 16:58:56.000000000 +0200 The patch is reversed. This can be confusing. % @@ -196,9 +195,7 @@ % aflag++; % break; % case 'c': % - reps = (int)strtonum(optarg, 0, 10000, &errstr); I don't like casting the return value, just to silence broken compilers/ CFLAGS. strtonum() is supposed to be easy to use, but it is not so easy to use if you have to put unnecessary type info in or near it. If I wanted to do that then I would want an interface with the type in the name, like the one that this is replacing (atoi() for ints, and strtol() for longs, but unfortunately no strtoi() for ints). % - if (errstr != NULL) % - errx(1, "-c %s: %s", optarg, errstr); The previous line has lots of trailing white space which looks like an extra blank line after line wrap. % @@ -226,9 +223,10 @@ % break; % case 'n': % nflag = 1; % - maxshowdevs = (int)strtonum(optarg, 0, 1000, &errstr); % - if (errstr != NULL) % - errx(1, "-n %s: %s", optarg, errstr); % + maxshowdevs = atoi(optarg); % + if (maxshowdevs < 0) % + errx(1, "number of devices %d is < 0", % + maxshowdevs); Your error messages are a bit too cryptic. The old one here is better. Old: "vmstat: number of devices -1 is < 0" New: "vmstat: -n -1: too small" Not too bad, but it is better to print the limit that was not met. Only strtonum() is well placed to do this, but it only prints a generic message. % break; % case 'p': % if (devstat_buildmatch(optarg, &matches, &num_matches) != 0) % @@ -302,14 +298,9 @@ % #define BACKWARD_COMPATIBILITY % #ifdef BACKWARD_COMPATIBILITY % if (*argv) { % - interval = (unsigned int)strtonum(*argv, 0, 3600, &errstr); % - if (errstr != NULL) % - errx(1, "wait time %s: %s", *argv, errstr); % - if (*++argv) { % - reps = (int)strtonum(*argv, 0, 10000, &errstr); % - if (errstr != NULL) % - errx(1, "iterations %s: %s", *argv, errstr); % - } % + interval = atoi(*argv); % + if (*++argv) % + reps = atoi(*argv); % } % #endif % strtonum() is still painful to use after adding error handling, sigh. The above messages are less cryptic than the correspnding ones for -w and -n. I prefer them for -w and -n too. Concerning strtonum()'s bad API, I prefer something like: uintmax_t strtoix(const char restrict *nptr, intmax_t min, uintmax_t max, const char restrict *errfmt, ...); so that the above can become: interval = strtoix(*argv, 0, 3600, "wait time %s: %m", *argv); if (*++argv) reps = strtoix(*argv, 0, 10000, "iterations %s: %m", *argv); , plus many variants to regain control over standard features of the strto*() family: (char **endptr, int base, long double fmax, int typectl, long double fmin, long double *fresultptr, int errctl) and add extensions (int multiplier_ctl, int unit_ctl, ...). A version that can handle either integers or floating point in input, and output both (e.g., decimal integer <1 followed by 100 zeros> -> UINTMAX_MAX/ERANGE together with 1e100/no_fp_err) necessarily takes a lot of parameters, but this is still easier to use than building a general number parser out of strtouimax() and strtold(). Concerning implementation errors in strtonum(): it does undocumented clobbering of errno: it sets errno to 0 even when there is no error. This particular clobbering is explicitly forbidden for any Standard C Library function. Bruce
On Thu, Jan 21, 2010 at 10:07 AM, Bruce Evans <brde@optusnet.com.au> wrote: > On Wed, 20 Jan 2010, Efstratios Karatzas wrote: > >> The only thing I don't like about this fix is the self-imposed limit on >> the >> values we accept. >> min limit for everything is 0 >> -w max is an hour >> -c max is 10 000 >> -n max is 1000 > > I don't like arbitrary limits.  I would use INT_MAX, etc. Although huge values don't make any sense, we should not put values in code like that. I should have used a #define MAX_TIME 3600 or something like that. Now it's INT_MAX and UINT_MAX respectively. > > % --- vmstat.c  2010-01-20 17:12:09.000000000 +0200 > % +++ vmstat.orig.c   2010-01-20 16:58:56.000000000 +0200 > > The patch is reversed.  This can be confusing. Sorry about that! > > % @@ -196,9 +195,7 @@ > %            aflag++; > %            break; > %        case 'c': > % -           reps = (int)strtonum(optarg, 0, 10000, &errstr); > > I don't like casting the return value, just to silence broken compilers/ > CFLAGS.  strtonum() is supposed to be easy to use, but it is not so > easy to use if you have to put unnecessary type info in or near it. > If I wanted to do that then I would want an interface with the type > in the name, like the one that this is replacing (atoi() for ints, and > strtol() for longs, but unfortunately no strtoi() for ints). For the record, I don't think that casting here makes this code any less neat or readable, although now that you point it out it does provide no extra helpful information to the person reading the code, maybe less is better. I did remove them in this version. Out of curiosity, why is there no strtoi()? A lot of code still uses atoi() and this way we would lose the long to int conversion problem that you mentioned in 64bit archs. > > % -           if (errstr != NULL) > % -               errx(1, "-c %s: %s", optarg, errstr); > > The previous line has lots of trailing white space which looks like an > extra blank line after line wrap. No more white space trails this time. > > % @@ -226,9 +223,10 @@ > %            break; > %        case 'n': > %            nflag = 1; > % -           maxshowdevs = (int)strtonum(optarg, 0, 1000, > &errstr); > % -           if (errstr != NULL) > % -               errx(1, "-n %s: %s", optarg, errstr); % + >          maxshowdevs = atoi(optarg); > % +           if (maxshowdevs < 0) > % +               errx(1, "number of devices %d is < 0", > % +                  maxshowdevs); > > Your error messages are a bit too cryptic.  The old one here is better. > > Old: "vmstat: number of devices -1 is < 0" > New: "vmstat: -n -1: too small" > > Not too bad, but it is better to print the limit that was not met.  Only > strtonum() is well placed to do this, but it only prints a generic message. > > The above messages are less cryptic than the correspnding ones for -w > and -n.  I prefer them for -w and -n too. I do think that the original messages for -w and -c are cryptic. But after going through a lot of *BSD code, I thought that for some strange reason this was the norm (meaning cryptic error messages and code comments). I guess it's just because a lot of code was written a long time ago. Anyhoo, the current error messages should be "ok" but keep reading, there's more on that. > %  #define    BACKWARD_COMPATIBILITY > %  #ifdef    BACKWARD_COMPATIBILITY > %    if (*argv) { > % -       interval = (unsigned int)strtonum(*argv, 0, 3600, &errstr); > % -       if (errstr != NULL) > % -           errx(1, "wait time %s: %s", *argv, errstr); > % -       if (*++argv) { > % -           reps = (int)strtonum(*argv, 0, 10000, &errstr); > % -           if (errstr != NULL) > % -               errx(1, "iterations %s: %s", *argv, errstr); > % -       } > % +       interval = atoi(*argv); > % +       if (*++argv) > % +           reps = atoi(*argv); > %    } > %  #endif > % > > strtonum() is still painful to use after adding error handling, sigh. > Well not *that* painful for simple line argument parsing, i believe it will do for this case and I certainly prefer it to the strtol() solution. What I don't like are the generic error messages of strtonum(). For example: > Old: "vmstat: number of devices -1 is < 0" > New: "vmstat: -n -1: too small" Instead of printing the limits that are acceptable here through errx(), we could try and change the error message that strtonum() returns and solve this ugliness at its root. That would be the right thing to do imho, but I believe I need a commiter that agrees with me before I go around posting PRs just to change error strings in openbsd code because *I* don't like them. eg "vmstat: -n -1: less than 0" (0 being the min value we want in this scenario) would be a good error message. The same goes for the error message for a longer than our max supplied value. What do you think? > > Concerning strtonum()'s bad API, I prefer something like: > > uintmax_t strtoix(const char restrict *nptr, intmax_t min, uintmax_t max, >          const char restrict *errfmt, ...); > > so that the above can become: > >         interval = strtoix(*argv, 0, 3600, "wait time %s: %m", > *argv); >         if (*++argv) >             reps = strtoix(*argv, 0, 10000, "iterations %s: %m", >               *argv); > > , plus many variants to regain control over standard features of the > strto*() family: (char **endptr, int base, long double fmax, int > typectl, long double fmin, long double *fresultptr, int errctl) and > add extensions (int multiplier_ctl, int unit_ctl, ...).  A version > that can handle either integers or floating point in input, and output > both (e.g., decimal integer <1 followed by 100 zeros> -> UINTMAX_MAX/ERANGE > together with 1e100/no_fp_err) necessarily takes a lot of parameters, > but this is still easier to use than building a general number parser > out of strtouimax() and strtold(). > A few days ago I was reading netbsd's mailing list and the topic at the time was if they were going to port the strtonum() or not. A popular opinion was that we don't need yet another function to parse numbers, so I don't know how other fbsd devs feel about it. But all of our existing options have their shortcomings and no single function family is panacea, that is true. > Concerning implementation errors in strtonum(): it does undocumented > clobbering of errno: it sets errno to 0 even when there is no error. > This particular clobbering is explicitly forbidden for any Standard C > Library function. > I wasn't aware of this, I have yet to read to source code of strtonum() to be honest. I was going to post a pr so that we could add the extra "caveats" section of the openbsd man page for atoi(3) to the freebsd man page and also include strtonum() in the "see also" section of atoi. I could file another pr so that the freebsd man page for strtonum() will contain this information about errno though. Thank you for your time, -- Efstratios "GPF" Karatzas
On Thu, Jan 21, 2010 at 8:41 PM, Efstratios Karatzas <gpf.kira@gmail.com> wrote: > Instead of printing the limits that are acceptable here through errx(), > we could try and change the error message that strtonum() returns and > solve this ugliness at its root. That would be the right thing to do imho, > but I believe I need a commiter that agrees with me before I go around > posting PRs just to change error strings in openbsd code because *I* > don't like them. > > eg > "vmstat: -n -1: less than 0" (0 being the min value we want in this scenario) > would be a good error message. > > The same goes for the error message for a longer than our max supplied value. > > What do you think? > Nevermind about that, I just read the code of strtonum() and understood why my idea cannot be implemented without changing the function's interface. -- Efstratios "GPF" Karatzas
For bugs matching the following criteria: Status: In Progress Changed: (is less than) 2014-06-01 Reset to default assignee and clear in-progress tags. Mail being skipped
Keyword: patch or patch-ready – in lieu of summary line prefix: [patch] * bulk change for the keyword * summary lines may be edited manually (not in bulk). Keyword descriptions and search interface: <https://bugs.freebsd.org/bugzilla/describekeywords.cgi>