Bug 180328

Summary: awk(1) fails to treat var as integer
Product: Base System Reporter: Steffen "Daode" Nurpmeso <sdaoden>
Component: binAssignee: freebsd-bugs (Nobody) <bugs>
Status: Open ---    
Severity: Affects Only Me CC: jwb, nosuw, syzosab, zaqi
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   

Description Steffen "Daode" Nurpmeso 2013-07-05 18:20:01 UTC
Note first that this problem also occurs for Mac OS X Snow Leopard and NetBSD current.  I have not yet tested GNU awk.
I use awk(1) to generate test data from Unicode text files.
. i think the best is i show it:

## Input producers

io_unicode_data() {
   < unicode/UnicodeData.txt ${TAWK} '
      BEGIN {FS = ";" ; OFS = ";"}
      # There are no comments in this, but..
      /^[[:space:]]*[^#]+$/ {
         i = $2
         # Ranges must become unrolled, otherwise step on
         if (i !~ /, First>/) {
            $2 = ""

         r1 = sprintf("%d", "0x" $1)
         r2 = sprintf("%d", "0x" $1)
         $2 = ""
         # This gets around a bug in at least "awk version 20070501" as found
         # on Slow Leopard: there the range F0000-FFFFD, and only that one,
         # will *not* be evaluated unless we do this (once property test came)
         # XXX presumably the type system is a bit weird; check other AWKs!
         sprintf("%X %X", r1, r2)
this is it; UnicodeData.txt contains multiple ranges, but only this one will be "omitted" without sprintf(), the while() will simply not execute otherwise.
         while (r1 <= r2) {
            $1 = sprintf("%X", r1)
            printf "%s\n", $0

How-To-Repeat: well..; git clone my S-CText and run `make ucd' with and without the line `sprintf("%X %X", r1, r2)', compare the resulting `test/sa/t_props.dat' files.
Comment 1 Mark Linimon 2013-07-05 23:02:01 UTC
----- Forwarded message from Steffen Daode Nurpmeso <sdaoden@gmail.com> -----

Date: Fri, 05 Jul 2013 23:52:45 +0200
From: Steffen Daode Nurpmeso <sdaoden@gmail.com>
To: freebsd-bugs@FreeBSD.org
Subject: Re: bin/180328: awk(1) fails to treat var as integer
User-Agent: s-nail s-nail-14.3.2-20-g1f64075

uwe@netbsd prodded that i dig a bit deeper and so here is the
thing a bit narrowed down.  Sorry.

 | Please, can you minimize the test case?  As far as I understand it
 | should be reducible to the script and to a single line of input that
 | triggers the problem.


  cat > test.sh <<\!
  printf '1 '; printf "F0000\n" |
  awk '{r2 = r1 = sprintf("%d", "0x" $1); while (r1 <= r2) {print r1; ++r1}}'
  printf '2 '; printf "F0000\n" |
  awk '{r1 = sprintf("%d", "0x" $1); r2 = r1; while (r1 <= r2) {print r1; ++r1}}'
  printf '3 '; printf "F0000\n" |
  awk '{r1 = sprintf("%d", "0x" $1); while (r1 <= 983040) {print r1; ++r1}}'
  printf '4 '; printf "F0000\n" |
  awk '{r1 = sprintf("%d", "0x" $1); r2 = sprintf("%d", "0x" $1); while (r1 <= r2) {print r1; ++r1}}'
  printf '5 '; printf "F0000\n" |
  awk '{r1 = sprintf("%d", "0x" $1); r2 = sprintf("%d", "0x" $1); while (r1 <= r2) {print r1; ++r1}}'
  printf '6 '; printf "F0000 F0001\n" |
  awk '{r1 = sprintf("%d", "0x" $1); r2 = sprintf("%d", "0x" $1); while (r1 < r2) {print r1; ++r1}}'
  sh ./test.sh

results in

  1 983040
  2 983040
  3 983040
  4 983040
  5 983040

So -- indeed.  Sorry.

 | -uwe



  $ make ucd; ll test/sa/t_props.dat; make ucd-clean;\
  sed -e 40d -i '' tools/t-base.t; make ucd; ll test/sa/t_props.dat

becomes (when i strip all the other messages)

  ucd: ok
  4956 -rw-rw-r--  1 steffen  staff  5071362  5 Jul 23:40 test/sa/t_props.dat
  ucd-clean: ok
  ucd: ok
  4188 -rw-rw-r--  1 steffen  staff  4284954  5 Jul 23:40 test/sa/t_props.dat
freebsd-bugs@freebsd.org mailing list
To unsubscribe, send any mail to "freebsd-bugs-unsubscribe@freebsd.org"

----- End forwarded message -----
Comment 2 Steffen "Daode" Nurpmeso 2013-07-10 10:01:34 UTC
Hello, i'm forwarding one more.  (This time to bug-followup@ --
hello, Mark Linimon!)

-------- Original Message --------
Date: Wed, 10 Jul 2013 10:53:13 +0200
From: Steffen "Daode" Nurpmeso <sdaoden@gmail.com>
To: gnats-bugs@NetBSD.org
Subject: Re: bin/48017: awk(1) fails to treat var as integer (may be related
 to #47840)

David Holland <dholland-bugs@netbsd.org> wrote:
 | sprintf witih %d doesn't produce an number value; it produces a
 | string value, which you have to coerce to a number by adding zero to
 | it to get it to behave like a number.

(Adding +0 was my final solution too, because GNU awk(1) didn't
make it by the (presumably more expensive, too) sprintf("%X")
call just as all other tested awk(1)s did.)

So there is a problem with the implicit type conversion, since

  echo f001 f00d |\
  awk '{ a=sprintf("%d", "0x" $1); b=sprintf("%d", "0x" $2); while (a < b) { print a; a++; }}'

works just fine?!?  I think the relevant parts from POSIX are

  the value of an expression shall be implicitly converted to the
  type needed for the context in which it is used.
  A numeric value that is exactly equal to the value of an integer
  (see Concepts Derived from the ISO C Standard) shall be converted
  to a string by the equivalent of a call to the sprintf function
  (see String Functions) with the string "%d" as the fmt argument
  and the numeric value being converted as the first and only expr
  This volume of POSIX.1-2008 specifies no explicit conversions
  between numbers and strings. An application can force an
  expression to be treated as a number by adding zero to it, or can
  force it to be treated as a string by concatenating the null
  string ( "" ) to it.
  A string value shall be considered a numeric string if it comes
  from one of the following:
    1. Field variables
    8. Variable assignment from another numeric string variable
  and an implementation-dependent condition corresponding to either
  case (a) or (b) below is met.
    b. After all the following conversions have been applied, the
    resulting string would lexically be recognized as a NUMBER
    token as described by the lexical conventions in Grammar :
  Whether or not a string is a numeric string shall be relevant only
  in contexts where that term is used in this section.

And because the `Table: Expressions in Decreasing Precedence in awk'
contains the line

  expr < expr   Less than   Numeric   None

i believe its a bug.  (That hopefully gets fixed by someone who
yet has some experience with the awk codebase.)

 | David A. Holland
 | dholland@netbsd.org

Comment 3 Eitan Adler freebsd_committer freebsd_triage 2017-12-31 07:58:44 UTC
For bugs matching the following criteria:

Status: In Progress Changed: (is less than) 2014-06-01

Reset to default assignee and clear in-progress tags.

Mail being skipped
Comment 4 Jason W. Bacon freebsd_committer 2020-09-05 18:35:36 UTC
I ran into something similar, also solved using

if ( var1 + 0 < var2 )

For data read using getline, the behavior differs from mawk and gawk from ports.  I'm not sure if this should be regarded as a bug, but it should at least be documented.

Here's a minimal test case:

    printf("%s\n", x < y);  # Always 1
    printf("%s\n", x < y);  # Always 0

    getline x < "xy.txt"
    getline y < "xy.txt"
    printf("%s %s\n", x, y);   # Prove we're using values from getline
    printf("%s\n", x < y);     # awk 1, mawk and gawk 0


Comment 5 CedricMiller 2020-10-11 14:29:41 UTC
Comment 6 JaniePelayo 2020-10-19 13:20:37 UTC
Comment 7 BereniceRobertson 2020-10-20 06:03:52 UTC