Bug 257288

Summary: awk - loop over array index goes in reverse
Product: Base System Reporter: parv <parv.0zero9+freebsd>
Component: binAssignee: Warner Losh <imp>
Status: Closed Not A Bug    
Severity: Affects Many People CC: imp, parv.0zero9+freebsd
Priority: ---    
Version: 13.0-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
awk on FreeBSD 12, 13 printed lines in reverse order while iterating over array index none

Description parv 2021-07-19 21:25:43 UTC
Created attachment 226558 [details]
awk on FreeBSD 12, 13 printed lines in reverse order while iterating over array index

While trying to print lines with distinct second field in a line to keep only the latest lines with 3 fields, the for-loop goes over the array variable in reverse order surprisingly ...

  cat <<_LINE | /usr/bin/awk '{ line[$2] = $0 } END { for ( i in line ) { print line[i] } }'
  10:09:58   18.1T   pool
  10:09:59   18.1T   pool
  10:43:45   18.2T   pool
  10:43:46   18.2T   pool
  _LINE
  10:43:46   18.2T   pool
  10:09:59   18.1T   pool


... I found this behaviour of /usr/bin/awk both on FreeBSD 12.2-RELEASE-p7 & 13-STABLE c 20210628.

OTOH, there was no such unexpected surprise from lang/gawk 5.1.0, which printed ...

  10:09:59   18.1T   pool
  10:43:46   18.2T   pool


Attached is the test shell script.
Comment 1 Warner Losh freebsd_committer freebsd_triage 2021-07-19 23:09:25 UTC
I'll take a closer look, but my first, quick reading of the standard, I see
> in the case of:
> for (variable in array)
> which shall iterate, assigning each index of array to variable in an unspecified order.
Comment 2 Warner Losh freebsd_committer freebsd_triage 2021-07-20 01:23:05 UTC
From the gawk manual on the for (i in array) construct:

"The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within awk and in standard awk cannot be controlled or changed. This can lead to problems if new elements are added to array by statements in the loop body; it is not predictable whether the for loop will reach them. Similarly, changing var inside the loop may produce strange results. It is best to avoid such things."

So I think this is a "not a bug" situation. I'll let the originator offer a dissenting view before closing, however.
Comment 3 parv 2021-07-20 03:49:28 UTC
Right. I had suspected that could be one possibility[0]. I had failed to find any blurb related to order of array elements in awk(1) manual page.

I had not read the gawk(1) manual page, obviously, as it seemed to have produced desired result at the time. Further testing showed gawk also not producing the ordered output.


I will mark this PR to be closed: not a bug.


0- Other rare possibility, by chance, could have been locale interaction; but setting LANG & LC_ALL to "C", & rest to unset, still produced the same result.
Comment 4 parv 2021-07-20 04:07:55 UTC
The text quoted by Warner L about the order of elements is not in gawk(1) manual page , but is present in "info" page ...

  https://www.gnu.org/software/gawk/manual/gawk.html

... gawk(1) manual page does mention that order could be set by setting 'PROCINFO["sorted_in"]'.

I, at this point, would rather pass the output to sort(1); or, not use (any) awk but use perl or python.