Bug 257288 - awk - loop over array index goes in reverse
Summary: awk - loop over array index goes in reverse
Status: Closed Not A Bug
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 13.0-STABLE
Hardware: Any Any
: --- Affects Many People
Assignee: Warner Losh
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-07-19 21:25 UTC by parv
Modified: 2021-07-20 04:07 UTC (History)
2 users (show)

See Also:


Attachments
awk on FreeBSD 12, 13 printed lines in reverse order while iterating over array index (671 bytes, application/x-shellscript)
2021-07-19 21:25 UTC, parv
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description parv 2021-07-19 21:25:43 UTC
Created attachment 226558 [details]
awk on FreeBSD 12, 13 printed lines in reverse order while iterating over array index

While trying to print lines with distinct second field in a line to keep only the latest lines with 3 fields, the for-loop goes over the array variable in reverse order surprisingly ...

  cat <<_LINE | /usr/bin/awk '{ line[$2] = $0 } END { for ( i in line ) { print line[i] } }'
  10:09:58   18.1T   pool
  10:09:59   18.1T   pool
  10:43:45   18.2T   pool
  10:43:46   18.2T   pool
  _LINE
  10:43:46   18.2T   pool
  10:09:59   18.1T   pool


... I found this behaviour of /usr/bin/awk both on FreeBSD 12.2-RELEASE-p7 & 13-STABLE c 20210628.

OTOH, there was no such unexpected surprise from lang/gawk 5.1.0, which printed ...

  10:09:59   18.1T   pool
  10:43:46   18.2T   pool


Attached is the test shell script.
Comment 1 Warner Losh freebsd_committer 2021-07-19 23:09:25 UTC
I'll take a closer look, but my first, quick reading of the standard, I see
> in the case of:
> for (variable in array)
> which shall iterate, assigning each index of array to variable in an unspecified order.
Comment 2 Warner Losh freebsd_committer 2021-07-20 01:23:05 UTC
From the gawk manual on the for (i in array) construct:

"The order in which elements of the array are accessed by this statement is determined by the internal arrangement of the array elements within awk and in standard awk cannot be controlled or changed. This can lead to problems if new elements are added to array by statements in the loop body; it is not predictable whether the for loop will reach them. Similarly, changing var inside the loop may produce strange results. It is best to avoid such things."

So I think this is a "not a bug" situation. I'll let the originator offer a dissenting view before closing, however.
Comment 3 parv 2021-07-20 03:49:28 UTC
Right. I had suspected that could be one possibility[0]. I had failed to find any blurb related to order of array elements in awk(1) manual page.

I had not read the gawk(1) manual page, obviously, as it seemed to have produced desired result at the time. Further testing showed gawk also not producing the ordered output.


I will mark this PR to be closed: not a bug.


0- Other rare possibility, by chance, could have been locale interaction; but setting LANG & LC_ALL to "C", & rest to unset, still produced the same result.
Comment 4 parv 2021-07-20 04:07:55 UTC
The text quoted by Warner L about the order of elements is not in gawk(1) manual page , but is present in "info" page ...

  https://www.gnu.org/software/gawk/manual/gawk.html

... gawk(1) manual page does mention that order could be set by setting 'PROCINFO["sorted_in"]'.

I, at this point, would rather pass the output to sort(1); or, not use (any) awk but use perl or python.