To reproduce: printf '%s\n' a b c d e f | sed ' /a/,/b/c\ x $!N ' Expected output: x c d e f Actual output:
I see no way to justify your "expected" output from the specification. (I also can't justify the actual output, but it deviates less from the spec than your "expected" output.) In particular, the line "b" is read using N and deleted without ever being seen by the /a/,/b/c command, and therefore the replacement "x" should never be emitted. My reading of the spec is that the "b", "d", "f" lines should be output, but I see no reading of the spec that allows the output of "c" and "e".
(In reply to Andrew "RhodiumToad" Gierth from comment #1) The expected output matches GNU sed. I'm not sure why you mention deletion, I'm not using any delete command. The first command changes the text in the range from a to b, and replaces it with x. The second command appends the next line, and since we're not doing anything fancy with \n in the pattern space, it should be a no-op. Therefore, the output should be identical to that of the same script without $!N.
(In reply to Mohamed Akram from comment #2) All of the pattern space (embedded newlines and all) is deleted at the end of each cycle (after being output if appropriate). The fact the GNU sed violates the spec is not our concern. Your description of the "c" command is not what the spec says. The spec says that with 2 addresses, "c" deletes the pattern space if the line is in the addressed range, and emits the replacement text if and only if the last line of the range is addressed. Since the "b" line is consumed by N and is never in the pattern space at the start of the cycle, the /a/,/b/ range never sees it, so the range extends to the last line of the file; but since you also read the last line of the file by doing N on the second-last line, the last line is also never processed by the "c" command so the replacement is never output. (You seem to be assuming that the use of N does not affect addresses; that's not what the spec says.)
(In reply to Andrew "RhodiumToad" Gierth from comment #3) Per the spec: > The sed utility shall then apply in sequence all commands whose addresses select that pattern space, until a command starts the next cycle or quits. For the c command: > Delete the pattern space. With a 0 or 1 address or at the end of a 2-address range, place text on the output and start the next cycle. So, once the c command is executed, the next cycle is started and N is not executed.
(In reply to Mohamed Akram from comment #4) This is the sequence of events according to the spec, as I read it: 1. Read line "a" into the pattern space. 2. Execute the first command: /a/ matches, so we begin an addressed range "c" deletes the pattern space (but does not emit anything and does not start the next cycle) 3. Execute the second command: $ does not match ! inverts the match N is executed, which appends "\nb" to the (deleted) pattern space 4. By my reading of the spec, the "\nb" should be output at this point. For whatever reason, BSD sed does not do that. 5. The pattern space is deleted (as this is the end of the cycle) 6. Read line "c" into the pattern space. 7. Execute the first command: /b/ does not match, so we are still in an addressed range "c" deletes the pattern space (but does not emit anything and does not start the next cycle) 8. Execute the second command: $ does not match ! inverts the match N is executed, which appends "\nd" to the (deleted) pattern space 9. as 4. 10. The pattern space is deleted (as this is the end of the cycle) 11. read line "e" into the pattern space. 12. Execute the first command: /b/ does not match, so we are still in an addressed range "c" deletes the pattern space (but does not emit anything and does not start the next cycle) 13. Execute the second command: $ does not match ! inverts the match N is executed, which appends "\nf" to the (deleted) pattern space 14. as 4. 15. The pattern space is deleted (as this is the end of the cycle). 16. There are no more lines so the process ends. Note that neither the last line of input, nor any line containing /b/, was never processed by the "c" command, so it never has a chance to emit the replacement text. You seem to be hung up on /a/,/b/ representing some block of input lines. This is NOT WHAT IT MEANS; it means "start a range when you see a _pattern space_ matching /a/, and end it when you see a pattern space matching /b/". By using N to process some input lines, you prevent them from being seen in the pattern space at the start of the script, which affects how the first command determines its range. Alternatively, you (or GNU sed) may be assuming that "c" starts a new cycle (rather than executing the rest of the script) for every row of a 2-address range, not just the last one. This isn't what the spec actually says (as you quoted yourself), though it might be considered to be more useful or consistent. (I looked for applicable defect reports against the spec, didn't find any.)
(In reply to Andrew "RhodiumToad" Gierth from comment #5) Thank you for this. > Alternatively, you (or GNU sed) may be assuming that "c" starts a new cycle (rather > than executing the rest of the script) for every row of a 2-address range, not just > the last one. This isn't what the spec actually says (as you quoted yourself), > though it might be considered to be more useful or consistent. (I looked for > applicable defect reports against the spec, didn't find any.) I now wonder if other implementations output bdf as expected. Might be worth opening a ticket against the spec to clarify this ambiguity.
(In reply to Mohamed Akram from comment #6) For what it's worth, the failure to output the "b","d","f" lines is because our sed has a "pattern deleted" flag which is set by "c" (and not reset by "N"), which suppresses the output of the pattern space at the end of the cycle. (I haven't looked at GNU sed's logic.)
Tried this with: NetBSD 9.3: same output as GNU OpenBSD 7.3: same output as FreeBSD
(In reply to Mohamed Akram from comment #8) NetBSD reference: https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=45981 That claims that the v7 manual said: Delete the pattern space. With 0 or 1 address or at the end of a 2-address range, place text on the output. Start the next cycle. ... which disagrees with what the spec now says. (As far as I can tell from a quick look, given that it's not the most readable code in the world, the v7 sed does in fact behave as documented in this case.) So, I think this can definitely be argued to be a defect in the spec; you should certainly take it up with them if you care about it.
should we assign this to standards@?
(In reply to Mina Galić from comment #10) Can't think of anything better to do with it.
(In reply to Andrew "RhodiumToad" Gierth from comment #9) Thank you very much for this. I've opened an issue with the Austin Group: https://austingroupbugs.net/view.php?id=1767
Created attachment 244157 [details] sed: fix 'c' command Patch to apply Austin Group resolution to code and manpage.
(In reply to Andrew "RhodiumToad" Gierth from comment #13) Thanks for the patch. Could someone merge it? The new version of the standard is out with the adjusted wording.