Summary: | diff: -rq still does full file comparison | ||||||
---|---|---|---|---|---|---|---|
Product: | Base System | Reporter: | ml | ||||
Component: | bin | Assignee: | Mark Johnston <markj> | ||||
Status: | Closed FIXED | ||||||
Severity: | Affects Some People | CC: | fnoyanisi, markj | ||||
Priority: | --- | Keywords: | performance, regression | ||||
Version: | 12.1-RELEASE | Flags: | koobs:
mfc-stable12+
|
||||
Hardware: | Any | ||||||
OS: | Any | ||||||
URL: | https://reviews.freebsd.org/D23152 | ||||||
See Also: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=252515 | ||||||
Attachments: |
|
Description
ml
2019-12-23 10:55:20 UTC
It appears BSD diff keeps working on the the files even after it knows whether the files are identical or not. In diffreg(), which is called by diff() we have [1] switch (files_differ(f1, f2, flags)) { case 0: goto closem; case 1: break; default: /* error */ status |= 2; goto closem; } files_differ() returns 0 when the files are identical and 1 when they are different. For the cases when files_differ() returns 1, instead of just a single break statement, we can add a check for the command line arguments and if '-q' supplied, i.e. D_BRIEF, then we should return from diffred() (after doing some housekeeping if necessary - or add another 'goto' label at the end of the function). Instead, BSD diff carries on and hashes each line in both files, sorts them and do more checks, hence taking more time. If this sounds reasonable, I can work on a patch. [1] http://src.illumos.org/source/xref/freebsd-head/usr.bin/diff/diffreg.c#341 I think adding one _if_ check to the case where the files differ would be sufficient; switch (files_differ(f1, f2, flags)) { case 0: goto closem; case 1: /* if brief output is needed, just return to the caller */ if (diff_format == D_BRIEF) { rval = D_DIFFER; goto closem; } break; default: /* error */ status |= 2; goto closem; } with this change, when -q or --brief is used, diff(1) will just print a message without doing any further work (In reply to fehmi noyan isi from comment #2) This looks reasonable to me. Can you verify that the tests in /usr/tests/usr.bin/diff pass with your change applied? Did you verify that diff exits with the correct status in all cases? Created attachment 210686 [details] patch for the diff(1) -q In addition to the code I gave in my previous comment, the "status" variable needs to be updated in order to set the proper exit status. The attached patch seems to work and the patched diff(1) utility passes all the tests. root@patch:/usr/src/usr.bin/diff # kyua test -k /usr/tests/Kyuafile usr.bin/diff usr.bin/diff/diff_test:Bflag -> passed [0.071s] usr.bin/diff/diff_test:b230049 -> passed [0.020s] usr.bin/diff/diff_test:brief_format -> passed [0.075s] usr.bin/diff/diff_test:group_format -> passed [0.020s] usr.bin/diff/diff_test:header -> passed [0.026s] usr.bin/diff/diff_test:header_ns -> passed [0.023s] usr.bin/diff/diff_test:ifdef -> passed [0.022s] usr.bin/diff/diff_test:side_by_side -> expected_failure: --side-by-side not currently implemented (bug # 219933): atf-check failed; see the output of the test for details [0.033s] usr.bin/diff/diff_test:simple -> passed [0.086s] usr.bin/diff/diff_test:unified -> passed [0.038s] usr.bin/diff/netbsd_diff_test:mallocv -> passed [0.024s] usr.bin/diff/netbsd_diff_test:nomallocv -> passed [0.023s] usr.bin/diff/netbsd_diff_test:same -> passed [0.024s] Results file id is usr_tests.20200113-172221-636074 Results saved to /root/.kyua/store/results.usr_tests.20200113-172221-636074.db 13/13 passed (0 failed) root@patch:/usr/src/usr.bin/diff # The patch file was generated with the command below root@patch:~ # diff -u diffreg.c.orig diffreg.c > diffreg.c.patch root@patch:~ # cat diffreg.c.patch --- diffreg.c.orig 2020-01-14 03:16:23.048362000 +1300 +++ diffreg.c 2020-01-14 05:49:28.212503000 +1300 @@ -342,6 +342,12 @@ case 0: goto closem; case 1: + /* if brief output is needed, just return to the caller */ + if (diff_format == D_BRIEF) { + status |= 1; + rval = D_DIFFER; + goto closem; + } break; default: /* error */ root@patch:~ # uname -a FreeBSD patch 13.0-CURRENT FreeBSD 13.0-CURRENT #0 r356528: Thu Jan 9 04:56:46 UTC 2020 root@releng1.nyi.freebsd.org:/usr/obj/usr/src/amd64.amd64/sys/GENERIC amd64 It would be nice to do some performance benchmarks given unexpectedly long running time was the driver for the bug report. (In reply to fehmi noyan isi from comment #4) Thanks! I will work on getting this committed. A commit references this bug: Author: markj Date: Mon Jan 13 18:29:48 UTC 2020 New revision: 356695 URL: https://svnweb.freebsd.org/changeset/base/356695 Log: Optimize diff -q. Once we know whether the files differ, we don't need to do any further work. PR: 242828 Submitted by: fehmi noyan isi <fnoyanisi@yahoo.com> (original version) Reviewed by: bapt, kevans MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D23152 Changes: head/usr.bin/diff/diffreg.c A commit references this bug: Author: markj Date: Mon Jan 20 01:38:03 UTC 2020 New revision: 356903 URL: https://svnweb.freebsd.org/changeset/base/356903 Log: MFC r356695, r356731: Optimize diff -q. PR: 242828 Changes: _U stable/12/ stable/12/usr.bin/diff/diffreg.c |