Bug 255551

Summary: FreeBSD sort much slower than GNU or NetBSD
Product: Base System Reporter: Jason W. Bacon <jwb>
Component: binAssignee: Mark Johnston <markj>
Status: Closed FIXED    
Severity: Affects Only Me CC: chris, cyril, emaste, markj
Priority: ---    
Version: 12.2-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
Column 2 from the original BED (TSV) file none

Description Jason W. Bacon freebsd_committer 2021-05-02 17:30:39 UTC
This isn't really a problem, since FreeBSD's sort command is more than fast enough for my purposes, but it seems worth reporting in case it exposes an undiscovered problem that might be easy to fix.

All tests were run on the same machine, a ThinkCenter i5 with 8G RAM.  NetBSD running under VirtualBox.

NetBSD netbsd9.acadix  bacon ~/Prog/Src/peak-classifier 435: (pkgsrc): time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
1.409u 2.548s 0:03.95 99.7%     0+0k 0+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1005: time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
24.574u 0.795s 0:25.37 99.9%	55+172k 0+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1006: time gsort --parallel=1 -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
3.005u 0.149s 0:03.16 99.3%	152+177k 6+0io 3pf+0w

I generated a simpler file using just column 2 from the bed file and ran a simple sort -n on it (no -k flags).  Results were similar, so the performance difference is not limited to multicolumn sorting.

Results are the same on 12.2 and 13.0.
Comment 1 Mark Johnston freebsd_committer 2021-05-03 17:39:12 UTC
Can you share the data that you're using for the comparison, or something with equivalent properties?
Comment 2 Jason W. Bacon freebsd_committer 2021-05-03 23:09:21 UTC
Created attachment 224652 [details]
Column 2 from the original BED (TSV) file

time sort -n test.txt > /dev/null
3.004u 0.063s 0:03.06 100.0%	55+172k 0+0io 0pf+0w

time gsort --parallel 1 -n test.txt > /dev/null
0.628u 0.047s 0:00.67 98.5%	156+182k 0+0io 0pf+0w
Comment 3 cyril 2021-05-11 17:00:19 UTC
You may be interested in the patch here which shows an optimization that speeds up sort by a significant amount, though not equivalently to the speed of gsort: https://reviews.freebsd.org/D30170. 

The flags --qsort, --mergesort may also be of interest, since they are faster than heapsort, the default algorithm used by sort.
Comment 4 Jason W. Bacon freebsd_committer 2021-05-12 13:12:21 UTC
(In reply to cyril from comment #3)

The patch about cuts the time in half for my data.

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1018: time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
12.361u 0.794s 0:13.16 99.9%	55+171k 0+0io 0pf+0w
Comment 5 Mark Johnston freebsd_committer 2021-05-12 13:25:44 UTC
(In reply to Jason W. Bacon from comment #4)
It'd be interesting to know whether --mergesort gives a further improvement.
Comment 6 Jason W. Bacon freebsd_committer 2021-05-12 20:28:06 UTC
(In reply to Mark Johnston from comment #5)

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1004: !972
time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
12.467u 0.906s 0:13.38 99.8%	55+171k 634+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1005: time sort --qsort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
11.188u 0.811s 0:12.00 99.9%	55+171k 0+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1006: time sort --mergesort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
8.271u 0.857s 0:09.13 99.8%	55+171k 0+0io 0pf+0w
Comment 7 commit-hook freebsd_committer 2021-05-13 13:35:55 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=71ec05a21257e159f40d54e26ad0011bb19b5134

commit 71ec05a21257e159f40d54e26ad0011bb19b5134
Author:     Cyril Zhang <cyril@freebsdfoundation.org>
AuthorDate: 2021-05-13 12:55:06 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-05-13 13:33:19 +0000

    sort: Cache value of MB_CUR_MAX

    Every usage of MB_CUR_MAX results in a call to __mb_cur_max.  This is
    inefficient and redundant.  Caching the value of MB_CUR_MAX in a global
    variable removes these calls and speeds up the runtime of sort.  For
    numeric sorting, runtime is almost halved in some tests.

    PR:             255551
    PR:             255840
    Reviewed by:    markj
    MFC after:      1 week
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D30170

 usr.bin/sort/bwstring.c  | 54 ++++++++++++++++++++++++------------------------
 usr.bin/sort/bwstring.h  |  9 ++++----
 usr.bin/sort/radixsort.c |  4 ++--
 usr.bin/sort/sort.c      |  6 +++++-
 usr.bin/sort/sort.h      |  6 ++++++
 5 files changed, 45 insertions(+), 34 deletions(-)
Comment 8 Mark Johnston freebsd_committer 2021-05-17 14:44:19 UTC
We are looking at changing the default to mergesort.  One other thing that will help a lot is to set LC_ALL=C in the sort(1) invocation.  In this case sort(1) will use radix sort and will assume that each character occupies a single byte, which makes the comparison function much faster.
Comment 9 Jason W. Bacon freebsd_committer 2021-05-17 16:51:06 UTC
sh -c  ./sort-test.sh 
+ time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed
       13.28 real        12.32 user         0.95 sys
+ time gsort '--parallel=1' -n -k 1 -k 2 -k 3 pc-gff-stripped.bed
        3.05 real         2.97 user         0.08 sys
+ export 'LC_ALL=C'
+ time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed
        7.64 real         7.30 user         0.33 sys
+ time sort --mergesort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed
        4.02 real         3.61 user         0.41 sys
+ time gsort '--parallel=1' -n -k 1 -k 2 -k 3 pc-gff-stripped.bed
        2.40 real         2.29 user         0.11 sys
Comment 10 commit-hook freebsd_committer 2021-05-20 14:11:02 UTC
A commit in branch stable/13 references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=df40dcbf7c794f5448c13e23670466658a620933

commit df40dcbf7c794f5448c13e23670466658a620933
Author:     Cyril Zhang <cyril@freebsdfoundation.org>
AuthorDate: 2021-05-13 12:55:06 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-05-20 13:15:43 +0000

    sort: Cache value of MB_CUR_MAX

    Every usage of MB_CUR_MAX results in a call to __mb_cur_max.  This is
    inefficient and redundant.  Caching the value of MB_CUR_MAX in a global
    variable removes these calls and speeds up the runtime of sort.  For
    numeric sorting, runtime is almost halved in some tests.

    PR:             255551
    PR:             255840
    Reviewed by:    markj
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D30170

    (cherry picked from commit 71ec05a21257e159f40d54e26ad0011bb19b5134)

 usr.bin/sort/bwstring.c  | 54 ++++++++++++++++++++++++------------------------
 usr.bin/sort/bwstring.h  |  9 ++++----
 usr.bin/sort/radixsort.c |  4 ++--
 usr.bin/sort/sort.c      |  6 +++++-
 usr.bin/sort/sort.h      |  6 ++++++
 5 files changed, 45 insertions(+), 34 deletions(-)
Comment 11 commit-hook freebsd_committer 2021-06-17 17:54:13 UTC
A commit in branch main references this bug:

URL: https://cgit.FreeBSD.org/src/commit/?id=68d3790ba0bce162f9fcaed09cfecd9adeab3943

commit 68d3790ba0bce162f9fcaed09cfecd9adeab3943
Author:     Cyril Zhang <cyril@freebsdfoundation.org>
AuthorDate: 2021-06-17 17:40:16 +0000
Commit:     Mark Johnston <markj@FreeBSD.org>
CommitDate: 2021-06-17 17:53:03 +0000

    sort: Change default algorithm to mergesort

    This results in a significant improvement in the runtime of sort(1) when
    radix sort cannot be used.  This comes at the expense of increased
    memory usage, but this is small relative to sort's overall memory usage.

    PR:             255551
    Reviewed by:    markj
    Sponsored by:   The FreeBSD Foundation
    Differential Revision:  https://reviews.freebsd.org/D30319

 usr.bin/sort/file.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
Comment 12 Mark Johnston freebsd_committer 2021-09-14 12:59:11 UTC
I'm going to close this one now that PR 255840 is fixed as well.  On main we're still not as fast as GNU sort, but the difference is much smaller:

markj@nuc> time sort -n test.txt > /dev/null

real    0m0.925s
user    0m0.875s
sys     0m0.050s

markj@nuc> time LC_ALL=C sort -n test.txt > /dev/null

real    0m0.782s
user    0m0.738s
sys     0m0.033s

markj@nuc> time gsort --parallel 1 -n test.txt > /dev/null

real    0m0.423s
user    0m0.380s
sys     0m0.041s

Per
> This isn't really a problem, since FreeBSD's sort command is more than fast enough for my purposes, but it seems worth reporting in case it exposes an undiscovered problem that might be easy to fix.

I think this PR has achieved its goals. :)
Please re-open if you disagree or have suggestions for further improvements.
Comment 13 Jason W. Bacon freebsd_committer 2021-09-17 17:46:13 UTC
That's a huge improvement, thanks!  Even less motivation to bother specifying gsort now.