Bug 255551 - FreeBSD sort much slower than GNU or NetBSD
Summary: FreeBSD sort much slower than GNU or NetBSD
Status: Open
Alias: None
Product: Base System
Classification: Unclassified
Component: bin (show other bugs)
Version: 12.2-STABLE
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-05-02 17:30 UTC by Jason W. Bacon
Modified: 2021-05-12 20:28 UTC (History)
4 users (show)

See Also:


Attachments
Column 2 from the original BED (TSV) file (941.07 KB, application/x-xz)
2021-05-03 23:09 UTC, Jason W. Bacon
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jason W. Bacon freebsd_committer 2021-05-02 17:30:39 UTC
This isn't really a problem, since FreeBSD's sort command is more than fast enough for my purposes, but it seems worth reporting in case it exposes an undiscovered problem that might be easy to fix.

All tests were run on the same machine, a ThinkCenter i5 with 8G RAM.  NetBSD running under VirtualBox.

NetBSD netbsd9.acadix  bacon ~/Prog/Src/peak-classifier 435: (pkgsrc): time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
1.409u 2.548s 0:03.95 99.7%     0+0k 0+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1005: time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
24.574u 0.795s 0:25.37 99.9%	55+172k 0+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1006: time gsort --parallel=1 -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
3.005u 0.149s 0:03.16 99.3%	152+177k 6+0io 3pf+0w

I generated a simpler file using just column 2 from the bed file and ran a simple sort -n on it (no -k flags).  Results were similar, so the performance difference is not limited to multicolumn sorting.

Results are the same on 12.2 and 13.0.
Comment 1 Mark Johnston freebsd_committer 2021-05-03 17:39:12 UTC
Can you share the data that you're using for the comparison, or something with equivalent properties?
Comment 2 Jason W. Bacon freebsd_committer 2021-05-03 23:09:21 UTC
Created attachment 224652 [details]
Column 2 from the original BED (TSV) file

time sort -n test.txt > /dev/null
3.004u 0.063s 0:03.06 100.0%	55+172k 0+0io 0pf+0w

time gsort --parallel 1 -n test.txt > /dev/null
0.628u 0.047s 0:00.67 98.5%	156+182k 0+0io 0pf+0w
Comment 3 cyril 2021-05-11 17:00:19 UTC
You may be interested in the patch here which shows an optimization that speeds up sort by a significant amount, though not equivalently to the speed of gsort: https://reviews.freebsd.org/D30170. 

The flags --qsort, --mergesort may also be of interest, since they are faster than heapsort, the default algorithm used by sort.
Comment 4 Jason W. Bacon freebsd_committer 2021-05-12 13:12:21 UTC
(In reply to cyril from comment #3)

The patch about cuts the time in half for my data.

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1018: time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
12.361u 0.794s 0:13.16 99.9%	55+171k 0+0io 0pf+0w
Comment 5 Mark Johnston freebsd_committer 2021-05-12 13:25:44 UTC
(In reply to Jason W. Bacon from comment #4)
It'd be interesting to know whether --mergesort gives a further improvement.
Comment 6 Jason W. Bacon freebsd_committer 2021-05-12 20:28:06 UTC
(In reply to Mark Johnston from comment #5)

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1004: !972
time sort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
12.467u 0.906s 0:13.38 99.8%	55+171k 634+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1005: time sort --qsort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
11.188u 0.811s 0:12.00 99.9%	55+171k 0+0io 0pf+0w

FreeBSD coral.acadix  bacon ~/Prog/Src/peak-classifier 1006: time sort --mergesort -n -k 1 -k 2 -k 3 pc-gff-stripped.bed > /dev/null
8.271u 0.857s 0:09.13 99.8%	55+171k 0+0io 0pf+0w