Bug 248123 - [NEW PORT] textproc/edcount: Estimate distinct count of values on the command line
Summary: [NEW PORT] textproc/edcount: Estimate distinct count of values on the command...
Status: Closed Feedback Timeout
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Only Me
Assignee: Daniel Engberg
URL:
Keywords: feature
Depends on:
Blocks:
 
Reported: 2020-07-20 11:00 UTC by Marcel Bischoff
Modified: 2022-10-17 23:39 UTC (History)
1 user (show)

See Also:


Attachments
Patch (2.99 KB, patch)
2020-07-20 11:00 UTC, Marcel Bischoff
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Marcel Bischoff 2020-07-20 11:00:44 UTC
Created attachment 216601 [details]
Patch

Estimate distinct count of values from standard input. Provides a very fast way to perform unique count estimates on the command line.

The edcount program implements HyperLogLog, with some minor modifications, as detailed by by Flajolet et. al. in the paper "HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm".

Additionally, the memory footprint of the program is constant, at a few megabytes. This memory use is constant regardless of the number of records counted, and does not degrade in accuracy.

NOTE: this is my first attempt at a new port from scratch, please be kind.
Comment 1 Kubilay Kocak freebsd_committer freebsd_triage 2020-07-20 12:07:53 UTC
Nice work and congratulations on your first port Marcel!

At a cursory glance it appears good, but the best way is to confirm that ones changes pass QA using our automated tools (portlint, poudriere at least), which pick up many issues

For details and instructions, see: 

https://www.freebsd.org/doc/en/books/porters-handbook/testing.html

If you need help and pointers getting these up and running, jump on #freebsd-ports on freenode IRC where there's plenty of people to support you

We also have a whole bunch of cheatsheets and checklists for Issue Management (and Ports Issues in particular, which are available here: 

https://wiki.freebsd.org/Bugzilla/

If you need any clarity on any of that, #freebsd-bugs or #freebsd-ports again for pointers/clarifications (and we'll up date the docs to improve them)
Comment 2 Marcel Bischoff 2020-07-20 13:34:40 UTC
Oh, I should have mentioned that I have already set up Poudriere to test the build as documented. This really streamlines the testing and gives helpful pointers on what to fix. :)
Comment 3 Kubilay Kocak freebsd_committer freebsd_triage 2020-07-20 14:18:59 UTC
Ahh awesome :)

For future reports, you can mention that explicitly in your description:

portlint: OK (looks fine.)
testport: OK (poudriere: <versions>, <archs>, <OPTIONS> tested)
Comment 4 Marcel Bischoff 2020-07-20 15:50:59 UTC
Okay, so just add those lines to the comment?
Comment 5 Daniel Engberg freebsd_committer freebsd_triage 2022-08-08 22:53:48 UTC
Hi,

Still interested in getting this?
Upstream seems to have moved to https://github.com/joshwalters/edcount/releases/tag/v1.2.0

Do we still need to depend on GCC?

Best regards,
Daniel
Comment 6 Daniel Engberg freebsd_committer freebsd_triage 2022-10-17 23:39:58 UTC
Closing due to submitter timeout, please open a new ticket if this is still of interest