| Summary: | ministat does not calculate proper median value (patch included) | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Base System | Reporter: | Marcus Reid <marcus> | ||||
| Component: | bin | Assignee: | Marcelo Araujo <araujo> | ||||
| Status: | Closed FIXED | ||||||
| Severity: | Affects Many People | CC: | araujo, emaste | ||||
| Priority: | --- | ||||||
| Version: | CURRENT | ||||||
| Hardware: | Any | ||||||
| OS: | Any | ||||||
| Attachments: |
|
||||||
|
Description
Marcus Reid
2015-07-15 07:24:48 UTC
I will take it. Hi, ministat(1) actually is doing in the right way. I made couple tests with other computer languages and they return the same value as ministat returns. As a simple example using two different ways in Python: >>> import statistics >>> items = [1,2,13,4,5,6,7] >>> statistics.median(items) 4 Also I made my own: >>> def middle(L): ... L = sorted(L) ... n = len(L) ... m = n -1 ... return (L[n/2] + L[m/2]) / 2.0 ... >>> print middle(items) 4.0 Ministat result: N Min Max Median Avg Stddev x 7 1 7 4 4 2.1602469 + 7 1 7 4 4 2.1602469 So, I can't see where is the problem. Best, You're not testing for the problem case there. Here, let me demonstrate:
>>> import statistics
>>> items = [1,2,3,4]
>>> statistics.median(items)
2.5
[mreid@sol /usr/home/mreid]$ ministat
1
2
3
4
x <stdin>
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|x x x x|
| |________________________________________________________________________________________________A_____________________________________M__________________________________________________________| |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 4 1 4 3 2.5 1.2909944
As you can see, ministat has a median of 3 whereas the real median is 2.5, as seen in the python example.
It's the case where there is no middle number (an even number of items in the dataset) that is broken.
Thanks!
Thanks for the feedback. Yes, you are right with this case! A commit references this bug: Author: araujo Date: Tue Nov 24 02:30:59 UTC 2015 New revision: 291231 URL: https://svnweb.freebsd.org/changeset/base/291231 Log: Compute the median of the data set as the midpoint between the two middle values when the data set has an even number of elements. PR: 201582 Submitted by: Marcus Reid <marcus@blazingdot.com> Reviewed by: imp Approved by: bapt (mentor) Changes: head/usr.bin/ministat/ministat.c Committed thanks! |