Created attachment 158793 [details] Make ministat return proper value for median From http://www.mathgoodies.com/lessons/vol8/median.html : "The median of a set of data is the middlemost number in the set. The median is also the number that is halfway into the set. To find the median, the data should be arranged in order from least to greatest. If there is an even number of items in the data set, then the median is found by taking the mean (average) of the two middlemost numbers." Ministat currently returns the second of the two middle numbers if there is an even number of entries. This patch takes the middle two and returns their average. *** usr.bin/ministat/ministat.c.orig 2015-07-14 23:49:11.246171000 -0700 --- usr.bin/ministat/ministat.c 2015-07-15 00:16:20.895494000 -0700 *************** *** 193,199 **** Median(struct dataset *ds) { ! return (ds->points[ds->n / 2]); } static double --- 193,200 ---- Median(struct dataset *ds) { ! if(!(ds->n % 2)) return ((ds->points[ds->n / 2]) + (ds->points[(ds->n / 2)-1]))/2; ! else return (ds->points[ds->n / 2]); } static double
I will take it.
Hi, ministat(1) actually is doing in the right way. I made couple tests with other computer languages and they return the same value as ministat returns. As a simple example using two different ways in Python: >>> import statistics >>> items = [1,2,13,4,5,6,7] >>> statistics.median(items) 4 Also I made my own: >>> def middle(L): ... L = sorted(L) ... n = len(L) ... m = n -1 ... return (L[n/2] + L[m/2]) / 2.0 ... >>> print middle(items) 4.0 Ministat result: N Min Max Median Avg Stddev x 7 1 7 4 4 2.1602469 + 7 1 7 4 4 2.1602469 So, I can't see where is the problem. Best,
You're not testing for the problem case there. Here, let me demonstrate: >>> import statistics >>> items = [1,2,3,4] >>> statistics.median(items) 2.5 [mreid@sol /usr/home/mreid]$ ministat 1 2 3 4 x <stdin> +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ |x x x x| | |________________________________________________________________________________________________A_____________________________________M__________________________________________________________| | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ N Min Max Median Avg Stddev x 4 1 4 3 2.5 1.2909944 As you can see, ministat has a median of 3 whereas the real median is 2.5, as seen in the python example. It's the case where there is no middle number (an even number of items in the dataset) that is broken. Thanks!
Thanks for the feedback. Yes, you are right with this case!
A commit references this bug: Author: araujo Date: Tue Nov 24 02:30:59 UTC 2015 New revision: 291231 URL: https://svnweb.freebsd.org/changeset/base/291231 Log: Compute the median of the data set as the midpoint between the two middle values when the data set has an even number of elements. PR: 201582 Submitted by: Marcus Reid <marcus@blazingdot.com> Reviewed by: imp Approved by: bapt (mentor) Changes: head/usr.bin/ministat/ministat.c
Committed thanks!