I am enclosing a uni-diff file against strptime.c that adds support for the %n, %t, %U, and %W specifiers. The diff file also contains two simple test cases that I wrote to test various aspects of the new (and old) strptime code. This directory contains a modified copy of strptime.c (based on version 1.35 of the FreeBSD strptime.c). I wrote these changes based on my own knowledge and research and I hereby donate them to the FreeBSD community. I claim no copyright or other privileges to these changes. If you find them useful, that's great. If you decide you don't need or want them, well, that's your decision and I'll understand. Write me if you have any questions. I have added support for the following POSIX-2001 features: 1. The skip whitespace directives "%n" and "%t". The POSIX-2001 standard says that the %n and %t directives skip any white space. 2. The week-of-the-year directives "%U" and "%W". The existing FreeBSD strptime code recognizes both directives and validates that the week number lies in the permitted range, but then simply discards the value. The POSIX-2001 standard is vague on how these directives should be handled. Give the limited knowledge that this implementation of strptime has about the nature of the string it is parsing (in particular, it has no knowledge of whether or not any given field in struct tm has been defined), I thought it wise to depend only upon the absolute minimum amount of information. Just as the "%p" directive requires that the string contain an preceding hour value, my implementation of "%U" and "%W" requires that the string contains a preceding year value. It then calculates the date of the first Sunday (for %U) or first Monday (for %W), and store the appropriate values in struct tm. If the strptime.c program was smarter about which fields have previously appeared, one could imagine more complicated definitions for the handling of these directives. I did not feel that these directives were worthy of adding a great deal more complexity to strptime.c. 3. The algorithm to compute the first weekday of a year comes from Wikipedia. The URL is in the source code. I simplified the formula because I only need to calculate the weekday that January 1 falls on. 4. I have enclosed two self-tests. The "t_first_wday.c" test performs tests on a range of years that includes both leap years and non-leap years. Essentially, it confirms that the formula to compute the first weekday of a year is correct. The "t_strptime.c" test performs basic tests on strptime itself. It ensures that the conversion specifiers are implemented. It performs a test of selected capabilities of strptime; it is not meant to be an exhaustive test. FreeBSD Documentation Update The existing man page for strptime contains the following text: The %U and %W format specifiers accept any value within the range 00 to 53 without validating against other values supplied (like month or day of the year, for example). Please replace that text with the following text: The %U and %W format specifiers accept a two-digit decimal value in the range 0 to 53. A leading zero is permitted but not required. The tm_year member of 'struct tm' must be set, either by using one of the specifiers that sets the year, or by initializing the tm_year member before calling strptime. Week 1 refers to the first Sunday (for %U) or first Monday (for %W) of the year. Week 0 is equivalent to specifying week 1, and specifying week 53 may result in rolling over to a date in the following year, in which case the tm_year member is incremented. The tm_yday, tm_mon, tm_mday, and tm_wday members are set to the values for the Sunday (for %U) or Monday for %W) of the specified week of the given year. Fix: See patch file. Patch attached with submission follows: How-To-Repeat: Run the enclosed test case on an unmodified copy of strptime. The %U and %W specifiers are recognized but do not assign any of the elements of the tm structure.
Created attachment 144894 [details] Updated libc patch Thank you for contributing! The file has, of course, changed a lot. Still the support for %W and %U is very useful. We already have an isleap() macro in a header nearby that can be reused, for the rest I only did some style(9) fixes. I am not very good with the documentation so my change there is pretty minimal. I haven't tested it yet but I will. Thanks!
(In reply to Pedro F. Giffuni from comment #1) ... > I haven't tested it yet but I will. > Running test from: http://www.scs.stanford.edu/histar/src/pkg/uclibc/test/time/tst-strptime.c Results before the patch: ... strptime ("2001 20 Mon", "%Y %U %a", ...) should be: wday = 1, yday = 140, mon = 4, mday = 21 is: wday = 1, yday = 0, mon = 0, mday = 0 yearday for `2001 20 Mon' incorrect: 0 instead of 140 month for `2001 20 Mon' incorrect: 0 instead of 4 monthday for `2001 20 Mon' incorrect: 0 instead of 21 strptime ("2001 21 Mon", "%Y %W %a", ...) should be: wday = 1, yday = 140, mon = 4, mday = 21 is: wday = 1, yday = 0, mon = 0, mday = 0 yearday for `2001 21 Mon' incorrect: 0 instead of 140 month for `2001 21 Mon' incorrect: 0 instead of 4 monthday for `2001 21 Mon' incorrect: 0 instead of 21 Results after the patch: ... strptime ("2001 20 Mon", "%Y %U %a", ...) should be: wday = 1, yday = 140, mon = 4, mday = 21 is: wday = 1, yday = 139, mon = 4, mday = 20 yearday for `2001 20 Mon' incorrect: 139 instead of 140 monthday for `2001 20 Mon' incorrect: 20 instead of 21 strptime ("2001 21 Mon", "%Y %W %a", ...) should be: wday = 1, yday = 140, mon = 4, mday = 21 is: wday = 1, yday = 140, mon = 4, mday = 21 ___ So there may be a technical mismatch (GNU libc manages to reinterpret standards in amusing ways) for the %U case but the result is clearly a gain.
Created attachment 145108 [details] Updated patch with fixes for the %U case I am include some fixes from ache@. With this we pass %W and %U tests. We still have long standing issues with wday and yday on other cases from the tests though.
Created attachment 145481 [details] Updated patch Small update to use tz more constants instead of magic numbers.
Created attachment 145482 [details] Updated patch Ooops.. drop unrelated change.
I haven't had much time to work on this lately so return it to the pool.
Created attachment 147573 [details] Patch completed David Carlier (david.carlier at hardenedbsd) completed the original patch to handle tm_wday/tm_yday This should obsolete the previous patches except for the testcases.
The patch passes my tests.
A commit references this bug: Author: pfg Date: Thu Sep 25 18:52:18 UTC 2014 New revision: 272122 URL: http://svnweb.freebsd.org/changeset/base/272122 Log: Add strptime(3) support for %U and %W Add support for the missing POSIX-2001 %U and %W features: the existing FreeBSD strptime code recognizes both directives and validates that the week number lies in the permitted range, but then simply discards the value. Initial support for the feature was written by Paul Green with important fixes by Andrey Chernov. Additional support for handling tm_wday/tm_yday was written by David Carlier. PR: 137307 MFC after: 1 month Changes: head/lib/libc/stdtime/strptime.c
A commit references this bug: Author: pfg Date: Thu Sep 25 23:04:38 UTC 2014 New revision: 272146 URL: http://svnweb.freebsd.org/changeset/base/272146 Log: Revert r272122 The patch still needs to be more robust and it broke the build on MIPS so revert it for now while all the issues are fixed. Reported by: ache, davide PR: 137307 Changes: head/lib/libc/stdtime/strptime.c
Created attachment 147680 [details] WIP patch I updated the patch with some fixes. ____ While here, bring some notes from ache@ : The only place looking suspicious is that: case 'U': @@ -313,6 +372,8 @@ * information present in the tm structure at this * point to calculate a real value, so just check the * range for now. + * We expect that the year has already been + * parsed. */ if (!isdigit_l((unsigned char)*buf, locale)) return (NULL); @@ -327,6 +388,45 @@ We cant assume that year is parsed, and tm_year used below will contain garbage if not. I suggest to move whole %U/W additional parameters calculations code starting from + /* Week numbers are l-origin. So that we can always + * return the date of a Sunday (or Monday), treat week + * 0 as week 1. + */ to the end of the parsing loop, remembering temporary 'i' variable under more global scope 'week_number'. It will allow to handle both normal %Y %U and even %U %Y combinations.
+ if (flags & (FLAG_YEAR | FLAG_MONTH)) { + if (!(flags & FLAG_YDAY) && (flags & FLAG_MDAY)) + tm->tm_yday = start_of_month[isleap(tm->tm_year + + TM_YEAR_BASE)][tm->tm_mon] + (tm->tm_mday - 1); + if (!(flags & FLAG_WDAY)) { + i = 0; + wday_offset = first_wday_of(tm->tm_year); + while (i++ <= tm->tm_yday) + if (wday_offset++ >= 6) + wday_offset = 0; + + tm->tm_wday = wday_offset; + } + } I just notice a lot of use without being initialize places. This code should be rewritten: if (!(flags & FLAG_YDAY) && (flags & (FLAG_YEAR | FLAG_MONTH | FLAG_MDAY)) == (FLAG_YEAR | FLAG_MONTH | FLAG_MDAY)) { tm->tm_yday = start_of_month[isleap(tm->tm_year + TM_YEAR_BASE)][tm->tm_mon] + (tm->tm_mday - 1); flags |= FLAG_YDAY; } if (!(flags & FLAG_WDAY) && (flags & (FLAG_YEAR | FLAG_YDAY)) == (FLAG_YEAR | FLAG_YDAY)) { i = 0; wday_offset = first_wday_of(tm->tm_year); while (i++ <= tm->tm_yday) { if (wday_offset++ >= 6) wday_offset = 0; } tm->tm_wday = wday_offset; flags |= FLAG_WDAY; }
Created attachment 147740 [details] Much cleaner version (proposed by ache@) Commit Candidate
A commit references this bug: Author: pfg Date: Sun Sep 28 21:20:21 UTC 2014 New revision: 272273 URL: https://svnweb.freebsd.org/changeset/base/272273 Log: Add strptime(3) support for %U and %W (take 2) Add support for the missing POSIX-2001 %U and %W features: the existing FreeBSD strptime code recognizes both directives and validates that the week number lies in the permitted range, but then simply discards the value. Initial support for the feature was written by Paul Green. David Carlier added the initial handling of tm_wday/tm_yday. Major credit goes to Andrey Chernov for detecting much of the brokenness, and rewriting/cleaning most of the code, making it much more robust. Tested independently with the strptime test from the GNU C library. PR: 137307 MFC after: 1 month Relnotes: yes Changes: head/lib/libc/stdtime/strptime.c
comment from antoine@ : Hi, It seems this change breaks some ports, so please no MFC until this is fixed: http://gohan2.ysv.freebsd.org/data/head-amd64-default-baseline/p369565_s272290/logs/errors/latrine-1.0.0_1.log http://gohan2.ysv.freebsd.org/data/head-amd64-default-baseline/p369565_s272290/logs/errors/mongrel2-1.7.5_2.log http://gohan2.ysv.freebsd.org/data/head-amd64-default-baseline/p369565_s272290/logs/errors/deforaos-mailer-0.1.6_1.log Cheers, Antoine (portmgr hat on)
A commit references this bug: Author: pfg Date: Wed Oct 1 22:18:07 UTC 2014 New revision: 272387 URL: https://svnweb.freebsd.org/changeset/base/272387 Log: strptime: fix bug introduced in r272273. Reported by: portmgr (antoine) Fix by: Andrey Chernov, David Carlier PR: 137307 (follow up) Changes: head/lib/libc/stdtime/strptime.c
A commit references this bug: Author: pfg Date: Wed Oct 8 16:29:47 UTC 2014 New revision: 272758 URL: https://svnweb.freebsd.org/changeset/base/272758 Log: MFC r272273, r272387, r272443, r272533 : Add strptime(3) support for %U and %W Add support for the missing POSIX-2001 %U and %W features: the existing FreeBSD strptime code recognizes both directives and validates that the week number lies in the permitted range, but then simply discards the value. Initial support for the feature was written by Paul Green. David Carlier added the initial handling of tm_wday/tm_yday. Major credit goes to Andrey Chernov for detecting much of the brokenness and rewriting/cleaning most of the code, making it much more robust. Tested independently with the strptime test from the GNU C library. PR: 137307 Relnotes: yes MFC r272441 : strptime: %s format fix. Almost never needed in real life because %s is tends to be only one format spec. 1) Return code of gmtime_r() is checked. 2) All flags are set. Submitted by: ache Changes: _U stable/10/ stable/10/lib/libc/stdtime/strptime.3 stable/10/lib/libc/stdtime/strptime.c
A commit references this bug: Author: pfg Date: Wed Oct 8 16:32:02 UTC 2014 New revision: 272759 URL: https://svnweb.freebsd.org/changeset/base/272759 Log: MFC r272273, r272387, r272443, r272533 : Add strptime(3) support for %U and %W Add support for the missing POSIX-2001 %U and %W features: the existing FreeBSD strptime code recognizes both directives and validates that the week number lies in the permitted range, but then simply discards the value. Initial support for the feature was written by Paul Green. David Carlier added the initial handling of tm_wday/tm_yday. Major credit goes to Andrey Chernov for detecting much of the brokenness and rewriting/cleaning most of the code, making it much more robust. Tested independently with the strptime test from the GNU C library. PR: 137307 Relnotes: yes MFC r272441 : strptime: %s format fix. Almost never needed in real life because %s is tends to be only one format spec. 1) Return code of gmtime_r() is checked. 2) All flags are set. Submitted by: ache Changes: _U stable/9/lib/libc/ _U stable/9/lib/libc/stdtime/ stable/9/lib/libc/stdtime/strptime.3 stable/9/lib/libc/stdtime/strptime.c
Committed and MFC'd up to stable/10 and stable/9.