Bug 137307 - [libc] [patch] Enhance strptime(3) to support %U and %W
Summary: [libc] [patch] Enhance strptime(3) to support %U and %W
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: Unspecified
Hardware: Any Any
: Normal Affects Some People
Assignee: Pedro F. Giffuni
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-07-31 15:20 UTC by Paul Green
Modified: 2014-10-08 16:34 UTC (History)
1 user (show)

See Also:


Attachments
file.diff (15.50 KB, patch)
2009-07-31 15:20 UTC, Paul Green
no flags Details | Diff
t_strptime.diff.txt (2.92 KB, patch)
2009-08-25 17:33 UTC, Paul Green
no flags Details | Diff
Updated libc patch (2.92 KB, patch)
2014-07-22 21:39 UTC, Pedro F. Giffuni
no flags Details | Diff
Updated patch with fixes for the %U case (4.08 KB, patch)
2014-07-29 00:14 UTC, Pedro F. Giffuni
no flags Details | Diff
Updated patch (5.07 KB, patch)
2014-08-07 19:53 UTC, Pedro F. Giffuni
no flags Details | Diff
Updated patch (4.47 KB, patch)
2014-08-07 19:56 UTC, Pedro F. Giffuni
no flags Details | Diff
Patch completed (5.22 KB, patch)
2014-09-22 20:30 UTC, Pedro F. Giffuni
no flags Details | Diff
WIP patch (5.24 KB, patch)
2014-09-25 23:16 UTC, Pedro F. Giffuni
no flags Details | Diff
Much cleaner version (proposed by ache@) (5.61 KB, patch)
2014-09-28 03:50 UTC, Pedro F. Giffuni
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Green 2009-07-31 15:20:01 UTC
I am enclosing a uni-diff file against strptime.c that adds support for the %n, %t, %U, and %W specifiers.  The diff file also contains two simple test cases that I wrote to test various aspects of the new (and old) strptime code.

This directory contains a modified copy of strptime.c (based on
version 1.35 of the FreeBSD strptime.c).  I wrote these changes
based on my own knowledge and research and I hereby donate them
to the FreeBSD community.  I claim no copyright or other
privileges to these changes.  If you find them useful, that's
great.  If you decide you don't need or want them, well, that's
your decision and I'll understand.

Write me if you have any questions.


I have added support for the following POSIX-2001 features:

1. The skip whitespace directives "%n" and "%t".  The POSIX-2001
   standard says that the %n and %t directives skip any white
   space.

2. The week-of-the-year directives "%U" and "%W".  The existing
   FreeBSD strptime code recognizes both directives and
   validates that the week number lies in the permitted range,
   but then simply discards the value.

   The POSIX-2001 standard is vague on how these directives
   should be handled.  Give the limited knowledge that this
   implementation of strptime has about the nature of the string
   it is parsing (in particular, it has no knowledge of whether
   or not any given field in struct tm has been defined), I
   thought it wise to depend only upon the absolute minimum
   amount of information.  Just as the "%p" directive requires
   that the string contain an preceding hour value, my
   implementation of "%U" and "%W" requires that the string
   contains a preceding year value.  It then calculates the date
   of the first Sunday (for %U) or first Monday (for %W), and
   store the appropriate values in struct tm.

   If the strptime.c program was smarter about which fields have
   previously appeared, one could imagine more complicated
   definitions for the handling of these directives.  I did not
   feel that these directives were worthy of adding a great deal
   more complexity to strptime.c.

3. The algorithm to compute the first weekday of a year comes
   from Wikipedia.  The URL is in the source code.  I simplified
   the formula because I only need to calculate the weekday that
   January 1 falls on.

4. I have enclosed two self-tests.

   The "t_first_wday.c" test performs tests on a range of years
   that includes both leap years and non-leap years.
   Essentially, it confirms that the formula to compute the
   first weekday of a year is correct.

   The "t_strptime.c" test performs basic tests on strptime
   itself.  It ensures that the conversion specifiers are
   implemented.  It performs a test of selected capabilities of
   strptime; it is not meant to be an exhaustive test.


FreeBSD Documentation Update

The existing man page for strptime contains the following text:

     The %U and %W format specifiers accept any value within the
     range 00 to 53 without validating against other values
     supplied (like month or day of the year, for example).

Please replace that text with the following text:

     The %U and %W format specifiers accept a two-digit decimal
     value in the range 0 to 53.  A leading zero is permitted
     but not required.  The tm_year member of 'struct tm' must
     be set, either by using one of the specifiers that sets the
     year, or by initializing the tm_year member before calling
     strptime.  Week 1 refers to the first Sunday (for %U) or
     first Monday (for %W) of the year.  Week 0 is equivalent to
     specifying week 1, and specifying week 53 may result in
     rolling over to a date in the following year, in which case
     the tm_year member is incremented.  The tm_yday, tm_mon,
     tm_mday, and tm_wday members are set to the values for the
     Sunday (for %U) or Monday for %W) of the specified week of
     the given year.

Fix: See patch file.

Patch attached with submission follows:
How-To-Repeat: Run the enclosed test case on an unmodified copy of strptime. The %U and %W specifiers are recognized but do not assign any of the elements of the tm structure.
Comment 1 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-07-22 21:39:36 UTC
Created attachment 144894 [details]
Updated libc patch

Thank you for contributing!

The file has, of course, changed a lot. Still the support for %W and %U is very useful.

We already have an isleap() macro in a header nearby that can be reused, for the rest I only did some style(9) fixes. I am not very good with the documentation so my change there is pretty minimal.

I haven't tested it yet but I will.

Thanks!
Comment 2 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-07-22 21:53:44 UTC
(In reply to Pedro F. Giffuni from comment #1)
...
> I haven't tested it yet but I will.
> 

Running test from:
http://www.scs.stanford.edu/histar/src/pkg/uclibc/test/time/tst-strptime.c

Results before the patch:
...
strptime ("2001 20 Mon", "%Y %U %a", ...)
        should be: wday = 1, yday = 140, mon =  4, mday = 21
               is: wday = 1, yday =   0, mon =  0, mday =  0
yearday for `2001 20 Mon' incorrect: 0 instead of 140
month for `2001 20 Mon' incorrect: 0 instead of 4
monthday for `2001 20 Mon' incorrect: 0 instead of 21
strptime ("2001 21 Mon", "%Y %W %a", ...)
        should be: wday = 1, yday = 140, mon =  4, mday = 21
               is: wday = 1, yday =   0, mon =  0, mday =  0
yearday for `2001 21 Mon' incorrect: 0 instead of 140
month for `2001 21 Mon' incorrect: 0 instead of 4
monthday for `2001 21 Mon' incorrect: 0 instead of 21


Results after the patch:
...
strptime ("2001 20 Mon", "%Y %U %a", ...)
        should be: wday = 1, yday = 140, mon =  4, mday = 21
               is: wday = 1, yday = 139, mon =  4, mday = 20
yearday for `2001 20 Mon' incorrect: 139 instead of 140
monthday for `2001 20 Mon' incorrect: 20 instead of 21
strptime ("2001 21 Mon", "%Y %W %a", ...)
        should be: wday = 1, yday = 140, mon =  4, mday = 21
               is: wday = 1, yday = 140, mon =  4, mday = 21
___

So there may be a technical mismatch (GNU libc manages to reinterpret standards in amusing ways) for the %U case but the result is clearly a gain.
Comment 3 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-07-29 00:14:20 UTC
Created attachment 145108 [details]
Updated patch with fixes for the %U case

I am include some fixes from ache@. With this we pass %W and %U tests.

We still have long standing issues with wday and yday on other cases from the tests though.
Comment 4 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-08-07 19:53:30 UTC
Created attachment 145481 [details]
Updated patch

Small update to use tz more constants instead of magic numbers.
Comment 5 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-08-07 19:56:00 UTC
Created attachment 145482 [details]
Updated patch

Ooops.. drop unrelated change.
Comment 6 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-09-21 15:32:44 UTC
I haven't had much time to work on this lately so return it to the pool.
Comment 7 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-09-22 20:30:39 UTC
Created attachment 147573 [details]
Patch completed

David Carlier (david.carlier at hardenedbsd) completed the original patch to handle tm_wday/tm_yday

This should obsolete the previous patches except for the testcases.
Comment 8 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-09-22 20:36:08 UTC
The patch passes my tests.
Comment 9 commit-hook freebsd_committer freebsd_triage 2014-09-25 18:53:14 UTC
A commit references this bug:

Author: pfg
Date: Thu Sep 25 18:52:18 UTC 2014
New revision: 272122
URL: http://svnweb.freebsd.org/changeset/base/272122

Log:
  Add strptime(3) support for %U and %W

  Add support for the missing POSIX-2001 %U and %W features: the
  existing  FreeBSD strptime code recognizes both directives and
  validates that the week number lies in the permitted range,
  but then simply discards the value.

  Initial support for the feature was written by Paul Green with
  important fixes by Andrey Chernov. Additional support for
  handling tm_wday/tm_yday was written by David Carlier.

  PR:		137307
  MFC after:	1 month

Changes:
  head/lib/libc/stdtime/strptime.c
Comment 10 commit-hook freebsd_committer freebsd_triage 2014-09-25 23:05:36 UTC
A commit references this bug:

Author: pfg
Date: Thu Sep 25 23:04:38 UTC 2014
New revision: 272146
URL: http://svnweb.freebsd.org/changeset/base/272146

Log:
  Revert r272122

  The patch still needs to be more robust and it broke the
  build on MIPS so revert it for now while all the issues
  are fixed.

  Reported by:	ache, davide
  PR:		137307

Changes:
  head/lib/libc/stdtime/strptime.c
Comment 11 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-09-25 23:16:29 UTC
Created attachment 147680 [details]
WIP patch

I updated the patch with some fixes.

____

While here, bring some notes from ache@ :

The only place looking suspicious is that:

 		case 'U':
@@ -313,6 +372,8 @@
 			 * information present in the tm structure at this
 			 * point to calculate a real value, so just check the
 			 * range for now.
+			 * We expect that the year has already been
+			 * parsed.
 			 */
 			if (!isdigit_l((unsigned char)*buf, locale))
 				return (NULL);
@@ -327,6 +388,45 @@

We cant assume that year is parsed, and tm_year used below will contain garbage 
if not. I suggest to move whole %U/W additional parameters calculations code starting 
from
+			/* Week numbers are l-origin. So that we can always
+			 * return the date of a Sunday (or Monday), treat week
+			 * 0 as week 1.
+			 */
to the end of the parsing loop, remembering temporary 'i' variable under more global scope 'week_number'.
It will allow to handle both normal %Y %U and even %U %Y combinations.
Comment 12 Andrey A. Chernov freebsd_committer freebsd_triage 2014-09-26 05:23:21 UTC
+	if (flags & (FLAG_YEAR | FLAG_MONTH)) {
+		if (!(flags & FLAG_YDAY) && (flags & FLAG_MDAY))
+			tm->tm_yday = start_of_month[isleap(tm->tm_year
+				+ TM_YEAR_BASE)][tm->tm_mon] + (tm->tm_mday - 1);
+		if (!(flags & FLAG_WDAY)) {
+			i = 0;
+			wday_offset = first_wday_of(tm->tm_year);
+			while (i++ <= tm->tm_yday)
+				if (wday_offset++ >= 6)
+					wday_offset = 0;
+
+			tm->tm_wday = wday_offset;
+		}
+	}


I just notice a lot of use without being initialize places. This code should be rewritten:

	if (!(flags & FLAG_YDAY) && 
	    (flags & (FLAG_YEAR | FLAG_MONTH | FLAG_MDAY)) == 
	    (FLAG_YEAR | FLAG_MONTH | FLAG_MDAY)) {
		tm->tm_yday = start_of_month[isleap(tm->tm_year
			+ TM_YEAR_BASE)][tm->tm_mon] + (tm->tm_mday - 1);
		flags |= FLAG_YDAY;
	}

	if (!(flags & FLAG_WDAY) && 
	    (flags & (FLAG_YEAR | FLAG_YDAY)) == (FLAG_YEAR | FLAG_YDAY)) {
		i = 0;
		wday_offset = first_wday_of(tm->tm_year);
		while (i++ <= tm->tm_yday) {
			if (wday_offset++ >= 6)
				wday_offset = 0;
		}
		tm->tm_wday = wday_offset;
		flags |= FLAG_WDAY;
	}
Comment 13 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-09-28 03:50:17 UTC
Created attachment 147740 [details]
Much cleaner version (proposed by ache@)

Commit Candidate
Comment 14 commit-hook freebsd_committer freebsd_triage 2014-09-28 21:21:16 UTC
A commit references this bug:

Author: pfg
Date: Sun Sep 28 21:20:21 UTC 2014
New revision: 272273
URL: https://svnweb.freebsd.org/changeset/base/272273

Log:
  Add strptime(3) support for %U and %W (take 2)

  Add support for the missing POSIX-2001 %U and %W features: the
  existing FreeBSD strptime code recognizes both directives and
  validates that the week number lies in the permitted range,
  but then simply discards the value.

  Initial support for the feature was written by Paul Green.
  David Carlier added the initial handling of tm_wday/tm_yday.
  Major credit goes to Andrey Chernov for detecting much of the
  brokenness, and rewriting/cleaning most of the code, making it
  much more robust.

  Tested independently with the strptime test from the GNU C
  library.

  PR:		137307
  MFC after:	1 month
  Relnotes:	yes

Changes:
  head/lib/libc/stdtime/strptime.c
Comment 16 commit-hook freebsd_committer freebsd_triage 2014-10-01 22:18:50 UTC
A commit references this bug:

Author: pfg
Date: Wed Oct  1 22:18:07 UTC 2014
New revision: 272387
URL: https://svnweb.freebsd.org/changeset/base/272387

Log:
  strptime: fix bug introduced in r272273.

  Reported by:	portmgr (antoine)
  Fix by:		Andrey Chernov, David Carlier
  PR:		137307 (follow up)

Changes:
  head/lib/libc/stdtime/strptime.c
Comment 17 commit-hook freebsd_committer freebsd_triage 2014-10-08 16:30:21 UTC
A commit references this bug:

Author: pfg
Date: Wed Oct  8 16:29:47 UTC 2014
New revision: 272758
URL: https://svnweb.freebsd.org/changeset/base/272758

Log:
  MFC	r272273, r272387, r272443, r272533 :

  Add strptime(3) support for %U and %W

  Add support for the missing POSIX-2001 %U and %W features: the
  existing FreeBSD strptime code recognizes both directives and
  validates that the week number lies in the permitted range,
  but then simply discards the value.

  Initial support for the feature was written by Paul Green.
  David Carlier added the initial handling of tm_wday/tm_yday.
  Major credit goes to Andrey Chernov for detecting much of the
  brokenness and rewriting/cleaning most of the code, making it
  much more robust.

  Tested independently with the strptime test from the GNU C
  library.

  PR:		137307
  Relnotes:	yes

  MFC r272441 :

  strptime: %s format fix.

  Almost never needed in real life because %s is tends to be
  only one format spec.
  1) Return code of gmtime_r() is checked.
  2) All flags are set.

  Submitted by:	ache

Changes:
_U  stable/10/
  stable/10/lib/libc/stdtime/strptime.3
  stable/10/lib/libc/stdtime/strptime.c
Comment 18 commit-hook freebsd_committer freebsd_triage 2014-10-08 16:32:23 UTC
A commit references this bug:

Author: pfg
Date: Wed Oct  8 16:32:02 UTC 2014
New revision: 272759
URL: https://svnweb.freebsd.org/changeset/base/272759

Log:
  MFC	r272273, r272387, r272443, r272533 :

  Add strptime(3) support for %U and %W

  Add support for the missing POSIX-2001 %U and %W features: the
  existing FreeBSD strptime code recognizes both directives and
  validates that the week number lies in the permitted range,
  but then simply discards the value.

  Initial support for the feature was written by Paul Green.
  David Carlier added the initial handling of tm_wday/tm_yday.
  Major credit goes to Andrey Chernov for detecting much of the
  brokenness and rewriting/cleaning most of the code, making it
  much more robust.

  Tested independently with the strptime test from the GNU C
  library.

  PR:		137307
  Relnotes:	yes

  MFC r272441 :

  strptime: %s format fix.

  Almost never needed in real life because %s is tends to be
  only one format spec.
  1) Return code of gmtime_r() is checked.
  2) All flags are set.

  Submitted by:	ache

Changes:
_U  stable/9/lib/libc/
_U  stable/9/lib/libc/stdtime/
  stable/9/lib/libc/stdtime/strptime.3
  stable/9/lib/libc/stdtime/strptime.c
Comment 19 Pedro F. Giffuni freebsd_committer freebsd_triage 2014-10-08 16:34:53 UTC
Committed and MFC'd up to stable/10 and stable/9.