Bug 52972

Summary: /bin/sh arithmetic not POSIX compliant
Product: Base System Reporter: Jens Schweikhardt <schweikh>
Component: standardsAssignee: freebsd-standards (Nobody) <standards>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: Unspecified   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
ash.patch.2
none
ash.patch none

Description Jens Schweikhardt 2003-06-05 19:10:09 UTC
	/bin/sh implements only a subset of the operators in $(( ... )) arithmetic.
	It also does not understand variable names in arithmetic expressions.

	This missing feature makes it impossible to run the OpenGroup's POSIX
	validation test suite because the configuration process for the test suite
	expects a POSIX system shell and makes heavy use of $((var += number)).
	[I can't just edit the scripts in question to use zsh or ksh93 because
	the configuration process involves executables calling system(3), make(1) etc
	which use /bin/sh hardcoded. Replacing /bin/sh is not an option.]

How-To-Repeat: $ /bin/sh
$ a=1
$ echo $((a + 1))                    # should echo 2
arith: syntax error: "a + 1"

$ echo $((a += 1))                   # should echo 2 and increment a
arith: syntax error: "a += 1"

IEEE Std 1003.2-2001 requires the other 'op=' assignment operators as well.

The zsh and ksh93 get this right:
$ a=1
$ echo $((a + 1))
2
$ echo $((a += 1))
2
$ echo $a
2
Comment 1 Jens Schweikhardt 2003-06-22 21:43:27 UTC
Wartan,

[please always cc to <bug-followup@FreeBSD.org> with the subject left as
is (like in this mail), so GNATS can add your comments to the audit
trail. Thanks!]

On Mon, Jun 23, 2003 at 12:21:58AM +0400, Wartan Hachaturow wrote:
# Hello.
# 
# I think I may try to fix this bug, but for now I've failed to
# find a precise line in SUS that requires this ( $((a+1)) vs $(($a+1)) )
# kind of arithmetic evaluation.
# I understand that in order to have "+=" operators, for example,
# we've got to have some way to use a variable itself, not substituted
# by its value, but still..
# Perhaps someone may point me to the exact part of SUS that describes 
# this particular thing?

It's in "Token Recognition"

 [...]

  If the current character is an unquoted '$' or '`' , the shell shall
  identify the start of any candidates for parameter expansion (
  Parameter Expansion ), command substitution ( Command Substitution ),
  or arithmetic expansion ( Arithmetic Expansion ) from their
  introductory unquoted character sequences: '$' or "${" , "$(" or '`' ,
  and "$((" , respectively. The shell shall read sufficient input to
  determine the end of the unit to be expanded (as explained in the
  cited sections). While processing the characters, if instances of
  expansions or quoting are found nested within the substitution, the
  shell shall recursively process them in the manner specified for the
  construct that is found. The characters found from the beginning of
  the substitution to its end, allowing for any recursion necessary to
  recognize embedded constructs, shall be included unmodified in the
  result token, including any embedded or enclosing substitution
  operators or quotes. The token shall not be delimited by the end of
  the substitution.


The recursive processing requires that $(($a+1)) needs to undergo
parameter expansion within $(()).

Regards,

	Jens
-- 
Jens Schweikhardt http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)
Comment 2 Wartan Hachaturow 2003-06-22 22:41:42 UTC
On Sun, Jun 22, 2003 at 10:43:27PM +0200, Jens Schweikhardt wrote:
> [please always cc to <bug-followup@FreeBSD.org> with the subject left as
> is (like in this mail), so GNATS can add your comments to the audit
> trail. Thanks!]

Sure, I just thought my initial letter wasn't valuable enough to
be included in the trail :)

> The recursive processing requires that $(($a+1)) needs to undergo
> parameter expansion within $(()).

Right, but this construction works in /bin/sh:

wart@mojo:~$ /bin/sh
$ a=1
$ echo $(($a+1))
2

You've said the problem was with the variable without leading $, like
this:
$ a=1
$ echo $((a+1)) 
arith: syntax error: "a+1"

And, as far as my English allows me to judge, the quoted part of SUS
says that "a" in this construct should be left in output as is (since it
doesn't have leading $, ${, $(, etc.), shouldn't it?

-- 
Regards, Wartan.
"Computers are not intelligent. They only think they are."
Comment 3 Dag-Erling Smørgrav 2003-06-23 07:32:55 UTC
Wartan Hachaturow <wart@tepkom.ru> writes:
> The following reply was made to PR standards/52972; it has been noted by =
GNATS.
> On Sun, Jun 22, 2003 at 10:43:27PM +0200, Jens Schweikhardt wrote:
> > The recursive processing requires that $(($a+1)) needs to undergo
> > parameter expansion within $(()).
>=20=20
>  Right, but this construction works in /bin/sh:
>=20=20
>  wart@mojo:~$ /bin/sh
>  $ a=3D1
>  $ echo $(($a+1))
>  2

Yes.  It expands to $((1+1)) which evaluates to 2.

>  You've said the problem was with the variable without leading $, like
>  this:
>  $ a=3D1
>  $ echo $((a+1))=20
>  arith: syntax error: "a+1"

This *should* work, but doesn't.

>  And, as far as my English allows me to judge, the quoted part of SUS
>  says that "a" in this construct should be left in output as is (since it
>  doesn't have leading $, ${, $(, etc.), shouldn't it?

Yes, it should be left as-is so the part of the code that evaluates
arithmetic expressions knows what variable is involved.  For instance,
"$(($a+=3D1)) would expand to "$((1+=3D1))" before evaluation, which makes
no sense, while "$((a+=3D1))" clearly says to increase a with 1.

DES
--=20
Dag-Erling Sm=F8rgrav - des@des.no
Comment 4 Wartan Hachaturow 2003-06-23 11:06:19 UTC
On Mon, Jun 23, 2003 at 08:32:55AM +0200, Dag-Erling Sm?rgrav wrote:

> Yes, it should be left as-is so the part of the code that evaluates
> arithmetic expressions knows what variable is involved.  For instance,
> "$(($a+=1)) would expand to "$((1+=1))" before evaluation, which makes
> no sense, while "$((a+=1))" clearly says to increase a with 1.

Yes, the latter was the thing that I couldn't understand -- how may one
implement +=, if variables are not required in arithmetic evaluation.
But now, I've noticed the line I've missed while reading SUS:

"Arithmetic Precision and Operations
Integer variables and constants, including the values of operands and
option-arguments, used by the standard utilities listed in this volume
of IEEE Std 1003.1-2001 shall be implemented" 

Question is closed :)

-- 
Regards, Wartan.
Comment 5 Wartan Hachaturow 2003-06-23 12:40:24 UTC
On Mon, Jun 23, 2003 at 08:32:55AM +0200, Dag-Erling Sm?rgrav wrote:

> This *should* work, but doesn't.

I've filed just the same bug against Debian package of ash. This is the
answer I got from Herbert Xu, also an upstream of what's called "dash",
Debian fork of ash shell (it has just the same behaviour as FreeBSD's 
ash):

----
> Package: ash
> Version: 0.4.17
> Severity: normal
> Tags: upstream

That's me.

> According to SUSv3, shell should implement integer variables in 
> arithmetic evaluation:

You're mistaken.  SUSv3 only requires constant arithmetic.  Variables
must be expanded before reaching arithmetic expansion.

Read C.2.6.4 for a detailed explanation.

> P.S. I'll try to make a patch fixing this bug in near future.

Please use bash/ksh instead.
----

.. and he has closed the bug. Now I am really lost -- where is the
truth?

-- 
Regards, Wartan.
Comment 6 Dag-Erling Smørgrav 2003-06-23 19:24:51 UTC
Wartan Hachaturow <wart@tepkom.ru> writes:
> I've filed just the same bug against Debian package of ash. This is the
> answer I got from Herbert Xu, also an upstream of what's called "dash",
> Debian fork of ash shell (it has just the same behaviour as FreeBSD's=20
> ash):
>=20
> > > According to SUSv3, shell should implement integer variables in=20
> > > arithmetic evaluation:
> >
> > You're mistaken.  SUSv3 only requires constant arithmetic.  Variables
> > must be expanded before reaching arithmetic expansion.

He's wrong.  The full text of section 2.6.4 is:

| Arithmetic expansion provides a mechanism for evaluating an arithmetic
| expression and substituting its value. The format for arithmetic
| expansion shall be as follows:
|=20
| $((expression))
|=20
| The expression shall be treated as if it were in double-quotes, except
| that a double-quote inside the expression is not treated
| specially. The shell shall expand all tokens in the expression for
| parameter expansion, command substitution, and quote removal.
|=20
| Next, the shell shall treat this as an arithmetic expression and
| substitute the value of the expression. The arithmetic expression
| shall be processed according to the rules given in Arithmetic
| Precision and Operations , with the following exceptions:
|=20
|  - Only signed long integer arithmetic is required.
|=20
|  - Only the decimal-constant, octal-constant, and hexadecimal-constant
|    constants specified in the ISO C standard, Section 6.4.4.1 are
|    required to be recognized as constants.
|=20
|  - The sizeof() operator and the prefix and postfix "++" and "--"
|    operators are not required.
|=20
|  - Selection, iteration, and jump statements are not supported.
|=20
| As an extension, the shell may recognize arithmetic expressions beyond
| those listed. The shell may use a signed integer type with a rank
| larger than the rank of signed long. The shell may use a real-floating
| type instead of signed long as long as it does not affect the results
| in cases where there is no overflow. If the expression is invalid, the
| expansion fails and the shell shall write a message to standard error
| indicating the failure.

Not only is there nothing there about only supporting constant
artithmetic, but the assignment operators are *not* listed as
exceptions to the "rules given in Arithmetic Precision and
Operations", which refers to section 1.7.2.1, which you will find at
the following URL:

http://www.opengroup.org/onlinepubs/007904975/utilities/xcu_chap01.html#tag=
_01_07_02_01

Therefore, assignment operators *are* required to work.

DES
--=20
Dag-Erling Sm=F8rgrav - des@des.no
Comment 7 Wartan Hachaturow 2003-06-24 10:46:24 UTC
On Mon, Jun 23, 2003 at 11:30:14AM -0700, Dag-Erling Sm?rgrav wrote:

>  He's wrong.  The full text of section 2.6.4 is:

Now, he has clarified himself, and pointed me to rationale.
This is the relevant part of rationale for 2.6.4:

"The syntax and semantics for arithmetic were changed for the ISO/IEC
9945-2:1993 standard. The language is essentially a pure arithmetic
evaluator of constants and operators (excluding assignment) and
represents a simple subset of the previous arithmetic language (which
was derived from the KornShell "(())" construct)."
..
"The portion of the ISO C standard arithmetic operations selected
corresponds to the operations historically supported in the KornShell."

In other words, rationale says that only constants and operators are the
language of arithmetic evaluation, and assignment shouldn't be
supported.

Jens, perhaps, you may clarify further, what particular test you've run,
post snippets of the code in question? If the tests are valid and
verified OpenGroup ones, we may have to ask them for clarification,
or issue a defect report..

-- 
Regards, Wartan.
Comment 8 Jens Schweikhardt 2003-06-24 18:16:00 UTC
On Tue, Jun 24, 2003 at 01:46:24PM +0400, Wartan Hachaturow wrote:
# On Mon, Jun 23, 2003 at 11:30:14AM -0700, Dag-Erling Sm?rgrav wrote:
# 
# >  He's wrong.  The full text of section 2.6.4 is:
# 
# Now, he has clarified himself, and pointed me to rationale.
# This is the relevant part of rationale for 2.6.4:
# 
# "The syntax and semantics for arithmetic were changed for the ISO/IEC
# 9945-2:1993 standard. The language is essentially a pure arithmetic
# evaluator of constants and operators (excluding assignment) and
# represents a simple subset of the previous arithmetic language (which
# was derived from the KornShell "(())" construct)."
# ..
# "The portion of the ISO C standard arithmetic operations selected
# corresponds to the operations historically supported in the KornShell."
# 
# In other words, rationale says that only constants and operators are the
# language of arithmetic evaluation, and assignment shouldn't be
# supported.

The rationale is not a normative part of POSIX 2001. If it disagrees
with anything in POSIX 2001, then POSIX 2001 is what counts.

# Jens, perhaps, you may clarify further, what particular test you've run,
# post snippets of the code in question? If the tests are valid and
# verified OpenGroup ones, we may have to ask them for clarification,
# or issue a defect report..

I'm using the VSC-lite test suite with the TET3.6 Test Environment as
made available for Austin Group members by the OpenGroup. The TET
contains scripts to generate the automated test suite. It is the
generation step that already fails (so I can't even *run* the actual
test suite). The script in question from tet3.6-lite.src.tar.gz is
src/posix_sh/api/tcm.sh:

schweikh@hal9000:/tmp $ fgrep '$((' src/posix_sh/api/tcm.sh
                        : $((tet_l1_iccount += 1))
                while test $((tet_l1_tpnum += 1)) -le $tet_tpcount
                        : $((tet_l1_tpcount += 1))
        while test $((tet_l3_sig += 1)) -lt $TET_NSIG
                        : $((tet_l31_testnum += $#))
                : $((tet_testnum += tet_l33_tpnum - 1))
                        if test $tet_l36_icstart -lt $((tet_l34_last_icend + 1))
                                tet_l36_icstart=$((tet_l34_last_icend + 1))
                                                : $((tet_l36_icstart += 1))
                while test $((tet_l36_icstart += 1)) -le $tet_l36_icend
                        while test $((tet_l36_icend -= 1)) -gt $tet_l36_icstart
        tet_l37_icnum=$((tet_l37_icstart - 1))
        while test $((tet_l37_icnum += 1)) -le $tet_l37_icend


Regards,

	Jens
-- 
Jens Schweikhardt http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)
Comment 9 Wartan Hachaturow 2003-07-03 15:27:53 UTC
Hello.

I've hacked out a simple patch for 5.1-RELEASE's /bin/sh, implementing 
variables support in arithmetic evaluation, along with various assignment
operators.

Since the patch is pretty big (11k), I won't include it right in the letter,
it may be grabbed from
http://velvet.myxomop.com/~wart/ash.patch


Below are some comments to the patch:
First of all, arithmetics variables are plain shell variables, without any 
integer/string variable difference. If the variable doesn't exist, it is 
initialized to zero, as SUS requires. When integer value is needed, the
result
of lookupvar() is strtol'ed. Austin group's proposed change to the
standart 
says that the behaviour in case of unconvertable string is unspecified.
I try to avoid the case of "1ab" being silently converted to integer 1,
and check that. There's one miss, though -- just like bash, we skip 
initial whitespaces in the variable (inside strtol):

$ a="   1"
$ echo $((a+1))
2

I don't consider this being clever, but I didn't find the way to avoid
it
(after all, results are unspecified :).
Integer to string coversion is performed via snprintf.
SUS now requires us to have at least signed long arithmetics. I've tried
to make the code type-independent, and introduced an arith_t typedef
(is it ok with style(9)?) with the macros for strtoarith_t and
atoarith_t.
11 assignment operators are implemented, following SUS requirement.
I haven't found a test suite for shell arithmetics, but simple things 
like

$ i=1; j=2; k=3
$ echo $((i+=j+=k))
6
$ echo $i, $j, $k
6, 5, 3

work so far. If anyone would submit a test suite, I'll be glad to test
it.
Only decimal-base arithmetics is implemented. Thus, x=010 is being
treated
as decimal 10. I've browsed SUS, trying to find the requirement for
other
bases, but haven't found it. Should we support it?

Comments, fixes, bugs, blames, etc. are welcome.

-- 
Regards, Wartan.
"Computers are not intelligent. They only think they are."
Comment 10 Jens Schweikhardt 2003-07-03 20:40:06 UTC
Wartan,

On Thu, Jul 03, 2003 at 06:27:53PM +0400, Wartan Hachaturow wrote:
# Hello.
# 
# I've hacked out a simple patch for 5.1-RELEASE's /bin/sh, implementing 
# variables support in arithmetic evaluation, along with various assignment
# operators.
# 
# Since the patch is pretty big (11k), I won't include it right in the letter,
# it may be grabbed from
# http://velvet.myxomop.com/~wart/ash.patch

Thanks for taking up the grunt work and implementing this. This is more
than I hoped for!

I've looked at your patch (not yet applied and tested) and have a few
remarks:

+ It appears this is a patch against RELENG_4; is this true? If yes, a
  patch against HEAD is needed.

+ There are many lines with whitespace at end-of-line which you should
  remove. Can you instruct your editor to make these visible?

+ Reversing comparisons against constant values like in
  if (NULL == lookupvar($1))
  look like a style(9) violation.

PS: 11k is not really a big patch. Nobody would mind if you send your
    improved patch to the GNATS db. URLs with patches tend to become
    stale very soon.

Regards,

	Jens
-- 
Jens Schweikhardt http://www.schweikhardt.net/
SIGSIG -- signature too long (core dumped)
Comment 11 Wartan Hachaturow 2003-07-03 21:35:55 UTC
On Thu, Jul 03, 2003 at 09:40:06PM +0200, Jens Schweikhardt wrote:

> Thanks for taking up the grunt work and implementing this. This is more
> than I hoped for!

Someone should have made it :)

> + It appears this is a patch against RELENG_4; is this true? If yes, a
>   patch against HEAD is needed.

Well, it was against RELENG_5_1. But the attached one is against HEAD
(though it have some remniscents like rcs ids from 5_1).
Looks like nothing has changed in ash since 5_1, initial patch applied
flawlessly.

> + There are many lines with whitespace at end-of-line which you should
>   remove. Can you instruct your editor to make these visible?

Ok, looks like I've removed them now. ":set list" was of a great help :)

> + Reversing comparisons against constant values like in
>   if (NULL == lookupvar($1))
>   look like a style(9) violation.

I heard somewhere that it's treated as a good habit, and was in process
of getting myself used to that kind of comparison (like you may have
noticed, I mixed both ways :). But if it's against style(9), I'll
drop it. (My internal nature was against it, anyway :).

> PS: 11k is not really a big patch. Nobody would mind if you send your
>     improved patch to the GNATS db. URLs with patches tend to become
>     stale very soon.

Ok, improved one is attached.

-- 
Regards, Wartan.
"Computers are not intelligent. They only think they are."
Comment 12 Wartan Hachaturow 2003-08-22 23:27:21 UTC
This version of patch (hopefully :) fixes the issues tjr pointed to,
and adds hexal and octal constants which are required by SUSv3.
I've even successfully built 5.1's world with it.

-- 
Regards, Wartan.
"Computers are not intelligent. They only think they are."
Comment 13 Jens Schweikhardt freebsd_committer freebsd_triage 2003-08-30 13:32:04 UTC
State Changed
From-To: open->patched

Improved patch applied to -current. MFC in 6 weeks or whenever 
the RELENG_4 code freeze is lifted. Thanks a lot for the work, Wartan!
Comment 14 Jens Schweikhardt freebsd_committer freebsd_triage 2005-07-18 13:31:36 UTC
State Changed
From-To: patched->closed

I have no intention to backport this to RELENG_4.