Bug 217611

Summary:

ARM VFP/NEON regs not preserved in signal delivery

Product:

Base System

Reporter:

Andrew "RhodiumToad" Gierth <andrew>

Component:

arm

Assignee:

Michal Meloun <mmel>

Status:

Closed FIXED

Severity:

Affects Many People

CC:

emaste, marklmi26-fbsd, mmel, otacilio.neto, pstef, sylvain

Priority:

---

Version:

11.0-STABLE

Hardware:

arm

OS:

Any

Attachments:

Description	Flags
FP registers vs. signals	none
patch I'm testing (breaks ABI)	none

Description Andrew "RhodiumToad" Gierth 2017-03-07 13:21:24 UTC

Created attachment 180600 [details]
FP registers vs. signals

Signal delivery to process is not saving/restoring the VFP/NEON registers.

I found this on stable/11, but it also exists on CURRENT. I tested on an RPI2 but the problem clearly exists on other ARM platforms.

The bug is obvious by inspection, but I attach an example test program. (If you try this on a much faster platform than cortex-a7 @900MHz, make sure you increase the loop count to ensure runtime exceeds 1 second.)

cc fptst.c -o fptst
./fptst
entering...
h0: actual=10000000.000000 expected=10000000.000000
h1: actual=10000001.000000 expected=10000001.000000
h2: actual=10000002.000000 expected=10000002.000000
h3: actual=10000003.000000 expected=10000003.000000
h4: actual=10000004.000000 expected=10000004.000000
h5: actual=10000005.000000 expected=10000005.000000
h6: actual=10000006.000000 expected=10000006.000000
h7: actual=10000007.000000 expected=10000007.000000

./fptst breakme
entering...
h0: actual=-18199474333245.414062 expected=10000000.000000
h1: actual=-18199474333245.414062 expected=10000001.000000
h2: actual=-18199476796269.414062 expected=10000002.000000
h3: actual=-18199476796271.312500 expected=10000003.000000
h4: actual=-18199479259296.312500 expected=10000004.000000
h5: actual=-18199479259295.312500 expected=10000005.000000
h6: actual=-18199481722320.312500 expected=10000006.000000
h7: actual=-18199481722319.312500 expected=10000007.000000

(actual results will vary)

Comment 1 Mark Millard 2017-03-08 03:25:42 UTC

(In reply to andrew from comment #0)

FYI:

# diff clang_armv7a_macros.txt clang_cortex_a7_macros.txt | more
7a8
> #define __ARM_ARCH_EXT_IDIV__ 1
13a15,16
> #define __ARM_FEATURE_FMA 1
> #define __ARM_FEATURE_IDIV 1
18c21
< #define __ARM_FP 0xC
---
> #define __ARM_FP 0xE
22c25
< #define __ARM_NEON_FP 0x4
---
> #define __ARM_NEON_FP 0x6
28c31
< #define __ARM_VFPV3__ 1
---
> #define __ARM_VFPV4__ 1

So both have NEON and VFP and FP, but not
exactly matching variations.

# diff clang_default_macros.txt clang_armv7a_macros.txt | more
6,7c6,7
< #define __ARM_ARCH 6
< #define __ARM_ARCH_6KZ__ 1
---
> #define __ARM_ARCH 7
> #define __ARM_ARCH_7A__ 1
9c9,10
< #define __ARM_ARCH_ISA_THUMB 1
---
> #define __ARM_ARCH_ISA_THUMB 2
> #define __ARM_ARCH_PROFILE 'A'
13c14
< #define __ARM_FEATURE_LDREX 0x4
---
> #define __ARM_FEATURE_LDREX 0xF
19a21,23
> #define __ARM_NEON 1
> #define __ARM_NEON_FP 0x4
> #define __ARM_NEON__ 1
24c28
< #define __ARM_VFPV2__ 1
---
> #define __ARM_VFPV3__ 1

So armv6 and -march=armv7a have the same
FP status. But armv6 has yet a different
VFP version and no NEON at all.

Note that -march=armv7r (not listed) does
not have FP, VFP, or NEON: less than armv6
ni these areas.

The same goes for -march=armv7m .


Technically FreeBSD could declare that no
option that would have the compiler use
NEON is supported. Otherwise FreeBSD needs
to be able to be built with NEON support,
possibly auto-adjusting to the processor
if enabled.

Similarly for building ports or other code.

Similar points go for VFP and for FP.

It might get more detailed for versions/variations
in NEON if more or less needs to be saved/restored.
Similarly for VFP and for FP.

Again: what the compilers might put to use matters.

Comment 2 Andrew "RhodiumToad" Gierth 2017-03-08 09:57:13 UTC

Uh, I think you're missing the point. Signal delivery isn't preserving ANY floating-point/vector registers on ANY vfp version; the test program fails when compiled with no cpu or arch options at all. 

Most programs don't do floating-point in signal handlers so this isn't immediately obvious. The use of -mcpu=cortex-a7 or similar option just broadens the problem to include every program that might have vector operations in a signal handler (which includes all programs with signal handlers if linked with libthr) as well as in the main program.

Support for preserving the full vfp/neon register set over context switches is already present, it's simply that whoever added that didn't follow through on the requirement to preserve them for signals too. Worse, the mcontext_t structure -- which is user-visible, and used by libthr -- isn't even big enough to hold the registers, so fixing the problem requires breaking the ABI.

Comment 3 Mark Millard 2017-03-08 16:33:12 UTC

(In reply to andrew from comment #2)

I do not see how what I wrote contradicts your original
or new material.

I'd guess that FreeBSD will at some point choose to support
VFP use and NEON use in some form for processors that have
them.

But for now the supported context requires that libthr (or its
signal handling) be built without use of such. Similarly for
any other signal-handling code.

The notes about the required ABI change were a good thing to
be explicit about as well. stable/11 may never get such since
after release ABI changes are avoided. Fixing 11 by changing
its ABI likely would require a general debate before potential
approval.

Thus stable/11 may literally never officially allow builds
where libthr uses NEON in its signal handlers, for example.
Nor openssl using NEON for another example.

head (12) likely will eventually support such, or that would
be my guess.

Comment 4 Andrew "RhodiumToad" Gierth 2017-03-08 17:28:55 UTC

(In reply to Mark Millard from comment #3)

You're still missing the point. This isn't about NEON except tangentially, and it isn't even about compiling openssl or libthr with cortex-a7 or other CPU types any more. It's about the entire floating-point support being incomplete.

The ABI *REQUIRES* use of the floating-point hardware (that's the "hf" at the end of "gnueabihf" - stands for "hard float", i.e. FP args go in FP registers). This isn't something that's just an option that programs can do without.

(It might be possible to do a non-ABI-breaking fix for 11 by preserving d0-d15; that would work for programs that don't use -mcpu or other options that enable vfp3 or neon usage.)

Comment 5 Mark Millard 2017-03-08 18:48:32 UTC

(In reply to andrew from comment #4)

I'm still unclear where I've contradicted you up to this point.
I think its more that I'm making a somewhat different set of
points.

NEON and its registers are not limited to floating point
contexts.

https://www.arm.com/products/processors/technologies/neon.php

lists:

It has 32 registers, 64-bits wide (dual view as 16 registers, 128-bits wide.
. . .
Data types can be: signed/unsigned 8-bit, 16-bit, 32-bit, 64-bit, single precision floating point

And http://lists.llvm.org/pipermail/llvm-dev/2016-March/097613.html

(2016-Mar-25) indicates that llvm/clang was intended to eventually have:

> Examples:
> 
> Works today:
> -mfpu=soft -> Int (ALU), FP (LIB), no VFP/NEON instructions
> -mfpu=softfp -> Int (ALU), FP (LIB), VFP/NEON instructions allowed
> -mfpu=vfp -> Int (ALU), FP (VFP)
> -mfpu=neon -> Int (NEON), FP (NEON)
> 
> Change proposed:
> -mfpmath=neon -mfpu=vfp -> Int (ALU), FP (NEON)
> -mfpmath=vfp -mfpu=neon -> Int (NEON), FP (VFP)

Thus even code not doing any floating point activity could be
using NEON registers and have problems with signals not preserving
those register values --even if no running code was using floating
point in any process (unlikely combination).

I think that this means that NEON deserves its own mention, not
just tied to VFP or other floating point contexts. This is in
addition to the floating point related points, not instead of them.

Side note:

For floating point use NEON is not IEEE-754 compliant but VFP is.

Comment 6 Andrew "RhodiumToad" Gierth 2017-03-08 21:30:35 UTC

Whatever. I have code under test now that seems to be working (with the world built for cortex-a7); though I need to dig around to find if I missed anything. I'll post the patch later.

Comment 7 Andrew "RhodiumToad" Gierth 2017-03-09 10:37:38 UTC

Created attachment 180669 [details]
patch I'm testing (breaks ABI)

Changing the size of mcontext_t changes the size of ucontext_t used by libthr, so everything linked with libthr (which notably includes sshd) will crash in the event of mismatch.

Both openssl speed and git clone now succeed when everything is built with CPUTYPE=cortex-a7, and the test program succeeds regardless of compilation flags; but I'm no expert on ARM (been playing with it only a few weeks) so it's quite likely I've overlooked something.

Comment 8 Mark Millard 2017-03-09 18:43:41 UTC

(In reply to andrew from comment #7)

I expect that this sort of code should have a
phabricator review (or equivalent) with FreeBSD
and arm experts invited.

One thing that I notice is that the Procedure
Call Standard indicates that at calls to public
interfaces (such as a signal handler) needs the
fpscr length and stride fields to have been
cleared.

An implication is that when the signal handler
returns the fpscr value needs to be restored,
including the length and stride fields.

As a cross check I looked at linux code and it
does these things.

It also showed that there is also code
present for fpexc, fpinst and fpinst2.
I'll avoid quoting code for fear of
pollution.

The VFP itself is temporarily disabled "to avoid
corrupting new thread state" before some activity
and re-enabled after that activity for the
restore-user-hw-state code. fpexc has 3 or so
bits manipulated.

There is some (cpu specific) fpexc code on the
preserve-user-clear-hw-state side of things as
well. There is also fpinst and fpinst2 code.

Comment 9 Andrew "RhodiumToad" Gierth 2017-03-09 19:47:14 UTC

(In reply to Mark Millard from comment #8)

I'm not touching fpexc and fpinst* because those are privileged, and therefore shouldn't be saved and restored via a user-controlled context. If anything from them does actually need to be preserved -- which I honestly don't know: fpexc seems to be used only to disable or enable the VFP, which requires no special attention here, and fpinst* seem to be related to exception handling, which would explain why they need to be preserved over kernel preemptions but wouldn't require them to be handled here -- then it would have to be done in some other way than storing them in user space.

fpscr is saved and restored by the patch (though I admit to not having actually tested that part yet).

(Incidentally, arm64 appears to have a different bug: it's not saving the fpcontext in get_mcontext/set_mcontext, but it is saving it in the signal code. This also isn't kosher, because it means that user-space getcontext(3)/swapcontext(3) etc. do not deliver all of their specified behavior, which may affect libthr)

Comment 10 Mark Millard 2017-03-09 20:38:36 UTC

(In reply to Mark Millard from comment #8)

This note may well be outside what FreeBSD chooses to target
or be ready for since there are no existing examples as far
as I can tell. Still I note it explicitly.

http://infocenter.arm.com/help/topic/com.arm.doc.dht0002a/DHT0002A_introducing_neon.pdf

says:

There is no architectural requirement for a processor to implement both VFP and NEON, but the common features in the programmers models for these extensions mean an operating system that supports VFP requires little or no modifications to also support NEON.

There is also:

http://infocenter.arm.com/help/topic/com.arm.doc.ihi0053d/IHI0053D_acle_2_1.pdf

__ARM_NEON is defined to a value indicating the Advanced SIMD (NEON) architecture supported. The only current value is 1.

In principle, for AArch32, the NEON architecture can exist in an integer-only version. To test for the presence of NEON floating-point vector instructions, test __ARM_NEON_FP. When NEON does occur in an integer-only version, the VFP scalar instruction set is also not present.

My guess is that FreeBSD would assume that NEON being present means that VFP is
as well, expecting that Integer NEON by itself is unlikely to occur.

Comment 11 Mark Millard 2017-03-09 20:56:07 UTC

(In reply to andrew from comment #9)

You may well be right. I do not claim to understand
all the justifications/requirements driving everything
Linux does. I just note that it does it in case it
prompts something about what does need to be done.

fpexp usage for signals is involved in:

A) disabling VFP and later enabling it
B) invalidating fpinst2
C) clearing the fp exception flag

for restore-user-hw-state (so on return from the
signal handler).

I would have guessed that the exception flag is
to be made to appear to be per-thread (or
per-process) specific and that a signal handler
that causes the fp exception flag to be set
should not have the flag value be visible as a
side effect on other threads or processes.

But I could be guessing wrong about what the 
fp exception flag indicates or its intended
scope.

Comment 12 Mark Millard 2017-03-09 21:12:37 UTC

(In reply to Mark Millard from comment #11)

Looks like fpexc is processor family specific. For arm11 there are
exception handling requirements. . .

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/Ccdhcfga.html

says for arm11:

[31]
EX
Exception flag.
When EX is set, the VFP11 coprocessor is in the exceptional state. EX must be cleared by the exception handling routine.

[28]
FP2V
FPINST2 instruction valid flag. Set when FPINST2 contains a valid instruction. FP2V must be cleared by the exception handling routine.

I guess this is why linux has such code.

Comment 13 Mark Millard 2017-03-09 21:19:33 UTC

(In reply to Mark Millard from comment #12)

[I should have looked this up as part of making
the prior note:]

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0290g/Ccdhcfga.html

lists arm11 as an armv6.

So for FreeBSD to cover armv6's VFP it looks like the
exception handling would be required to clear those
fpexc fields (EX and FP2V).

Comment 14 Mark Millard 2017-03-09 21:26:52 UTC

(In reply to Mark Millard from comment #13)

Another piece of that document's text says:

You must save and restore the FPEXC register whenever changing the context. If the EX flag, FPEXC[31], is set, then the VFP11 coprocessor is in the exceptional state, and you must also save and restore the FPINST and FPINST2 registers. You can write the context switch code to determine from the EX flag which registers to save and restore or to simply save all three.



As I understand it signal delivery and return would be considered a context switch for the above. (But I'm no expert. Linux did bother to.)

Comment 15 Andrew "RhodiumToad" Gierth 2017-03-09 22:22:49 UTC

(In reply to Mark Millard from comment #11)

The existing exception handling so far handles exactly two cases:

1. The VFP is disabled because the current thread might not own the hardware, in which case the exception handler clears the exception, restores the registers from the pcb to the hardware if necessary, and turns the VFP on;

2. The VFP is enabled, in which case the exception handler clears the exception and sends SIGFPE to the process

In either event, the exception must be cleared before we return to user mode, and it's at that point that signal delivery happens. Hence, no need to preserve the exception state.

The code in vfp.c that preserves fpexc/fpinst/fpinst2 has to do so because of the possibility that the thread gets preempted inside the kernel while handling an exception (i.e. before reaching the critical_enter() in vfp_bounce), This doesn't apply to signal delivery into user space.

So unless I'm badly misunderstanding the existing code, what I've done should be sufficient. It might not be optimal - it would perhaps make sense to have (as other targets do) a flag to indicate that the VFP state is present, and use that to decide whether to restore the state to the registers rather than just to the pcb; my version currently always returns from signals with the VFP off and the state in the pcb rather than the hardware.

Comment 16 Mark Millard 2017-03-09 22:35:06 UTC

(In reply to Mark Millard from comment #14)

http://liris.cnrs.fr/~mmrissa/lib/exe/fetch.php?media=armv7-a-r-manual.pdf

indicates that much of the FPEXC content and related things are
"VFP subarchitecture defined" for armv7-a and armv7-r: they are
not part of the armv7 architecture itself.

So I appears to me that FreeBSD must decide what
VFP subarchitectures it is going to support automatically
--and what versions relative to the "Common VFP
Subarchitecture". May be only "Common VFP Subarchitecture"
versions are to be supported.

[Note that Cortex-A8 has only one bit used but arm11
(an armv6) has many used.]

The document says as well:

Appendix F Common VFP Subarchitecture Specification
Defines version 2 of the Common VFP Subarchitecture.
Note
This specification is not part of the ARM architecture specification. This sub-architectural information is included here as supplementary information, for the convenience of developers and users who might require this information.
. . .
Appendix F Common VFP Subarchitecture Specification includes examples of how a Floating-point subarchitecture might define additional registers, in the SUBARCHITECTURE DEFINED register space using addresses in the 0b1001 to 0b1111 range. Appendix F is not part of the ARMv7 architecture. It is included as an example of how a Floating-point subarchitecture might be defined.
. . .

[presuming a Common VFP Subarchitecture context:]
. . .
Floating-point support code
A complete Floating-point implementation might require a software component, called the support code. For example, if an implementation includes VFPv3U or VFPv4U, support code must handle the trapped floating-point exceptions. The interface to the support code is called the VFP subarchitecture. ARM has defined a subarchitecture that is suitable for use with implementations of the ARM Floating-point Extension, see Appendix F Common VFP Subarchitecture Specification.

Context switch software can be written to always save and restore the subarchitecture registers. In this case appropriate context switch software must be chosen based on the registers implemented, using the detection mechanism described in Detecting which VFP Common subarchitecture registers are implemented on
page AppxF-2445.

Comment 17 Otacílio de Araújo Ramos Neto 2017-03-11 21:12:40 UTC

I confirm that this occurs on beaglebone black.

% ./fptst breakme
entering...
h0: actual=-807739214198814.750000 expected=50000000.000000
h1: actual=-807739214198813.750000 expected=50000001.000000
h2: actual=-807739230607542.875000 expected=50000002.000000
h3: actual=-807739230607540.000000 expected=50000003.000000
h4: actual=-807739247016269.125000 expected=50000004.000000
h5: actual=-807739247016266.250000 expected=50000005.000000
h6: actual=-807739263424995.375000 expected=50000006.000000
h7: actual=-807739263424993.375000 expected=50000007.000000

Comment 18 commit-hook freebsd_committer

2017-03-26 08:38:05 UTC

A commit references this bug:

Author: mmel
Date: Sun Mar 26 08:36:56 UTC 2017
New revision: 315974
URL: https://svnweb.freebsd.org/changeset/base/315974

Log:
  Preserve VFP state across signal delivery.

  We don't have enouch space to store full VFP context within mcontext
  stucture. Due to this:
   - follow i386/amd64 way and store VFP state outside of the mcontext_t
     but point to it. Use the size of VFP state structure as an 'magic'
     indicator of the saved VFP state presence.
   - teach set_mcontext() about this external storage.
   - for signal delivery, store VFP state to expanded 'struct sigframe'.

  Submited by:	Andrew Gierth (initial version)
  PR:		217611
  MFC after:	2 weeks

Changes:
  head/lib/libthread_db/arch/arm/libpthread_md.c
  head/sys/arm/arm/machdep.c
  head/sys/arm/include/frame.h
  head/sys/arm/include/ucontext.h

Comment 19 Mark Millard 2017-03-26 09:23:11 UTC

(In reply to commit-hook from comment #18)

Does the struct fpreg problem that was mentioned
on the list need a separate bugzilla report?



I remind what I ran into when I looked around based
on the reference. (The original reference just named
the struct.)


On 2017-Mar-12, at 12:17 AM, Michal Meloun <melounmichal at gmail.com> wrote:

The struct fpreg is also wrong and I'm not sure if
or how we can to fix this in compatible way.



And I later wrote:

Looking up some details shows sys/arm/include/reg.h with:

struct fpreg {
      unsigned int fpr_fpsr;
      fp_reg_t fpr[8];
};

[Covers only 8 floating point registers?]

and shows sys/arm/include/fp.h with:

typedef struct fp_extended_precision {
      u_int32_t fp_exponent;
      u_int32_t fp_mantissa_hi;
      u_int32_t fp_mantissa_lo;
} fp_extended_precision_t;

typedef struct fp_extended_precision fp_reg_t;
. . .
/*
* Type for a saved FP context, if we want to translate the context to a
* user-readable form
*/

typedef struct {
      u_int32_t fpsr;
      fp_extended_precision_t regs[8];
} fp_state_t;


So each of:

struct fpreg
fp_state_t

has room for 8 instances of 96 bits (beyond fpsr), not sufficient
for 32 double precision (i.e., 64-bit) registers.

The arm code also has:

# grep -r "\<fpreg\>" /usr/src/sys/arm/ | more
/usr/src/sys/arm/arm/machdep.c:fill_fpregs(struct thread *td, struct fpreg *regs)
/usr/src/sys/arm/arm/machdep.c:set_fpregs(struct thread *td, struct fpreg *regs)
/usr/src/sys/arm/include/reg.h:struct fpreg {
/usr/src/sys/arm/include/reg.h:int     fill_fpregs(struct thread *, struct fpreg *);
/usr/src/sys/arm/include/reg.h:int     set_fpregs(struct thread *, struct fpreg *);

And the system has:

/usr/src/sys/sys/procfs.h:typedef struct fpreg fpregset_t;
/usr/src/sys/sys/procfs.h:    size_t    pr_fpregsetsz;  /* sizeof(fpregset_t) (1) */
/usr/src/sys/sys/procfs.h:typedef fpregset_t prfpregset_t;
/usr/src/sys/sys/ptrace.h:struct fpreg;
/usr/src/sys/sys/ptrace.h:int   proc_read_fpregs(struct thread *_td, struct fpreg *_fpreg);
/usr/src/sys/sys/ptrace.h:int   proc_write_fpregs(struct thread *_td, struct fpreg *_fpreg);

and:

/usr/src/sys/kern/sys_process.c: * Ptrace doesn't support fpregs at all, and there are no security holes
/usr/src/sys/kern/sys_process.c: * or translations for fpregs, so we can just copy them.
/usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c:        PROC_ACTION(fill_fpregs(td, fpregs));
/usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c:        PROC_ACTION(set_fpregs(td, fpregs));
/usr/src/sys/kern/sys_process.c:                struct fpreg fpreg;
/usr/src/sys/kern/sys_process.c:                error = COPYIN(uap->addr, &r.fpreg, sizeof r.fpreg);
/usr/src/sys/kern/sys_process.c:                error = COPYOUT(&r.fpreg, uap->addr, sizeof r.fpreg);
/usr/src/sys/kern/sys_process.c:                error = PROC_WRITE(fpregs, td2, addr);
/usr/src/sys/kern/sys_process.c:                error = PROC_READ(fpregs, td2, addr);

and there is use of some of the above in:

/usr/src/sys/kern/sys_process.c: * proc_read_fpregs, proc_write_fpregs
/usr/src/sys/kern/sys_process.c:proc_read_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c:        PROC_ACTION(fill_fpregs(td, fpregs));
/usr/src/sys/kern/sys_process.c:proc_write_fpregs(struct thread *td, struct fpreg *fpregs)
/usr/src/sys/kern/sys_process.c:        PROC_ACTION(set_fpregs(td, fpregs));

It appears that fp_state_t is unused in /usr/src/sys/ .

Comment 20 Michal Meloun freebsd_committer

2017-03-26 10:02:15 UTC

(In reply to Mark Millard from comment #19)
No, it doesn't need a separate bug report. I only want to solve this puzzle step-by-step.

At this time, I have prepared fix for pthread library (libthread) -> https://github.com/strejda/tegra/commit/fd7dce067adaef48fbd4ea37b7507f14c112fd4a but it needs more testing.

The struct fpreg related problem is slightly more complicated and cannot be solved (probably) without ABI breakage. Fortunately, only real out-of-kernel consumer of struct fpreg is old KSE library (libthread_db), but I have not analyzed ABI change impact for it (and I'm not sure if it's used at all).

Michal

Comment 21 Andrew "RhodiumToad" Gierth 2017-03-26 12:11:40 UTC

(In reply to Michal Meloun from comment #20)

The pthread library fix is obviously inadequate since it is hardcoding the structure size in userspace.

Comment 22 commit-hook freebsd_committer

2017-10-16 12:54:11 UTC

A commit references this bug:

Author: mmel
Date: Mon Oct 16 12:53:55 UTC 2017
New revision: 324660
URL: https://svnweb.freebsd.org/changeset/base/324660

Log:
  Save VFP state in getcontext(3) on ARM.
  This is a last followup of r315974, which fixes userland part
  of VFP save/restore problems described in PR 217611.

  PR:		217611
  MFC after:	2 weeks

Changes:
  head/lib/libc/arm/gen/Makefile.inc
  head/lib/libc/arm/gen/getcontextx.c
  head/sys/arm/arm/machdep.c
  head/sys/arm/arm/sys_machdep.c
  head/sys/arm/include/machdep.h
  head/sys/arm/include/sysarch.h