| Summary: | Conflicting system headers illustrated by build of graphics/cqcam | ||
|---|---|---|---|
| Product: | Base System | Reporter: | Bernard van Gastel <bvgastel> |
| Component: | kern | Assignee: | freebsd-bugs (Nobody) <bugs> |
| Status: | Closed FIXED | ||
| Severity: | Affects Only Me | ||
| Priority: | Normal | ||
| Version: | Unspecified | ||
| Hardware: | Any | ||
| OS: | Any | ||
|
Description
Bernard van Gastel
2003-01-14 18:20:07 UTC
Responsible Changed From-To: gnats-admin->freebsd-ports Reassign misfiled Ports PR. State Changed From-To: open->suspended Known problem, awaiting fix Responsible Changed From-To: freebsd-ports-bugs->linimon I guess no one is going to fix this unless I do it ... This is really a kernel problem. I am going to go ahead and commit a workaround for this and the one or two other ports with this problem -- but the workaround is basically unacceptable. The underlying problem is that machine/cpufunc.h for i386 has had a definition for a machine function 'ffs' for, oh, say, about 9 years now. However, man ffs will show you that there is an ffs(3) function as well. Even after reading the source it's not clear to me if these are supposed to have the same purpose -- someone with a more intimate knowledge of i386 arch is going to have to rule for certain. Back in 2002 a commit was done to create 'strings.h' to provide better adherance to POSIX. When this was done, a prototype for ffs() was introduced for ffs(3). These prototypes fight with each other. From user code, there appears to be no way (to me) to allow access to both. However, this port, among others, wishes to use the strings.h definitions _and_ the inb() and outb() functions which only cpufunc.h provides. The only way to (correctly) fix this has to do with changes to the include files, and that's outside the charter of the ports folks. In the meantime, I'm going to hold my nose and commit an include file to the port that is merely the inb/outb functions. This is clearly a hack that should go away once a "correct" solution is found. mcl State Changed From-To: suspended->open This is really a kernel problem that can only be worked around in the ports collection with terrible hackery. I'm going to go ahead and do that to unbreak graphics/cqcam and one or two other ports that suffer from this, but this is by no means an acceptable long-term solution for many reasons. See extensive comments in the audit trail. Responsible Changed From-To: linimon->freebsd-bugs On Tue, 23 Dec 2003, Mark Linimon wrote: > This is really a kernel problem. I am going to go ahead and commit a > workaround for this and the one or two other ports with this problem -- > but the workaround is basically unacceptable. Er, this is really a port[s] problem. <machine/cpufunc.h> is not intended to be included by applications. There was never any conflict with <string.h> in the kernel because the kernel never included <string.h>, and the kernel now avoids bogus conflicts, if any, with gcc's builtin ffs() using -fno-builtin. > The underlying problem is that machine/cpufunc.h for i386 has had > a definition for a machine function 'ffs' for, oh, say, about 9 years > now. However, man ffs will show you that there is an ffs(3) function > as well. Even after reading the source it's not clear to me if these > are supposed to have the same purpose -- someone with a more intimate > knowledge of i386 arch is going to have to rule for certain. They are the same. Last time I checked (less than a year ago), the gcc builtin was still slower than the kernel inline except possibly when the latter can use non-base-arch instructions like cmov. amd64's always have cmov and always use the builtin. ... I checked again. With the following slightly too simple test: %%% #include <sys/types.h> #include <machine/cpufunc.h> int z[4096]; main() { volatile int v; int i, j; for (i = 0; i < 4096; i++) z[i] = 1 << rand(); /* Yes, this is sloppy. */ for (j = 0; j < 100000; j++) for (i = 0; i < 4096; i++) #ifdef NOBUILTIN v = ffs(z[i]); #else v = __builtin_ffs(z[i]); #endif } %%% Times on an Athlon XP1600 overclocked by 146/133: cc -O -mcpu=pentiumpro -o foo foo.c (default from bsd.cpu.mk) 3.49 real 3.47 user 0.00 sys cc -O -mcpu=pentiumpro -DNOBUILTIN -o foo foo.c (default + kernel ffs()) 3.21 real 3.21 user 0.00 sys cc -O -march=pentiumpro -o foo foo.c (gives cmov and works on Athlon XP too): 3.21 real 3.21 user 0.00 sys Here using cmov[e] gives the same amount of optimization as the kernel ffs() gets by using a simple conditional branch instead of a slow instruction sequence starting with "set"[e]. Mispredicted branches are expensive on some arches, but apparently they aren't on Athlons. The rand() in the test was intended to cause mispredicted branches as well as lengthy searches, but it doesn't actually. The branch is never taken since z[i] is never 0. On changing the initialization of z[i] so that the branch is taken every second time: if (i & 1) z[i] = 1 << rand(); the kernel version becomes much faster: 2.01 real 2.00 user 0.00 sys and the other times don't change significantly. This is presumably because the Athlon predicts taking the branch every second time perfectly. The bit-search instruction is very expensive (and always takes the same time??) and by branching over it every second time the cost per iteration is almost halved. A better benchmark might randomize the branches, but this might be evey further from real applications since an arg of 0 may be very unlikely (or very likely). Times on a Celeron 366: gcc builtin without cmov (very slow!): 15.78 real 15.68 user 0.00 gcc builtin with cmov: 5.64 real 5.61 user 0.00 kernel ffs(): 5.85 real 5.81 user 0.00 kernel ffs() with alternating 0's (again, others not affected by alternating): 5.62 real 5.58 user 0.00 Times on an amd64 (sledge = Opteron 244 1804 MHz) gcc builtin with cmov: 2.73 real 2.72 user 0.00 sys old kernel ffs(): 3.42 real 3.39 user 0.01 sys kernel ffs() with alternating 0's (again, builtin affected by alternating): 1.82 real 1.82 user 0.00 sys So using cmov is actually significtly better than a simple branch on amd64's, but only if the arg isn't often 0. > In the meantime, I'm going to hold my nose and commit an include > file to the port that is merely the inb/outb functions. This is > clearly a hack that should go away once a "correct" solution is found. This is approximately correct, not a hack. The system could provide a header that implements inb() and outb() functions for userland (*), but <machine/cpufunc.h> is not this header. It's just a bit much for multiple applications to have to duplicate these interfaces. (*) They shouldn't exist in the kernel. Bus-space should be used. Bruce > > >Er, this is really a port[s] problem [...] > >The system could provide >a header that implements inb() and outb() functions for userland (*), >but <machine/cpufunc.h> is not this header. > Other than duplicating the inb/outb code into places in the ports collection, there is no way that I can see for the ports collection to fix this problem; it involves some kind of change to the system headers. So, I'm saying that I agree with point (2) but that IMHO (2) is necessarily in conflict with (1). If you have some other suggestion about getting inb/outb functionality into the ports, please make it. (Fair warning: "rewrite or delete the apps" is not what I'm looking for :-) ... unless you're also willing to replace the junky old parallel-port peripherals that these ports talk to). State Changed From-To: open->closed A workaround for this problem was committed around 2 months ago. |