I've seen sqlite3 crap out due to "disk IO error". It looks like the
F_SETFL path is returning EDEADLK when it shouldn't be - only the
"wait" version of this should be.
The kernel code looks to be:
lf_setlock() -> lf_add_outgoing() -> lf_add_edge() -> graph_add_edge()
.. and lf_setlock() will return an error from lf_add_outgoing()
without checking if it's (a) EDEADLK, and (b) whether we're going to
sleep or not.
So, sqlite3 trips up on this. I'm sure other things do. What should
the correct thing be? It looks like EWOULDBLOCK is the correct value
to return for F_SETFL failing, not EDEADLK.
What do those-who-know-POSIX-standards-better-than-I think?
Here's what I'm trying with:
adrian@test3:~/work/freebsd % svn diff stable/10/src/sys/kern/
--- stable/10/src/sys/kern/kern_lockf.c (revision 267627)
+++ stable/10/src/sys/kern/kern_lockf.c (working copy)
@@ -1425,6 +1425,14 @@
if (lockf_debug & 1)
lf_print("lf_setlock: deadlock", lock);
+ * If the lock isn't waiting, return EAGAIN
+ * rather than EDEADLK.
+ if (((lock->lf_flags & F_WAIT) == 0) &&
+ (error == EDEADLK))
+ error = EAGAIN;
.. this seems to fix the problem.
.. and I mean F_SETLK.
You are right that fcntl(F_SETLK) should not fail with [EDEADLK]; this error is only defined for fcntl(F_SETLKW). A non-blocking locking attempt cannot deadlock.
The proposed patch looks wrong, though. The above if ((lock->lf_flags & F_WAIT) == 0 && lock->lf_async_task == NULL) should already catch this case. Perhaps lf_async_task is set incorrectly?
I'm still verifying it. I've added some printf()s to see what triggers and where.
Is it possible that the initial check will succeed but something will sneak in between that check and actually trying to add the lock? The actual sx lock isn't held for the duration of all of that.
I admit I'm not familiar at all in this code.