Bug 191593

Summary: [fcntl] F_SETLK returns EDEADLK when it shouldn't - only F_SETLKW and waiting should return EDEADLK
Product: Base System Reporter: Adrian Chadd <adrian>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: New ---    
Severity: Affects Only Me CC: jilles
Priority: ---    
Version: CURRENT   
Hardware: Any   
OS: Any   

Description Adrian Chadd freebsd_committer freebsd_triage 2014-07-04 00:50:59 UTC
Hi!

I've seen sqlite3 crap out due to "disk IO error". It looks like the
F_SETFL path is returning EDEADLK when it shouldn't be - only the
"wait" version of this should be.

The kernel code looks to be:

lf_setlock() -> lf_add_outgoing() -> lf_add_edge() -> graph_add_edge()
-> EDEADLK

.. and lf_setlock() will return an error from lf_add_outgoing()
without checking if it's (a) EDEADLK, and (b) whether we're going to
sleep or not.

So, sqlite3 trips up on this. I'm sure other things do. What should
the correct thing be? It looks like EWOULDBLOCK is the correct value
to return for F_SETFL failing, not EDEADLK.

What do those-who-know-POSIX-standards-better-than-I think?

Thanks!
Comment 1 Adrian Chadd freebsd_committer freebsd_triage 2014-07-04 01:46:32 UTC
Here's what I'm trying with:

adrian@test3:~/work/freebsd % svn diff stable/10/src/sys/kern/
Index: stable/10/src/sys/kern/kern_lockf.c
===================================================================
--- stable/10/src/sys/kern/kern_lockf.c (revision 267627)
+++ stable/10/src/sys/kern/kern_lockf.c (working copy)
@@ -1425,6 +1425,14 @@
                        if (lockf_debug & 1)
                                lf_print("lf_setlock: deadlock", lock);
 #endif
+
+                       /*
+                        * If the lock isn't waiting, return EAGAIN
+                        * rather than EDEADLK.
+                        */
+                       if (((lock->lf_flags & F_WAIT) == 0) &&
+                           (error == EDEADLK))
+                               error = EAGAIN;
                        lf_free_lock(lock);
                        goto out;
                }

.. this seems to fix the problem.
Comment 2 Adrian Chadd freebsd_committer freebsd_triage 2014-07-04 02:18:02 UTC
.. and I mean F_SETLK.
Comment 3 Jilles Tjoelker freebsd_committer freebsd_triage 2014-07-04 22:41:20 UTC
You are right that fcntl(F_SETLK) should not fail with [EDEADLK]; this error is only defined for fcntl(F_SETLKW). A non-blocking locking attempt cannot deadlock.

The proposed patch looks wrong, though. The above if ((lock->lf_flags & F_WAIT) == 0 && lock->lf_async_task == NULL) should already catch this case. Perhaps lf_async_task is set incorrectly?
Comment 4 Adrian Chadd freebsd_committer freebsd_triage 2014-07-05 02:27:46 UTC
I'm still verifying it. I've added some printf()s to see what triggers and where.

Is it possible that the initial check will succeed but something will sneak in between that check and actually trying to add the lock? The actual sx lock isn't held for the duration of all of that.

I admit I'm not familiar at all in this code.