Bug 191593 - [fcntl] F_SETLK returns EDEADLK when it shouldn't - only F_SETLKW and waiting should return EDEADLK
Summary: [fcntl] F_SETLK returns EDEADLK when it shouldn't - only F_SETLKW and waiting...
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Only Me
Assignee: freebsd-bugs mailing list
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-04 00:50 UTC by Adrian Chadd
Modified: 2014-07-05 02:27 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Chadd freebsd_committer 2014-07-04 00:50:59 UTC
Hi!

I've seen sqlite3 crap out due to "disk IO error". It looks like the
F_SETFL path is returning EDEADLK when it shouldn't be - only the
"wait" version of this should be.

The kernel code looks to be:

lf_setlock() -> lf_add_outgoing() -> lf_add_edge() -> graph_add_edge()
-> EDEADLK

.. and lf_setlock() will return an error from lf_add_outgoing()
without checking if it's (a) EDEADLK, and (b) whether we're going to
sleep or not.

So, sqlite3 trips up on this. I'm sure other things do. What should
the correct thing be? It looks like EWOULDBLOCK is the correct value
to return for F_SETFL failing, not EDEADLK.

What do those-who-know-POSIX-standards-better-than-I think?

Thanks!
Comment 1 Adrian Chadd freebsd_committer 2014-07-04 01:46:32 UTC
Here's what I'm trying with:

adrian@test3:~/work/freebsd % svn diff stable/10/src/sys/kern/
Index: stable/10/src/sys/kern/kern_lockf.c
===================================================================
--- stable/10/src/sys/kern/kern_lockf.c (revision 267627)
+++ stable/10/src/sys/kern/kern_lockf.c (working copy)
@@ -1425,6 +1425,14 @@
                        if (lockf_debug & 1)
                                lf_print("lf_setlock: deadlock", lock);
 #endif
+
+                       /*
+                        * If the lock isn't waiting, return EAGAIN
+                        * rather than EDEADLK.
+                        */
+                       if (((lock->lf_flags & F_WAIT) == 0) &&
+                           (error == EDEADLK))
+                               error = EAGAIN;
                        lf_free_lock(lock);
                        goto out;
                }

.. this seems to fix the problem.
Comment 2 Adrian Chadd freebsd_committer 2014-07-04 02:18:02 UTC
.. and I mean F_SETLK.
Comment 3 Jilles Tjoelker freebsd_committer 2014-07-04 22:41:20 UTC
You are right that fcntl(F_SETLK) should not fail with [EDEADLK]; this error is only defined for fcntl(F_SETLKW). A non-blocking locking attempt cannot deadlock.

The proposed patch looks wrong, though. The above if ((lock->lf_flags & F_WAIT) == 0 && lock->lf_async_task == NULL) should already catch this case. Perhaps lf_async_task is set incorrectly?
Comment 4 Adrian Chadd freebsd_committer 2014-07-05 02:27:46 UTC
I'm still verifying it. I've added some printf()s to see what triggers and where.

Is it possible that the initial check will succeed but something will sneak in between that check and actually trying to add the lock? The actual sx lock isn't held for the duration of all of that.

I admit I'm not familiar at all in this code.