Bug 213611 - ports-mgmt/pkg: 1.9.1 pkgdb locking doesn't work on NFS
Summary: ports-mgmt/pkg: 1.9.1 pkgdb locking doesn't work on NFS
Status: Closed FIXED
Alias: None
Product: Ports & Packages
Classification: Unclassified
Component: Individual Port(s) (show other bugs)
Version: Latest
Hardware: Any Any
: --- Affects Some People
Assignee: freebsd-pkg (Nobody)
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-19 13:16 UTC by Markus Wennrich
Modified: 2018-11-22 22:06 UTC (History)
11 users (show)

See Also:
bugzilla: maintainer-feedback? (pkg)


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Wennrich 2016-10-19 13:16:51 UTC
pkg info fails with following error, if /var/db/pkg is on NFS:

pkg: sqlite error while executing iterator in file pkgdb_iterator.c:1033: database is locked

ktrace:

 61912 pkg      CALL  setgroups(0x1,0x802780abc)
 61912 pkg      RET   setgroups 0
 61912 pkg      CALL  setegid(0xfffe)
 61912 pkg      RET   setegid 0
 61912 pkg      CALL  setgid(0xfffe)
 61912 pkg      RET   setgid 0
 61912 pkg      CALL  seteuid(0xfffe)
 61912 pkg      RET   seteuid 0
 61912 pkg      CALL  setuid(0xfffe)
 61912 pkg      RET   setuid 0
 61912 pkg      CALL  mkdir(0x80301c2e8,0x1ff<S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IWGRP|S_IXGRP|S_IROTH|S_IWOTH|S_IXOTH>)
 61912 pkg      NAMI  "/var/db/pkg/local.sqlite.lock"
 61912 pkg      RET   mkdir -1 errno 13 Permission denied

==> privileges get dropped and only afterwards mkdir /var/db/pkg/local.sqlite.lock is tried, which has to fail as user "nobody".



Verified with "chmod 777 /var/db/pkg/"

   drwxr-xr-x   2 nobody  nobody    4096 Oct 19 12:50 local.sqlite.lock



From src/info.c:

322
323         drop_privileges();
324         if (pkgdb_obtain_lock(db, PKGDB_LOCK_READONLY) != EPKG_OK) {
325                 pkgdb_close(db);

Probably wrong order of drop_privileges and pkgdb_obtain_lock?


This happens only on NFS because of this in src/pkgdb.c:

1026  /*
1027   * Fall back on unix-dotfile locking strategy if on a network filesystem
1028   */

(...)
1034   sqlite3_vfs_register(sqlite3_vfs_find("unix-dotfile"), 1);


(Our quick workaround was to comment out the whole "fall back on unix-dotfile on NFS" part. Probably not the best solution ...)
Comment 1 Baptiste Daroussin freebsd_committer freebsd_triage 2016-10-19 16:03:07 UTC
Your workaround is a good one if you have setup correctly nfs locking. The reason for the fallback is because most users do not.

I will fix asap.
I bet pkg audit has the same issue
Comment 2 Baptiste Daroussin freebsd_committer freebsd_triage 2016-12-08 09:30:50 UTC
Should be fixed, can you confirm? (in 1.9.4)
Comment 3 dornergd 2016-12-14 08:40:10 UTC
I have tested pkg 1.9.4 and still get the same error in combination with NFS:
"pkg: sqlite error while executing iterator in file pkgdb_iterator.c:1033: database is locked.

To get more information I activated the debug output:
# pkg -dddd info 
DBG(1)[22647]> pkg initialized
DBG(4)[22647]> Pkgdb: running 'PRAGMA user_version;'
DBG(4)[22647]> Pkgdb: executing 'PRAGMA foreign_keys = ON;'
DBG(4)[22647]> Pkgdb: preparing statement 'INSERT OR IGNORE INTO mtree(content) VALUES(?1)'
...
DBG(4)[22647]> Pkgdb: preparing statement 'INSERT OR IGNORE INTO requires(require) VALUES(?1)'
DBG(4)[22647]> Pkgdb: running 'SELECT id, origin, name, name as uniqueid, version, comment, desc, message, arch, maintainer, www, prefix, flatsize, licenselogic, automatic, locked, time, manifestdigest, vital FROM packages AS p ORDER BY p.name;'
pkg: sqlite error while executing iterator in file pkgdb_iterator.c:1033: database is locked

But when I start "pkg shell" and run the "SELECT"-statement there, I get a list
of all my installed packages,

Additional I found out that this problem exists since version 1.9.0.
Comment 4 Rodney W. Grimes freebsd_committer freebsd_triage 2017-02-21 16:43:32 UTC
I am still seeing this bug with pkg 1.9.4 on aarch64/rpi3.
I well attempt to verify that it still exists on amd64/i386.
Comment 5 Wolfgang Meyer 2017-03-02 12:06:29 UTC
The problem is still present with 1.10.0
Comment 6 rz-rpi03 2017-03-03 13:30:30 UTC
I am investigating the same problem.
NFS lockd and statd are up an running.

As far as I can tell sqlite uses the directory local.sqlite.lock in /var/db/pkg
for dotlocking /var/db/pkg/local.sqlite. But it never deletes it _there_.

Doing "pkg-static info" on an unlocked database shows the expected output
but leaves local.sqlite.lock in /var/db/pkg.

Next time you call pkg or pkg-static you will get the "database is locked"
error unless you are fast enough to remove /var/db/pkg/local.sqlite while
pkg is running.

The following part of ktrace/kdump shows the locking is done with a relative
path. So it happens in /var/db/pkg.
But the unlocking is done with a absolute path. An this path does not match
the relative one used for locking.

 95575 pkg-static CALL  stat(0xbfbfe150,0xbfbfe0f0)
 95575 pkg-static NAMI  "/var/db/pkg"
 95575 pkg-static STRU  struct stat {dev=973143811, ino=913835, mode=040755, nlink=2, uid=0, gid=0, rdev=0, atime=1488515387.614408874, mtime=1488543581.397913666, ctime=1488543581.397913666, birthtime=-1, size=4096, blksize=4096, blocks=8, flags=0x0 }
 95575 pkg-static RET   stat 0
 95575 pkg-static CALL  eaccess(0xbfbfe150,0x4<R_OK>)
 95575 pkg-static NAMI  "/var/db/pkg"
 95575 pkg-static RET   eaccess 0
 95575 pkg-static CALL  stat(0xbfbfe150,0xbfbfe0f0)
 95575 pkg-static NAMI  "/var/db/pkg/local.sqlite"
 95575 pkg-static STRU  struct stat {dev=973143811, ino=913839, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1488508880.607008273, mtime=1488508814.956928131, ctime=1488508814.956928131, birthtime=-1, size=64334848, blksize=4096, blocks=125672, flags=0x0 }
 95575 pkg-static RET   stat 0
 95575 pkg-static CALL  eaccess(0xbfbfe150,0x4<R_OK>)
 95575 pkg-static NAMI  "/var/db/pkg/local.sqlite"
 95575 pkg-static RET   eaccess 0
 95575 pkg-static CALL  openat(AT_FDCWD,0x2880f0e0,0x120000<O_RDONLY|O_DIRECTORY|O_CLOEXEC>)
 95575 pkg-static NAMI  "/var/db/pkg"
 95575 pkg-static RET   openat 4
 95575 pkg-static CALL  faccessat(0x4,0x83ecfb0,0x4<R_OK>,0x100)
 95575 pkg-static NAMI  "local.sqlite"
 95575 pkg-static RET   faccessat 0
 95575 pkg-static CALL  statfs(0x2880f0e0,0xbfbfe358)
 95575 pkg-static NAMI  "/var/db/pkg"
 95575 pkg-static RET   statfs 0
 95575 pkg-static CALL  lstat(0x2881f728,0xbfbfe390)
 95575 pkg-static NAMI  "local.sqlite"
 95575 pkg-static RET   lstat -1 errno 2 No such file or directory
 95575 pkg-static CALL  __getcwd(0x28848008,0x1ff)
 95575 pkg-static NAMI  "/root"
 95575 pkg-static RET   __getcwd 0
 95575 pkg-static CALL  getpid
 95575 pkg-static RET   getpid 95575/0x17557
 95575 pkg-static CALL  getpid
 95575 pkg-static RET   getpid 95575/0x17557
 95575 pkg-static CALL  fstatat(0x4,0x2884d206,0xbfbfdfe0,0)
 95575 pkg-static NAMI  "local.sqlite"
 95575 pkg-static STRU  struct stat {dev=973143811, ino=913839, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1488508880.607008273, mtime=1488508814.956928131, ctime=1488508814.956928131, birthtime=-1, size=64334848, blksize=4096, blocks=125672, flags=0x0 }
 95575 pkg-static RET   fstatat 0
 95575 pkg-static CALL  openat(0x4,0x2884d206,0x100202<O_RDWR|O_CREAT|O_CLOEXEC>,0644<S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH>)
 95575 pkg-static NAMI  "local.sqlite"
 95575 pkg-static RET   openat 5
 95575 pkg-static CALL  fstat(0x5,0xbfbfde88)
 95575 pkg-static STRU  struct stat {dev=973143811, ino=913839, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1488508880.607008273, mtime=1488508814.956928131, ctime=1488508814.956928131, birthtime=-1, size=64334848, blksize=4096, blocks=125672, flags=0x0 }
 95575 pkg-static RET   fstat 0
 95575 pkg-static CALL  pread(0x5,0xbfbfe460,0x64,0,0)
 95575 pkg-static GIO   fd 5 read 100 bytes
       0x0000 5351 4c69 7465 2066 6f72 6d61 7420 3300 0400 0101 0040 2020 0007 6598 0000 f56b 0000 766c 0000 1565 0000 1e32  |SQLite format 3......@  ..e....k..vl...e...2|
[bytes are skipped]
 95575 pkg-static RET   pread 100/0x64
[lock the database]
 95575 pkg-static CALL  mkdirat(0x4,0x2881f76e,0777<S_IRUSR|S_IWUSR|S_IXUSR|S_IRGRP|S_IWGRP|S_IXGRP|S_IROTH|S_IWOTH|S_IXOTH>)
 95575 pkg-static NAMI  "local.sqlite.lock"
 95575 pkg-static RET   mkdirat 0
 95575 pkg-static CALL  fstatat(0x4,0x2884d21a,0xbfbfe058,0)
 95575 pkg-static NAMI  "local.sqlite-journal"
 95575 pkg-static RET   fstatat -1 errno 2 No such file or directory
 95575 pkg-static CALL  fstat(0x5,0xbfbfe058)
 95575 pkg-static STRU  struct stat {dev=973143811, ino=913839, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1488543655.118915624, mtime=1488508814.956928131, ctime=1488508814.956928131, birthtime=-1, size=64334848, blksize=4096, blocks=125672, flags=0x0 }
 95575 pkg-static RET   fstat 0
 95575 pkg-static CALL  fstatat(0x4,0x2884d235,0xbfbfe058,0)
 95575 pkg-static NAMI  "local.sqlite-wal"
 95575 pkg-static RET   fstatat -1 errno 2 No such file or directory
 95575 pkg-static CALL  fstat(0x5,0xbfbfe058)
 95575 pkg-static STRU  struct stat {dev=973143811, ino=913839, mode=0100644, nlink=1, uid=0, gid=0, rdev=0, atime=1488543655.118915624, mtime=1488508814.956928131, ctime=1488508814.956928131, birthtime=-1, size=64334848, blksize=4096, blocks=125672, flags=0x0 }
 95575 pkg-static RET   fstat 0
 95575 pkg-static CALL  pread(0x5,0x2889ce28,0x400,0,0)
 95575 pkg-static GIO   fd 5 read 1024 bytes
       0x0000 5351 4c69 7465 2066 6f72 6d61 7420 3300 0400 0101 0040 2020 0007 6598 0000 f56b 0000 766c 0000 1565 0000 1e32  |SQLite format 3......@  ..e....k..vl...e...2|
[bytes are skipped]
 95575 pkg-static RET   pread 1024/0x400
[unlock the database]
 95575 pkg-static CALL  rmdir(0x2881f768)
 95575 pkg-static NAMI  "/root/local.sqlite.lock"
 95575 pkg-static RET   rmdir -1 errno 2 No such file or directory
 95575 pkg-static CALL  utimes(0x2881f768,0)
 95575 pkg-static NAMI  "/root/local.sqlite.lock"
 95575 pkg-static RET   utimes -1 errno 2 No such file or directory
 95575 pkg-static CALL  fstatat(0x4,0x2884d21a,0xbfbfdf08,0)
 95575 pkg-static NAMI  "local.sqlite-journal"
 95575 pkg-static RET   fstatat -1 errno 2 No such file or directory
 95575 pkg-static CALL  fstat(0x5,0xbfbfdf08)


Ralf
Comment 7 Wolfgang Meyer 2017-03-30 09:09:32 UTC
I am still seeing this bug with pkg 1.10.1.
Comment 8 Markus Wennrich 2017-05-03 11:28:44 UTC
Just saw in the commit https://github.com/freebsd/pkg/commit/6ed10aa80256b340b4e5c422438c435e7aa70c45 that you have to set "NFS_WITH_PROPER_LOCKING" for the fix to work.


    root@server:/ # pkg info
    pkg: sqlite error while executing PRAGMA user_version; in file pkgdb.c:2442: database is locked

    root@server:/ # echo "NFS_WITH_PROPER_LOCKING = true;" >> /usr/local/etc/pkg.conf

    root@server:/ # pkg info
    autoconf-2.69_1                Automatically configure source code on many Un*x platforms
    (...)

==> I can confirm, that with "NFS_WITH_PROPER_LOCKING" set, pkg locking on NFS works as expected. 

But honestly, I don't see a reason for keeping the unix-dotfile code, since it doesn't work with drop_privileges. As far as I can see, everbody who runs into the unix-dotfile part has to set NFS_WITH_PROPER_LOCKING anyway. No?
Comment 9 Walter Schwarzenfeld freebsd_triage 2018-02-07 15:09:14 UTC
Feedback from reporter please.
Comment 10 Markus Wennrich 2018-02-09 07:03:39 UTC
"fixed" with echo "NFS_WITH_PROPER_LOCKING = true;" >> /usr/local/etc/pkg.conf
Comment 11 Markus Wennrich 2018-02-09 07:03:57 UTC
"fixed" with echo "NFS_WITH_PROPER_LOCKING = true;" >> /usr/local/etc/pkg.conf
Comment 12 Michael Dexter 2018-02-12 19:53:02 UTC
I am seeing what might be a related issue with pkg 1.10.3 with an NFS booted system.

Scenario:

Populate the client's /var/cache/pkg with a known-good set of packages, including pkg itself.

pkg-static add -f /var/cache/pkg/pkg-1.10.3_1.txz

pkg-static: sqlite error while executing PRAGMA user_version; in file pkgdb.c:2446: disk I/O error

I have NFS_WITH_PROPER_LOCKING = "true"; in /usr/local/etc/pkg.conf

I have lockd running on the NFS server.

Adding a tmpfs mount to /var/db/pkg allows it to work exactly as expected.

Any suggestions?
Comment 13 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2018-11-22 22:06:45 UTC
(this is high on Google search results page, so updating with the lates state of the problem)

NFS_WITH_PROPER_LOCKING was removed: https://github.com/freebsd/pkg/commit/655ddffae6ae4adc2820736a33736a99dea8dfc3

I was able to fix NFS locking problem by enabling lockd on *both* server and client:

sysrc rpc_lockd_enable=YES
service lockd start