Bug 219124 - /var/db/services.db is extremely large for what it does
Summary: /var/db/services.db is extremely large for what it does
Status: New
Alias: None
Product: Base System
Classification: Unclassified
Component: misc (show other bugs)
Version: CURRENT
Hardware: Any Any
: --- Affects Many People
Assignee: freebsd-bugs mailing list
URL: https://reviews.freebsd.org/D9655
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-07 16:54 UTC by Sean Bruno
Modified: 2017-05-15 18:05 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Sean Bruno freebsd_committer 2017-05-07 16:54:30 UTC
% ls -l /var/db/services.db 
-rw-r--r--  1 root  wheel  2097920 Jan 29 04:05 /var/db/services.db


I've noted this in various places before, but there is now way that this file should be 2MB.

I've added a review to exclude installation of /var/db/services.db for use with small installs, but I think the "answer" is to investigate what exactly is causing this file to explode in size.
Comment 1 Conrad Meyer freebsd_committer 2017-05-07 17:09:12 UTC
87% of the file is zero bytes.

>>> f = file("/var/db/services.db").read()
>>> zb = 0
>>> for b in f:
...   if b == '\x00': zb += 1
...
>>> print zb
1836320

$ ls -l /var/db/services.db
-rw-r--r--  1 root  wheel  2097920 Jan 16 15:16 /var/db/services.db

1836320 / 2097920 = 0.87530506
Comment 2 Conrad Meyer freebsd_committer 2017-05-07 17:17:30 UTC
As far as: can Berkeley db be VACUUMed:  maybe?

http://stackoverflow.com/questions/8722687/berkeley-db-file-compression says yes for 5.x.

But I don't think 5.x is what we ship in base (libc).  There is a port: db5.
Comment 3 Sean Bruno freebsd_committer 2017-05-15 17:40:46 UTC
For entertainment, I ran services_mkdb against a services file with only 1 entry in it.  The file was still 2MB!
Comment 4 Sean Bruno freebsd_committer 2017-05-15 18:05:30 UTC
I'm guessing that this is an initialization problem.  The hash is being setup to handle way more elements than is really needed in here.

% wc -l /etc/services 
    2495 /etc/services


HASHINFO hinfo = {
        .bsize = 256,
        .ffactor = 4,
        .nelem = 32768,
        .cachesize = 1024,
        .hash = NULL,
        .lorder = 0
};



If I change the HASHINFO to be slightly less over engineered (and less future proof), I can get the *empty* services file down to 260k, but that's not really a huge improvement for a basically empty file.  Should it be that big?  I didn't really think I was going to have to go and learn berkley DB this week.  :-)


Index: services_mkdb.c
===================================================================
--- services_mkdb.c     (revision 318297)
+++ services_mkdb.c     (working copy)
@@ -68,10 +68,10 @@
 static void    usage(void);
 
 HASHINFO hinfo = {
-       .bsize = 256,
-       .ffactor = 4,
-       .nelem = 32768,
-       .cachesize = 1024,
+       .bsize = 48,
+       .ffactor = 1,
+       .nelem = 4096,
+       .cachesize = 256,
        .hash = NULL,
        .lorder = 0
 };


-rw-r--r--  1 sbruno  sbruno  262720 May 15 12:04 services.db