Bug 193705

Summary: www/squid: transparent pf didn't work, reflect that AUFS is actually threaded now, move out diskd and rock (both unstable)
Product: Ports & Packages Reporter: Volodymyr Kostyrko <arcade>
Component: Individual Port(s)Assignee: Guido Falsi <madpilot>
Status: Closed FIXED    
Severity: Affects Some People CC: freebsd, kib, madpilot, marino, takefu, timp87
Priority: ---    
Version: Latest   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
squid.diff
none
squid2.diff none

Description Volodymyr Kostyrko 2014-09-17 11:11:21 UTC
Created attachment 147394 [details]
squid.diff

1. For trasnparent pf proxying with-nat-devpf should be enabled also. One thing I missed in the patch is that documentation is wrong, http_port should be marked as `transparent` not `intercept`.

2. AUFS is only named asynchronous, actually it relies on threading. http://etutorials.org/Server+Administration/Squid.+The+definitive+guide/Chapter+8.+Advanced+Disk+Cache+Topics/8.4+The+aufs+Storage+Scheme/ I know this is weird but you can check it - enabling aufs brings pthreads in.

3. I previously tested DiskD and Rock on 3.3 and haven't noticed any big changes to them. Probably that's not quite right to change default by disabling Rock but I can't get it working right now.

BONUS: can't find why but squid doesn't stops on `service squid stop`, only child exists and master stays and spawns another child.
Comment 1 John Marino freebsd_committer 2014-09-17 16:17:19 UTC
timp87, takefu: Do you concur with this patch?
Comment 2 Volodymyr Kostyrko 2014-09-17 16:34:23 UTC
Actually I thinking about disabling FS_DISKD by default too. I enabled it briefly on one of my hosts and in next hour I start getting this:

2014/09/17 15:42:50 kid1| storeDiskdSend: msgsnd: (35) Resource temporarily unavailable
2014/09/17 15:42:50 kid1| storeDiskdSend: msgsnd: (35) Resource temporarily unavailable
2014/09/17 15:42:50 kid1| storeDiskdSend CLOSE: (35) Resource temporarily unavailable
2014/09/17 15:42:50 kid1| assertion failed: ufs/UFSStoreState.cc:76: "!(closing ||opening)"
2014/09/17 15:42:53| assertion failed: Kid.cc:43: "cpid > 0"

After this squid restart and spawns another diskd process with previous diskd hanging in memory.
Comment 3 timp87 2014-09-17 17:08:23 UTC
arcade, did you add any parameters to /boot/loader.conf when used diskd?
Comment 4 timp87 2014-09-17 17:35:25 UTC
(In reply to timp87 from comment #3)
> arcade, did you add any parameters to /boot/loader.conf when used diskd?

See http://wiki.squid-cache.org/Features/DiskDaemon. That described options for kernel can be set using appropriate sysctls through /boot/loader.conf.
Comment 5 timp87 2014-09-17 17:41:43 UTC
(In reply to arcade from comment #0)
> Created attachment 147394 [details]
> squid.diff
> 
> 1. For trasnparent pf proxying with-nat-devpf should be enabled also. One
> thing I missed in the patch is that documentation is wrong, http_port should
> be marked as `transparent` not `intercept`.

I've no opinion here, I don't use pf at all, sorry.

> 
> 2. AUFS is only named asynchronous, actually it relies on threading.
> http://etutorials.org/Server+Administration/Squid.+The+definitive+guide/
> Chapter+8.+Advanced+Disk+Cache+Topics/8.4+The+aufs+Storage+Scheme/ I know
> this is weird but you can check it - enabling aufs brings pthreads in.

Agree, you're right.

> 3. I previously tested DiskD and Rock on 3.3 and haven't noticed any big
> changes to them.

I think it depends on how many users you have and what kind of load they generate.

> Probably that's not quite right to change default by
> disabling Rock but I can't get it working right now.

I didn't try rock. I think most people use traditional ufs. IMO it's not a problem to disable it by default.

> BONUS: can't find why but squid doesn't stops on `service squid stop`, only child exists and master stays and spawns another child.

Did you use diskd then?
Comment 6 Volodymyr Kostyrko 2014-09-17 20:38:37 UTC
(In reply to timp87 from comment #5)
> > 1. For trasnparent pf proxying with-nat-devpf should be enabled also. One
> > thing I missed in the patch is that documentation is wrong, http_port should
> > be marked as `transparent` not `intercept`.
> 
> I've no opinion here, I don't use pf at all, sorry.

For transparent proxying squid requires extra data about where the connection was originally pointing before it was redirected at the packet filter level. For PF this means squid needs access to /dev/pf (personally I do this by allowing group network rw rights on pf and adding squid to that group). This doesn't work if squid has no code to open /dev/pf (hence with-nat-devpf).

The part with transparent/inercept historically squid started with `transparent` and few versions ago (between 3.1 and 3.3) squid starts to refuse this option but was correctly working with `intercept`. Right now in 3.4 I see that incoming requests are processed as targeted at localhost (no nat info) if incoming http_port if labeled as `intercept`. Squid was also refusing to accept direct connections on `intercept` port.

> > 3. I previously tested DiskD and Rock on 3.3 and haven't noticed any big
> > changes to them.
> I think it depends on how many users you have and what kind of load they
> generate.

Me plus 2 phones over wi-fi. I'd say meager load.

I checked the link you provided. Yes, I hadn't changed my IPC configuration. I altered it this way:

kern.ipc.msgmnb=8192 # was 2k
kern.ipc.msgssz=64   # was 8
kern.ipc.msgtql=2048 # was 40

I'll report after some testing...

> > Probably that's not quite right to change default by
> > disabling Rock but I can't get it working right now.
> 
> I didn't try rock. I think most people use traditional ufs. IMO it's not a
> problem to disable it by default.

It just fails for me from the start.

cache_dir ufs /var/squid/cache 65536 16 256 min-size=65536
cache_dir rock /var/squid/cacher 8192 max-size=65535

# squid -z
...
2014/09/17 23:09:00 kid2| Set Current Directory to /var/squid/
2014/09/17 23:09:00 kid2| Creating missing swap directories
2014/09/17 23:09:00 kid2| Skipping existing Rock db: /var/squid/cacher/rock
...

# service squid start

After that process that works with rock storage constantly dies:

2014/09/17 23:16:24 kid2| Squid plugin modules loaded: 0
2014/09/17 23:16:24 kid2| Loading cache_dir #1 from /var/squid/cacher/rock
2014/09/17 23:16:24 kid2| Store rebuilding is 0.10% complete
...
FATAL: kid2 registration timed out
Squid Cache (Version 3.4.7): Terminated abnormally.
CPU Usage: 4.559 seconds = 0.323 user + 4.236 sys
Maximum Resident Size: 167072 KB
Page faults with physical i/o: 0

and then:

FATAL: kid1 registration timed out
Squid Cache (Version 3.4.7): Terminated abnormally.
CPU Usage: 0.147 seconds = 0.057 user + 0.090 sys
Maximum Resident Size: 169920 KB
Page faults with physical i/o: 0

So they keep dying one after another and that's all...

> > BONUS: can't find why but squid doesn't stops on `service squid stop`,
> > only child exists and master stays and spawns another child.
> Did you use diskd then?

No. I used log_file_daemon but tuning it off doesn't help. Looks like this:

2014/09/17 23:24:36 kid1| storeLateRelease: released 0 objects
2014/09/17 23:24:45| Set Current Directory to /var/squid/
FATAL: Received Segment Violation...dying.
2014/09/17 23:24:45 kid1| Closing HTTP port [::]:8080
2014/09/17 23:24:45 kid1| Closing HTTP port [::]:3128
2014/09/17 23:24:45 kid1| storeDirWriteCleanLogs: Starting...
2014/09/17 23:24:45 kid1|   Finished.  Wrote 4197 entries.
2014/09/17 23:24:45 kid1|   Took 0.00 seconds (3211170.62 entries/sec).
CPU Usage: 0.125 seconds = 0.050 user + 0.075 sys
Maximum Resident Size: 167344 KB
Page faults with physical i/o: 0

Sep 17 23:24:35 limbo squid[33811]: Squid Parent: (squid-1) process 33814 started
Sep 17 23:24:46 limbo squid[33811]: Squid Parent: (squid-1) process 33814 exited due to signal 6 with status 0
Sep 17 23:24:46 limbo kernel: pid 33814 (squid), uid 100: exited on signal 6 (core dumped)
Sep 17 23:24:49 limbo squid[33811]: Squid Parent: (squid-1) process 33836 started

So the kid1 is simply restarted.

Backtrace on the core shows this:

#8  <signal handler called>
No symbol table info available.
#9  0x00000008037fbbda in kevent () from /lib/libc.so.7
No symbol table info available.
#10 0x000000000067076d in Comm::DoSelect ()
No symbol table info available.
#11 0x000000000062157e in CommSelectEngine::checkEvents ()
No symbol table info available.
#12 0x000000000050e6f7 in EventLoop::checkEngine ()
No symbol table info available.
#13 0x000000000050ea6c in EventLoop::runOnce ()
No symbol table info available.
#14 0x000000000050e998 in EventLoop::run ()
No symbol table info available.
#15 0x000000000056d3af in SquidMain ()
No symbol table info available.
#16 0x000000000056b92c in main ()
No symbol table info available.
Comment 7 timp87 2014-09-18 07:11:47 UTC
(In reply to arcade from comment #6)
> (In reply to timp87 from comment #5)
> > > 1. For trasnparent pf proxying with-nat-devpf should be enabled also. One
> > > thing I missed in the patch is that documentation is wrong, http_port should
> > > be marked as `transparent` not `intercept`.
> > 
> > I've no opinion here, I don't use pf at all, sorry.
> 
> For transparent proxying squid requires extra data about where the
> connection was originally pointing before it was redirected at the packet
> filter level. For PF this means squid needs access to /dev/pf (personally I
> do this by allowing group network rw rights on pf and adding squid to that
> group). This doesn't work if squid has no code to open /dev/pf (hence
> with-nat-devpf).
> 
> The part with transparent/inercept historically squid started with
> `transparent` and few versions ago (between 3.1 and 3.3) squid starts to
> refuse this option but was correctly working with `intercept`. Right now in
> 3.4 I see that incoming requests are processed as targeted at localhost (no
> nat info) if incoming http_port if labeled as `intercept`. Squid was also
> refusing to accept direct connections on `intercept` port.

I believe you have more experience with pf and squid.

> 
> > > 3. I previously tested DiskD and Rock on 3.3 and haven't noticed any big
> > > changes to them.
> > I think it depends on how many users you have and what kind of load they
> > generate.
> 
> Me plus 2 phones over wi-fi. I'd say meager load.
> 
> I checked the link you provided. Yes, I hadn't changed my IPC configuration.
> I altered it this way:
> 
> kern.ipc.msgmnb=8192 # was 2k
> kern.ipc.msgssz=64   # was 8
> kern.ipc.msgtql=2048 # was 40
> 
> I'll report after some testing...


 
> > > Probably that's not quite right to change default by
> > > disabling Rock but I can't get it working right now.
> > 
> > I didn't try rock. I think most people use traditional ufs. IMO it's not a
> > problem to disable it by default.
> 
> It just fails for me from the start.
> 
> cache_dir ufs /var/squid/cache 65536 16 256 min-size=65536
> cache_dir rock /var/squid/cacher 8192 max-size=65535
> 
> # squid -z
> ...
> 2014/09/17 23:09:00 kid2| Set Current Directory to /var/squid/
> 2014/09/17 23:09:00 kid2| Creating missing swap directories
> 2014/09/17 23:09:00 kid2| Skipping existing Rock db: /var/squid/cacher/rock
> ...
> 
> # service squid start
> 
> After that process that works with rock storage constantly dies:
> 
> 2014/09/17 23:16:24 kid2| Squid plugin modules loaded: 0
> 2014/09/17 23:16:24 kid2| Loading cache_dir #1 from /var/squid/cacher/rock
> 2014/09/17 23:16:24 kid2| Store rebuilding is 0.10% complete
> ...
> FATAL: kid2 registration timed out
> Squid Cache (Version 3.4.7): Terminated abnormally.
> CPU Usage: 4.559 seconds = 0.323 user + 4.236 sys
> Maximum Resident Size: 167072 KB
> Page faults with physical i/o: 0
> 
> and then:
> 
> FATAL: kid1 registration timed out
> Squid Cache (Version 3.4.7): Terminated abnormally.
> CPU Usage: 0.147 seconds = 0.057 user + 0.090 sys
> Maximum Resident Size: 169920 KB
> Page faults with physical i/o: 0
> 
> So they keep dying one after another and that's all...

Same behaviour if there is only one storage scheme (rock) in config? I mean no ufs, only one rock.


> > > BONUS: can't find why but squid doesn't stops on `service squid stop`,
> > > only child exists and master stays and spawns another child.
> > Did you use diskd then?
> 
> No. I used log_file_daemon but tuning it off doesn't help. Looks like this:
> 
> 2014/09/17 23:24:36 kid1| storeLateRelease: released 0 objects
> 2014/09/17 23:24:45| Set Current Directory to /var/squid/
> FATAL: Received Segment Violation...dying.
> 2014/09/17 23:24:45 kid1| Closing HTTP port [::]:8080
> 2014/09/17 23:24:45 kid1| Closing HTTP port [::]:3128
> 2014/09/17 23:24:45 kid1| storeDirWriteCleanLogs: Starting...
> 2014/09/17 23:24:45 kid1|   Finished.  Wrote 4197 entries.
> 2014/09/17 23:24:45 kid1|   Took 0.00 seconds (3211170.62 entries/sec).
> CPU Usage: 0.125 seconds = 0.050 user + 0.075 sys
> Maximum Resident Size: 167344 KB
> Page faults with physical i/o: 0
> 
> Sep 17 23:24:35 limbo squid[33811]: Squid Parent: (squid-1) process 33814
> started
> Sep 17 23:24:46 limbo squid[33811]: Squid Parent: (squid-1) process 33814
> exited due to signal 6 with status 0
> Sep 17 23:24:46 limbo kernel: pid 33814 (squid), uid 100: exited on signal 6
> (core dumped)
> Sep 17 23:24:49 limbo squid[33811]: Squid Parent: (squid-1) process 33836
> started
> 
> So the kid1 is simply restarted.
> 
> Backtrace on the core shows this:
> 
> #8  <signal handler called>
> No symbol table info available.
> #9  0x00000008037fbbda in kevent () from /lib/libc.so.7
> No symbol table info available.
> #10 0x000000000067076d in Comm::DoSelect ()
> No symbol table info available.
> #11 0x000000000062157e in CommSelectEngine::checkEvents ()
> No symbol table info available.
> #12 0x000000000050e6f7 in EventLoop::checkEngine ()
> No symbol table info available.
> #13 0x000000000050ea6c in EventLoop::runOnce ()
> No symbol table info available.
> #14 0x000000000050e998 in EventLoop::run ()
> No symbol table info available.
> #15 0x000000000056d3af in SquidMain ()
> No symbol table info available.
> #16 0x000000000056b92c in main ()
> No symbol table info available.

Don't understand. Diskd is a storage scheme, but log_file_daemon is just daemon, that log messages from squid (access.log). Anyway, I'm not strong in C++, and don't know how to deal with backtrace, I'm sorry.
Comment 8 John Marino freebsd_committer 2014-09-18 07:16:10 UTC
I think all the backtrace is showing is that the libraries have had their symbols stripped, which is the normal default behavior of libraries installed by ports.
Comment 9 timp87 2014-09-18 07:47:41 UTC
In general I don't mind proposed fix.
Comment 10 timp87 2014-09-18 07:48:23 UTC
(In reply to timp87 from comment #9)
> In general I don't mind proposed fix.

diff, not fix =)
Comment 11 Volodymyr Kostyrko 2014-09-18 12:22:40 UTC
(In reply to timp87 from comment #7)
> > > > 3. I previously tested DiskD and Rock on 3.3 and haven't noticed any big
> > > > changes to them.
> > > I think it depends on how many users you have and what kind of load they
> > > generate.
> > 
> > Me plus 2 phones over wi-fi. I'd say meager load.
> > 
> > I checked the link you provided. Yes, I hadn't changed my IPC configuration.
> > I altered it this way:
> > 
> > kern.ipc.msgmnb=8192 # was 2k
> > kern.ipc.msgssz=64   # was 8
> > kern.ipc.msgtql=2048 # was 40
> > 
> > I'll report after some testing...

Looks like it survived one night without any problems. So the actual source of problems with diskd was just my misconfiguration.

> > So they keep dying one after another and that's all...
> 
> Same behaviour if there is only one storage scheme (rock) in config? I mean
> no ufs, only one rock.

Yes.

Spotted another variant:

FATAL: Rock cache_dir at /var/squid/cacher/rock failed to open db file: (40) Message too long
Squid Cache (Version 3.4.7): Terminated abnormally.
CPU Usage: 0.086 seconds = 0.047 user + 0.039 sys
Maximum Resident Size: 2622128 KB
Page faults with physical i/o: 0


> > > > BONUS: can't find why but squid doesn't stops on `service squid stop`,
> > > > only child exists and master stays and spawns another child.
> > > Did you use diskd then?
> > 
> > No. I used log_file_daemon but tuning it off doesn't help. Looks like this:
>
> Don't understand. Diskd is a storage scheme, but log_file_daemon is just
> daemon, that log messages from squid (access.log). Anyway, I'm not strong in
> C++, and don't know how to deal with backtrace, I'm sorry.

It can fail without log_file_daemon too. I'm putting a better backtrace here in case someone would like to investigate:

#8  <signal handler called>
No symbol table info available.
#9  0x00000008037fcbda in kevent () from /lib/libc.so.7
No symbol table info available.
#10 0x00000000006707bd in Comm::DoSelect (msec=<value optimized out>) at ModKqueue.cc:264
        poll_time = {tv_sec = 0, tv_nsec = 623000000}
        num = <value optimized out>
#11 0x00000000006215be in CommSelectEngine::checkEvents (this=<value optimized out>, timeout=623) at comm.cc:2058
        last_timeout = 1411042331
#12 0x000000000050e727 in EventLoop::checkEngine (this=0x7fffffffe938, engine=0x7fffffffe918, primary=<value optimized out>) at EventLoop.cc:55
        requested_delay = 4
#13 0x000000000050ea9c in EventLoop::runOnce (this=0x7fffffffe938) at EventLoop.cc:129
        waitingEngine = <value optimized out>
#14 0x000000000050e9c8 in EventLoop::run (this=0x7fffffffe938) at EventLoop.cc:99
No locals.
#15 0x000000000056d3df in SquidMain (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1528
        mainLoop = {errcount = 0, last_loop = false, engines = {capacity = 16, count = 4, items = 0x8141a9980}, timeService = 0x7fffffffe910, primaryEngine = 0x7fffffffe918, loop_delay = 623, error = false, runOnceResult = false}
        signalEngine = {<AsyncEngine> = {_vptr$AsyncEngine = 0x6b2d30}, loop = @0x7fffffffe938}
        store_engine = {<AsyncEngine> = {_vptr$AsyncEngine = 0x6b4320}, <No data fields>}
        comm_engine = {<AsyncEngine> = {_vptr$AsyncEngine = 0x96c800}, <No data fields>}
        time_engine = {_vptr$TimeEngine = 0x6c1290}
#16 0x000000000056b95c in main (argc=<value optimized out>, argv=<value optimized out>) at main.cc:1260
No locals.
Comment 12 timp87 2014-09-18 12:28:31 UTC
Hi Dennis! I'm sorry to bother you! If you time or interested in this PR you could help us. If not - just remove yourself from CC List.
Comment 13 Guido Falsi freebsd_committer 2014-09-18 18:44:35 UTC
*** Bug 193752 has been marked as a duplicate of this bug. ***
Comment 14 Guido Falsi freebsd_committer 2014-09-18 18:47:32 UTC
(In reply to Guido Falsi from comment #13)
> *** Bug 193752 has been marked as a duplicate of this bug. ***

In relation to this, the TP_PF_CONFIGURE_WITH patch (only that one line) could be committed, if it fixes the pf problem.

Any objection if I do that?
Comment 15 John Marino freebsd_committer 2014-09-18 19:04:16 UTC
Guido, I don't mind if you want to reassign this PR to yourself and do what you think is best.  Due to $REALJOB, I haven't been following it very closely TBH.
Comment 16 Guido Falsi freebsd_committer 2014-09-18 20:23:44 UTC
I'm studying the other changes (apart from the pf one).

I have a few questions I need answered to choose how to deal with these matters:

is diskd stable?

(it used to be a mess in the 2.7 and 3.2 days, but maybe it has been fixed)

If it is stable and usable it can be "promoted" default as proposed.

Rock instead appears to be unstable, so I'd rather keep it marked ass such and not default.

Do you agree?
Comment 17 Volodymyr Kostyrko 2014-09-18 20:40:57 UTC
(In reply to Guido Falsi from comment #16)
> I'm studying the other changes (apart from the pf one).
> 
> I have a few questions I need answered to choose how to deal with these
> matters:
> 
> is diskd stable?

I tested it only on a small load and yes, after tuning kern.ipc at /boot/loader.conf diskd works like a charm. Probably absence of quick notes in installation message/squid.conf.documented makes everyone think that diskd should work as is. So I think it can be mistakenly taken as unstable in previous versions too.
Comment 18 Guido Falsi freebsd_committer 2014-09-18 20:48:27 UTC
(In reply to arcade from comment #17)
> (In reply to Guido Falsi from comment #16)
> > is diskd stable?
> 
> I tested it only on a small load and yes, after tuning kern.ipc at
> /boot/loader.conf diskd works like a charm. Probably absence of quick notes
> in installation message/squid.conf.documented makes everyone think that
> diskd should work as is. So I think it can be mistakenly taken as unstable
> in previous versions too.

I understand.

I think adding a quick note in pkg-message about need to tune kern.ipc if using diskd is needed then.

Could you cook up something, since you appear to have an idea of what should be in there?

Thanks.
Comment 19 Volodymyr Kostyrko 2014-09-19 20:01:20 UTC
Created attachment 147486 [details]
squid2.diff

Added a note about tweaking kern.ipc for DiskD to work. I also added a link to the original doc in case something goes bad. I added only three sysctls that should be changed, other two are already at recommended values since 1999.

Included one more fix - bumped version. Sry, can't restrain myself :)
Comment 20 Guido Falsi freebsd_committer 2014-09-19 20:46:38 UTC
Committed. Thanks for the fast feedback!
Comment 21 commit-hook freebsd_committer 2014-09-19 20:47:08 UTC
A commit references this bug:

Author: madpilot
Date: Fri Sep 19 20:46:14 UTC 2014
New revision: 368596
URL: http://svnweb.freebsd.org/changeset/ports/368596

Log:
  - Make diskd and rock storage schemes options (diskd enabled by default) [1]
  - Add note in pkg-message about diskd needing some tuning [1]
  - Fix transparent proxy with PF [1] [2]
  - Fix a pair of typos [1]

  PR:		193705 [1], 193752 [2]
  Submitted by:	arcade@b1t.name [1], kib@ [2]

Changes:
  head/www/squid/Makefile
  head/www/squid/files/pkg-message.in
  head/www/squid/pkg-plist