Since a few days (I believe since the upgrade of mesa to 17.1.0) VBoxSVC continuously consumes 100% CPU and does not shut down after all sessions have closed. This happens if any graphical window is used (VirtualBox). If using only VBoxHeadless, VBoxSvc behaves normally. Also, this happens on one of two systems only: The affected one has an Intel CPU and uses an Nvidia graphics card + proprietary driver, whereas the normal one uses an AMD CPU and displays graphical output over the network (X11 connection). # top -Utest last pid: 9924; load averages: 0.88, 0.90, 1.40 up 0+05:30:35 22:43:46 122 processes: 2 running, 119 sleeping, 1 zombie CPU: 0.5% user, 12.5% nice, 0.4% system, 0.0% interrupt, 86.5% idle Mem: 820M Active, 2747M Inact, 5324M Wired, 109M Buf, 15G Free ARC: 2998M Total, 618M MFU, 2329M MRU, 444K Anon, 11M Header, 38M Other Swap: 9216M Total, 9216M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 9916 test 14 25 5 172M 50820K select 5 0:15 103.65% VBoxSVC 9911 test 9 26 5 394M 107M select 2 0:01 0.08% VirtualBo 9917 test 1 20 0 30364K 7076K CPU5 5 0:00 0.05% top 9914 test 1 25 5 123M 43400K select 0 0:00 0.02% VBoxXPCOM 9898 test 1 20 0 75696K 13228K select 0 0:00 0.01% xterm 9899 test 1 20 0 35068K 8796K pause 4 0:00 0.00% zsh When exiting VirtualBox via the "exit" menu command, the window vanishes, but VBoxSVC continues to run at 100% CPU, and the VirtualBox process remains (at 0%). In this case, VBoxSVC can be killed -1, which will then also terminate VirtualBox and VBoxXPCOM. All ports up-to-date as of now.
The problem could lie with the preview in VirtualBox - it stays blank (black) on the machine where VBoxSVC consumes 100% CPU.
After some experimentation, the problem turned out to have a quite different cause. On both hosts, the same emulated harddisk is available: on the AMD machine directly via a vmdk file pointing, on the Intel machine via iSCSI. However, the file holding the emulated harddisk was not available. On the AMD machine, this only caused VirtualBox to note that the machine cannot be started. On the Intel machine however, the file seems to get exported as a zero-sized disk; this in turn seems to trip up VBoxSVC. So there are two issues: One, VBoxSVC gets confused about the announced but not really existing iSCSI target, and second, the iSCSI daemon blindly announces to export a file (with zero size) which does not exist (and cannot be created, its path not being available). I think the bigger problem lies with the iSCSI subsystem announcing a target which does not exist. I'll try to reassing the issue to the base system.
I think it's a feature, not a bug. A nonexisting file is a valid LUN, but without backing media (think an empty DVD drive). SCSI compliant clients are supposed to know how to handle this.
Hmmm... but if I then make the backing file available (by mounting the filesystem on which it resides), this fact is not reflected when I run 'ctladm devlist'. The same goes for the client: Attaching the disk using 'iscsictl -A' indeed first shows that the disk is not ready: da0: Attempt to query device size failed: NOT READY, Logical unit not ready, manual But after mounting the backing file, this does not change. -- Martin
You need to notify CTL that something has changed by doing "service ctld reload". Could you check whether it works and report back? Thanks!
This indeed works, I do not even need to rescan the bus on the client. The following message is logged when trying "dd if=/dev/da0 of=/dev/null bs=1m count=1" on the client after making the target available: May 31 19:43:46 mizar kernel: (da0:iscsi1:0:0:0): Capacity data has changed However, I then proceeded to remove the target from the configuration. On the client this leads to the message May 31 17:44:41 mizar iscsid[4380]: hal.xyzzy (iqn.1995-06.xyzzy.hal:disk1): target returned error: Not found May 31 19:44:41 mizar kernel: WARNING: hal.xyzzy (iqn.1995-06.xyzzy.hal:disk1): connection error; reconnecting May 31 19:44:41 mizar iscsid[4220]: child process 4380 terminated with exit status 1 But device da0 on the client still exists. If I now do dd if=/dev/da0 of=/dev/null bs=1m count=1" on the client again, I get a non-interruptible hang until I make the target available again, and even then the hang only terminates after a timeout of 60 seconds: May 31 19:45:42 mizar iscsid[4387]: hal.xyzzy (iqn.1995-06.xyzzy.hal:disk1): target returned error: Not found May 31 19:45:42 mizar kernel: WARNING: hal.xyzzy (iqn.1995-06.xyzzy.hal:disk1): login timed out after 61 seconds; reconnecting May 31 19:45:42 mizar iscsid[4220]: child process 4387 terminated with exit status 1 May 31 19:46:43 mizar kernel: WARNING: hal.xyzzy (iqn.1995-06.xyzzy.hal:disk1): login timed out after 61 seconds; reconnecting The dd then succeeds.
This too is expected: by default, the initiator will handle errors by just reconnecting until it succeeds. You don't want your iSCSI disks to disappear because of transient network problems, right? ;-) You can disable this behaviour by changing the kern.iscsi.fail_on_disconnection sysctl to 1, as described in man.freebsd.org/iscsi.
Thanks for all the pointers! -- Martin