Summary: | (zfs+i386 No PAE) panic: kmem_malloc(36864): kmem_map too small: 431976448 total allocated | ||
---|---|---|---|
Product: | Base System | Reporter: | Michelle Sullivan <michelle> |
Component: | kern | Assignee: | freebsd-fs (Nobody) <fs> |
Status: | New --- | ||
Severity: | Affects Some People | CC: | michelle, mmoll, ota |
Priority: | --- | Keywords: | crash |
Version: | 9.3-RELEASE | ||
Hardware: | i386 | ||
OS: | Any |
Description
Michelle Sullivan
2015-02-18 15:14:48 UTC
Getting lots of cores for the same issue (url in the original report) Set loader.conf to: vm.kmem_size_max="1024M" vfs.zfs.arc_meta_limit=11381616 vfs.zfs.arc_min=3375104 vfs.zfs.arc_max=16875520 System stays up longer, but still dies. Solution (it's stayed up so far) seems to be remove a processor... Running on one processor it's been running 12 hours... where before it was around an hour (45 minutes after starting poudriere.) Correction... that was not the last thing I did... I changed from using MFSSIZE to TMPFS. Ok got it down to something a little more clearer... No changing of vm.kmem_size_max has any effect except delaying the issue... just as vm.kmem_size doesn't. What does stop the panic (so far I've build 800+ packages using poudriere - with 'svn update' prior - which before the update didn't even complete and after going through several boots it manages around 20 packages before panicing - if it even gets past the checking sanity phase).... Reduce the CPU count to one cpu .... Interestingly setting max ARC size to 40M and within minutes of multiple CPUs it's over 400M ... before it panics... With one CPU it grows to 80(ish)M and doesn't panic... some 40 or 50 cores publicly available at the link i post earlier... both 9.2 and 9.3 are affected... lets hope someone will see to patch it before 9.4.. I wouldn't have a clue where to look for the issue or I'd take a shot at it myself. Michelle, did you compile that kernel with raised KVA_PAGES as described in https://wiki.freebsd.org/ZFSTuningGuide#i386 ? No I haven't - and quite deliberately (mainly because I saw that later and the systems are set to update using freebsd-update and I don't want the kernel to bugger up the patching)... and I wanted to see if I could get to the bottom of the cause or at least make it reproducible... I see no reason why zfs on i386 shouldn't work without the need to recompile the kernel (or zfs should be removed so that anyone wishing to use it should have to compile)... and I've finally got progress, single CPU and no panic, multiple CPUs and reproducible panic.. can it be looked at to resolve it..? (because in reality it has to be some form of bug - ARC should not exhaust the memory, and it should be constrained by the limits in the loader conf(ig)) It has been quite a while I used ZFS on i386, but from what I remember: - Default kernels can allocate 512MB max. as kmem (ALL kmem, not only ARC!) o That means ARC should be limited to 256MB or so, to still have room for other kernel tasks and some safety buffer. - Limiting the memory down to such values will make ZFS _very_ slow. - In general ZFS was not really designed for 32 bit systems anyway. - I used ZFS on i386 successfully with 4GB of RAM by setting: o options KVA_PAGES=512 in the custom kernel o vm.kmem_size and vm.kmem_size_max to 1536MB in loader.conf IMHO, at the end of the day the only advise here can be to move on to amd64 or if that's not possible to use a custom kernel with increased KVA_PAGES. 100% right with the 512MB limit. Setting arc_max to 40M is ignored... I have that now with a single CPU and it's showing (in "top"): Total: 52M 1858K MFU, 24M MRU, 400K Anon, 1975K Header, 24M Other If I set more than one CPU in the VM - the 'Total' will get to around 467M and then panic (same loader.conf settings) This I believe is the bug/a bug... Running ZFS on i386 should not be recommended, 100% with you... however: 1/ It is available in default kernels. 2/ KVA Pages needs to be set for default kernels. 3/ I'm not using ZFS in production anywhere - it's used for poudriere otherwise I'd disable it completely. We have a couple of things/statements here: 1/ Its available for use by default, recommended or not. 2/ It seems to work (all be it slowly) on i386 with the correct tuning. 3/ More than one CPU and it doesn't work 4/ KVA pages is not set by default. At least some should be resolved.... and this is how I see it: 1/ It either should be enabled by default and KVA_PAGES=512 set by default. 2/ It should be disabled by default with the warning that KVA_PAGES needs to be set if enabling. and: A/ someone should look into and resolve (if possible) the fact that arc_max is not respected when multiple CPUs are present. B/ documentation to indicating (1) and/or (2) should be updated (currently there is the link that you indicated, it probably should be expanded to indicate that ZFS is disabled and KVA_PAGES should be added if enabling, or that zfs is enabled and so it KVA_PAGES by default and what ever risk that may entail - it should also probably add that ZFS is really not for i386 because it will be really slow as it wasn't designed for 32 bit... [does that make sense?]) Thoughts? Not trying to be a pain here - but the default should work (even if that's "Disable ZFS in the default kernel") ... and I really think that 1 CPU working, 2 CPU = panic is a bug, and possibly an important one that may even be there on amd64 just it is not noticed because of the platform difference. Regards, Michelle I think I might be one step closer to the cause. I've noticed when using memory based disk on both i386 and amd64 the memory used (some/all?) cases doesn't appear in in the memory stats shown in 'top' ... however the memory is 'missing' .. could this be fooling the VM manager into thinking there is memory free when there is not and therefore screwing up the memory pressure handler...? It looks this bug report is too old to further investigate. On the other hand, I run 12.0-RELEASE/STABLE and 13-CURRENT on i386 system with 512MB of RAM , ZFS and tmpfs, and non-PAE kernel and haven't seen this type of errors for years. It is also possible that this bug has been fixed as well. |