Bug 218911

Summary: [uma] Memory corruption with certain item sizes
Product: Base System Reporter: Fabian Keil <fk>
Component: kernAssignee: freebsd-bugs (Nobody) <bugs>
Status: Closed FIXED    
Severity: Affects Some People CC: avg, emaste, kib, markj, oliver.pinter, op
Priority: --- Keywords: patch
Version: 11.0-STABLE   
Hardware: Any   
OS: Any   
Attachments:
Description Flags
sys/vm: Set UMA_ZONE_OFFPAGE if the offset would be too large to fit into keg->uk_pgoff
none
graid modification to reproduce the issue more conveniently none

Description Fabian Keil 2017-04-27 11:15:19 UTC
Created attachment 182122 [details]
sys/vm: Set UMA_ZONE_OFFPAGE if the offset would be too large to fit into keg->uk_pgoff

When creating an uma zone without specifying the UMA_ZONE_OFFPAGE
flag uma may use the slab itself for its book-keeping purposes if
it contains enough unused space.

keg->uk_pgoff is an uint16_t. It can overflow in which case
valid writes to correctly allocated items corrupt uma's
book-keeping data.

With INVARIANTS enabled, this results in a panic in uma_dbg_free(),
without INVARIANTS the effect is less obvious.

Here's a backtrace from a panic with INVARIANTS enabled
using a test zone with item size 129024 which is one of
the sizes that trigger the problem:

(kgdb) where
#0  __curthread () at ./machine/pcpu.h:222
#1  doadump (textdump=1) at /usr/src/sys/kern/kern_shutdown.c:298
#2  0xffffffff80555465 in kern_reboot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:366
#3  0xffffffff80555a40 in vpanic (fmt=<optimized out>, ap=0xfffffe0113d69980) at /usr/src/sys/kern/kern_shutdown.c:759
#4  0xffffffff80555876 in kassert_panic (fmt=0xffffffff808c521f "Assertion %s failed at %s:%d") at /usr/src/sys/kern/kern_shutdown.c:649
#5  0xffffffff807dd333 in keg_fetch_slab (keg=0xfffff80005b52480, zone=0xfffff80005b53700, flags=2) at /usr/src/sys/vm/uma_core.c:2343
#6  0xffffffff807dc9ae in zone_fetch_slab (zone=0xfffff80005b53700, keg=0xfffff80005b52480, flags=2) at /usr/src/sys/vm/uma_core.c:2368
#7  0xffffffff807dca40 in zone_import (zone=0xfffff80005b53700, bucket=0xfffff800d7eae9f8, max=1, flags=2) at /usr/src/sys/vm/uma_core.c:2494
#8  0xffffffff807d9341 in zone_alloc_bucket (zone=0xfffff80005b53700, udata=0x0, flags=2) at /usr/src/sys/vm/uma_core.c:2524
#9  uma_zalloc_arg (zone=0xfffff80005b53700, udata=0x0, flags=<optimized out>) at /usr/src/sys/vm/uma_core.c:2250
#10 0xffffffff8182329f in uma_zalloc (flags=2, zone=<optimized out>) at /usr/src/sys/vm/uma.h:336
#11 g_raid_init (mp=0xffffffff81856838 <g_raid_class>) at /usr/src/sys/modules/geom/geom_raid/../../../geom/raid/g_raid.c:2496
#12 0xffffffff804d1125 in g_load_class (arg=<optimized out>, flag=<optimized out>) at /usr/src/sys/geom/geom_subr.c:124
#13 0xffffffff804cd7c7 in one_event () at /usr/src/sys/geom/geom_event.c:264
#14 g_run_events () at /usr/src/sys/geom/geom_event.c:286
#15 0xffffffff80517e64 in fork_exit (callout=0xffffffff804cfa60 <g_event_procbody>, arg=0x0, frame=0xfffffe0113d69c00) at /usr/src/sys/kern/kern_fork.c:1040
#16 <signal handler called>
[...]
(kgdb) f 5
#5  0xffffffff807dd333 in keg_fetch_slab (keg=0xfffff80005b52480, zone=0xfffff80005b53700, flags=2) at /usr/src/sys/vm/uma_core.c:2343
2343                            MPASS(slab->us_keg == keg);
(kgdb) p slab->us_keg
$1 = (uma_keg_t) 0xdeadc0dedeadc0de
(kgdb) f 9
#9  uma_zalloc_arg (zone=0xfffff80005b53700, udata=0x0, flags=<optimized out>) at /usr/src/sys/vm/uma_core.c:2250
2250            bucket = zone_alloc_bucket(zone, udata, flags);
(kgdb) p zone
$2 = (uma_zone_t) 0xfffff80005b53700
(kgdb) p *zone
$3 = {uz_lock = {lock_object = {lo_name = 0xffffffff8184f85a "uma_test", lo_flags = 21168128, lo_data = 0, lo_witness = 0xfffffe0000a0cf80}, mtx_lock = 4}, uz_lockptr = 0xfffff80005b52480,
  uz_name = 0xffffffff8184f85a "uma_test", uz_link = {le_next = 0x0, le_prev = 0xfffff80005b52510}, uz_buckets = {lh_first = 0x0}, uz_kegs = {lh_first = 0xfffff80005b537b0}, uz_klink = {kl_link = {le_next = 0x0,
      le_prev = 0xfffff80005b537a8}, kl_keg = 0xfffff80005b52480}, uz_slab = 0xffffffff807dc940 <zone_fetch_slab>, uz_ctor = 0xffffffff807dd730 <trash_ctor>, uz_dtor = 0xffffffff807dd780 <trash_dtor>, uz_init = 0x0,
  uz_fini = 0x0, uz_import = 0xffffffff807dc9f0 <zone_import>, uz_release = 0xffffffff807dcc90 <zone_release>, uz_arg = 0xfffff80005b53700, uz_flags = 0, uz_size = 129024, uz_allocs = 0, uz_fails = 0, uz_frees = 0,
  uz_sleeps = 0, uz_count = 1, uz_count_min = 1, uz_warning = 0x0, uz_ratecheck = {tv_sec = 0, tv_usec = 0}, uz_maxaction = {ta_link = {stqe_next = 0x0}, ta_pending = 0, ta_priority = 0, ta_func = 0x0, ta_context = 0x0},
0x0},
  uz_cpu = {{uc_freebucket = 0x0, uc_allocbucket = 0x0, uc_allocs = 0, uc_frees = 0}}}
(kgdb) f 11
#11 g_raid_init (mp=0xffffffff81856838 <g_raid_class>) at /usr/src/sys/modules/geom/geom_raid/../../../geom/raid/g_raid.c:2496
2496                    p = uma_zalloc(uma_test_zone, M_WAITOK);
(kgdb) p i
$4 = 0
(kgdb) l -
2491
2492            uma_test_zone = uma_zcreate("uma_test", uma_test_zone_item_size, NULL, NULL,
2493                NULL, NULL, 0, 0);
2494
2495            for (i = 0; i <= uma_test_zone_item_size; i++) {
2496                    p = uma_zalloc(uma_test_zone, M_WAITOK);
2497                    memset(p, 0x10, i);
2498                    G_RAID_DEBUG(1, "Freeing item after filling %d bytes.", i);
2499                    uma_zfree(uma_test_zone, p);
2500            }
(kgdb) f 9
#9  uma_zalloc_arg (zone=0xfffff80005b53700, udata=0x0, flags=<optimized out>) at /usr/src/sys/vm/uma_core.c:2250
2250            bucket = zone_alloc_bucket(zone, udata, flags);
(kgdb) p zone->uz_kegs->lh_first->kl_keg
$6 = (uma_keg_t) 0xfffff80005b52480
(kgdb) p *zone->uz_kegs->lh_first->kl_keg
$7 = {uk_lock = {lock_object = {lo_name = 0xffffffff8184f85a "uma_test", lo_flags = 21168128, lo_data = 0, lo_witness = 0xfffffe0000a0cf80}, mtx_lock = 18446735277663592448}, uk_hash = {uh_slab_hash = 0x0, uh_hashsize = 0,
    uh_hashmask = 0}, uk_zones = {lh_first = 0xfffff80005b53700}, uk_part_slab = {lh_first = 0x0}, uk_free_slab = {lh_first = 0x0}, uk_full_slab = {lh_first = 0x0}, uk_align = 0, uk_pages = 32, uk_free = 1, uk_reserve = 0,
  uk_size = 129024, uk_rsize = 129024, uk_maxpages = 0, uk_init = 0xffffffff807dd7b0 <trash_init>, uk_fini = 0xffffffff807dd7e0 <trash_fini>, uk_allocf = 0xffffffff807db690 <page_alloc>,
  uk_freef = 0xffffffff807db770 <page_free>, uk_offset = 0, uk_kva = 0, uk_slabzone = 0x0, uk_pgoff = 65424, uk_ppera = 32, uk_ipers = 1, uk_flags = 0, uk_name = 0xffffffff8184f85a "uma_test", uk_link = {
    le_next = 0xfffff80005b52300, le_prev = 0xffffffff80f45d00 <uma_kegs>}}
(kgdb) p sizeof(struct uma_slab)
$8 = 112
(kgdb) p zone->uz_kegs->lh_first->kl_keg->uk_ppera
$9 = 32
(kgdb) p 4096 * zone->uz_kegs->lh_first->kl_keg->uk_ppera
$10 = 131072
(kgdb) p 4096 * zone->uz_kegs->lh_first->kl_keg->uk_ppera - sizeof(struct uma_slab)
$11 = 130960
(kgdb) p 4096 * zone->uz_kegs->lh_first->kl_keg->uk_ppera - sizeof(struct uma_slab) - 65536
$12 = 65424
(kgdb) p zone->uz_kegs->lh_first->kl_keg->uk_size
$13 = 129024
(kgdb) p 4096 * zone->uz_kegs->lh_first->kl_keg->uk_ppera -  zone->uz_kegs->lh_first->kl_keg->uk_rsize
$14 = 2048

The attached patch lets uma set the UMA_ZONE_OFFPAGE if keg->uk_pgoff
would otherwise be used but is too small to hold the correct offset.

Changing keg->uk_pgoff's type should work as well but I haven't tested it.

Obtained from: ElectroBSD
Comment 1 Fabian Keil 2017-04-27 11:20:23 UTC
Created attachment 182123 [details]
graid modification to reproduce the issue more conveniently

This patch modifies graid (which I don't need) to quickly reproduce the problem upon kldload.

Obviously it should only be used for testing purposes.
Comment 2 Konstantin Belousov freebsd_committer freebsd_triage 2017-04-30 09:25:55 UTC
This looks reasonable to me, but I added several more people who did actual uma changes recently, to confirm my understanding.
Comment 3 Mark Johnston freebsd_committer freebsd_triage 2017-04-30 16:19:31 UTC
This looks reasonable to me too.

I note that the assignment keg->uk_slabsize = keg->uk_ppera * PAGE_SIZE will overflow when uk_ppera >= 16. This field isn't actually used after the keg is constructed, so a different solution may be to remove uk_slabsize and make uk_pgoff 32 bits wide.
Comment 4 Mark Johnston freebsd_committer freebsd_triage 2017-04-30 16:45:08 UTC
(In reply to Mark Johnston from comment #3)
Oops, I missed that uk_slabsize was already removed in r315077.
Comment 5 Andriy Gapon freebsd_committer freebsd_triage 2017-06-12 11:02:37 UTC
The patch looks okay to me as well.  It seems that in practice it is extremely rare to use such larg UMA item sizes that are not multiple of the page size at the same time.
Comment 6 commit-hook freebsd_committer freebsd_triage 2017-09-13 21:54:57 UTC
A commit references this bug:

Author: markj
Date: Wed Sep 13 21:54:38 UTC 2017
New revision: 323564
URL: https://svnweb.freebsd.org/changeset/base/323564

Log:
  Widen uk_pgoff, the slab header offset field.

  16 bits is only wide enough for kegs with an item size of up to 64KB.
  At that size or larger, slab headers are typically offpage because the
  item size is a multiple of the page size, but there is no requirement
  that this be the case.

  We can widen the field without affecting the layout of struct uma_keg
  since the removal of uk_slabsize in r315077 left an adjacent hole.

  PR:		218911
  MFC after:	2 weeks

Changes:
  head/sys/vm/uma_int.h
Comment 7 commit-hook freebsd_committer freebsd_triage 2017-09-27 14:19:26 UTC
A commit references this bug:

Author: markj
Date: Wed Sep 27 14:18:20 UTC 2017
New revision: 324057
URL: https://svnweb.freebsd.org/changeset/base/324057

Log:
  MFC r323564:
  Widen uk_pgoff, the slab header offset field.

  PR:	218911

Changes:
_U  stable/11/
  stable/11/sys/vm/uma_int.h