224330 – head -r326347 breaks USB on rpi2 and pine64 (and likely more)

Bug 224330 - head -r326347 breaks USB on rpi2 and pine64 (and likely more)

Summary: head -r326347 breaks USB on rpi2 and pine64 (and likely more)

Status:	Closed FIXED

Alias:	None

Product:	Base System
Classification:	Unclassified
Component:	kern (show other bugs)
Version:	CURRENT
Hardware:	Any Any

Importance:	--- Affects Some People
Assignee:	Andrew Turner

URL:
Keywords:	regression

Depends on:
Blocks:

Reported:	2017-12-14 05:39 UTC by Mark Millard
Modified:	2018-08-20 18:41 UTC (History)
CC List:	8 users (show)

See Also:	224080

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Mark Millard 2017-12-14 05:39:29 UTC

[This may be related to bugzilla 224080 that is also for
-r326347 problems.]

See Emmanuel Vadot's pine64 USB-breakage report at:

https://lists.freebsd.org/pipermail/svn-src-head/2017-December/106954.html

He writes in part:

 This seems to break USB (EHCI and OHCI) on Pine64 (arm64)
 The board just hang when plugin usb device



In my case I saw this as a boot problem because I use
a USB SSD on a powered hub for the root file system.
(And I did not know about the above report yet.)

See my list messages for the rpi2 example:

https://lists.freebsd.org/pipermail/freebsd-current/2017-December/067771.html
https://lists.freebsd.org/pipermail/freebsd-current/2017-December/067799.html
https://lists.freebsd.org/pipermail/freebsd-current/2017-December/067807.html
https://lists.freebsd.org/pipermail/freebsd-current/2017-December/067811.html

My reports have some material from db> prompt
commands (but I'm no expert at what to pick
to show for this).

In my case I did a bisect sequence to find the
-r326346-and-before good / -r325347-and-later bad
boundary: I had jumped from -r326192 to -r326726
but had the problem.

Some db> material is from -r326726 before the
bisect and other material is from -r326347 .

The kernel thread (100001) involves:

. . .
uma_zcreate() at uma_zcreate+0x10c
         pc = 0xc052b024  lr = 0xc050cc90 (ffs_mount+0x80)
         sp = 0xd42a5858  fp = 0xd42a5980
         r4 = 0xd7844000  r5 = 0x00000000
         r6 = 0x00000003  r7 = 0xc09a99c0
         r8 = 0x00000000  r9 = 0xd429f000
        r10 = 0xd828c400
ffs_mount() at ffs_mount+0x80
         pc = 0xc050cc90  lr = 0xc032e1d4 (vfs_donmount+0xeec)
         sp = 0xd42a5988  fp = 0xd42a5b38
         r4 = 0xffffffff  r5 = 0xd74ec120
         r6 = 0xd429f000  r7 = 0x00000000
         r8 = 0x00000000  r9 = 0xc425acd0
        r10 = 0xd828c400
. . .

The uma thread involves:

. . .
uma_reclaim_locked() at uma_reclaim_locked+0x200
         pc = 0xc052d9f8  lr = 0xc052de70 (uma_reclaim_worker+0x4c)
         sp = 0xd87dadf0  fp = 0xd87dae20
         r4 = 0xc09a9c68  r5 = 0xc4242d80
         r6 = 0xc06df86e  r7 = 0x00000000
         r8 = 0x00000100  r9 = 0xc09b3d08
        r10 = 0xc09a9c7c
uma_reclaim_worker() at uma_reclaim_worker+0x4c
         pc = 0xc052de70  lr = 0xc0230f10 (fork_exit+0xa0)
         sp = 0xd87dae28  fp = 0xd87dae40
         r4 = 0xd7ad8740  r5 = 0xd70ce000
         r6 = 0xc052de24  r7 = 0x00000000
         r8 = 0xd87dae48  r9 = 0xd7ada3a0
        r10 = 0xc0824e8c
. . .

Comment 1 Cy Schubert freebsd_committer

2017-12-14 06:02:47 UTC

I currently use zfs on USB <-> IDE and USB <-> SATA as well (on amd64). No problems there however I also  have UFS on USB <-> SATA. I'll check that out tonight.

Comment 2 Mark Millard 2017-12-14 06:28:32 UTC

(In reply to Mark Millard from comment #0)

I should have noted:

The rpi2 is a v1.1 one,  a Cortex-A7, armv7 context.

Pine64 has a Cortex-A53, aarch64 context.

(rpi2 v1.2 would be Cortext-A53, thus the need for
the note even for folks familiar with the boards.)

Comment 3 Mark Linimon freebsd_committer

2017-12-14 14:08:47 UTC

Over to committer of r326347 for review.

Comment 4 Jeff Roberson freebsd_committer

2017-12-14 21:24:13 UTC

Can someone who is experiencing a hang give me the following values from ddb:

uma_kmem_limit
uma_kmem_total
uma_recliam_needed

These are also indirectly visible via:
sysctl vm.kmem_map_free
sysctl vm.kmem_map_size

I have not noticed anything irregular in the arm usage of kmem after a review.  So we'll need to debug a bit further.  Thank you for your patience.

Comment 5 Jeff Roberson freebsd_committer

2017-12-14 22:15:05 UTC

mi_switch() at mi_switch+0x258
         pc = 0xc0275f28  lr = 0xc052d9f8 (uma_reclaim_locked+0x200)

Can someone on arm report what source line is at this offset?  You can use addr2line or gdb list *(uma_reclaim_locked+0x200)

also
show vmem kernel_arena

Comment 6 Mark Millard 2017-12-14 23:12:14 UTC

(In reply to Jeff Roberson from comment #5)

This reply is for things I can do without the rpi2
at the moment. . .

addr2line (on the cross build host since lack
of booting the rpi2 is part of my context):

I show both the pc and the lr addresses in:

         pc = 0xc0275f28  lr = 0xc052d9f8 (uma_reclaim_locked+0x200)


# addr2line -e /usr/obj/DESTDIRs/clang-armv7dbg-installkernel_bisect/usr/lib/debug/boot/kernel/kernel.debug  0xc0275f28
/usr/src_bisect/sys/kern/kern_synch.c:447

So:

head/sys/kern/kern_synch.c 326023

in mi_switch on the if line below:

        /*
         * If the last thread was exiting, finish cleaning it up.
         */
        if ((td = PCPU_GET(deadthread))) {
                PCPU_SET(deadthread, NULL);
                thread_stash(td);
        }


# addr2line -e /usr/obj/DESTDIRs/clang-armv7dbg-installkernel_bisect/usr/lib/debug/boot/kernel/kernel.debug  0xc052d9f8
./machine/pcpu.h:96

So interpreting that as /usr/src_bisect/sys/arm/include/pcpu.h
in the source tree used for the build:

head/sys/arm/include/pcpu.h 326258

in the __asm line in:

static inline struct thread *
get_curthread(void)
{
        void *ret;

        __asm __volatile("mrc p15, 0, %0, c13, c0, 4" : "=r" (ret));
        return (ret);
}

When I'm using the rpi2 at a db> prompt I'll try to confirm
that the mrc is what I see.

Comment 7 Jeff Roberson freebsd_committer

2017-12-14 23:14:11 UTC

Can you gradually reduce the offset until it gives a location in actual uma code?

Comment 8 Mark Millard 2017-12-14 23:25:11 UTC

(In reply to Jeff Roberson from comment #5)

[I'll see in a bit about finding a uma code
address.]

show vmem kernel_arena crashes after a few lines:

db> show vmem kernel_arena
vmem 0xc07eb71c ','
        quantum:        1
        size:   0
        inuse:  3230964176
        free:   1064003120
        busy tags:      -1072951294
        free tags:      2147482985
KDB: reentering
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc = 0xc05622b8  lr = 0xc005f0d4 (db_trace_self_wrapper+0x30)
         sp = 0xd42b2278  fp = 0xd42b2390
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
         pc = 0xc005f0d4  lr = 0xc02b2e1c (kdb_reenter+0x44)
         sp = 0xd42b2398  fp = 0xd42b23a0
         r4 = 0xc0943e28  r5 = 0x5155005d
         r6 = 0x00000005  r7 = 0x00000005
kdb_reenter() at kdb_reenter+0x44
         pc = 0xc02b2e1c  lr = 0xc0585d4c (abort_handler+0x110)
         sp = 0xd42b23a8  fp = 0xd42b2438
         r4 = 0x00000005 r10 = 0xd42b2440
abort_handler() at abort_handler+0x110
         pc = 0xc0585d4c  lr = 0xc0564cfc (exception_exit)
         sp = 0xd42b2440  fp = 0xd42b2898
         r4 = 0xc07eb71c  r5 = 0xd42b2790
         r6 = 0x51550049  r7 = 0xd42b25c0
         r8 = 0xd42b26a8  r9 = 0xd42b24d8
        r10 = 0x00000000
exception_exit() at exception_exit
         pc = 0xc0564cfc  lr = 0xc02cfe50 (vmem_summ+0xd8)
         sp = 0xd42b24d0  fp = 0xd42b2898
         r0 = 0xc07eb840  r1 = 0xd42b25c0
         r2 = 0xd42b2790  r3 = 0x51550049
         r4 = 0xc07eb71c  r5 = 0xd42b2790
         r6 = 0x51550049  r7 = 0xd42b25c0
         r8 = 0xd42b26a8  r9 = 0xd42b24d8
        r10 = 0x00000000 r12 = 0xd42b25c0
vmem_summ() at vmem_summ+0xf8
         pc = 0xc02cfe70  lr = 0xc005bb34 (db_command+0x268)
         sp = 0xd42b28a0  fp = 0xd42b2940
         r4 = 0x00000001  r5 = 0x00000001
         r6 = 0xc069da80  r7 = 0xc02cfd78
         r8 = 0xc09b02b8  r9 = 0xc07eb7ec
        r10 = 0x00000000
db_command() at db_command+0x268
         pc = 0xc005bb34  lr = 0xc005b8bc (db_command_loop+0x74)
         sp = 0xd42b2948  fp = 0xd42b2958
         r4 = 0xc067b689  r5 = 0xc06ad511
         r6 = 0xc09b02a4  r7 = 0xd42b2b28
         r8 = 0xc0824ea0  r9 = 0xc07b3ee8
        r10 = 0xc0943e10
db_command_loop() at db_command_loop+0x74
         pc = 0xc005b8bc  lr = 0xc005f25c (db_trap+0x12c)
         sp = 0xd42b2960  fp = 0xd42b2a78
         r4 = 0x00000000  r5 = 0xc09b02b0
         r6 = 0xc0943e30 r10 = 0xc0943e10
. . .

Comment 9 Jeff Roberson freebsd_committer

2017-12-14 23:28:37 UTC

Can you get me the uma_kmem_* variables?

I think you actually need to do:
x/x kernel_arena

to extract the address to pass to show vmem.

Comment 10 Mark Millard 2017-12-14 23:30:34 UTC

(In reply to Jeff Roberson from comment #4)

db> print *uma_kmem_limit    
10bde000
db> print *uma_kmem_total
  e05000
db> print *uma_reclaim_needed 
       7

Comment 11 Jeff Roberson freebsd_committer

2017-12-14 23:34:47 UTC

show page

?

Comment 12 Mark Millard 2017-12-14 23:35:33 UTC

(In reply to Jeff Roberson from comment #9)

As for kernel_arena . . .

db> x/x kernel_arena
kernel_arena:   c0948e40
db> show vmem 0xc0948e40     
vmem 0xc0948e40 'kernel arena'
        quantum:        4096
        size:   362807296
        inuse:  362754048
        free:   53248
        busy tags:      1966
        free tags:      3
                        inuse   size            free    size
        4096            1802    7380992         0       0
        8192            14      114688          0       0
        12288           85      1044480         0       0
        16384           15      245760          0       0
        20480           2       40960           0       0
        32768           5       163840          0       0
        36864           1       36864           0       0
        53248           0       0               1       53248
        65536           11      720896          0       0
        94208           1       94208           0       0
        131072          8       1134592         0       0
        262144          13      4153344         0       0
        524288          1       888832          0       0
        2097152         2       6844416         0       0
        4194304         2       12742656        0       0
        8388608         1       10035200        0       0
        16777216        2       52510720        0       0
        134217728       1       264601600       0       0

But it turns out that *kernel_arena works:

db> show vmem *kernel_arena  
vmem 0xc0948e40 'kernel arena'
        quantum:        4096
        size:   362807296
        inuse:  362754048
        free:   53248
        busy tags:      1966
        free tags:      3
                        inuse   size            free    size
        4096            1802    7380992         0       0
        8192            14      114688          0       0
        12288           85      1044480         0       0
        16384           15      245760          0       0
        20480           2       40960           0       0
        32768           5       163840          0       0
        36864           1       36864           0       0
        53248           0       0               1       53248
        65536           11      720896          0       0
        94208           1       94208           0       0
        131072          8       1134592         0       0
        262144          13      4153344         0       0
        524288          1       888832          0       0
        2097152         2       6844416         0       0
        4194304         2       12742656        0       0
        8388608         1       10035200        0       0
        16777216        2       52510720        0       0
        134217728       1       264601600       0       0

db> show vmem *kernel_arena  
vmem 0xc0948e40 'kernel arena'
        quantum:        4096
        size:   362807296
        inuse:  362754048
        free:   53248
        busy tags:      1966
        free tags:      3
                        inuse   size            free    size
        4096            1802    7380992         0       0
        8192            14      114688          0       0
        12288           85      1044480         0       0
        16384           15      245760          0       0
        20480           2       40960           0       0
        32768           5       163840          0       0
        36864           1       36864           0       0
        53248           0       0               1       53248
        65536           11      720896          0       0
        94208           1       94208           0       0
        131072          8       1134592         0       0
        262144          13      4153344         0       0
        524288          1       888832          0       0
        2097152         2       6844416         0       0
        4194304         2       12742656        0       0
        8388608         1       10035200        0       0
        16777216        2       52510720        0       0
        134217728       1       264601600       0       0

Comment 13 Mark Millard 2017-12-14 23:36:16 UTC

(In reply to Jeff Roberson from comment #11)

db> show page
vm_cnt.v_free_count: 201160
vm_cnt.v_inactive_count: 0
vm_cnt.v_active_count: 0
vm_cnt.v_laundry_count: 0
vm_cnt.v_wire_count: 4563
vm_cnt.v_free_reserved: 333
vm_cnt.v_free_min: 1360
vm_cnt.v_free_target: 4441

Comment 14 Jeff Roberson freebsd_committer

2017-12-14 23:42:22 UTC

Can you confirm that you have r326664?

Comment 15 Jeff Roberson freebsd_committer

2017-12-15 00:02:27 UTC

Is there a way I can get remote access to a serial console on this board for interactive debugging?  Or can I buy one cheaply somewhere?

Comment 16 Mark Millard 2017-12-15 00:09:45 UTC

(In reply to Jeff Roberson from comment #5)

I have confirmed:

db> x/i 0xc052d9f8
uma_reclaim_locked+0x200:       mrc     p15, 0, r0, c13, c0, 4

But near by addresses show line numbers in
blank/comment lines before the while loop
for:

/*
 * Drain the cached buckets from a zone.  Expects a locked zone on entry.
 */
static void
bucket_cache_drain(uma_zone_t zone)
{
        uma_bucket_t bucket;

        /*
         * Drain the bucket queues and free the buckets, we just keep two per
         * cpu (alloc/free).
         */
        while ((bucket = LIST_FIRST(&zone->uz_buckets)) != NULL) {
                LIST_REMOVE(bucket, ub_link);
                ZONE_UNLOCK(zone);
                bucket_drain(zone, bucket);
                bucket_free(zone, bucket, NULL);
                ZONE_LOCK(zone);
        }

        /*
         * Shrink further bucket sizes.  Price of single zone lock collision
         * is probably lower then price of global cache drain.
         */
        if (zone->uz_count > zone->uz_count_min)
                zone->uz_count--;
}

in /usr/src_bisect/sys/vm/uma_core.c .

So for reference to see if the near by code
makes sense: there is a +0x274: bl uma_zfree_arg
See below.

db> x/i,20        
uma_reclaim_locked+0x1d8:       beq     uma_reclaim_locked+0x4c8
uma_reclaim_locked+0x1dc:       mrc     p15, 0, r0, c13, c0, 4
uma_reclaim_locked+0x1e0:       mov     r1, #0x00000000
uma_reclaim_locked+0x1e4:       mov     r2, r10
uma_reclaim_locked+0x1e8:       und     e3003317
uma_reclaim_locked+0x1ec:       bl      _thread_lock
uma_reclaim_locked+0x1f0:       mrc     p15, 0, r0, c13, c0, 4
uma_reclaim_locked+0x1f4:       mov     r1, r4
uma_reclaim_locked+0x1f8:       str     r4, [r13, #0x004]
uma_reclaim_locked+0x1fc:       bl      sched_bind
uma_reclaim_locked+0x200:       mrc     p15, 0, r0, c13, c0, 4
uma_reclaim_locked+0x204:       mov     r1, #0x00000000
uma_reclaim_locked+0x208:       mov     r2, r10
uma_reclaim_locked+0x20c:       und     e3003319
uma_reclaim_locked+0x210:       ldr     r0, [r0]
uma_reclaim_locked+0x214:       add     r0, r0, #0x00000010
uma_reclaim_locked+0x218:       bl      __mtx_unlock_spin_flags
uma_reclaim_locked+0x21c:       mov     r0, r9
uma_reclaim_locked+0x220:       mov     r1, r10
uma_reclaim_locked+0x224:       und     e30026cd
uma_reclaim_locked+0x228:       bl      __rw_rlock
uma_reclaim_locked+0x22c:       und     e3090cb8
uma_reclaim_locked+0x230:       cmp     r12, #0x0000009a
uma_reclaim_locked+0x234:       ldr     r0, [r0]
uma_reclaim_locked+0x238:       cmp     r0, #0x00000000
uma_reclaim_locked+0x23c:       bne     uma_reclaim_locked+0x254
uma_reclaim_locked+0x240:       b       uma_reclaim_locked+0x4b0
uma_reclaim_locked+0x244:       ldr     r0, [r13, #0x008]
uma_reclaim_locked+0x248:       ldr     r0, [r0, #0x0a4]
uma_reclaim_locked+0x24c:       cmp     r0, #0x00000000
uma_reclaim_locked+0x250:       beq     uma_reclaim_locked+0x4b0
uma_reclaim_locked+0x254:       str     r0, [r13, #0x008]
db> x/i,20
uma_reclaim_locked+0x258:       ldr     r7, [r0, #0x04c]
uma_reclaim_locked+0x25c:       cmp     r7, #0x00000000
uma_reclaim_locked+0x260:       bne     uma_reclaim_locked+0x2a8
uma_reclaim_locked+0x264:       b       uma_reclaim_locked+0x244
uma_reclaim_locked+0x268:       sub     r0, r0, #0x00000010
uma_reclaim_locked+0x26c:       ldr     r0, [r0]
uma_reclaim_locked+0x270:       mov     r1, r9
uma_reclaim_locked+0x274:       bl      uma_zfree_arg
uma_reclaim_locked+0x278:       b       uma_reclaim_locked+0x29c
uma_reclaim_locked+0x27c:       und     e30f0df4
uma_reclaim_locked+0x280:       cmp     r12, #0x0000006d
uma_reclaim_locked+0x284:       bl      kassert_panic
uma_reclaim_locked+0x288:       b       uma_reclaim_locked+0x400
uma_reclaim_locked+0x28c:       und     e30f0df4
uma_reclaim_locked+0x290:       cmp     r12, #0x0000006d
uma_reclaim_locked+0x294:       bl      kassert_panic
uma_reclaim_locked+0x298:       b       uma_reclaim_locked+0x468
uma_reclaim_locked+0x29c:       ldr     r7, [r7, #0x048]
uma_reclaim_locked+0x2a0:       cmp     r7, #0x00000000
uma_reclaim_locked+0x2a4:       beq     uma_reclaim_locked+0x244
uma_reclaim_locked+0x2a8:       ldrb    r0, [r7, #0x087]
uma_reclaim_locked+0x2ac:       tst     r0, #0x00000020
uma_reclaim_locked+0x2b0:       bne     uma_reclaim_locked+0x29c
uma_reclaim_locked+0x2b4:       ldr     r0, [r7, #0x040]
uma_reclaim_locked+0x2b8:       mov     r1, #0x00000000
uma_reclaim_locked+0x2bc:       mov     r2, r10
uma_reclaim_locked+0x2c0:       und     e30032e7
uma_reclaim_locked+0x2c4:       add     r0, r0, #0x00000010
uma_reclaim_locked+0x2c8:       mov     r9, #0x00000000
uma_reclaim_locked+0x2cc:       bl      __mtx_lock_flags
uma_reclaim_locked+0x2d0:       bl      critical_enter
uma_reclaim_locked+0x2d4:       mrc     p15, 0, r0, c0, c0, 5

Comment 17 Mark Millard 2017-12-15 00:19:32 UTC

(In reply to Jeff Roberson from comment #14)

This is running -r326347 directly, not later.
It is a debug kernel build (unusual for me).

If needed I can update to some specific version
of the kernel after I build it. Do you want
-r326664 on my end of things?

It is built from my sources that are different
mostly by powerpc64 and powerpc experiments.
My kernel configurations include the standard
one and then adjust the debug vs. not details.
build with -mcpu=cortex-a7 in use.

# svnlite status /usr/src_bisect/ | sort                                                                                                                                                        ?       /usr/src_bisect/sys/amd64/conf/GENERIC-DBG
?       /usr/src_bisect/sys/amd64/conf/GENERIC-NODBG
?       /usr/src_bisect/sys/arm/conf/GENERIC-DBG
?       /usr/src_bisect/sys/arm/conf/GENERIC-NODBG
?       /usr/src_bisect/sys/arm64/conf/GENERIC-DBG
?       /usr/src_bisect/sys/arm64/conf/GENERIC-NODBG
?       /usr/src_bisect/sys/powerpc/conf/GENERIC64vtsc-DBG
?       /usr/src_bisect/sys/powerpc/conf/GENERIC64vtsc-NODBG
?       /usr/src_bisect/sys/powerpc/conf/GENERICvtsc-DBG
?       /usr/src_bisect/sys/powerpc/conf/GENERICvtsc-NODBG
M       /usr/src_bisect/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
M       /usr/src_bisect/contrib/llvm/tools/lld/ELF/Arch/PPC64.cpp
M       /usr/src_bisect/crypto/openssl/crypto/armcap.c
M       /usr/src_bisect/lib/libkvm/kvm_powerpc.c
M       /usr/src_bisect/lib/libkvm/kvm_private.c
M       /usr/src_bisect/stand/defs.mk
M       /usr/src_bisect/stand/powerpc/boot1.chrp/Makefile
M       /usr/src_bisect/stand/powerpc/kboot/Makefile
M       /usr/src_bisect/sys/arm64/arm64/identcpu.c
M       /usr/src_bisect/sys/conf/kmod.mk
M       /usr/src_bisect/sys/conf/ldscript.powerpc
M       /usr/src_bisect/sys/kern/subr_pcpu.c
M       /usr/src_bisect/sys/powerpc/aim/mmu_oea64.c
M       /usr/src_bisect/sys/powerpc/ofw/ofw_machdep.c
M       /usr/src_bisect/sys/powerpc/powerpc/interrupt.c
M       /usr/src_bisect/sys/powerpc/powerpc/mp_machdep.c
M       /usr/src_bisect/sys/powerpc/powerpc/trap.c

/usr/src_bisect/sys/kern/subr_pcpu.c just has a comment about
pcpu_find in it (powerpc tied content).

Comment 18 Mark Millard 2017-12-15 00:51:25 UTC

(In reply to Jeff Roberson from comment #15)

No remote access to the serial console, sorry.

Cortext-A7 (armv7) based rpi2's (so, V1.1 or before) are no
longer in production. [V1.2 changed to Cortex-A53 (aarch64).]

The one I'm using has its slot for mmcsd0 with
a broken "hold the sdcard in" mechanism but an
eject mechanism that still works. I actively
hold the sdcard in until it is no longer needed
(has switched to the USB SSD on the powered hub).

I tried to get my hands on an advertised Cortex-A7
RPI2-B but what was delivered was a V1.2 (Cortex-A53)
one. This may be typical these days. So I'm still
stuck holding the sdcard in during early power-up.

However, based on Emmanuel Vadot's pine64 report
I expect that lots of other types of devices also
show the problem for USB drives. So far I've
avoided bringing forward the problem to any other
types of device.

Out of (from snapshots):

o 12.0-CURRENT arm GUMSTIX
o 12.0-CURRENT armv6 RPI-B
o 12.0-CURRENT armv7 BEAGLEBONE
o 12.0-CURRENT armv7 CUBIEBOARD
o 12.0-CURRENT armv7 CUBIEBOARD2
o 12.0-CURRENT armv7 CUBOX-HUMMINGBOARD
o 12.0-CURRENT armv7 RPI2
o 12.0-CURRENT armv7 PANDABOARD
o 12.0-CURRENT armv7 WANDBOARD
o 12.0-CURRENT aarch64 GENERIC
o 12.0-CURRENT aarch64 RPI3
o 12.0-CURRENT aarch64 PINE64

I have access to one each of the last 2
types as well and could build for them
if needed.

I also have that RPI2 V1.2 that was not
what was advertised, which may well be
covered by the RPI3 type of build.

You might want to talk to Emmanuel Vadot
since he is a committer and works on
supporting these types of boards. He
knows vastly more than I do. He may
have access to more board types as well.

RPI3's are fairly cheap and are easy to
find/order. Pine64+ 2GB's have more RAM
and are faster but not as common. There
are Pine64's with less RAM and more modern
variants that Emmanuel has provided support
for as I understand (lpddr3 memory instead,
for example).

[The banana Pi M3 (Cortex-A7) that I have
access to seems to have a power-connector
problem so I've not mentioned it above.
Still I could try dealing with it if needed.]

Comment 19 Mark Millard 2017-12-15 01:01:54 UTC

(In reply to Mark Millard from comment #17)

Actually I need to be clearer about versions:

kernel: -r326347

world: -r326726

This started for me when I jumped from
-r326192 to -r326726. I held world
constant and bisect'd just the kernel,
using a separate source tree.

So my world is from my /usr/src/ and
my kernel is from my /usr/src-bisect/ .

I could make both world and kernel be based
on a specific version if you want --or just
the kernel as I have been doing.

Also present are materials from:

# ls -lTd /usr/local/share/rpi-firmware /usr/local/share/u-boot/u-boot-rpi2
drwxr-xr-x  2 root  wheel  512 Nov 22 17:22:52 2017 /usr/local/share/rpi-firmware
drwxr-xr-x  2 root  wheel  512 Nov 22 17:22:52 2017 /usr/local/share/u-boot/u-boot-rpi2

from sysutils/rpi-firmware and sysutils/u-boot-rpi2 .

Comment 20 Jeff Roberson freebsd_committer

2017-12-15 02:12:19 UTC

Please pull in the patch from r326664 and retest.  It is safe to pull just that rev.

Comment 21 Mark Millard 2017-12-15 03:01:24 UTC

(In reply to Mark Millard from comment #16)

Why the source lines were odd. . .

Stupid /usr/src_bisect/ vintage mistake on my part:
it was at -r326346 instead of -r326347 (from something
prior that I'd looked at).

Updating /usr/src_bisect back to -r326347 leads to
the addr2line based lookup making sense. . .

# addr2line -e /usr/obj/DESTDIRs/clang-armv7dbg-installkernel_bisect/usr/lib/debug/boot/kernel/kernel.debug  0xc052d9f4
/usr/src_bisect/sys/vm/uma_core.c:792

line 792 is the sched_bind(curthread, cpu) line
in:
 
static void
cache_drain_safe(uma_zone_t zone)
{
        int cpu;
 
        /*
         * Polite bucket sizes shrinking was not enouth, shrink aggressively.
         */
        if (zone)
                cache_shrink(zone);
        else
                zone_foreach(cache_shrink);

        CPU_FOREACH(cpu) {
                thread_lock(curthread);
                sched_bind(curthread, cpu);
                thread_unlock(curthread);
 
                if (zone)
                        cache_drain_safe_cpu(zone);
                else
                        zone_foreach(cache_drain_safe_cpu);
        }
        thread_lock(curthread);
        sched_unbind(curthread);
        thread_unlock(curthread);
}

Comment 22 Mark Millard 2017-12-15 03:07:18 UTC

(In reply to Jeff Roberson from comment #20)

I'll try that patch but remember that I started
the bisect because jumping to -r326726 had the
problem. (I jumped from -r326192.)

So I started with this patch present when I
started my bisect effort.

Comment 23 Jeff Roberson freebsd_committer

2017-12-15 03:34:15 UTC

Thank you Mark, that makes more sense.  It looks like we bound to some cpu but never ran again.  Does arm do anything strange with its cpus?  are they 0..mp_maxid?  Anything odd with scheduling?

Comment 24 Mark Millard 2017-12-15 03:39:30 UTC

(In reply to Jeff Roberson from comment #20)


Interesting. I had a successful boot of
-r326347 with that patch present.

I'll spend some time testing if it seems
to reliably boot --after getting something
to eat.

So I may have to retry -r326726 and bisect
again if -r326347 with the patch seems to
reliably work but -r326726 does not seem
to reliably work. (We will see.)

Comment 25 Mark Millard 2017-12-15 03:43:37 UTC

(In reply to Jeff Roberson from comment #23)

I'm afraid I do not have the right background
knowledge for questions like "are they 0..mp_maxid? 
Anything odd with scheduling?" for the cores.

(Sometimes I can analyze source enough to find
answers to specific questions --other times not.
Such tends to be time consuming relative to where
I start from.)

Comment 26 Jeff Roberson freebsd_committer

2017-12-15 03:45:40 UTC

Sorry, the output of:
sysctl kern.smp
sysctl kern.sched

should tell me what I need to know.  Thanks!

Comment 27 Mark Millard 2017-12-15 04:28:29 UTC

(In reply to Jeff Roberson from comment #26)

The following is from the rpi2 booted
with the patched -r326347:

# sysctl kern.smp
kern.smp.forward_signal_enabled: 1
kern.smp.topology: 0
kern.smp.cpus: 4
kern.smp.disabled: 0
kern.smp.active: 1
kern.smp.maxcpus: 4
kern.smp.maxid: 3

# sysctl kern.sched
kern.sched.topology_spec: <groups>
 <group level="1" cache-level="2">
  <cpu count="4" mask="f">0, 1, 2, 3</cpu>
 </group>
</groups>

kern.sched.steal_thresh: 2
kern.sched.steal_idle: 1
kern.sched.balance_interval: 127
kern.sched.balance: 1
kern.sched.affinity: 1
kern.sched.idlespinthresh: 157
kern.sched.idlespins: 10000
kern.sched.static_boost: 152
kern.sched.preempt_thresh: 80
kern.sched.interact: 30
kern.sched.slice: 12
kern.sched.quantum: 94488
kern.sched.name: ULE
kern.sched.preemption: 1
kern.sched.cpusetsize: 4

Comment 28 Mark Millard 2017-12-15 05:07:25 UTC

So far the patched -r326347 has booted
fine, both debug-kernel and non-debug-kernel
builds.

I'm now building -r326664 in hopes that it will
be a good low bound if I have to start another
bisect when I retry -r326726.

For various reasons this is a full build. I
buildworld and buildkernel even when I only
install a trial kernel for a bisect. So, it
will be a while. (It is a cross build,
however.)

I'm doing non-debug, in other words, the way
I usually do things.

Comment 29 Mark Millard 2017-12-15 10:08:54 UTC

When I looked I found at least one
change between -r326664 and -r326726
that had later been reverted.

So I updated to -r326871 instead of
testing -r326726 :

# uname -apKU
FreeBSD rpi2 12.0-CURRENT FreeBSD 12.0-CURRENT  r326871M  arm armv7 1200054 1200054

It seems to be booting from the USB
SSD just fine: no hang-ups so far.

Comment 30 Mark Millard 2017-12-16 19:55:04 UTC

(In reply to Mark Millard from comment #29)

I have progressed the rpi2 V1.1 to -r32888 and it
is still booting and operating fine.

See:

https://lists.freebsd.org/pipermail/freebsd-arm/2017-December/017195.html

for the details of how the boot is set up to get
the kernel and world from the USB SSD, getting
very little from the sdcard first.

Comment 31 Emmanuel Vadot freebsd_committer

2017-12-19 22:52:49 UTC

On Pine64-LTS running c4bc9a29b59/r326935 it's still hanging when plugging an usb device

db> print *uma_kmem_limit
29400000
db> print *uma_kmem_total
 1813000
db> print *uma_reclaim_needed
       0
db> show page
vm_cnt.v_free_count: 487944
vm_cnt.v_inactive_count: 269
vm_cnt.v_active_count: 2995
vm_cnt.v_laundry_count: 0
vm_cnt.v_wire_count: 15673
vm_cnt.v_free_reserved: 726
vm_cnt.v_free_min: 3259
vm_cnt.v_free_target: 10858
vm_cnt.v_inactive_target: 16287
db> show vmem *kernel_arena
vmem 0xffff000000a2ec00 'kernel arena'
        quantum:        4096
        size:   413138944
        inuse:  411766784
        free:   1372160
        busy tags:      1031
        free tags:      4
                        inuse   size            free    size
        4096            541     2215936         3       12288
        8192            343     2809856         0       0
        16384           8       131072          0       0
        20480           108     2211840         0       0
        32768           5       163840          0       0
        36864           1       36864           0       0
        65536           8       524288          0       0
        69632           1       69632           0       0
        98304           1       98304           0       0
        131072          5       720896          0       0
        262144          6       1572864         0       0
        524288          1       524288          0       0
        1048576         1       1187840         1       1359872
        4194304         1       7905280         0       0
        268435456       1       391593984       0       0
db> c
root@pine64-lts:~ # sysctl hw.usb.debug=6     
hw.usb.debug: 0 -> 6                          
root@pine64-lts:~ # usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x000
4                                                       
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0x23 bRequest=0x01 wValue=0x0010 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_req_reset_port:                      
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0x23 bRequest=0x01 wValue=0x0014 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0x23 bRequest=0x03 wValue=0x0004 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0x23 bRequest=0x01 wValue=0x0010 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set       
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_req_reset_port: port 1 reset returning error=USB_ERR_NORMAL_COMPLETION
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0x23 bRequest=0x01 wValue=0x0010 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_req_reset_port:
usbd_req_reset_port:
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0x23 bRequest=0x01 wValue=0x0014 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0x23 bRequest=0x03 wValue=0x0004 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0x23 bRequest=0x01 wValue=0x0014 wIndex=0x0001 wLength=0x0000
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_do_request_flags: udev=0xfffffd0000b4e000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usbd_req_reset_port: port 1 reset returning error=USB_ERR_NORMAL_COMPLETION
usbd_do_request_flags: udev=0xfffffd0000b4f000 bmRequestType=0xa3 bRequest=0x00 wValue=0x0000 wIndex=0x0001 wLength=0x0004
usbd_do_request_flags: Handle Request function is set
usb_alloc_device: parent_dev=0xfffffd00008d4900, bus=0xffff00005aef7320, parent_hub=0xfffffd0000b4f000, depth=1, port_index=0, port_no=1, spe
ed=1, usb_mode=0
usb_alloc_device: device_index=2
usb_set_device_state: udev 0xfffffd0043231000 state DETACHED -> POWERED
usb_set_device_state: done
usb_alloc_device: calling usbd_req_set_address
usbd_req_set_address: setting device address=2
usbd_do_request_flags: udev=0xfffffd0043231000 bmRequestType=0x00 bRequest=0x05 wValue=0x0002 wIndex=0x0000 wLength=0x0000
usbd_do_request_flags: calling usbd_ctrl_transfert_setup
usbd_ctrl_transfer_setup: unsetup existing transfer
usbd_ctrl_transfer_setup: Setup new transfer
usbd_transfer_setup: Do stuff
usbd_transfer_setup: Starting loop
usbd_transfer_setup: Get matching endpoint
usbd_transfer_setup: Done
usbd_transfer_setup: set transfer endpoint pointer
usbd_transfer_setup: Call host or device ctrl transfer setup
usbd_transfer_setup: Done
usbd_transfer_setup: Get matching endpoint
usbd_transfer_setup: Done
usbd_transfer_setup: set transfer endpoint pointer
usbd_transfer_setup: Call host or device ctrl transfer setup
usbd_transfer_setup: Done
usbd_transfer_setup: Get matching endpoint
usbd_transfer_setup: Done
usbd_transfer_setup: set transfer endpoint pointer
usbd_transfer_setup: Call host or device ctrl transfer setup


Then it's just hanged and I can't enter ddb again.
The patch for the extra debug printf is here : https://people.freebsd.org/~manu/0001-Add-a-lot-of-debug-printf.patch

Comment 32 Mark Millard 2017-12-20 06:42:36 UTC

(In reply to Jeff Roberson from comment #26)

FYI: That same rpi2 with the same -r326888
head version just hung up at boot the same
way as the earlier versions had. It suggests
some sort of race condition or other source
of variability.

chain 34:
 thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
 thread 100072 (pid 23, uma) is on a run queue

db> bt
Tracing pid 23 tid 100072 td 0xd6dd3ae0
cpu_switch() at cpu_switch+0x10
         pc = 0xc05b8be8  lr = 0xc02af88c (sched_switch+0x7e4)
         sp = 0xd7dbdd48  fp = 0xd7dbdd78
sched_switch() at sched_switch+0x7e4
         pc = 0xc02af88c  lr = 0xc0288b40 (mi_switch+0x1ac)
         sp = 0xd7dbdd80  fp = 0xd7dbdda8
         r4 = 0x00000034  r5 = 0x00000000
         r6 = 0xd6dd3ae0  r7 = 0x00000000
         r8 = 0x00000000  r9 = 0xc0816768
        r10 = 0x00000100
mi_switch() at mi_switch+0x1ac
         pc = 0xc0288b40  lr = 0xc05609c0 (uma_reclaim_locked+0x258)
         sp = 0xd7dbddb0  fp = 0xd7dbddd8
         r4 = 0xc0833600  r5 = 0x00000002
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0xc0834168  r9 = 0x00000000
        r10 = 0x00000000
uma_reclaim_locked() at uma_reclaim_locked+0x258
         pc = 0xc05609c0  lr = 0xc0560f4c (uma_reclaim_worker+0xd8)
         sp = 0xd7dbdde0  fp = 0xd7dbde18
         r4 = 0xc095e528  r5 = 0xc095e538
         r6 = 0xc09685c8  r7 = 0x00000000
         r8 = 0xc0834168  r9 = 0xc095e53c
        r10 = 0x00000000
uma_reclaim_worker() at uma_reclaim_worker+0xd8
         pc = 0xc0560f4c  lr = 0xc0240144 (fork_exit+0xc0)
         sp = 0xd7dbde20  fp = 0xd7dbde40
         r4 = 0xd6dd3ae0  r5 = 0xd6b4b000
         r6 = 0xc0560e74  r7 = 0x00000000
         r8 = 0xd7dbde48  r9 = 0xc09518c0
        r10 = 0xd6b5b000
fork_exit() at fork_exit+0xc0
         pc = 0xc0240144  lr = 0xc0599c58 (swi_exit)
         sp = 0xd7dbde48  fp = 0x00000000
         r4 = 0xc0560e74  r5 = 0x00000000
         r6 = 0xc1ed63a0  r7 = 0x00002738
         r8 = 0xc0816768 r10 = 0xd6b5b000
swi_exit() at swi_exit
         pc = 0xc0599c58  lr = 0xc0599c58 (swi_exit)
         sp = 0xd7dbde48  fp = 0x00000000

lr = 0xc05609c0 (uma_reclaim_locked+0x258) ends up
being /usr/src_bisect/sys/vm/uma_core.c:792 (like before).

db> bt
Tracing pid 1 tid 100001 td 0xc3770000
cpu_switch() at cpu_switch+0x10
         pc = 0xc05b8be8  lr = 0xc02af88c (sched_switch+0x7e4)
         sp = 0xc3776718  fp = 0xc3776748
sched_switch() at sched_switch+0x7e4
         pc = 0xc02af88c  lr = 0xc0288b40 (mi_switch+0x1ac)
         sp = 0xc3776750  fp = 0xc3776778
         r4 = 0x000004c5  r5 = 0x00000000
         r6 = 0xc3770000  r7 = 0x00000000
         r8 = 0x00000000  r9 = 0xc0816768
        r10 = 0x00000104
mi_switch() at mi_switch+0x1ac
         pc = 0xc0288b40  lr = 0xc02d7964 (sleepq_wait+0x2c)
         sp = 0xc3776780  fp = 0xc3776798
         r4 = 0xc3770000  r5 = 0x00000000
         r6 = 0xc095e528  r7 = 0xc095e538
         r8 = 0x00000001  r9 = 0x00000001
        r10 = 0xc095e528
sleepq_wait() at sleepq_wait+0x2c
         pc = 0xc02d7964  lr = 0xc028789c (_sx_slock_hard+0x308)
         sp = 0xc37767a0  fp = 0xc37767f8
         r4 = 0x00000001  r5 = 0x00000000
         r6 = 0x00000000 r10 = 0xc095e528
_sx_slock_hard() at _sx_slock_hard+0x308
         pc = 0xc028789c  lr = 0xc055dbe8 (uma_zcreate+0xa4)
         sp = 0xc3776800  fp = 0xc3776840
         r4 = 0x00000000  r5 = 0xc095e528
         r6 = 0x00000078  r7 = 0xc06f8205
         r8 = 0xd6f32048  r9 = 0xc3770000
        r10 = 0xd6f32000
uma_zcreate() at uma_zcreate+0xa4
         pc = 0xc055dbe8  lr = 0xc053b8c0 (ffs_mount+0x80)
         sp = 0xc3776848  fp = 0xc3776978
         r4 = 0xd6f32000  r5 = 0x00000000
         r6 = 0x00000003  r7 = 0xc095e2a8
ffs_mount() at ffs_mount+0x80
         pc = 0xc053b8c0  lr = 0xc0344fbc (vfs_donmount+0x1490)
         sp = 0xc3776980  fp = 0xc3776b38
         r4 = 0xffffffff  r5 = 0xc0812870
         r6 = 0xc419afb0  r7 = 0x00000000
         r8 = 0xd6f32048  r9 = 0x00000000
        r10 = 0xd6f32000
vfs_donmount() at vfs_donmount+0x1490
         pc = 0xc0344fbc  lr = 0xc03481c8 (kernel_mount+0x3c)
         sp = 0xc3776b40  fp = 0xc3776b78
         r4 = 0xd7826000  r5 = 0x00000000
         r6 = 0xd7826000  r7 = 0x00000000
         r8 = 0xd6f2d010  r9 = 0xc418ffb0
        r10 = 0x00000000
kernel_mount() at kernel_mount+0x3c
         pc = 0xc03481c8  lr = 0xc034ad64 (parse_mount+0x458)
         sp = 0xc3776b80  fp = 0xc3776c58
         r4 = 0xd784c100  r5 = 0xd6f2d014
         r6 = 0xd7826000 r10 = 0x00000000
parse_mount() at parse_mount+0x458
         pc = 0xc034ad64  lr = 0xc034950c ($a.2+0x24)
         sp = 0xc3776c60  fp = 0xc3776dc0
         r4 = 0xd6f33018  r5 = 0xfffffff7
         r6 = 0xc3770000  r7 = 0xc41a67da
         r8 = 0xd6f2c080  r9 = 0x00000000
        r10 = 0xd6f33000
$a.2() at $a.2+0x24
         pc = 0xc034950c  lr = 0xc02168bc (start_init+0x60)
         sp = 0xc3776dc8  fp = 0xc3776e18
         r4 = 0xc3770000  r5 = 0xc376c000
         r6 = 0xc021685c  r7 = 0x00000001
         r8 = 0xc376c000  r9 = 0x00000000
        r10 = 0x00000000
start_init() at start_init+0x60
         pc = 0xc02168bc  lr = 0xc0240144 (fork_exit+0xc0)
         sp = 0xc3776e20  fp = 0xc3776e40
         r4 = 0xc3770000  r5 = 0xc376c000
         r6 = 0xc021685c  r7 = 0x00000000
         r8 = 0xc3776e48  r9 = 0x00000000
        r10 = 0x00000000
fork_exit() at fork_exit+0xc0
         pc = 0xc0240144  lr = 0xc0599c58 (swi_exit)
         sp = 0xc3776e48  fp = 0x00000000
         r4 = 0xc021685c  r5 = 0x00000000
         r6 = 0x00000000  r7 = 0x00000000
         r8 = 0x00000000 r10 = 0x00000000
swi_exit() at swi_exit
         pc = 0xc0599c58  lr = 0xc0599c58 (swi_exit)
         sp = 0xc3776e48  fp = 0x00000000

db> print *uma_kmem_limit
107f2000
db> print *uma_kmem_total
  8d8000
db> print *uma_reclaim_needed
       7
db> show page
vm_cnt.v_free_count: 199675
vm_cnt.v_inactive_count: 0
vm_cnt.v_active_count: 0
vm_cnt.v_laundry_count: 0
vm_cnt.v_wire_count: 3035
vm_cnt.v_free_reserved: 329
vm_cnt.v_free_min: 1341
vm_cnt.v_free_target: 4377
vm_cnt.v_inactive_target: 6565
db> show vmem *kernel_arena
vmem 0xc0958c80 'kernel arena'
        quantum:        4096
        size:   352321536
        inuse:  351490048
        free:   831488
        busy tags:      1143
        free tags:      3
                        inuse   size            free    size
        4096            985     4034560         0       0
        8192            11      90112           0       0
        12288           84      1032192         0       0
        16384           14      229376          0       0
        20480           2       40960           0       0
        32768           5       163840          0       0
        36864           1       36864           0       0
        65536           11      720896          0       0
        94208           1       94208           0       0
        131072          19      2990080         1       184320
        262144          1       270336          0       0
        524288          2       1708032         1       647168
        2097152         1       2097152         0       0
        4194304         2       12357632        0       0
        8388608         1       9773056         0       0
        16777216        2       52510720        0       0
        134217728       1       263340032       0       0

Comment 33 Jeff Roberson freebsd_committer

2017-12-20 06:45:53 UTC

(In reply to Mark Millard from comment #32)
Can you set uma_kmem_limit = ULONG_MAX; where it is declared in uma_core.c and test a theory?

Comment 34 Mark Millard 2017-12-20 07:11:58 UTC

(In reply to Jeff Roberson from comment #33)

[Note: The problem was reproduced with a non-debug
kernel build and this is also such. I normally do
not run debug builds.]

For reference:

/* kmem soft limit. */
static unsigned long uma_kmem_limit = ULONG_MAX;
static volatile unsigned long uma_kmem_total;


It still hung up. But it does look a little
different:

chain 36:
 thread 100000 (pid 0, swapper) blocked on lock 0xc0967474 (sleep mutex) "process lock"
 thread 100081 (pid 0, if_io_tqg_3) is on a run queue

if_io_tqg_3 is new.

But the allchain output eventually got:

chain 45:
 thread 100056 (pid 0, CAM taskq) sleeping on 0xc3764680 "-"
chain 46:
 thread 100073 (pid 0, if_config_tqg_0) blocked on lockmgr (null)EXCL
KDB: reentering
KDB: stack backtrace:
db_trace_self() at db_trace_self
         pc = 0xc05972cc  lr = 0xc0060f98 (db_trace_self_wrapper+0x30)
. . .
exception_exit() at exception_exit
         pc = 0xc0599cc8  lr = 0xc0258684 (lockmgr_chain+0x60)
         sp = 0xc3783858  fp = 0xc3783880
         r0 = 0x75657571  r1 = 0xc06cbd2d
         r2 = 0x00000000  r3 = 0x600001d3
         r4 = 0xc06cbd2d  r5 = 0xc06ddb13
         r6 = 0xc0964bf8  r7 = 0xc415b220
         r8 = 0xc06ddb69  r9 = 0xc3783860
        r10 = 0xc07fa250 r12 = 0x00000003
print_lockchain() at print_lockchain+0x58
         pc = 0xc02e1898  lr = 0xc02e1ad0 (db_show_allchains+0x9c)
         sp = 0xc3783888  fp = 0xc37838a0
         r4 = 0x0000002e  r5 = 0xc09673c8
         r6 = 0xc0964bf8  r7 = 0xd6dd3740
         r8 = 0xc06ddbb2  r9 = 0xc06cbd2d
        r10 = 0x00000000
db_show_allchains() at db_show_allchains+0x9c
         pc = 0xc02e1ad0  lr = 0xc005d934 (db_command+0x268)
         sp = 0xc37838a8  fp = 0xc3783948
         r4 = 0x00000001  r5 = 0x00000000
         r6 = 0xc06f7397  r7 = 0xc02e1a34
         r8 = 0xc0964b74  r9 = 0xc07fb448
. . .

Comment 35 Mark Millard 2017-12-20 07:20:55 UTC

(In reply to Mark Millard from comment #34)

In:

QUOTE
It still hung up. But it does look a little
different:

chain 36:
 thread 100000 (pid 0, swapper) blocked on lock 0xc0967474 (sleep mutex) "process lock"
 thread 100081 (pid 0, if_io_tqg_3) is on a run queue

if_io_tqg_3 is new.
END QUOTE.

I also should have said the swapper is new.
And "a little" was silly.

Comment 36 Mark Millard 2017-12-20 08:00:15 UTC

(In reply to Mark Millard from comment #35)

I'm sorry, I was a little distracted and missed
sending the part that was similar to the past
reports:

chain 35:
 thread 100001 (pid 1, kernel) blocked on sx "umadrain" XLOCK
 thread 100072 (pid 23, uma) is on a run queue


The chain that I reported in #33 and #34 is in addition to the
the above, chains repeated below:

chain 36:
 thread 100000 (pid 0, swapper) blocked on lock 0xc0967474 (sleep mutex) "process lock"
 thread 100081 (pid 0, if_io_tqg_3) is on a run queue
. . .
chain 46:
 thread 100073 (pid 0, if_config_tqg_0) blocked on lockmgr (null)EXCL
KDB: reentering
KDB: stack backtrace:
. . .

Comment 37 Jeff Roberson freebsd_committer

2017-12-20 08:04:22 UTC

In uma_reclaim_wakeup() can you add:

printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed);

I'm not sure why you're hitting the limit at all during boot.

Comment 38 Mark Millard 2017-12-20 08:39:27 UTC

(In reply to Jeff Roberson from comment #37)

Is the ULONG_MAX still required?

It did not look like I'd be likely to ever
boot with it.

Note: I reverted and tried some more and
it booted the last time that I tried. So
it is currently at a sh prompt.

Racy or dependent on uninitialized memory
that is not stable in value or some such?

Comment 39 Jeff Roberson freebsd_committer

2017-12-20 08:41:46 UTC

I think the ulong makes sense.  There are possibly a few allocations that happen before the value is initialized and these could trigger wakeups.

Can you add the printf and show me the output from it?

Comment 40 Mark Millard 2017-12-20 08:53:11 UTC

(In reply to Jeff Roberson from comment #39)

With ULONG_MAX it died in a state that
could not even complete a show allchains
at the db> prompt. This might interfere
with gathering other information.

For reference:

/* kmem soft limit. */
static unsigned long uma_kmem_limit = ULONG_MAX;
static volatile unsigned long uma_kmem_total;

. . .

void
uma_reclaim_wakeup(void)
{
printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed);
        if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0)
                wakeup(uma_reclaim);
}

I'll reply when I've got it in place.

Comment 41 Jeff Roberson freebsd_committer

2017-12-20 09:14:15 UTC

ah it should be LONG_MAX not ULONG_MAX.

Comment 42 Mark Millard 2017-12-20 09:16:25 UTC

(In reply to Mark Millard from comment #40)

Successful boots do not show the printf output
(so far).

A bunch of attempts have not gotten a boot
hangup.

It looks like I'll just have to use it with the
two changes and hope for a failure with some
output from the new printf at some point. I
have no clue how to force a failure on demand.

I'll note that after the -r326871 on the 15th
(comment #29) the rpi2 spent most of its time
doing poudriere build experiments while I did
other things. So it has not been rebooted
all that much since then.

Comment 43 Mark Millard 2017-12-20 09:18:34 UTC

(In reply to Jeff Roberson from comment #41)

Okay. I'll update to LONG_MAX.

But my guess is that it will still be a
"use it until failure" sort of context
with no quick result.

Comment 44 Mark Millard 2017-12-20 09:37:04 UTC

(In reply to Mark Millard from comment #43)

Looks like I installed the wrong kernel
initially, no printf available to output.

So this is the first test with the printf
in place. . .

Still no failures. Looks like "use until
a failure shows up".

For reference:

/* kmem soft limit. */
static unsigned long uma_kmem_limit = LONG_MAX;
static volatile unsigned long uma_kmem_total;

. . .

void
uma_reclaim_wakeup(void)
{
printf("limit %lX, total %lX, needed %d\n", uma_kmem_limit, uma_kmem_total, uma_reclaim_needed);
        if (atomic_fetchadd_int(&uma_reclaim_needed, 1) == 0)
                wakeup(uma_reclaim);
}

Comment 45 Jeff Roberson freebsd_committer

2017-12-20 20:45:13 UTC

What it looks like to me is that we woke up the reclaim thread multiple times early in boot before arm was prepared to run the thread and bind to multiple CPUs.  There may be a race that makes it work sometimes.  I believe the = LONG_MAX fix should resolve the issue.

Comment 46 Mark Millard 2017-12-20 20:51:23 UTC

(In reply to Jeff Roberson from comment #45)

Okay. I'll report here if I see another hang.

If I progress past -r326888 I'll be sure that
the LONG_MAX is still present --and for now I'll
leave the printf as well.

Thanks.

Comment 47 Emmanuel Vadot freebsd_committer

2017-12-20 21:10:51 UTC

The LONG_MAX doesn't change anything for me on Pine64.

Comment 48 Mark Millard 2017-12-21 20:39:44 UTC

(In reply to Emmanuel Vadot from comment #47)

I have replicated your hangs-on-plugin-of-
USB-drive problem on a Pine64+ 2GB. The
printf did not output.

I first updated the Pine64+ 2GB that I have
access to be based on the patched -r326888
based source that I've been using for my
recent rpi2 experiments.

It boots via the usdcard slot, not USB.
(eMMC via an adapter.) Plugging in a USB
drive (via plugging the a USB hub it is
on) hangs the Pine64+ 2GB.

<CR>~^B does not get to the db> prompt.

This was with a non-debug kernel.

Comment 49 Jeff Roberson freebsd_committer

2017-12-21 20:45:14 UTC

After debugging this further on IRC we believe this may be an existing pmap race that is made worse by combining kernel/kmem arenas.

Disabling SMP makes the problem go away.

Adding printfs makes it clear that it hangs somewhere in vm_page_alloc_contig, but printing in this function changes timing sufficiently that the problem goes away.  JHB noticed that there are no IPIs in pmap_change_attr on arm.  Could this be a problem with the direct map?

Comment 50 Mark Millard 2017-12-21 20:49:05 UTC

(In reply to Mark Millard from comment #48)

I tried plugging in just the hub, without the
drive. That worked. On the console serial port
it shows the following when the hub is plugged
in:

ugen0.2: <vendor 0x05e3 product 0x0610> at usbus0
uhub2 on uhub0
uhub2: <vendor 0x05e3 product 0x0610, class 9/0, rev 2.00/92.10, addr 2> on usbus0
uhub2: MTT enabled
uhub2: 4 ports with 1 removable, self powered

And the following when unplugged:

ugen0.2: <vendor 0x05e3 product 0x0610> at usbus0 (disconnected)
uhub2: at uhub0, port 1, addr 2 (disconnected)
uhub2: detached

Comment 51 Mark Millard 2017-12-21 20:59:40 UTC

(In reply to Mark Millard from comment #50)

I have no such problem with plugging in
USB drives on the rpi3 that is running
-r326888 (unpatched as yet). For example:

ugen0.4: <vendor 0x05e3 product 0x0610> at usbus0
uhub2 on uhub1
uhub2: <vendor 0x05e3 product 0x0610, class 9/0, rev 2.00/92.10, addr 4> on usbus0
uhub2: MTT enabled
uhub2: 4 ports with 1 removable, self powered
ugen0.5: <OWC Envoy Pro mini> at usbus0
umass0 on uhub2
umass0: <OWC Envoy Pro mini, class 0/0, rev 2.10/1.00, addr 5> on usbus0
umass0:  SCSI over Bulk-Only; quirks = 0x0100
umass0:0:0: Attached to scbus0
da0 at umass-sim0 bus 0 scbus0 target 0 lun 0
da0: <OWC Envoy Pro mini 0> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number <REPLACED>
da0: 40.000MB/s transfers
da0: 228936MB (468862128 512 byte sectors)
da0: quirks=0x2<NO_6_BYTE>

(Same hub and drive as I used on the Pine64+ 2GB.)

Comment 52 Mark Millard 2017-12-22 08:44:30 UTC

(In reply to Jeff Roberson from comment #45)

I've tried building -r327075 for various
TARGET_ARCH's and they fail to compile with
LONG_MAX in place:

TARGET_ARCH=powerpc (via gcc 4.2.1, llvm fail to build it normally)
--- uma_core.o ---
Compiler executable checksum: 9060ea2e8b86c7caeab9ffe5a850239f
/usr/src/sys/vm/uma_core.c:151: error: 'LONG_MAX' undeclared here (not in a function)
*** [uma_core.o] Error code 1

make[2]: stopped in /usr/obj/powerpcvtsc_clang_gcc421/powerpc.powerpc/usr/src/powerpc.powerpc/sys/GENERICvtsc-NODBG

TARGET_ARCH=powerpc64 (via system-clang)
--- uma_core.o ---
static unsigned long uma_kmem_limit = LONG_MAX;
                                      ^
--- uma_core.o ---
/usr/src/sys/vm/uma_core.c:151:39: error: use of undeclared identifier 'LONG_MAX'
--- uma_core.o ---
1 error generated.

And so on.

Apparently no header that defines it has been included.

Comment 53 commit-hook freebsd_committer

2018-01-02 04:37:00 UTC

A commit references this bug:

Author: jeff
Date: Tue Jan  2 04:35:57 UTC 2018
New revision: 327485
URL: https://svnweb.freebsd.org/changeset/base/327485

Log:
  Fix arc after r326347 broke various memory limit queries.  Use UMA features
  rather than kmem arena size to determine available memory.

  Initialize the UMA limit to LONG_MAX to avoid spurious wakeups on boot before
  the real limit is set.

  PR:		224330 (partial), 224080
  Reviewed by:	markj, avg
  Sponsored by:	Netflix / Dell EMC Isilon
  Differential Revision:	https://reviews.freebsd.org/D13494

Changes:
  head/sys/cddl/compat/opensolaris/sys/kmem.h
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
  head/sys/vm/uma.h
  head/sys/vm/uma_core.c
  head/sys/vm/uma_int.h

Comment 54 Jeff Roberson freebsd_committer

2018-01-23 21:29:42 UTC

Andy, I am assigning this to you because JHB and I believe it is an existing race in pmap.  Changing timing between calls to the allocator and pmap_change_attr() resolves the issue.

Comment 55 gergely.czuczy 2018-02-01 05:17:12 UTC

May I ask when can we expect a fix for this? I would like to update an rpi3 and a pine64, but this is blocking them from booting on their own. boot -v works this around, however I can't always have a keyboard/console attached to manually aid the booting process.

Comment 56 Mark Millard 2018-02-01 05:48:00 UTC

(In reply to gergely.czuczy from comment #55)

So far as I know, at this point no one knows
just what source code is wrong. No specific
source code fixes have been identified from
what I can tell.

So it looks to be an open-ended effort at
this point: no pre-kniown schedule is likely.

Comment 57 Kyle Evans freebsd_committer

2018-02-07 05:10:10 UTC

(In reply to gergely.czuczy from comment #55)

For what it's worth, setting boot_verbose="YES" in loader.conf(5) will also do the trick and alleviate the need for serial on boot. This is what I do for my Pine64, at least.

Comment 58 gergely.czuczy 2018-02-07 05:16:40 UTC

That works for my pine64, however that doesn't do the trick for the rpi3. And unfortunately my rpi3 is the brewery controller, where i don't have the opportunity to attach a serial console.

So that's why it's a very serious issue for me, I can't update my beer brewing stack because of this bug :)

Comment 59 Ed Maste freebsd_committer

2018-02-15 14:51:32 UTC

For reference USB is also non-functional on my arm64 Softiron Overdrive 1000, at r329060.

Comment 60 Mark Millard 2018-05-14 03:21:22 UTC

(In reply to Ed Maste from comment #59)

FYI: on a Pine64+ 2GiB running head -r333157 I can plug in
a USB drive into the bottom USB port, and mount and use
/dev/da0p1:

=>       34  468862061  da0  GPT  (224G)
         34       2014       - free -  (1.0M)
       2048  413138944    1  PINE642Groot  (197G)
  413140992   12582912    2  PINE642Gswap  (6.0G)
  425723904     131072    3  Basic data partition  (64M)
  425854976   43007119       - free -  (21G)

As I remember the upper USB port was ignored when I
tried there.

Note: I'm not booting from USB currently. Instead I'm
using a EMMC on an adapter to provide /dev/mmcsd0 and
I'm booting it from that.

Comment 61 Emmanuel Vadot freebsd_committer

2018-08-20 18:17:16 UTC

Fixed a while ago

Comment 62 Mark Millard 2018-08-20 18:23:30 UTC

(In reply to Emmanuel Vadot from comment #61)

Reconfirmed recently for head -r337400 on a Pine64+ 2GB
on the lower USB port. (It had been a very long time
since I'd updated/tried.) In this case the root file
system and swap partition were on the USB drive, unlike
comment #60.

Comment 63 Emmanuel Vadot freebsd_committer

2018-08-20 18:28:47 UTC

(In reply to Mark Millard from comment #62)

No, this is not what this bug is about (whatever you are talking about)
The board doesn't hang
If you have another problem please fill a new PR with info on how to reproduce.

Comment 64 Mark Millard 2018-08-20 18:41:48 UTC

(In reply to Emmanuel Vadot from comment #63)

My wording was unclear: I was trying to report
that things were working for the port I happened
to use: no hangs.

I've recently done over 80 hours of builds (world,
kernel, ports) based on USB root file systems and
swap partitions, multiple reboots, power downs, and
power ups involved. It all worked fine. I happened
to use the lower USB port.