Bug 232149

Summary: graphics/graphviz: (x11/pixman) Segmentation Fault (FreeBSD12/aarch64)
Product: Ports & Packages Reporter: Stefan Rink <stefanrink>
Component: Individual Port(s)Assignee: Michal Meloun <mmel>
Status: Closed FIXED    
Severity: Affects Some People CC: emaste, mikael, mmel, val, w.schwarzenfeld
Priority: ---    
Version: Latest   
Hardware: arm64   
OS: Any   
Bug Depends on: 233204    
Bug Blocks:    

Description Stefan Rink 2018-10-10 13:28:19 UTC
Trying to get graphviz to make an image results in a segmentation fault.
# cat /tmp/test.dot | dot -Tpng
Segmentation fault (core dumped)


Already at latest base and kernel (FreeBSD 12)
# uname -a
FreeBSD NODE001 12.0-ALPHA8 FreeBSD 12.0-ALPHA8 #3 r339012M: Mon Oct  8 20:23:15 UTC 2018     freebsd@NODE005:/usr/obj/usr/src/arm64.aarch64/sys/sopine  arm64

Started troubleshooting myself but got a bit stuck at this weird curthread pointer.


# gdb /usr/local/bin/dot dot.core
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "aarch64-portbld-freebsd12.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/local/bin/dot...done.
[New LWP 100082]
Core was generated by `dot -v -Tpng'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  _thr_rtld_rlock_acquire (lock=0x411dec80) at /usr/src/lib/libthr/thread/thr_rtld.c:125
125             THR_CRITICAL_ENTER(curthread);
(gdb) bt full
#0  _thr_rtld_rlock_acquire (lock=0x411dec80) at /usr/src/lib/libthr/thread/thr_rtld.c:125
        l = 0x411dec80
        curthread = 0x801800000000001
        errsave = <error reading variable errsave (Cannot access memory at address 0x80180000000018d)>
#1  0x00000000402390c4 in rlock_acquire (lock=0x40270090 <rtld_locks>, lockstate=0xffffffffab40) at /usr/src/libexec/rtld-elf/rtld_lock.c:209
No locals.
#2  0x0000000040232bec in _rtld_bind (obj=0x409bf000, reloff=96) at /usr/src/libexec/rtld-elf/rtld.c:789
        lockstate = {lockstate = 1, env = {{_sjb = {5192296858134625181781816927844096, 3541774862153317871616, 281474976688960, 
                5192296858162646120099333491982448, 5192296858205737714255517928390658, 0, 19849560668804171569190382992, 
                85264437479619916114209298589211951104, 13844966854071681024, 4624070917402656768, 0, 20249735019133469358624320736, 
                5192296858152168369465466474459136, 85, 36893769622395796048, 36893488147419103233, 19951829344161559554618773900, 
                19999637623203440009873805312, 20147593866117581453433744528, 1084182528, 20147734725455328299569884368, 19851772949927161933322518824, 
                19995557498562240636417077544, 20250678256497908875248176320, 19849563546496247067880435504, 36893488147419103233, 
                5192296858142299027316480101316352, 11510768301995844169728, 281474976689376, 5192296858162646120099333491982448, 
                5192296858205737714255517928390658, 0}}}}
        rel = <optimized out>
        defobj = <optimized out>
        def = <optimized out>
        where = <optimized out>
        target = <optimized out>
#3  0x000000004023007c in _rtld_bind_start () at /usr/src/libexec/rtld-elf/aarch64/rtld_start.S:93
No locals.
#4  0x00000000416efd50 in pixman_image_composite32 (op=PIXMAN_OP_SRC, src=0x42631400, mask=0x0, dest=0x42631b00, src_x=0, src_y=0, mask_x=0, mask_y=0, 
    dest_x=0, dest_y=0, width=11, height=11) at pixman.c:686
        src_format = PIXMAN_a8
        mask_format = 0
        dest_format = PIXMAN_a8
        region = {extents = {x1 = 0, y1 = 0, x2 = 11, y2 = 11}, data = 0x0}
        extents = {x1 = 0, y1 = 0, x2 = 11, y2 = 11}
        imp = 0x424efa00
        func = 0x418efa74 <fast_composite_src_memcpy>
        info = {op = 1116379648, src_image = 0xffffffffb010, mask_image = 0x1, dest_image = 0x42631400, src_x = 0, src_y = 0, mask_x = 1113791232, 
          mask_y = 0, dest_x = -2145384446, dest_y = -2145384446, width = 0, height = 1048576, src_flags = 0, mask_flags = 1073741824, 
          dest_flags = 1074791425}
        pbox = 0xffffffffaff0
        n = 0
#5  0x0000000041937648 in pixman_glyph_cache_insert (cache=0x428a9a00, font_key=0x42713e00, glyph_key=0x2a, origin_x=0, origin_y=11, image=0x42631400)
    at pixman-glyph.c:286
        glyph = 0x4257fbd0
---Type <return> to continue, or q <return> to quit--- 
        width = 11
        height = 11
#6  0x00000000410cebd0 in ?? () from /usr/local/lib/libcairo.so.2
No symbol table info available.
#7  0x0000ffffffffbb04 in ?? ()
No symbol table info available.
Backtrace stopped: not enough registers or memory available to unwind further
Comment 1 Walter Schwarzenfeld freebsd_triage 2018-10-10 14:15:37 UTC
The command
 cat test.dot | dot -Tpng
also not works on amd64. It does not segfault, but it "hangs" anywhere.

The command
dot -Tpng test.dot -o test.png

works.
Comment 2 Walter Schwarzenfeld freebsd_triage 2018-10-10 14:19:34 UTC
Uups
cat test.dot | dot -Tpng
works, but with -o
cat test.dot | dot -Tpng -o test.png.
Comment 3 Stefan Rink 2018-10-10 14:24:41 UTC
# dot -Tpng test.dot -o test.png
Segmentation fault (core dumped)

As a workaround I made a dot shell file what does ssh to my machine;

ssh test@FreeBSDi7 -C clusterdot $*

On the i7 I have clusterdot shell file with;

dot $*

What works fine but isn't the best solution I guess. :-)

It's only on the ARM64 it fails, I could try with ARM also but that will take a while to setup..


---

# dot -v -Tpng test.dot -o test.png
dot - graphviz version 2.40.1 (20161225.0304)
Using render: cairo:cairo
Using device: png:cairo:cairo
libdir = "/usr/local/lib/graphviz"
Activated plugin library: libgvplugin_dot_layout.so.6
Using layout: dot:dot_layout
The plugin configuration file:
        /usr/local/lib/graphviz/config6
                was successfully loaded.
    render      :  cairo dot dot_json fig gd json json0 map mp pic pov ps svg tk vml vrml xdot xdot_json
    layout      :  circo dot fdp neato nop nop1 nop2 osage patchwork sfdp twopi
    textlayout  :  textlayout
    device      :  canon cmap cmapx cmapx_np dot dot_json eps fig gd gd2 gif gv imap imap_np ismap jpe jpeg jpg json json0 mp pdf pic plain plain-ext png pov ps ps2 svg svgz tk vml vmlz vrml wbmp x11 xdot xdot1.2 xdot1.4 xdot_json xlib
    loadimage   :  (lib) eps gd gd2 gif jpe jpeg jpg png ps svg xbm
fontname: "Helvetica" resolved to: (ps:pango  DejaVu Sans, ) (PangoCairoFcFont) "DejaVu Sans, Book" /usr/local/share/fonts/dejavu/DejaVuSans.ttf
pack info:
  mode   undefined
  size   0
  flags  0
  margin 8
pack info:
  mode   node
  size   0
  flags  0
network simplex:  20 nodes 19 edges maxiter=2147483647 balance=2
network simplex: 20 nodes 19 edges 0 iter 0.00 sec
network simplex:  4 nodes 4 edges maxiter=2147483647 balance=2
network simplex: 4 nodes 4 edges 0 iter 0.00 sec
network simplex:  5 nodes 4 edges maxiter=2147483647 balance=2
network simplex: 5 nodes 4 edges 0 iter 0.00 sec
network simplex:  4 nodes 5 edges maxiter=2147483647 balance=2
network simplex: 4 nodes 5 edges 0 iter 0.00 sec
network simplex:  2 nodes 1 edges maxiter=2147483647 balance=1
network simplex: 2 nodes 1 edges 0 iter 0.00 sec
Maxrank = 1, minrank = 0
mincross: pass 0 iter 0 trying 0 cur_cross 0 best_cross 0
mincross oneDegreeRelationshipsDiagram: 0 crossings, 0.00 secs.
network simplex:  3 nodes 2 edges maxiter=2147483647 balance=2
network simplex: 3 nodes 2 edges 0 iter 0.00 sec
routesplines: 1 edges, 3 boxes 0.00 sec
Using render: cairo:cairo
Using device: png:cairo:cairo
dot: allocating a 1337K cairo image surface (657 x 521 pixels)
Segmentation fault (core dumped)
Comment 4 Stefan Rink 2018-10-10 21:26:37 UTC
First tests are on our Sopine cluster and I don't have a 32bit world for it yet.

I do run a bit of customized kernel but nothing in that part of the kernel has changed but to be sure I got another brand of ARM64 CPU...

So just tested this on the RPI-III with the corresponding aarch64 image and same problem.
Also just tested on the latest ARM (32bit) image for RPI-II and that worked so it's aarch64 specific.

Reproduce;
Boot the aarch64 image on RPI3 or Pine64+/Sopine
pkg install graphviz

echo 'digraph "test" { "test":"testx" } ' | dot -Tpng -otest.png

I think this should move to AARCH64 it seems arch specific.
Comment 5 Mikael Urankar freebsd_committer freebsd_triage 2018-11-14 16:11:00 UTC
Cf bug #233204 for a wip patch
Comment 6 Stefan Rink 2018-11-16 01:24:46 UTC
(In reply to mikael.urankar from comment #5)

You sir earned a cookie or a beer or something!

That indeed fixed the issue, I will test further but the first tests seemed to work so I'll roll it out on our cluster and test some more in a bigger setting and cross my fingers that all nodes will stay stable. :-) 

I also have a feeling that will fix some other threading issues as well.. (i.e. Python < 3.7 issues)
 
Fixed with: https://github.com/strejda/freebsd/commit/981459604061136fc68c020ff6124fab0d1196aa
Comment 7 Val Packett 2018-11-17 11:58:22 UTC
(In reply to Stefan Rink from comment #6)

Please don't mark FIXED until the fix actually lands in upstream FreeBSD
Comment 8 Michal Meloun freebsd_committer freebsd_triage 2018-12-12 12:06:05 UTC
Final (and much more complex) version of this patch is under review now:
https://reviews.freebsd.org/D18417

Michal
Comment 9 Stefan Rink 2018-12-13 14:52:21 UTC
All 42 nodes are still up and running with https://github.com/strejda/freebsd/commit/981459604061136fc68c020ff6124fab0d1196aa!
Total crashcounter: 0

Will test https://reviews.freebsd.org/D18417 when I have some spare time.
Comment 10 commit-hook freebsd_committer freebsd_triage 2018-12-15 10:39:04 UTC
A commit references this bug:

Author: mmel
Date: Sat Dec 15 10:38:10 UTC 2018
New revision: 342113
URL: https://svnweb.freebsd.org/changeset/base/342113

Log:
  Improve R_AARCH64_TLSDESC relocation.
  The original code did not support dynamically loaded libraries and used
  suboptimal access to TLS variables.
  New implementation removes lazy resolving of TLS relocation - due to flaw
  in TLSDESC design is impossible to switch resolver function at runtime
  without expensive locking.

  Due to this, 3 specialized resolvers are implemented:
   - load time resolver for TLS relocation from libraries loaded with main
     executable (thus with known TLS offset).
   - resolver for undefined thread weak symbols.
   - slower lazy resolver for dynamically loaded libraries with fast path for
     already resolved symbols.

  PR:		228892, 232149, 233204, 232311
  MFC after:	2 weeks
  Differential Revision:	https://reviews.freebsd.org/D18417

Changes:
  head/libexec/rtld-elf/aarch64/reloc.c
  head/libexec/rtld-elf/aarch64/rtld_start.S
  head/libexec/rtld-elf/amd64/reloc.c
  head/libexec/rtld-elf/arm/reloc.c
  head/libexec/rtld-elf/i386/reloc.c
  head/libexec/rtld-elf/mips/reloc.c
  head/libexec/rtld-elf/powerpc/reloc.c
  head/libexec/rtld-elf/powerpc64/reloc.c
  head/libexec/rtld-elf/riscv/reloc.c
  head/libexec/rtld-elf/rtld.c
  head/libexec/rtld-elf/rtld.h
  head/libexec/rtld-elf/sparc64/reloc.c
Comment 11 Walter Schwarzenfeld freebsd_triage 2019-08-28 08:28:02 UTC
Forgotten to close?
Comment 12 Mikael Urankar freebsd_committer freebsd_triage 2019-08-28 08:46:06 UTC
(In reply to Walter Schwarzenfeld from comment #11)
yes, can you close it please?