Trying to copy files to an msdosfs file system on a USB stick causes the system to panic. This is reproducible, the backtrace always looks the same. Unread portion of the kernel message buffer: Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xe fault code = supervisor write data, page not present instruction pointer = 0x8:0xffffff00705ba1f0 stack pointer = 0x10:0xffffffffaf0fe2e0 frame pointer = 0x10:0xffffffffaf0fe390 code segment = base rx0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 2047 (cp) trap number = 12 panic: page fault cpuid = 0 Uptime: 13m32s Physical memory: 2030 MB Dumping 209 MB: 194 178 162 146 130 114 98 82 66 50 34 18 2 #0 doadump () at pcpu.h:194 194 pcpu.h: No such file or directory. in pcpu.h (kgdb) bt #0 doadump () at pcpu.h:194 #1 0x0000000000000004 in ?? () #2 0xffffffff801ff9c1 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #3 0xffffffff801ffdf2 in panic (fmt=0x104 <Address 0x104 out of bounds>) at /usr/src/sys/kern/kern_shutdown.c:572 #4 0xffffffff803dbe8a in trap_fatal (frame=0xffffff0001ff46a0, eva=18446742974281863168) at /usr/src/sys/amd64/amd64/trap.c:724 #5 0xffffffff803dc231 in trap_pfault (frame=0xffffffffaf0fe230, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:641 #6 0xffffffff803dcaef in trap (frame=0xffffffffaf0fe230) at /usr/src/sys/amd64/amd64/trap.c:410 #7 0xffffffff803c392e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #8 0xffffff00705ba1f0 in ?? () #9 0x00000009802813e4 in ?? () #10 0xffffff00705ba1f0 in ?? () #11 0xffffff0001ff46a0 in ?? () #12 0xffffff0005659700 in ?? () #13 0xffffffffaf0fe4e0 in ?? () #14 0x0000000000003041 in ?? () #15 0xffffff0001ff46a0 in ?? () #16 0xffffffff80416924 in cdrom_rootdevnames () ---Type <return> to continue, or q <return> to quit--- #17 0x000000000000080e in ?? () #18 0x0000000000000000 in ?? () #19 0xffffff00705ba1f0 in ?? () #20 0x0000000000000000 in ?? () #21 0xffffff007cf86ec8 in ?? () #22 0xffffff0001ff46a0 in ?? () #23 0xffffff0005d5a820 in ?? () #24 0x0000000000009000 in ?? () #25 0xffffff00705ba1f0 in ?? () #26 0xffffffffaf0fe4e0 in ?? () #27 0x0000000000000000 in ?? () #28 0x0000000000000004 in ?? () #29 0xffffffff803bceba in vnode_pager_getpages (object=0xffffff0001ff46a0, m=0x0, count=Variable "count" is not available. ) at vnode_if.h:1129 #30 0xffffffff803a87d0 in vm_fault (map=0xffffff0005b9f000, vaddr=34368442368, fault_type=1 '\001', fault_flags=0) at vm_pager.h:130 #31 0xffffffff803dc0ae in trap_pfault (frame=0xffffffffaf0fe740, usermode=0) at /usr/src/sys/amd64/amd64/trap.c:618 #32 0xffffffff803dcaef in trap (frame=0xffffffffaf0fe740) at /usr/src/sys/amd64/amd64/trap.c:410 #33 0xffffffff803c392e in calltrap () at /usr/src/sys/amd64/amd64/exception.S:169 #34 0xffffffff803db4ed in copyin () at /usr/src/sys/amd64/amd64/support.S:303 ---Type <return> to continue, or q <return> to quit--- #35 0xffffffff802063f7 in uiomove (cp=0xffffffff9bc99000, n=4096, uio=0xffffffffaf0feb10) at /usr/src/sys/kern/kern_subr.c:170 #36 0xffffffff801a5fb2 in msdosfs_write (ap=Variable "ap" is not available. ) at /usr/src/sys/fs/msdosfs/msdosfs_vnops.c:812 #37 0xffffffff803f89ae in VOP_WRITE_APV (vop=0xffffffff805421a0, a=0xffffffffaf0fea20) at vnode_if.c:691 #38 0xffffffff80282797 in vn_write (fp=0xffffff0070379000, uio=0xffffffffaf0feb10, active_cred=Variable "active_cred" is not available. ) at vnode_if.h:373 #39 0xffffffff80233a0f in dofilewrite (td=0xffffff0001ff46a0, fd=4, fp=0xffffff0070379000, auio=0xffffffffaf0feb10, offset=Variable "offset" is not available. ) at file.h:254 #40 0xffffffff80233cbb in kern_writev (td=0xffffff0001ff46a0, fd=4, auio=0xffffffffaf0feb10) at /usr/src/sys/kern/sys_generic.c:401 #41 0xffffffff80233d28 in write (td=Variable "td" is not available. ) at /usr/src/sys/kern/sys_generic.c:317 #42 0xffffffff803dc49c in syscall (frame=0xffffffffaf0fec70) at /usr/src/sys/amd64/amd64/trap.c:852 #43 0xffffffff803c3b3b in Xfast_syscall () at /usr/src/sys/amd64/amd64/exception.S:290 #44 0x000000080070c5bc in ?? () Previous frame inner to this frame (corrupt stack?) (kgdb) How-To-Repeat: Just write a file on an msdosfs mount.
State Changed From-To: open->feedback To submitter: are you able to connect the USB stick to a machine running Windows and run chkdsk, to confirm that the filesystem is not invalid? (Although we should ideally be resiliant to corrupt filesystems, if it still panics after a chkdisk then it's a more serious problem...) Also, can you give some detail about the system in question? How big is the USB stick? Are there any modifications to your custom kernel that may be related in amy way?
Responsible Changed From-To: freebsd-bugs->gavin Track
gavin@FreeBSD.org wrote: > To submitter: are you able to connect the USB stick to a machine > running Windows and run chkdsk, to confirm that the filesystem > is not invalid? (Although we should ideally be resiliant to > corrupt filesystems, if it still panics after a chkdisk then it's > a more serious problem...) > I have already checked the stick under windows. Chkdisk did not find any problems, but the panic still occurs. The problem started after I updated RELENG_7 on my machine this weekend. The previous RELENG_7 build was ~2 months old. > Also, can you give some detail about the system in question? How big > is the USB stick? Are there any modifications to your custom kernel > that may be related in amy way? > The sticks is 8G large. I'll just post anything that might be useful: This is my (compacted) kernel config: cpu HAMMER ident HP6510b makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_ULE options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options NFSCLIENT # Network Filesystem Client options NFSSERVER # Network Filesystem Server options NFS_ROOT # NFS usable as /, requires NFSCLIENT options NTFS # NT File System options MSDOSFS # MSDOS Filesystem options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options COMPAT_IA32 # Compatible with i386 binaries options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options AUDIT # Security event auditing options SMP # Symmetric MultiProcessor Kernel options ALTQ options ALTQ_CBQ # Class Bases Queueing options ALTQ_RED # Random Early Detection options ALTQ_RIO # RED In/Out options ALTQ_HFSC # Hierarchical Packet Scheduler options ALTQ_CDNR # Traffic conditioner options ALTQ_PRIQ # Priority Queueing device acpi device pci options ATA_STATIC_ID # Static device numbering options AHC_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~128k to driver. options AHD_REG_PRETTY_PRINT # Print register bitfields in debug # output. Adds ~215k to driver. device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device sc device loop # Network loopback device ether # Ethernet support device pty # Pseudo-ttys (telnet etc) device bpf # Berkeley packet filter This is my loader.conf: # Boot loader. autoboot_delay="2" loader_logo="beastie" # ATA controller drivers atadisk_load="YES" atapci_load="YES" # Deactivate write cache #hw.ata.wc=0 # USB drivers usb_load="YES" ubsa_load="YES" umass_load="YES" ums_load="YES" ugen_load="YES" # network driver if_bge_load="YES" # random device random_load="YES" # agp bus agp_load="YES" # CD/DVD driver acd_load="YES" atapicam_load="YES" # Required to create memory disks. geom_md_load="YES" # Intel 3945ABG Wireless LAN IEEE 802.11 driver. legal.intel_wpi.license_ack=1 if_wpi_load="YES" wlan_load="YES" wlan_amrr_load="YES" firmware_load="YES" wpifw_load="YES" wlan_scan_sta_load="YES" # Sound driver. snd_hda_load="YES" # Sound multiplexer. hw.snd.maxautovchans="8" # Synaptics support. #hw.psm.synaptics_support="1" # Linux compat linux_load="YES" # Sync PDA over USB. uvisor_load="YES" # kenv | grep smbios smbios.bios.reldate="01/11/2008" smbios.bios.vendor="Hewlett-Packard" smbios.bios.version="68DDU Ver. F.10" smbios.chassis.maker="Hewlett-Packard" smbios.chassis.serial="CNU74808MK" smbios.chassis.tag="CNU74808MK" smbios.planar.maker="Hewlett-Packard" smbios.planar.product="30C0" smbios.planar.version="KBC Version 71.2E" smbios.socket.enabled="1" smbios.socket.populated="1" smbios.system.maker="Hewlett-Packard" smbios.system.product="HP Compaq 6510b (GR695EA#ABD)" smbios.system.serial="CNU74808MK" smbios.system.uuid="e85c3fb2-3f15-e011-08a0-6d990e4acd29" smbios.system.version="F.10" # pciconf -lv hostb0@pci0:0:0:0: class=0x060000 card=0x30c0103c chip=0x2a008086 rev=0x0c hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile PM965/GM965/GL960 Express Processor to DRAM Controller' class = bridge subclass = HOST-PCI vgapci0@pci0:0:2:0: class=0x030000 card=0x30c0103c chip=0x2a028086 rev=0x0c hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile 965 Express Integrated Graphics Controller' class = display subclass = VGA vgapci1@pci0:0:2:1: class=0x038000 card=0x30c0103c chip=0x2a038086 rev=0x0c hdr=0x00 vendor = 'Intel Corporation' device = 'Mobile 965 Express Integrated Graphics Controller' class = display uhci0@pci0:0:26:0: class=0x0c0300 card=0x30c0103c chip=0x28348086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB uhci1@pci0:0:26:1: class=0x0c0300 card=0x30c0103c chip=0x28358086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB ehci0@pci0:0:26:7: class=0x0c0320 card=0x30c0103c chip=0x283a8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '81EC1043 (?) ICH8 Enhanced USB2 Enhanced Host Controller' class = serial bus subclass = USB pcm0@pci0:0:27:0: class=0x040300 card=0x30c0103c chip=0x284b8086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H &SUBSYS_81EC1043&REV_02\3&11583659&0&D8' class = multimedia pcib1@pci0:0:28:0: class=0x060400 card=0x30c0103c chip=0x283f8086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 1' class = bridge subclass = PCI-PCI pcib2@pci0:0:28:1: class=0x060400 card=0x30c0103c chip=0x28418086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 2' class = bridge subclass = PCI-PCI pcib3@pci0:0:28:2: class=0x060400 card=0x30c0103c chip=0x28438086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 3' class = bridge subclass = PCI-PCI pcib4@pci0:0:28:4: class=0x060400 card=0x30c0103c chip=0x28478086 rev=0x03 hdr=0x01 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) PCIe Port 5' class = bridge subclass = PCI-PCI uhci2@pci0:0:29:0: class=0x0c0300 card=0x30c0103c chip=0x28308086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB uhci3@pci0:0:29:1: class=0x0c0300 card=0x30c0103c chip=0x28318086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB uhci4@pci0:0:29:2: class=0x0c0300 card=0x30c0103c chip=0x28328086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB UHCI' class = serial bus subclass = USB ehci1@pci0:0:29:7: class=0x0c0320 card=0x30c0103c chip=0x28368086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) USB2 EHCI' class = serial bus subclass = USB pcib5@pci0:0:30:0: class=0x060401 card=0x30c0103c chip=0x24488086 rev=0xf3 hdr=0x01 vendor = 'Intel Corporation' device = '82801BAM/CAM/DBM (ICH2-M/3-M/4-M) Hub Interface to PCI Bridge' class = bridge subclass = PCI-PCI isab0@pci0:0:31:0: class=0x060100 card=0x30c0103c chip=0x28158086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = 'ICH8M-E (ICH8 Family) LPC Interface Controller' class = bridge subclass = PCI-ISA atapci0@pci0:0:31:1: class=0x01018a card=0x30c0103c chip=0x28508086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801H (ICH8 Family) Ultra ATA Storage Controllers' class = mass storage subclass = ATA atapci1@pci0:0:31:2: class=0x010601 card=0x30c0103c chip=0x28298086 rev=0x03 hdr=0x00 vendor = 'Intel Corporation' device = '82801 Intel(R) 82801HEM/HBM SATA AHCI Controller' class = mass storage wpi0@pci0:16:0:0: class=0x028000 card=0x135c103c chip=0x42228086 rev=0x02 hdr=0x00 vendor = 'Intel Corporation' device = '10418086 Intel 3945ABG Wireless LAN controller' class = network bge0@pci0:24:0:0: class=0x020000 card=0x30c0103c chip=0x169314e4 rev=0x02 hdr=0x00 vendor = 'Broadcom Corporation' device = 'BCM 5787A Ethernet Controller Broadcom Netlink Gigabit' class = network subclass = ethernet none0@pci0:2:4:0: class=0x060700 card=0x30c0103c chip=0x04761180 rev=0xb6 hdr=0x02 vendor = 'Ricoh Company, Ltd.' device = 'unknown Ricoh R/RL/5C476(II)' class = bridge subclass = PCI-CardBus none1@pci0:2:4:1: class=0x0c0010 card=0x30c0103c chip=0x08321180 rev=0x02 hdr=0x00 vendor = 'Ricoh Company, Ltd.' device = 'unknown IEEE 1394 (4 pin firewire) chip)' class = serial bus subclass = FireWire # vmstat -i interrupt total rate irq1: atkbd0 6445 1 irq9: acpi0 3757 0 irq12: psm0 780 0 irq14: ata0 113 0 irq16: pcm0 uhci0+ 11 0 irq17: wpi0 uhci1+ 27081 4 irq18: bge0 ehci0+ 31750 5 irq20: uhci2 ehci1 6157 1 irq21: uhci3 188839 34 cpu0: timer 10901466 1998 cpu1: timer 10893486 1997 Total 22059885 4044 I'm using the following patch for mount.c: http://www.freebsd.org/cgi/query-pr.cgi?prp=120784-5-diff&n=/patch-5.diff # mount /dev/ufs/2root on / (ufs, local) devfs on /dev (devfs, local) /dev/ufs/2tmp on /tmp (ufs, local, soft-updates) /dev/ufs/2usr on /usr (ufs, NFS exported, local, soft-updates) /dev/ufs/2var on /var (ufs, local, soft-updates) pid874@mobileKamikaze:/var/run/automounter.amd.mnt on /var/run/automounter.amd.mnt (nfs) /dev/msdosfs/APRIL RYAN on /var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a (msdosfs, local, noatime, noexec) # df -h Filesystem Size Used Avail Capacity Mounted on /dev/ufs/2root 496M 362M 94M 79% / devfs 1.0K 1.0K 0B 100% /dev /dev/ufs/2tmp 1.9G 44K 1.8G 0% /tmp /dev/ufs/2usr 38G 18G 17G 52% /usr /dev/ufs/2var 3.9G 2.0G 1.5G 58% /var /dev/msdosfs/APRIL RYAN 7.5G 2.9G 4.7G 38% /var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a That's all I can think off right now.
Bruce Evans wrote: > On Mon, 21 Apr 2008, Dominic Fandrey wrote: > This seems to be a bug in usb (umass) or the particular usb drive... > ... > > To check that this is the bug, mount msdosfs with -o noclusterr,noclusterw > under RELENG_7 or later (the bug also affects RELENG_6, but these mount > options are broken in RELENG_6). ... # mount -t msdosfs -o noatime,noexec,-L=en_GB.UTF-8,noclusterr,noclusterw /dev/da0 /mnt/tmp mount_msdosfs: /dev/da0: mount option <noclusterw> is unknown: Invalid argument It seems they are also broken under RELENG_7 from the day before yesterday. > >> # mount >> /dev/ufs/2root on / (ufs, local) >> devfs on /dev (devfs, local) >> /dev/ufs/2tmp on /tmp (ufs, local, soft-updates) >> /dev/ufs/2usr on /usr (ufs, NFS exported, local, soft-updates) >> /dev/ufs/2var on /var (ufs, local, soft-updates) >> pid874@mobileKamikaze:/var/run/automounter.amd.mnt on >> /var/run/automounter.amd.mnt (nfs) >> /dev/msdosfs/APRIL RYAN on >> /var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a >> (msdosfs, local, noatime, noexec) > > The labels obfuscate the device type for all mountpoints very well. The ufs mounts are on an SATA drive. The msdosfs slice is the USB stick at /dev/da0. The nfs mount is from amd.
On Tue, 22 Apr 2008, Dominic Fandrey wrote: > Bruce Evans wrote: >> On Mon, 21 Apr 2008, Dominic Fandrey wrote: >> This seems to be a bug in usb (umass) or the particular usb drive... >> ... >> >> To check that this is the bug, mount msdosfs with -o noclusterr,noclusterw >> under RELENG_7 or later (the bug also affects RELENG_6, but these mount >> options are broken in RELENG_6). ... > > # mount -t msdosfs -o noatime,noexec,-L=en_GB.UTF-8,noclusterr,noclusterw > /dev/da0 /mnt/tmp > mount_msdosfs: /dev/da0: mount option <noclusterw> is unknown: Invalid > argument > > It seems they are also broken under RELENG_7 from the day before yesterday. Oops, it is indeed broken there too. msdosfs in RELENG_7 is in the state that I thought RELENG_6 was in (missing critical MFCs only for nocluster* in the options list and to fix panics related to the dirty flag), while msdosfs in RELENG_6 is just too old to support clustering. The broken nocluster* can be worked around by upgrading to a version of mount_msdsosfs(8) that hasn't been broken by using nmount(2). mount_msdsosfs(8) from RELENG_5 should work. Bruce
Bruce Evans wrote: > On Tue, 22 Apr 2008, Dominic Fandrey wrote: > >> Bruce Evans wrote: >>> On Mon, 21 Apr 2008, Dominic Fandrey wrote: >>> This seems to be a bug in usb (umass) or the particular usb drive... >>> ... >>> >>> To check that this is the bug, mount msdosfs with -o >>> noclusterr,noclusterw >>> under RELENG_7 or later (the bug also affects RELENG_6, but these mount >>> options are broken in RELENG_6). ... >> >> # mount -t msdosfs -o >> noatime,noexec,-L=en_GB.UTF-8,noclusterr,noclusterw /dev/da0 /mnt/tmp >> mount_msdosfs: /dev/da0: mount option <noclusterw> is unknown: Invalid >> argument >> >> It seems they are also broken under RELENG_7 from the day before >> yesterday. > > Oops, it is indeed broken there too. msdosfs in RELENG_7 is in the state > that I thought RELENG_6 was in (missing critical MFCs only for nocluster* > in the options list and to fix panics related to the dirty flag), while > msdosfs in RELENG_6 is just too old to support clustering. > > The broken nocluster* can be worked around by upgrading to a version of > mount_msdsosfs(8) that hasn't been broken by using nmount(2). > mount_msdsosfs(8) from RELENG_5 should work. I feel reluctant about downgrading to 5.x mount_msdosfs, however I can confirm that cp with large files does _not_ cause a panic. As far as I understand this confirms your theory. How can I provide more useful information?
On Wed, 23 Apr 2008, Dominic Fandrey wrote: > Bruce Evans wrote: >> The broken nocluster* can be worked around by upgrading to a version of >> mount_msdsosfs(8) that hasn't been broken by using nmount(2). >> mount_msdsosfs(8) from RELENG_5 should work. > > I feel reluctant about downgrading to 5.x mount_msdosfs, But it would be an upgrage :-). Anyway, running mount_msdosfs on one disposable file system that might panic should be safe. > however I can > confirm that cp with large files does _not_ cause a panic. As far as I > understand this confirms your theory. Not quite. I would have expected the problem to affect read() and write() too unless the file system is mounted with -nocluster*. > How can I provide more useful information? Check if the cp of large files actually works. A previous report mentioned data corruption but I don't remember it saying anything about panics. Maybe mmap() does something different that causes more serious corruption. I'll have to think more about adding debugging code to mmap() and the device driver. Meanwhile, can you try changing this code in msdosfs_vnops.c: %%% mp = vp->v_mount; maxio = mp->mnt_iosize_max / mp->mnt_stat.f_iosize; bnpercn = de_cn2bn(pmp, 1); %%% o Add a printf to print out maxio (might need rate limiting). o Try lower values of maxio until you find the largest one that works (keep dividing by 2. Only try one value per boot or per mount of course). I think it is always 128K initially, and small values will work. A value of the cluster size (typically 4K) or smaller should give the old behaviour. or one or more of the following: o Check that large i/o's to the raw device work. o Check for the problem with other file systems that implement clustering. ffs is easiest. o On an older version of FreeBSD that doesn't seem to have the problem, check for the problem with msdosfs with a large cluster size (the cluster can be up to 64K, which is large enough to show the problem that I suspect). Check on file systems that implement clustering too (now the block size doesn't need to be large to cause large i/o's). Bruce
Bruce Evans wrote: > On Wed, 23 Apr 2008, Dominic Fandrey wrote: > >> Bruce Evans wrote: >>> The broken nocluster* can be worked around by upgrading to a version of >>> mount_msdsosfs(8) that hasn't been broken by using nmount(2). >>> mount_msdsosfs(8) from RELENG_5 should work. >> >> I feel reluctant about downgrading to 5.x mount_msdosfs, > > But it would be an upgrage :-). Anyway, running mount_msdosfs on one > disposable file system that might panic should be safe. If it really is of help, I will downgrade. Not before the weekend, though. >> however I can confirm that cp with large files does _not_ cause a >> panic. As far as I understand this confirms your theory. > > Not quite. I would have expected the problem to affect read() and write() > too unless the file system is mounted with -nocluster*. > >> How can I provide more useful information? > > Check if the cp of large files actually works. A previous report mentioned > data corruption but I don't remember it saying anything about panics. > Maybe > mmap() does something different that causes more serious corruption. I copied a 1.2gb DVD rip and watched it afterwards. No corruption. Md5 checksums show that the file on the stick and the original are identical. > I'll have to think more about adding debugging code to mmap() and the > device driver. > > Meanwhile, can you try changing this code in msdosfs_vnops.c: > > %%% > mp = vp->v_mount; > maxio = mp->mnt_iosize_max / mp->mnt_stat.f_iosize; > bnpercn = de_cn2bn(pmp, 1); > %%% > > o Add a printf to print out maxio (might need rate limiting). > o Try lower values of maxio until you find the largest one that works > (keep dividing by 2. Only try one value per boot or per mount of > course). I think it is always 128K initially, and small values will > work. A value of the cluster size (typically 4K) or smaller should > give the old behaviour. I will give it a try. > or one or more of the following: > > o Check that large i/o's to the raw device work. > o Check for the problem with other file systems that implement clustering. > ffs is easiest. > o On an older version of FreeBSD that doesn't seem to have the problem, > check for the problem with msdosfs with a large cluster size (the > cluster can be up to 64K, which is large enough to show the problem that > I suspect). Check on file systems that implement clustering too (now > the block size doesn't need to be large to cause large i/o's). > > Bruce > These ones are harder. I will also defer them to the weekend.
Bruce Evans wrote: > On Wed, 23 Apr 2008, Dominic Fandrey wrote: > >> Bruce Evans wrote: >>> The broken nocluster* can be worked around by upgrading to a version of >>> mount_msdsosfs(8) that hasn't been broken by using nmount(2). >>> mount_msdsosfs(8) from RELENG_5 should work. >> >> I feel reluctant about downgrading to 5.x mount_msdosfs, > > But it would be an upgrage :-). Anyway, running mount_msdosfs on one > disposable file system that might panic should be safe. > >> however I can confirm that cp with large files does _not_ cause a >> panic. As far as I understand this confirms your theory. > > Not quite. I would have expected the problem to affect read() and write() > too unless the file system is mounted with -nocluster*. This can be closed. Your suggestions have been very helpful. It turned out that fusefs-ntfs is causing the panic, when I copy files from it.
State Changed From-To: feedback->closed Submitter reports that this was actually caused by fusefs-ntfs and not msdosfs.
On Fri, 2 May 2008, Dominic Fandrey wrote: > Bruce Evans wrote: >> On Wed, 23 Apr 2008, Dominic Fandrey wrote: >> >>> Bruce Evans wrote: >>>> The broken nocluster* can be worked around by upgrading to a version of >>>> mount_msdsosfs(8) that hasn't been broken by using nmount(2). >>>> mount_msdsosfs(8) from RELENG_5 should work. >>> >>> I feel reluctant about downgrading to 5.x mount_msdosfs, >> >> But it would be an upgrage :-). Anyway, running mount_msdosfs on one >> disposable file system that might panic should be safe. >> >>> however I can confirm that cp with large files does _not_ cause a panic. >>> As far as I understand this confirms your theory. >> >> Not quite. I would have expected the problem to affect read() and write() >> too unless the file system is mounted with -nocluster*. > > This can be closed. > > Your suggestions have been very helpful. It turned out that fusefs-ntfs is > causing the panic, when I copy files from it. Now we have a better argument for not axing non-port ntfs :-). I think it sort of works read-only. Too bad we're no closer to understand the msdosfs problem. Bruce
Bruce Evans wrote: > On Fri, 2 May 2008, Dominic Fandrey wrote: > >> Bruce Evans wrote: >>> On Wed, 23 Apr 2008, Dominic Fandrey wrote: >>> >>>> Bruce Evans wrote: >>>>> The broken nocluster* can be worked around by upgrading to a >>>>> version of >>>>> mount_msdsosfs(8) that hasn't been broken by using nmount(2). >>>>> mount_msdsosfs(8) from RELENG_5 should work. >>>> >>>> I feel reluctant about downgrading to 5.x mount_msdosfs, >>> >>> But it would be an upgrage :-). Anyway, running mount_msdosfs on one >>> disposable file system that might panic should be safe. >>> >>>> however I can confirm that cp with large files does _not_ cause a >>>> panic. As far as I understand this confirms your theory. >>> >>> Not quite. I would have expected the problem to affect read() and >>> write() >>> too unless the file system is mounted with -nocluster*. >> >> This can be closed. >> >> Your suggestions have been very helpful. It turned out that >> fusefs-ntfs is causing the panic, when I copy files from it. > > Now we have a better argument for not axing non-port ntfs :-). I think > it sort of works read-only. Too bad we're no closer to understand the > msdosfs problem. > > Bruce It was really all my fault, I forgot to rebuild fusefs-kmod after updating my kernel. I'm sorry (not really) that I cannot serve you helpful data for the msdosfs problems other people have expressed.