Bug 122961 - [msdosfs] write operation on msdosfs file system causes panic
Summary: [msdosfs] write operation on msdosfs file system causes panic
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: kern (show other bugs)
Version: unspecified
Hardware: Any Any
: Normal Affects Only Me
Assignee: Gavin Atkinson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-04-21 12:30 UTC by kamikaze
Modified: 2008-05-02 15:50 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description kamikaze 2008-04-21 12:30:02 UTC
Trying to copy files to an msdosfs file system on a USB stick causes the system to panic. This is reproducible, the backtrace always looks the same.

Unread portion of the kernel message buffer:


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address	= 0xe
fault code		= supervisor write data, page not present
instruction pointer	= 0x8:0xffffff00705ba1f0
stack pointer	        = 0x10:0xffffffffaf0fe2e0
frame pointer	        = 0x10:0xffffffffaf0fe390
code segment		= base rx0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2047 (cp)
trap number		= 12
panic: page fault
cpuid = 0
Uptime: 13m32s
Physical memory: 2030 MB
Dumping 209 MB: 194 178 162 146 130 114 98 82 66 50 34 18 2

#0  doadump () at pcpu.h:194
194	pcpu.h: No such file or directory.
	in pcpu.h
(kgdb) bt
#0  doadump () at pcpu.h:194
#1  0x0000000000000004 in ?? ()
#2  0xffffffff801ff9c1 in boot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:418
#3  0xffffffff801ffdf2 in panic (fmt=0x104 <Address 0x104 out of bounds>)
    at /usr/src/sys/kern/kern_shutdown.c:572
#4  0xffffffff803dbe8a in trap_fatal (frame=0xffffff0001ff46a0, 
    eva=18446742974281863168) at /usr/src/sys/amd64/amd64/trap.c:724
#5  0xffffffff803dc231 in trap_pfault (frame=0xffffffffaf0fe230, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:641
#6  0xffffffff803dcaef in trap (frame=0xffffffffaf0fe230)
    at /usr/src/sys/amd64/amd64/trap.c:410
#7  0xffffffff803c392e in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:169
#8  0xffffff00705ba1f0 in ?? ()
#9  0x00000009802813e4 in ?? ()
#10 0xffffff00705ba1f0 in ?? ()
#11 0xffffff0001ff46a0 in ?? ()
#12 0xffffff0005659700 in ?? ()
#13 0xffffffffaf0fe4e0 in ?? ()
#14 0x0000000000003041 in ?? ()
#15 0xffffff0001ff46a0 in ?? ()
#16 0xffffffff80416924 in cdrom_rootdevnames ()
---Type <return> to continue, or q <return> to quit---
#17 0x000000000000080e in ?? ()
#18 0x0000000000000000 in ?? ()
#19 0xffffff00705ba1f0 in ?? ()
#20 0x0000000000000000 in ?? ()
#21 0xffffff007cf86ec8 in ?? ()
#22 0xffffff0001ff46a0 in ?? ()
#23 0xffffff0005d5a820 in ?? ()
#24 0x0000000000009000 in ?? ()
#25 0xffffff00705ba1f0 in ?? ()
#26 0xffffffffaf0fe4e0 in ?? ()
#27 0x0000000000000000 in ?? ()
#28 0x0000000000000004 in ?? ()
#29 0xffffffff803bceba in vnode_pager_getpages (object=0xffffff0001ff46a0, 
    m=0x0, count=Variable "count" is not available.
) at vnode_if.h:1129
#30 0xffffffff803a87d0 in vm_fault (map=0xffffff0005b9f000, vaddr=34368442368, 
    fault_type=1 '\001', fault_flags=0) at vm_pager.h:130
#31 0xffffffff803dc0ae in trap_pfault (frame=0xffffffffaf0fe740, usermode=0)
    at /usr/src/sys/amd64/amd64/trap.c:618
#32 0xffffffff803dcaef in trap (frame=0xffffffffaf0fe740)
    at /usr/src/sys/amd64/amd64/trap.c:410
#33 0xffffffff803c392e in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:169
#34 0xffffffff803db4ed in copyin () at /usr/src/sys/amd64/amd64/support.S:303
---Type <return> to continue, or q <return> to quit---
#35 0xffffffff802063f7 in uiomove (cp=0xffffffff9bc99000, n=4096, 
    uio=0xffffffffaf0feb10) at /usr/src/sys/kern/kern_subr.c:170
#36 0xffffffff801a5fb2 in msdosfs_write (ap=Variable "ap" is not available.
)
    at /usr/src/sys/fs/msdosfs/msdosfs_vnops.c:812
#37 0xffffffff803f89ae in VOP_WRITE_APV (vop=0xffffffff805421a0, 
    a=0xffffffffaf0fea20) at vnode_if.c:691
#38 0xffffffff80282797 in vn_write (fp=0xffffff0070379000, 
    uio=0xffffffffaf0feb10, active_cred=Variable "active_cred" is not available.
) at vnode_if.h:373
#39 0xffffffff80233a0f in dofilewrite (td=0xffffff0001ff46a0, fd=4, 
    fp=0xffffff0070379000, auio=0xffffffffaf0feb10, offset=Variable "offset" is not available.
) at file.h:254
#40 0xffffffff80233cbb in kern_writev (td=0xffffff0001ff46a0, fd=4, 
    auio=0xffffffffaf0feb10) at /usr/src/sys/kern/sys_generic.c:401
#41 0xffffffff80233d28 in write (td=Variable "td" is not available.
) at /usr/src/sys/kern/sys_generic.c:317
#42 0xffffffff803dc49c in syscall (frame=0xffffffffaf0fec70)
    at /usr/src/sys/amd64/amd64/trap.c:852
#43 0xffffffff803c3b3b in Xfast_syscall ()
    at /usr/src/sys/amd64/amd64/exception.S:290
#44 0x000000080070c5bc in ?? ()
Previous frame inner to this frame (corrupt stack?)
(kgdb)

How-To-Repeat: Just write a file on an msdosfs mount.
Comment 1 Gavin Atkinson freebsd_committer freebsd_triage 2008-04-21 15:24:29 UTC
State Changed
From-To: open->feedback

To submitter: are you able to connect the USB stick to a machine 
running Windows and run chkdsk, to confirm that the filesystem 
is not invalid?  (Although we should ideally be resiliant to 
corrupt filesystems, if it still panics after a chkdisk then it's 
a more serious problem...) 

Also, can you give some detail about the system in question?  How big 
is the USB stick?  Are there any modifications to your custom kernel 
that may be related in amy way? 


Comment 2 Gavin Atkinson freebsd_committer freebsd_triage 2008-04-21 15:24:29 UTC
Responsible Changed
From-To: freebsd-bugs->gavin

Track
Comment 3 kamikaze 2008-04-21 20:56:32 UTC
gavin@FreeBSD.org wrote:
> To submitter: are you able to connect the USB stick to a machine
> running Windows and run chkdsk, to confirm that the filesystem
> is not invalid?  (Although we should ideally be resiliant to
> corrupt filesystems, if it still panics after a chkdisk then it's
> a more serious problem...)
> 

I have already checked the stick under windows. Chkdisk did not find any
problems, but the panic still occurs.

The problem started after I updated RELENG_7 on my machine this weekend. The
previous RELENG_7 build was ~2 months old.

> Also, can you give some detail about the system in question?  How big
> is the USB stick?  Are there any modifications to your custom kernel
> that may be related in amy way?
> 

The sticks is 8G large. I'll just post anything that might be useful:

This is my (compacted) kernel config:
cpu		HAMMER
ident		HP6510b
makeoptions	DEBUG=-g		# Build kernel with gdb(1) debug symbols
options 	SCHED_ULE
options 	PREEMPTION		# Enable kernel thread preemption
options 	INET			# InterNETworking
options 	INET6			# IPv6 communications protocols
options 	SCTP			# Stream Control Transmission Protocol
options 	FFS			# Berkeley Fast Filesystem
options 	SOFTUPDATES		# Enable FFS soft updates support
options 	UFS_ACL			# Support for access control lists
options 	UFS_DIRHASH		# Improve performance on big directories
options 	UFS_GJOURNAL		# Enable gjournal-based UFS journaling
options 	MD_ROOT			# MD is a potential root device
options 	NFSCLIENT		# Network Filesystem Client
options 	NFSSERVER		# Network Filesystem Server
options 	NFS_ROOT		# NFS usable as /, requires NFSCLIENT
options 	NTFS			# NT File System
options 	MSDOSFS			# MSDOS Filesystem
options 	CD9660			# ISO 9660 Filesystem
options 	PROCFS			# Process filesystem (requires PSEUDOFS)
options 	PSEUDOFS		# Pseudo-filesystem framework
options 	GEOM_PART_GPT		# GUID Partition Tables.
options 	GEOM_LABEL		# Provides labelization
options 	COMPAT_43TTY		# BSD 4.3 TTY compat [KEEP THIS!]
options 	COMPAT_IA32		# Compatible with i386 binaries
options 	COMPAT_FREEBSD6		# Compatible with FreeBSD6
options 	SCSI_DELAY=5000		# Delay (in ms) before probing SCSI
options 	KTRACE			# ktrace(1) support
options 	SYSVSHM			# SYSV-style shared memory
options 	SYSVMSG			# SYSV-style message queues
options 	SYSVSEM			# SYSV-style semaphores
options 	_KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions
options 	KBD_INSTALL_CDEV	# install a CDEV entry in /dev
options 	ADAPTIVE_GIANT		# Giant mutex is adaptive.
options 	STOP_NMI		# Stop CPUS using NMI instead of IPI
options 	AUDIT			# Security event auditing
options 	SMP			# Symmetric MultiProcessor Kernel
options		ALTQ
options		ALTQ_CBQ		# Class Bases Queueing
options		ALTQ_RED		# Random Early Detection
options		ALTQ_RIO		# RED In/Out
options		ALTQ_HFSC		# Hierarchical Packet Scheduler
options		ALTQ_CDNR		# Traffic conditioner
options		ALTQ_PRIQ		# Priority Queueing
device		acpi
device		pci
options 	ATA_STATIC_ID	# Static device numbering
options 	AHC_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~128k to driver.
options 	AHD_REG_PRETTY_PRINT	# Print register bitfields in debug
					# output.  Adds ~215k to driver.
device		atkbdc		# AT keyboard controller
device		atkbd		# AT keyboard
device		psm		# PS/2 mouse
device		kbdmux		# keyboard multiplexer
device		vga		# VGA video card driver
device		sc
device		loop		# Network loopback
device		ether		# Ethernet support
device		pty		# Pseudo-ttys (telnet etc)
device		bpf		# Berkeley packet filter



This is my loader.conf:

# Boot loader.
autoboot_delay="2"
loader_logo="beastie"
# ATA controller drivers
atadisk_load="YES"
atapci_load="YES"
# Deactivate write cache
#hw.ata.wc=0
# USB drivers
usb_load="YES"
ubsa_load="YES"
umass_load="YES"
ums_load="YES"
ugen_load="YES"
# network driver
if_bge_load="YES"
# random device
random_load="YES"
# agp bus
agp_load="YES"
# CD/DVD driver
acd_load="YES"
atapicam_load="YES"
# Required to create memory disks.
geom_md_load="YES"
# Intel 3945ABG Wireless LAN IEEE 802.11 driver.
legal.intel_wpi.license_ack=1
if_wpi_load="YES"
wlan_load="YES"
wlan_amrr_load="YES"
firmware_load="YES"
wpifw_load="YES"
wlan_scan_sta_load="YES"
# Sound driver.
snd_hda_load="YES"
# Sound multiplexer.
hw.snd.maxautovchans="8"
# Synaptics support.
#hw.psm.synaptics_support="1"
# Linux compat
linux_load="YES"
# Sync PDA over USB.
uvisor_load="YES"


# kenv | grep smbios
smbios.bios.reldate="01/11/2008"
smbios.bios.vendor="Hewlett-Packard"
smbios.bios.version="68DDU Ver. F.10"
smbios.chassis.maker="Hewlett-Packard"
smbios.chassis.serial="CNU74808MK"
smbios.chassis.tag="CNU74808MK"
smbios.planar.maker="Hewlett-Packard"
smbios.planar.product="30C0"
smbios.planar.version="KBC Version 71.2E"
smbios.socket.enabled="1"
smbios.socket.populated="1"
smbios.system.maker="Hewlett-Packard"
smbios.system.product="HP Compaq 6510b (GR695EA#ABD)"
smbios.system.serial="CNU74808MK"
smbios.system.uuid="e85c3fb2-3f15-e011-08a0-6d990e4acd29"
smbios.system.version="F.10"


# pciconf -lv
hostb0@pci0:0:0:0:	class=0x060000 card=0x30c0103c chip=0x2a008086 rev=0x0c
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Mobile PM965/GM965/GL960 Express Processor to DRAM Controller'
     class      = bridge
     subclass   = HOST-PCI
vgapci0@pci0:0:2:0:	class=0x030000 card=0x30c0103c chip=0x2a028086 rev=0x0c
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Mobile 965 Express Integrated Graphics Controller'
     class      = display
     subclass   = VGA
vgapci1@pci0:0:2:1:	class=0x038000 card=0x30c0103c chip=0x2a038086 rev=0x0c
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'Mobile 965 Express Integrated Graphics Controller'
     class      = display
uhci0@pci0:0:26:0:	class=0x0c0300 card=0x30c0103c chip=0x28348086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) USB UHCI'
     class      = serial bus
     subclass   = USB
uhci1@pci0:0:26:1:	class=0x0c0300 card=0x30c0103c chip=0x28358086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) USB UHCI'
     class      = serial bus
     subclass   = USB
ehci0@pci0:0:26:7:	class=0x0c0320 card=0x30c0103c chip=0x283a8086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '81EC1043 (?) ICH8 Enhanced USB2 Enhanced Host Controller'
     class      = serial bus
     subclass   = USB
pcm0@pci0:0:27:0:	class=0x040300 card=0x30c0103c chip=0x284b8086 rev=0x03 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H &SUBSYS_81EC1043&REV_02\3&11583659&0&D8'
     class      = multimedia
pcib1@pci0:0:28:0:	class=0x060400 card=0x30c0103c chip=0x283f8086 rev=0x03
hdr=0x01
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) PCIe Port 1'
     class      = bridge
     subclass   = PCI-PCI
pcib2@pci0:0:28:1:	class=0x060400 card=0x30c0103c chip=0x28418086 rev=0x03
hdr=0x01
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) PCIe Port 2'
     class      = bridge
     subclass   = PCI-PCI
pcib3@pci0:0:28:2:	class=0x060400 card=0x30c0103c chip=0x28438086 rev=0x03
hdr=0x01
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) PCIe Port 3'
     class      = bridge
     subclass   = PCI-PCI
pcib4@pci0:0:28:4:	class=0x060400 card=0x30c0103c chip=0x28478086 rev=0x03
hdr=0x01
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) PCIe Port 5'
     class      = bridge
     subclass   = PCI-PCI
uhci2@pci0:0:29:0:	class=0x0c0300 card=0x30c0103c chip=0x28308086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) USB UHCI'
     class      = serial bus
     subclass   = USB
uhci3@pci0:0:29:1:	class=0x0c0300 card=0x30c0103c chip=0x28318086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) USB UHCI'
     class      = serial bus
     subclass   = USB
uhci4@pci0:0:29:2:	class=0x0c0300 card=0x30c0103c chip=0x28328086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) USB UHCI'
     class      = serial bus
     subclass   = USB
ehci1@pci0:0:29:7:	class=0x0c0320 card=0x30c0103c chip=0x28368086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) USB2 EHCI'
     class      = serial bus
     subclass   = USB
pcib5@pci0:0:30:0:	class=0x060401 card=0x30c0103c chip=0x24488086 rev=0xf3
hdr=0x01
     vendor     = 'Intel Corporation'
     device     = '82801BAM/CAM/DBM (ICH2-M/3-M/4-M) Hub Interface to PCI Bridge'
     class      = bridge
     subclass   = PCI-PCI
isab0@pci0:0:31:0:	class=0x060100 card=0x30c0103c chip=0x28158086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = 'ICH8M-E (ICH8 Family) LPC Interface Controller'
     class      = bridge
     subclass   = PCI-ISA
atapci0@pci0:0:31:1:	class=0x01018a card=0x30c0103c chip=0x28508086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801H (ICH8 Family) Ultra ATA Storage Controllers'
     class      = mass storage
     subclass   = ATA
atapci1@pci0:0:31:2:	class=0x010601 card=0x30c0103c chip=0x28298086 rev=0x03
hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '82801 Intel(R) 82801HEM/HBM SATA AHCI Controller'
     class      = mass storage
wpi0@pci0:16:0:0:	class=0x028000 card=0x135c103c chip=0x42228086 rev=0x02 hdr=0x00
     vendor     = 'Intel Corporation'
     device     = '10418086 Intel 3945ABG Wireless LAN controller'
     class      = network
bge0@pci0:24:0:0:	class=0x020000 card=0x30c0103c chip=0x169314e4 rev=0x02 hdr=0x00
     vendor     = 'Broadcom Corporation'
     device     = 'BCM 5787A Ethernet Controller Broadcom Netlink Gigabit'
     class      = network
     subclass   = ethernet
none0@pci0:2:4:0:	class=0x060700 card=0x30c0103c chip=0x04761180 rev=0xb6 hdr=0x02
     vendor     = 'Ricoh Company, Ltd.'
     device     = 'unknown Ricoh R/RL/5C476(II)'
     class      = bridge
     subclass   = PCI-CardBus
none1@pci0:2:4:1:	class=0x0c0010 card=0x30c0103c chip=0x08321180 rev=0x02 hdr=0x00
     vendor     = 'Ricoh Company, Ltd.'
     device     = 'unknown IEEE 1394 (4 pin firewire) chip)'
     class      = serial bus
     subclass   = FireWire


# vmstat -i
interrupt                          total       rate
irq1: atkbd0                        6445          1
irq9: acpi0                         3757          0
irq12: psm0                          780          0
irq14: ata0                          113          0
irq16: pcm0 uhci0+                    11          0
irq17: wpi0 uhci1+                 27081          4
irq18: bge0 ehci0+                 31750          5
irq20: uhci2 ehci1                  6157          1
irq21: uhci3                      188839         34
cpu0: timer                     10901466       1998
cpu1: timer                     10893486       1997
Total                           22059885       4044


I'm using the following patch for mount.c:
http://www.freebsd.org/cgi/query-pr.cgi?prp=120784-5-diff&n=/patch-5.diff


# mount
/dev/ufs/2root on / (ufs, local)
devfs on /dev (devfs, local)
/dev/ufs/2tmp on /tmp (ufs, local, soft-updates)
/dev/ufs/2usr on /usr (ufs, NFS exported, local, soft-updates)
/dev/ufs/2var on /var (ufs, local, soft-updates)
pid874@mobileKamikaze:/var/run/automounter.amd.mnt on
/var/run/automounter.amd.mnt (nfs)
/dev/msdosfs/APRIL RYAN on
/var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a (msdosfs,
local, noatime, noexec)


# df -h
Filesystem                                            Size    Used   Avail
Capacity  Mounted on
/dev/ufs/2root                                        496M    362M     94M
79%    /
devfs                                                 1.0K    1.0K      0B
100%    /dev
/dev/ufs/2tmp                                         1.9G     44K    1.8G
  0%    /tmp
/dev/ufs/2usr                                          38G     18G     17G
52%    /usr
/dev/ufs/2var                                         3.9G    2.0G    1.5G
58%    /var
/dev/msdosfs/APRIL RYAN                               7.5G    2.9G    4.7G
38%    /var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a


That's all I can think off right now.
Comment 4 kamikaze 2008-04-22 18:25:50 UTC
Bruce Evans wrote:
> On Mon, 21 Apr 2008, Dominic Fandrey wrote:
> This seems to be a bug in usb (umass) or the particular usb drive...
> ...
> 
> To check that this is the bug, mount msdosfs with -o noclusterr,noclusterw
> under RELENG_7 or later (the bug also affects RELENG_6, but these mount
> options are broken in RELENG_6). ...

# mount -t msdosfs -o noatime,noexec,-L=en_GB.UTF-8,noclusterr,noclusterw 
/dev/da0 /mnt/tmp
mount_msdosfs: /dev/da0: mount option <noclusterw> is unknown: Invalid argument

It seems they are also broken under RELENG_7 from the day before yesterday.

> 
>> # mount
>> /dev/ufs/2root on / (ufs, local)
>> devfs on /dev (devfs, local)
>> /dev/ufs/2tmp on /tmp (ufs, local, soft-updates)
>> /dev/ufs/2usr on /usr (ufs, NFS exported, local, soft-updates)
>> /dev/ufs/2var on /var (ufs, local, soft-updates)
>> pid874@mobileKamikaze:/var/run/automounter.amd.mnt on 
>> /var/run/automounter.amd.mnt (nfs)
>> /dev/msdosfs/APRIL RYAN on 
>> /var/run/automounter.mnt/msdosfs/bb8a40b99a061c33a35f4e7275d1842a 
>> (msdosfs, local, noatime, noexec)
> 
> The labels obfuscate the device type for all mountpoints very well.

The ufs mounts are on an SATA drive. The msdosfs slice is the USB stick at 
/dev/da0. The nfs mount is from amd.
Comment 5 Bruce Evans freebsd_committer 2008-04-23 02:18:25 UTC
On Tue, 22 Apr 2008, Dominic Fandrey wrote:

> Bruce Evans wrote:
>> On Mon, 21 Apr 2008, Dominic Fandrey wrote:
>> This seems to be a bug in usb (umass) or the particular usb drive...
>> ...
>> 
>> To check that this is the bug, mount msdosfs with -o noclusterr,noclusterw
>> under RELENG_7 or later (the bug also affects RELENG_6, but these mount
>> options are broken in RELENG_6). ...
>
> # mount -t msdosfs -o noatime,noexec,-L=en_GB.UTF-8,noclusterr,noclusterw 
> /dev/da0 /mnt/tmp
> mount_msdosfs: /dev/da0: mount option <noclusterw> is unknown: Invalid 
> argument
>
> It seems they are also broken under RELENG_7 from the day before yesterday.

Oops, it is indeed broken there too.  msdosfs in RELENG_7 is in the state
that I thought RELENG_6 was in (missing critical MFCs only for nocluster*
in the options list and to fix panics related to the dirty flag), while
msdosfs in RELENG_6 is just too old to support clustering.

The broken nocluster* can be worked around by upgrading to a version of
mount_msdsosfs(8) that hasn't been broken by using nmount(2).
mount_msdsosfs(8) from RELENG_5 should work.

Bruce
Comment 6 kamikaze 2008-04-23 21:37:35 UTC
Bruce Evans wrote:
> On Tue, 22 Apr 2008, Dominic Fandrey wrote:
> 
>> Bruce Evans wrote:
>>> On Mon, 21 Apr 2008, Dominic Fandrey wrote:
>>> This seems to be a bug in usb (umass) or the particular usb drive...
>>> ...
>>>
>>> To check that this is the bug, mount msdosfs with -o 
>>> noclusterr,noclusterw
>>> under RELENG_7 or later (the bug also affects RELENG_6, but these mount
>>> options are broken in RELENG_6). ...
>>
>> # mount -t msdosfs -o 
>> noatime,noexec,-L=en_GB.UTF-8,noclusterr,noclusterw /dev/da0 /mnt/tmp
>> mount_msdosfs: /dev/da0: mount option <noclusterw> is unknown: Invalid 
>> argument
>>
>> It seems they are also broken under RELENG_7 from the day before 
>> yesterday.
> 
> Oops, it is indeed broken there too.  msdosfs in RELENG_7 is in the state
> that I thought RELENG_6 was in (missing critical MFCs only for nocluster*
> in the options list and to fix panics related to the dirty flag), while
> msdosfs in RELENG_6 is just too old to support clustering.
> 
> The broken nocluster* can be worked around by upgrading to a version of
> mount_msdsosfs(8) that hasn't been broken by using nmount(2).
> mount_msdsosfs(8) from RELENG_5 should work.

I feel reluctant about downgrading to 5.x mount_msdosfs, however I can confirm 
that cp with large files does _not_ cause a panic. As far as I understand this 
confirms your theory.

How can I provide more useful information?
Comment 7 Bruce Evans freebsd_committer 2008-04-24 04:57:04 UTC
On Wed, 23 Apr 2008, Dominic Fandrey wrote:

> Bruce Evans wrote:
>> The broken nocluster* can be worked around by upgrading to a version of
>> mount_msdsosfs(8) that hasn't been broken by using nmount(2).
>> mount_msdsosfs(8) from RELENG_5 should work.
>
> I feel reluctant about downgrading to 5.x mount_msdosfs,

But it would be an upgrage :-).  Anyway, running mount_msdosfs on one
disposable file system that might panic should be safe.

> however I can 
> confirm that cp with large files does _not_ cause a panic. As far as I 
> understand this confirms your theory.

Not quite.  I would have expected the problem to affect read() and write()
too unless the file system is mounted with -nocluster*.

> How can I provide more useful information?

Check if the cp of large files actually works.  A previous report mentioned
data corruption but I don't remember it saying anything about panics.  Maybe
mmap() does something different that causes more serious corruption.

I'll have to think more about adding debugging code to mmap() and the
device driver.

Meanwhile, can you try changing this code in msdosfs_vnops.c:

%%%
 	mp = vp->v_mount;
 	maxio = mp->mnt_iosize_max / mp->mnt_stat.f_iosize;
 	bnpercn = de_cn2bn(pmp, 1);
%%%

o Add a printf to print out maxio (might need rate limiting).
o Try lower values of maxio until you find the largest one that works
   (keep dividing by 2.  Only try one value per boot or per mount of
   course).  I think it is always 128K initially, and small values will
   work.  A value of the cluster size (typically 4K) or smaller should
   give the old behaviour.

or one or more of the following:

o Check that large i/o's to the raw device work.
o Check for the problem with other file systems that implement clustering.
   ffs is easiest.
o On an older version of FreeBSD that doesn't seem to have the problem,
   check for the problem with msdosfs with a large cluster size (the
   cluster can be up to 64K, which is large enough to show the problem that
   I suspect).   Check on file systems that implement clustering too (now
   the block size doesn't need to be large to cause large i/o's).

Bruce
Comment 8 kamikaze 2008-04-24 05:30:40 UTC
Bruce Evans wrote:
> On Wed, 23 Apr 2008, Dominic Fandrey wrote:
> 
>> Bruce Evans wrote:
>>> The broken nocluster* can be worked around by upgrading to a version of
>>> mount_msdsosfs(8) that hasn't been broken by using nmount(2).
>>> mount_msdsosfs(8) from RELENG_5 should work.
>>
>> I feel reluctant about downgrading to 5.x mount_msdosfs,
> 
> But it would be an upgrage :-).  Anyway, running mount_msdosfs on one
> disposable file system that might panic should be safe.

If it really is of help, I will downgrade. Not before the weekend, though.

>> however I can confirm that cp with large files does _not_ cause a 
>> panic. As far as I understand this confirms your theory.
> 
> Not quite.  I would have expected the problem to affect read() and write()
> too unless the file system is mounted with -nocluster*.
> 
>> How can I provide more useful information?
> 
> Check if the cp of large files actually works.  A previous report mentioned
> data corruption but I don't remember it saying anything about panics.  
> Maybe
> mmap() does something different that causes more serious corruption.

I copied a 1.2gb DVD rip and watched it afterwards. No corruption. Md5 
checksums show that the file on the stick and the original are identical.

> I'll have to think more about adding debugging code to mmap() and the
> device driver.
> 
> Meanwhile, can you try changing this code in msdosfs_vnops.c:
> 
> %%%
>     mp = vp->v_mount;
>     maxio = mp->mnt_iosize_max / mp->mnt_stat.f_iosize;
>     bnpercn = de_cn2bn(pmp, 1);
> %%%
> 
> o Add a printf to print out maxio (might need rate limiting).
> o Try lower values of maxio until you find the largest one that works
>   (keep dividing by 2.  Only try one value per boot or per mount of
>   course).  I think it is always 128K initially, and small values will
>   work.  A value of the cluster size (typically 4K) or smaller should
>   give the old behaviour.

I will give it a try.

> or one or more of the following:
> 
> o Check that large i/o's to the raw device work.
> o Check for the problem with other file systems that implement clustering.
>   ffs is easiest.
> o On an older version of FreeBSD that doesn't seem to have the problem,
>   check for the problem with msdosfs with a large cluster size (the
>   cluster can be up to 64K, which is large enough to show the problem that
>   I suspect).   Check on file systems that implement clustering too (now
>   the block size doesn't need to be large to cause large i/o's).
> 
> Bruce
> 

These ones are harder. I will also defer them to the weekend.
Comment 9 kamikaze 2008-05-02 09:55:56 UTC
Bruce Evans wrote:
> On Wed, 23 Apr 2008, Dominic Fandrey wrote:
> 
>> Bruce Evans wrote:
>>> The broken nocluster* can be worked around by upgrading to a version of
>>> mount_msdsosfs(8) that hasn't been broken by using nmount(2).
>>> mount_msdsosfs(8) from RELENG_5 should work.
>>
>> I feel reluctant about downgrading to 5.x mount_msdosfs,
> 
> But it would be an upgrage :-).  Anyway, running mount_msdosfs on one
> disposable file system that might panic should be safe.
> 
>> however I can confirm that cp with large files does _not_ cause a 
>> panic. As far as I understand this confirms your theory.
> 
> Not quite.  I would have expected the problem to affect read() and write()
> too unless the file system is mounted with -nocluster*.

This can be closed.

Your suggestions have been very helpful. It turned out that fusefs-ntfs is 
causing the panic, when I copy files from it.
Comment 10 Gavin Atkinson freebsd_committer freebsd_triage 2008-05-02 10:55:32 UTC
State Changed
From-To: feedback->closed

Submitter reports that this was actually caused by fusefs-ntfs and not 
msdosfs.
Comment 11 Bruce Evans freebsd_committer 2008-05-02 15:25:36 UTC
On Fri, 2 May 2008, Dominic Fandrey wrote:

> Bruce Evans wrote:
>> On Wed, 23 Apr 2008, Dominic Fandrey wrote:
>> 
>>> Bruce Evans wrote:
>>>> The broken nocluster* can be worked around by upgrading to a version of
>>>> mount_msdsosfs(8) that hasn't been broken by using nmount(2).
>>>> mount_msdsosfs(8) from RELENG_5 should work.
>>> 
>>> I feel reluctant about downgrading to 5.x mount_msdosfs,
>> 
>> But it would be an upgrage :-).  Anyway, running mount_msdosfs on one
>> disposable file system that might panic should be safe.
>> 
>>> however I can confirm that cp with large files does _not_ cause a panic. 
>>> As far as I understand this confirms your theory.
>> 
>> Not quite.  I would have expected the problem to affect read() and write()
>> too unless the file system is mounted with -nocluster*.
>
> This can be closed.
>
> Your suggestions have been very helpful. It turned out that fusefs-ntfs is 
> causing the panic, when I copy files from it.

Now we have a better argument for not axing non-port ntfs :-).  I think
it sort of works read-only.  Too bad we're no closer to understand the
msdosfs problem.

Bruce
Comment 12 kamikaze 2008-05-02 15:40:23 UTC
Bruce Evans wrote:
> On Fri, 2 May 2008, Dominic Fandrey wrote:
> 
>> Bruce Evans wrote:
>>> On Wed, 23 Apr 2008, Dominic Fandrey wrote:
>>>
>>>> Bruce Evans wrote:
>>>>> The broken nocluster* can be worked around by upgrading to a 
>>>>> version of
>>>>> mount_msdsosfs(8) that hasn't been broken by using nmount(2).
>>>>> mount_msdsosfs(8) from RELENG_5 should work.
>>>>
>>>> I feel reluctant about downgrading to 5.x mount_msdosfs,
>>>
>>> But it would be an upgrage :-).  Anyway, running mount_msdosfs on one
>>> disposable file system that might panic should be safe.
>>>
>>>> however I can confirm that cp with large files does _not_ cause a 
>>>> panic. As far as I understand this confirms your theory.
>>>
>>> Not quite.  I would have expected the problem to affect read() and 
>>> write()
>>> too unless the file system is mounted with -nocluster*.
>>
>> This can be closed.
>>
>> Your suggestions have been very helpful. It turned out that 
>> fusefs-ntfs is causing the panic, when I copy files from it.
> 
> Now we have a better argument for not axing non-port ntfs :-).  I think
> it sort of works read-only.  Too bad we're no closer to understand the
> msdosfs problem.
> 
> Bruce

It was really all my fault, I forgot to rebuild fusefs-kmod after updating my 
kernel. I'm sorry (not really) that I cannot serve you helpful data for the 
msdosfs problems other people have expressed.