Bug 17643

Summary: 3.4 to 4.0 upgrade: ATAPI drivers damage the filesystem
Product: Base System Reporter: grg <grg>
Component: kernAssignee: Søren Schmidt <sos>
Status: Closed FIXED    
Severity: Affects Only Me    
Priority: Normal    
Version: 3.4-STABLE   
Hardware: Any   
OS: Any   

Description grg 2000-03-28 18:40:01 UTC
 I have CVSuped the 4.0-STABLE sources on Mar 20.
 I have upgraded one 3.4-STABLE machine without any problems,
 and today wanted to upgrade another.
 I've made buildworld, buildkernel, installkernel.

 During the boot I've seen:

 Mar 28 17:48:56 koch2 /kernel: ata0: at 0x1f0 irq 14 on atapci0
 Mar 28 17:48:56 koch2 /kernel: ata0: at 0x1f0 irq 14 on atapci0
 Mar 28 17:49:07 koch2 /kernel: ed0: <Longshine LCS-8634P Ethernet Card
 Mar 28 17:49:07 koch2 /kernel: > at port 0x240-0x25f iomem 0xc0000-0xc003f irq 11 on isa0
 Mar 28 17:49:07 koch2 /kernel: ed0: supplying EUI64: 08:00:00:ff:fe:00:10:37
 Mar 28 17:49:07 koch2 /kernel: ed0: address 08:00:00:00:10:37, type NE2000 (16 bit) 
 ...
 Mar 28 17:49:07 koch2 /kernel: ed0: starting DAD for fe80:0001::0a00:00ff:fe00:1037
 Mar 28 17:49:07 koch2 /kernel: ed0: DAD complete for fe80:0001::0a00:00ff:fe00:1037 - no duplicates found
 Mar 28 17:49:07 koch2 /kernel: ed0: device timeout
 Mar 28 17:49:07 koch2 /kernel: ed0: device timeout
 Mar 28 17:49:07 koch2 /kernel: ad0: WRITE command timeout - resetting
 Mar 28 17:49:07 koch2 /kernel: ata0: resetting devices .. done
 Mar 28 17:49:11 koch2 /kernel: ed0: device timeout
 Mar 28 17:49:26 koch2 last message repeated 2 times
 Mar 28 17:49:48 koch2 /kernel: ad0: WRITE command timeout - resetting
 Mar 28 17:49:48 koch2 /kernel: ata0: resetting devices .. done
 Mar 28 17:50:23 koch2 /kernel: ed0: device timeout

Two strange thing here: ed0 timeout and ad0 timeout.

The booting process continued. 

Despite that fact that ed0 was detected,
"ifconfig ed0" gave "interface does not exist"
(also, note the 'ed0 device timeout' messages above)

Then I shut down the machine
and turned it off and on again, just in case there were
real minor problem with hardware.

Second boot:

fsck says about /usr "UNEXPECTED INCONSISTENCY, RUN FSCK MANUALLY".

After running fsck -y several times quite a few files
have been either deleted or moved to lost+found.
/usr/local and my home directory disappeared.

Then I rebooted again in order to turn 'UDMA' off in BIOS.

And again I've seen 'ad0: WRITE command timeout - resetting'
messages, and again /usr became damaged, which resulted
in loss of many other files and directories. About
2/3 of /usr is lost.

I restored parts of /usr and now this machine is again
on 3.4, without any hardware problems neither with 
HDD nor network card.

Fix: 

I think I don't know where to look for source of the problem.
Comment 1 Sheldon Hearn freebsd_committer freebsd_triage 2000-03-29 12:02:52 UTC
Responsible Changed
From-To: freebsd-bugs->sos

Over to the ata maintainer. 

Comment 2 grg 2000-03-29 15:19:20 UTC
I want to add that I use the chipset
detected as 

   pcib2: <VIA 82C598MVP (Apollo MVP3) PCI-PCI (AGP) bridge> at device 1.0 on pci0


Several other users wrote to freebsd-stable mail-list
about the same problem with this chipset.

-- 
=== Grigoriy Strokin, Lomonosov University (MGU), Moscow ===
=== contact info: http://isabase.philol.msu.ru/~grg/     ===
Comment 3 grg 2000-04-22 15:37:17 UTC
Further details:

When I use sysctl -w hw.atamodes=pio,pio,pio,pio, 
I live with 4.0 quite happily, but sometimes see
a strange FreeBSD behaviour:
   1) sometimes, after a clean shutdown, 
      there is a message
        WARNING: / was not properly dismounted
   2) sometimes, just after the message 
      'mounting root from ufs:/dev/ad0s3a'
      I see '/sbin/iu4yefbkljhf: not found,
      'iu4yefbkljhf' is varying random string.
      The system doesn't go multi-user in such cases.
      Then I turn the machine off and on again,
      and the system boots fine
   I suppose that some nasty things happen to happen
   just after the kernel is booted and
   before sysctl -w hw.atamodes=pio,pio,pio,pio
   from /etc/rc is executed, and also
   during the system shutdown.

   They happen seldom, but they do.



-- 
=== Grigoriy Strokin, Lomonosov University (MGU), Moscow ===
=== contact info: http://isabase.philol.msu.ru/~grg/     ===
Comment 4 Søren Schmidt freebsd_committer freebsd_triage 2000-11-14 08:19:28 UTC
State Changed
From-To: open->closed


Problems with the VIA 586 chipset should be fixed in 4.2 and later.