|Summary:||[ichwd] New Intel TCO Watchdog Timer (Skylake/Sunrise Point and newer) is not supported|
|Product:||Base System||Reporter:||Greg V <greg>|
|Component:||kern||Assignee:||Andriy Gapon <avg>|
|Severity:||Affects Some People||CC:||0mp, avg, dnelson_1901, driesm.michiels, emaste, kib, t_uemura|
Description Greg V 2017-09-05 18:47:05 UTC
Looks like Intel moved the watchdog from LPC to SMBus in Sunrise Point, so the ichwd driver does not work anymore. Relevant Linux patches: https://lkml.org/lkml/2015/7/27/456
Comment 1 Konstantin Belousov 2017-10-10 15:28:46 UTC
I have some WIP but it is far from being able to drive the hardware. The current state is that I have attachments for the power controller, added code which understands location of the TCO registers in SMBUS but the stumbling block is to correctly reserve TCO window in rman. It is already reserved by motherboard PNP0C02 on the server where I debug this.
Comment 2 Andriy Gapon 2018-09-19 10:42:27 UTC
Just curious if there are any news about this.
Comment 3 Konstantin Belousov 2018-09-19 12:00:13 UTC
(In reply to Andriy Gapon from comment #2) It is in the same state, nothing happens. I did not received the advise about co-existence of drivers.
Comment 4 Andriy Gapon 2018-10-16 14:43:01 UTC
A proposed patch (first iteration as of now): https://reviews.freebsd.org/D17585 Please see the review request for its limitations. If PCI ID of your pci0:0:31:4 device is different, then please add it before testing.
Comment 5 commit-hook 2018-10-22 14:44:48 UTC
A commit references this bug: Author: avg Date: Mon Oct 22 14:44:45 UTC 2018 New revision: 339591 URL: https://svnweb.freebsd.org/changeset/base/339591 Log: ichwd: add support for TCO watchdog timer in Lewisburg PCH (C620) The change is based on public documents listed below as well as Linux changes and the code developed by Kostik. The documents: - Intel? C620 Series Chipset Platform Controller Hub Datasheet - Intel? 100 Series and Intel? C230 Series Chipset Family Platform Controller Hub (PCH) Datasheet - Volume 2 of 2 Interesting Linux commits: - https://github.com/torvalds/linux/commit/9424693035a57961a8eb09e96aab315a7096535d - https://github.com/torvalds/linux/commit/2a7a0e9bf7b32e838d873226808ab8a6c00148f7 The peculiarity of the new chipsets is that the watchdog resources are configured in PCI registers of SMBus controller and Power Management function as opposed to the LPC bridge. I took a simplistic approach of querying the resources from the respective PCI devices. ichwd is still a device on isa bus. The PCI devices are found by their slot and function defined in the datasheets as siblings of the upstream LPC bridge. There are some shortcuts and missing features. First of all, I have not implemented the functionality required to clear the no-reboot bit. That would require writing to a special PCI configuration register of a hidden / invisible PCI device after which the device would start responding to accesses to other registers. The no-reboot bit was not set on my test hardware, so I decided to leave its handling for the later time. Also, I did not try to handle the case where the watchdog resources are not configured by the hardware as well as the case where ACPI defined operational region conflicts with the watchdog resources. My test system did not have either of those problem, so, again, I decided to leave those cases until later. See this Linux commit for some details of the ACPI problem: https://github.com/torvalds/linux/commit/a7ae81952cdab56a1277bd2f9ed7284c0f575120 Finally, I have added only the PCI ID found on my test system. I think that more IDs can be added as the change gets tested. Tested on Dell PowerEdge R740. PR: 222079 Reviewed by: mav, kib MFC after: 3 weeks Relnotes: maybe Sponsored by: Panzura Differential Revision: https://reviews.freebsd.org/D17585 Changes: head/sys/dev/ichwd/ichwd.c head/sys/dev/ichwd/ichwd.h
Comment 6 t_uemura 2018-11-01 04:14:26 UTC
The patch (with the relevant PCI device ID added) can be applied cleanly against 11.2-STABLE as of 28th Sep. on my Shuttle DS77U mini PC (Sunrise Point-LP chipset), compiled with no error, and the patched ichwd now is able to probe/attach the watchdog timer as follows. ichwd0: <Sunrise Point-LP watchdog timer> at port 0x400-0x41f on isa0 Though, the timer never fires by killing the running watchdogd daemon, as it should. /etc/rc.d/watchdogd onestart sleep 10 killall -KILL watchdogd I can see the following two verbose lines when I start watchdogd, just for information. ichwd0: timer enabled ichwd0: timeout set to 229 ticks Is this related to the missing no-reboot bit handling? P.S. The added PCI device ID is 0x9d23 .
Comment 7 Andriy Gapon 2018-11-01 06:36:46 UTC
(In reply to t_uemura from comment #6) I think that the missing no-reboot handling is the most likely issue here. If you are interested, this is what Linux does for that bit: https://github.com/torvalds/linux/blob/9424693035a57961a8eb09e96aab315a7096535d/drivers/i2c/busses/i2c-i801.c#L1201 On FreeBSD/x86 pci_cfgregread() and pci_cfgregwrite() [see sys/x86/include/pci_cfgreg.h] are functions that can be used to directly access the hardware even if it's hidden from the pci bus driver.
Comment 8 t_uemura 2018-11-06 04:48:17 UTC
Created attachment 198998 [details] Attempt to clear NO_REBOOT bit on tco_version==4 hardware
Comment 9 t_uemura 2018-11-06 04:54:58 UTC
Andriy Thank you for your pointer. I just added some lines to try to clear NO_REBOOT bit but no success. The readout of SMBus PCR GC register (offset 0xc6000c of SBREG_BAR) is always 0xffffffff and I can't change the value. I don't know whether this is because of programming error or WDT is disabled completely.
Comment 10 t_uemura 2018-11-06 06:11:50 UTC
Reply to myself. My hardware seems to be a culprit. The watchdog doesn't fire also on Linux even it is probed and attached successfully. The Linux test detail is described in the following page. http://www.madore.org/~david/linux/iTCO-wdt-test.html
Comment 11 Andriy Gapon 2018-11-06 10:11:17 UTC
(In reply to t_uemura from comment #8) Thank you very much for the patch! Does it work as far as device attachment goes? I mean, are you able to read the BAR and to create the resource based on it? What is the BAR that you get? My own preference would be NOT to create a device_t handle for 31:1. I would instead just set the resource described by the BAR on the ichwd device.
Comment 12 Andriy Gapon 2018-11-06 10:12:50 UTC
(In reply to t_uemura from comment #10) I see... Maybe the BIOS on your system configures the watchdog to generate an SMI and locks that configuration. Maybe something else. Thank you for the investigation and the patch.
Comment 13 commit-hook 2018-11-06 13:55:15 UTC
A commit references this bug: Author: avg Date: Tue Nov 6 13:54:25 UTC 2018 New revision: 340182 URL: https://svnweb.freebsd.org/changeset/base/340182 Log: MFC r339591: ichwd: add support for TCO watchdog timer in Lewisburg PCH (C620) PR: 222079 Relnotes: maybe Sponsored by: Panzura Changes: _U stable/11/ stable/11/sys/dev/ichwd/ichwd.c stable/11/sys/dev/ichwd/ichwd.h
Comment 14 Dries Michiels 2018-11-06 16:26:56 UTC
Hi, Can this be comitted to stable 12 too? Or is this already in? Thanks
Comment 15 commit-hook 2018-11-06 17:31:22 UTC
A commit references this bug: Author: avg Date: Tue Nov 6 17:31:10 UTC 2018 New revision: 340190 URL: https://svnweb.freebsd.org/changeset/base/340190 Log: MFC r339591: ichwd: add support for TCO watchdog timer in Lewisburg PCH (C620) PR: 222079 Approved by: re (rgrimes) Relnotes: maybe Sponsored by: Panzura Changes: _U stable/12/ stable/12/sys/dev/ichwd/ichwd.c stable/12/sys/dev/ichwd/ichwd.h
Comment 16 t_uemura 2018-11-07 00:53:57 UTC
Andriy > I mean, are you able to read the BAR and to create the resource based on it? > What is the BAR that you get? With my patch applied, I can read SBREG_BAR by pci_cfgregread() calls. For my hardware, the BAR value is 0xfd000000, so the SMBus PCR GC register should be at 0xfdc6000c . In order to access to the pointed location, I think I must call bus_alloc_resource() with relevant device_t specified as its first argument. By specifying dev or something like device_get_parent(dev), all calls fail to allocate resource. I think dev must be PCI one, for pci_alloc_resource() to be called. > My own preference would be NOT to create a device_t handle for 31:1. I would > instead just set the resource described by the BAR on the ichwd device. Sure. If there is an API without needing device_t, I would use that. Anyway, I've got the following verbose message with my patch applied. ichwd0: <Sunrise Point-LP watchdog timer> at port 0x400-0x41f on isa0 pcib0: allocated type 4 (0x400-0x41f) for rid 0 of ichwd0 pcib0: allocated type 4 (0x1830-0x1837) for rid 1 of ichwd0 found-> vendor=0x8086, dev=0x9d20, revid=0x21 domain=0, bus=0, slot=31, func=1 class=05-80-00, hdrtype=0x00, mfdev=0 cmdreg=0x0006, statreg=0x0000, cachelnsz=0 (dwords) lattimer=0x00 (0 ns), mingnt=0x00 (0 ns), maxlat=0x00 (0 ns) map: type Memory, range 64, base rxfd000000, size 24, enabled pcib0: allocated type 3 (0xfd000000-0xfdffffff) for rid 10 of pci0:0:31:1 ichwd0: ICH WDT present but disabled in BIOS or hardware device_attach: ichwd0 attach returned 6
Comment 17 t_uemura 2018-11-07 05:49:43 UTC
I noticed that my ThinkPad T460s also had a Sunrise Point-LP PCH. The Linux test worked correctly (successfully rebooted) on this ThinkPad, though my patched ichwd didn't. Now I think there is programming error in my patch, maybe around mapping SMBus PCR GC register (around line 783 in patched ichwd.c) or accessing the register (around line 505).
Comment 18 Andriy Gapon 2018-11-07 06:57:40 UTC
(In reply to t_uemura from comment #16) See how ichwd steals / borrows a resource based at ICH_TCOBASE. You would need to call bus_set_resource() before you can call bus_alloc_resource(). Also, you would need to invent a unique rid for the new resource and consistently use it in both set and alloc.
Comment 19 Andriy Gapon 2018-11-07 07:03:59 UTC
(In reply to t_uemura from comment #17) I do not see any obvious mistakes in your patch _except_ for using a resource associated with a PCI device after hiding that PCI device. It's possible that the resource becomes invalid.
Comment 20 Andriy Gapon 2018-11-07 07:08:08 UTC
(In reply to t_uemura from comment #16) Oh, based on this > pcib0: allocated type 3 (0xfd000000-0xfdffffff) for rid 10 of pci0:0:31:1 I suspect that ichwd_read_p2sb_4 / ichwd_write_p2sb_4 operate on address 0xfd000000 instead of 0xfdc6000c. Resource allocation is hard.
Comment 21 t_uemura 2018-11-08 06:18:46 UTC
Created attachment 199074 [details] (Maybe) fixed resource allocation.
Comment 22 t_uemura 2018-11-08 06:47:55 UTC
Andriy By calling bus_set_resource() instantly panics my system. I think this device doesn't need bus_set_resource() because the target memory range is mapped correctly in pci_add_map(), perhaps called within pci_add_child() . > map: type Memory, range 64, base rxfd000000, size 24, enabled I don't know why bus_alloc_resource() in my previous patch allocates not only 0xfdc6000c ~ 0xfdc6000f but whole 0xfd000000 ~ 0xfdffffff. In the modified patch, bus_alloc_resource() now try to allocate whole 16 MiB range, and ichwd_read_p2sb_4 / ichwd_write_p2sb_4 is accessing offset 0xc6000c of that range. On both Shuttle mini PC and ThinkPad, the initial readout value of ichwd_read_p2sb_4 is 0x00000000. Now I can set/unset the NO_REBOOT bit, so at least the resource allocation issue is fixed. On the other hand, with or without my new patch, ThinkPad now reboots successfully upon watchdog timeout. However Shuttle mini PC doesn't. This may be a BIOS or even a hardware issue.
Comment 23 Andriy Gapon 2018-12-05 13:27:02 UTC
Created attachment 199847 [details] my update of the NO_REBOOT patch (In reply to t_uemura from comment #22) Sorry for the long silence. Could you please test the patch that I am attaching and see if it doesn't cause any regressions comparing to your patch?
Comment 24 Andriy Gapon 2018-12-05 13:30:39 UTC
Created attachment 199848 [details] my update of the NO_REBOOT patch Oops, fixed a copy+paste error.
Comment 25 t_uemura 2018-12-12 01:23:24 UTC
Sorry for my delay. Just tested but to no avail. Specifically, it probed and attached successfully, enabled once watchdogd was started, and never fired after the running watchdogd was killed by SIGKILL. The verbose dmesg is as follows. ichwd0: <Sunrise Point-LP watchdog timer> at port 0x400-0x41f iomem 0xfdc6000c-0xfdc6000f on isa0 pcib0: allocated type 4 (0x400-0x41f) for rid 0 of ichwd0 pcib0: allocated type 3 (0xfdc6000c-0xfdc6000f) for rid 1 of ichwd0 pcib0: allocated type 4 (0x1830-0x1837) for rid 2 of ichwd0 ichwd0: timer disabled random: harvesting attach, 8 bytes (4 bits) from ichwd0 ichwd0: timer enabled ichwd0: timeout set to 229 ticks Note that a person at Shuttle Technical Support replied to my inquiry that their DS77U "cannot support onboard hardware watchdog function". On the other hand, the original ichwd surely worked on ThinkPad T460s, so at least the corresponding device ID 0x9d23 can be added.
Comment 26 Andriy Gapon 2018-12-12 07:41:43 UTC
(In reply to t_uemura from comment #25) Thank you for testing! I didn't suppose that my change will fix the watchdog functionality on DS77U. I just wanted to check that it works no worse than the original patch that you developed.
Comment 27 t_uemura 2018-12-12 07:57:06 UTC
OK, I'll try your patch on ThinkPad shortly.
Comment 28 t_uemura 2018-12-14 06:23:11 UTC
I just confirmed on ThinkPad T460s that ichwd with your patch applied worked correctly.