After some (relatively short) period of time on 11-Current with the Raspberry Pi2 audio fails in a "mute" state. Some programs attempting to access audio (e.g. mpg321) will print a diagnostic on the console, others (e.g. espeak) simply hang. If there is a diagnostic it is of the form: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead There appears to be no way to clear this condition once it occurs other than a reboot.
BTW I'm happy to run this down but need a place to start looking.... if the folk(s) who worked on the bcm2835 code originally could chime in it would go a long way toward pointing me in the right direction. I can rapid-build kernels via cross-compilation and can force the problem to occur within a half-hour or so -- this smells like either an interrupt isn't being enabled when it should be or is remaining masked off when it shouldn't. Even a workaround that cleared the problem when a new open occurred on the channel would render the problem a nuisance rather than the far-more serious issue that it currently is.
Update: I have built a test system and stuck a looping shell process on it playing mp3 files. It did *NOT* fail over a period of more than 24 hours, where my "general purpose" examples that try to play audio samples and speech tend to go mute within a few hours of operation. That test machine was doing nothing else and was continually playing audio using mpg321, which strongly suggests that some other actor that touches the bcm2835 during "normal" operation is improperly masking the interrupt for audio.
Created attachment 166534 [details] This appears to sort of fix the problem.... This is a flat-out shot in the dark.... it doesn't completely fix the problem, in that the audio still goes "deaf", but once it does a short time later it comes back on and works from that point forward without further failures (~2 weeks+ without a second problem.) It's probably wrong-headed in that this shouldn't be necessary in the first place if I'm reading the code right, but I'm not arguing with success :)
A commit references this bug: Author: gonzo Date: Mon Nov 7 17:38:40 UTC 2016 New revision: 308424 URL: https://svnweb.freebsd.org/changeset/base/308424 Log: Fix locking in bcm2835_audio driver - Move all VCHI activity to worker thread: channel methods are called with non-sleepable lock held and VCHI uses sleepable lock. - In worker thread use sx(9) lock instead of mutex(9) for the same reason. PR: 213801, 205979 Changes: head/sys/arm/broadcom/bcm2835/bcm2835_audio.c
(In reply to commit-hook from comment #4) Is this headed to MFC-land? :)
(In reply to karl from comment #5) Yes, I just forgot to add MFC field :( I'm going to try to keep STABLE-11 in sync with -CURRENT as much as possible.
I am still not seeing this last commit on the Stable/11 branch.....
(In reply to karl from comment #7) Hans Peter Selasky suggested some improvements. I am going to investigate and commit both patches together
(In reply to Oleksandr Tymoshenko from comment #8) Thanks... will look for it.
Is there any update in status available on this?
(In reply to karl from comment #10) Not yet. Got distracted by other stuff. It is finished now and I am planning to spend some time working on it this week
Update on the progress. I rewrote FreeBSD driver to be more robust (I believe). The problem is it looks like there was bug in VC firmware that caused PWM audio output deadlocks. I don't see it with latest firmware but I haven't ran long-term tests. HDMI audio output doesn't seem to be affected. More details: https://github.com/raspberrypi/firmware/issues/696 I'd appreciate more tests of new patch with new firmware. If you use serial port make sure you have following line in config.txt: init_uart_clock=3000000
Created attachment 178126 [details] rpi-audio-fix.diff Add WIP patch
What is this against? I get a LOT of failures against r310285 (11-STABLE)....
Created attachment 178127 [details] rpi-audio-fix-11.diff Add patch agains STABLE-11
(In reply to karl from comment #14) -CURRENT. I attached patch agains 11
Testing now.... will be a bit for it to compile and update and then (of course) know if it fails or not.
I have my test Pi2 running an infinite loop playing a short mp3 chime; we'll see if it survives the day running this without locking up. If the patch fixes the problem, it should.
So far looks good -- no hangs, been running for several hours looping on a mpg123 play command for recorded chime I have set up and is still playing on the headphones.
Nope -- got another hang, although it took a LONG time to show up. pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead pcm0: chn_write(): pcm0:virtual:dsp0.vp1: play interrupt timeout, channel dead pcm0: chn_write(): pcm0:virtual:dsp0.vp2: play interrupt timeout, channel dead 73394 - S 1:15.71 /usr/local/bin/espeak -v en-us -s 175 "espeak" is being asked to make an announcement, and it is spinning (accumulating runtime but not going anywhere) I will attempt to craft a means of causing this more-readily.
FreeBSD 11.0-STABLE #1 r310285M: Tue Dec 20 07:18:19 CST 2016 karl@NewFS.denninger.net:/pics/CrossBuild/obj/arm.armv6/pics/CrossBuild/src/sys/RPI2
(In reply to karl from comment #20) Could you show sysctl dev.pcm.0 output? Also look for word "weird" in /var/log/mesages There is debug message for inconsistent VC firmware callback.
(In reply to karl from comment #20) Also, did you update firmware to the latest version?
Where is the firmware to update with found and where does it go? This is what I have in the boot messages; I have updated the kernel sources but if there's a blob that has to go somewhere I probably don't have that. vchiq0: <BCM2835 VCHIQ> mem 0xb800-0xb84f irq 28 on simplebus0 vchiq: local ver 8 (min 3), remote ver 8. pcm0: <VCHIQ audio> on vchiq0 There is nothing interesting in /var/log/messages other than the pcm errors: Dec 21 08:54:44 HD-MCP ntpd[546]: leapsecond file ('/etc/ntp/leap-seconds'): expired less than 360 days ago Dec 21 09:45:04 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead Dec 21 09:45:09 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp1: play interrupt timeout, channel dead Dec 21 09:45:10 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp2: play interrupt timeout, channel dead root@HD-MCP:/var/log # sysctl dev.pcm.0 dev.pcm.0.starved: 0 dev.pcm.0.freebuffer: 40000 dev.pcm.0.underruns: 8 dev.pcm.0.retrieved: 0 dev.pcm.0.submitted: 40000 dev.pcm.0.callbacks: 1376 dev.pcm.0.dest: 0 dev.pcm.0.bitperfect: 0 dev.pcm.0.buffersize: 0 dev.pcm.0.play.vchanformat: s16le:2.0 dev.pcm.0.play.vchanrate: 48000 dev.pcm.0.play.vchanmode: fixed dev.pcm.0.play.vchans: 3 dev.pcm.0.%parent: vchiq0 dev.pcm.0.%pnpinfo: dev.pcm.0.%location: dev.pcm.0.%driver: pcm dev.pcm.0.%desc: VCHIQ audio
(In reply to karl from comment #24) There is FAT partition on SD card with VC firmware bits. You can download latest binaries from here: https://github.com/raspberrypi/firmware/tree/master/boot/ You need *.elf, *.dat, and bootcode.bin. Make sure you added "init_uart_clock=3000000" to config.txt on FAT partition if you have UART console connect. Backup content of FAT partition before updating it, just in case.
Picking up those files got me a non-booting unit... will investigate (have to hook up a physical console....)
It appears that my test machine (which was running a custom u-boot) didn't like the updated files -- I've reverted back to the stock uboot and it comes up; running the tests once again.
So far so good....
Failed and machine is now in a persistent state (good, which means I can run whatever tests you may wish) where the .0 device is dead. See my other report here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213687
A commit references this bug: Author: gonzo Date: Tue Dec 27 19:08:08 UTC 2016 New revision: 310636 URL: https://svnweb.freebsd.org/changeset/base/310636 Log: [rpi] Fix bcm2835_audio locking and samples starvation Rework general approach to locking and working with audio worker thread: - Use flags to signal requested worker action - Fix submitted buffer calculations to avoid samples starvation - Protect buffer pointers with locks to fix race condition between callback and audio worker thread - Remove unnecessary vchi_service_use - Do not use lock to serialize VCHI requests since only one thread issues them now - Fix unloading signaling per hselasky@ suggestion - Add output to detect inconsistent callback data caused by possible firmware bug https://github.com/raspberrypi/firmware/issues/696 - Add stats/debug sysctls to troubleshoot possible bugs PR: 213687, 205979, 215194 MFC after: 1 week Changes: head/sys/arm/broadcom/bcm2835/bcm2835_audio.c
Closing since original problem was fixed. There is one more outstanding issue introduced by fix: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213687
A commit references this bug: Author: gonzo Date: Mon May 8 19:57:16 UTC 2017 New revision: 317974 URL: https://svnweb.freebsd.org/changeset/base/317974 Log: MFC r308424, r310636 r308424: Fix locking in bcm2835_audio driver - Move all VCHI activity to worker thread: channel methods are called with non-sleepable lock held and VCHI uses sleepable lock. - In worker thread use sx(9) lock instead of mutex(9) for the same reason. PR: 213801, 205979 r310636: [rpi] Fix bcm2835_audio locking and samples starvation Rework general approach to locking and working with audio worker thread: - Use flags to signal requested worker action - Fix submitted buffer calculations to avoid samples starvation - Protect buffer pointers with locks to fix race condition between callback and audio worker thread - Remove unnecessary vchi_service_use - Do not use lock to serialize VCHI requests since only one thread issues them now - Fix unloading signaling per hselasky@ suggestion - Add output to detect inconsistent callback data caused by possible firmware bug https://github.com/raspberrypi/firmware/issues/696 - Add stats/debug sysctls to troubleshoot possible bugs PR: 213687, 205979, 215194 Changes: _U stable/11/ stable/11/sys/arm/broadcom/bcm2835/bcm2835_audio.c