Hello, I am the happy owner of the raspberry pi 2, analogue sounds works GREAT but I've always had problem after a few days. I listen to music several hours every day with my Pi, if I play music everyday everything is fine, but I have noticed that if I don't play music for more than 48 hours, I get this error trying to relaunch music (no log before): pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead Any idea? I have this verbose log just before it dies (cut version): pcm0: WARNING: PCMDIR_PLAY DMA completion too fast/slow ! hwptr=44000, old=44000 delta=0 amt=0 ready=8192 free=119808 feed_root: (virtual) appending 3676 bytes (count=3676 l=0 feed=7524878) pcm0: WARNING: PCMDIR_PLAY DMA completion too fast/slow ! hwptr=48000, old=48000 delta=0 amt=0 ready=8192 free=119808 feed_root: (virtual) appending 1852 bytes (count=1852 l=0 feed=7524879) feed_root: (virtual) appending 1824 bytes (count=1824 l=0 feed=7524880) pcm0: WARNING: PCMDIR_PLAY DMA completion too fast/slow ! hwptr=52000, old=52000 delta=0 amt=0 ready=8192 free=119808 pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead Gonzo said in an earlier bug these WARNINGs are inherent to the way the raspberry pi sends sound to the firmware so I'm not worried about it. # sysctl -a | grep pcm: dev.pcm.0.dest: 1 dev.pcm.0.eq_preamp: +0.0dB dev.pcm.0.eq: 1 dev.pcm.0.bitperfect: 0 dev.pcm.0.buffersize: 0 dev.pcm.0.play.vchanformat: s16le:2.0 dev.pcm.0.play.vchanrate: 48000 dev.pcm.0.play.vchanmode: fixed dev.pcm.0.play.vchans: 1 dev.pcm.0.%parent: vchiq0 dev.pcm.0.%pnpinfo: dev.pcm.0.%location: dev.pcm.0.%driver: pcm dev.pcm.0.%desc: VCHIQ audio dev.pcm.%parent: # sysctl -a | grep snd hw.snd.maxautovchans: 16 hw.snd.default_unit: 0 hw.snd.version: 2009061500/armv6 hw.snd.default_auto: 1 hw.snd.verbose: 4 hw.snd.vpc_mixer_bypass: 1 hw.snd.feeder_rate_quality: 1 hw.snd.feeder_rate_round: 25 hw.snd.feeder_rate_max: 2016000 hw.snd.feeder_rate_min: 1 hw.snd.feeder_rate_polyphase_max: 183040 hw.snd.feeder_rate_presets: 100:8:0.85 100:36:0.92 100:164:0.97 hw.snd.feeder_eq_exact_rate: 0 hw.snd.feeder_eq_presets: PEQ:16000,0.2500,62,0.2500:-9,9,1.0:44100,48000,88200,96000,176400,192000 hw.snd.basename_clone: 1 hw.snd.compat_linux_mmap: 0 hw.snd.syncdelay: -1 hw.snd.usefrags: 0 hw.snd.vpc_reset: 0 hw.snd.vpc_0db: 45 hw.snd.vpc_autoreset: 1 hw.snd.timeout: 5 hw.snd.latency_profile: 1 hw.snd.latency: 5 hw.snd.report_soft_matrix: 1 hw.snd.report_soft_formats: 1
The only fix is to restart (shutdown -r now) the Pi (as I unfortunately cannot reload the sound or pcm or bcm_audio module). My kernel version is CURRENT as of today (but bug has been there on my weekly builds for more than 1 year).
This is almost-certainly the same bug as I reported here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=205979 Reported in January and while I've done some work on it I've not been able to squash it.
Oh, incidentally, it will clear on its own. Sometimes. And then it comes back. Detecting it is not hard (since it logs the error) but I've yet to figure out how to trigger a full re-init on the device when it occurs, which should (in theory anyway) clear it -- or why it's happening in the first place.
I think it's somewhat different from the bug Karl described. Locking was done wrong in RPi audio driver and bug reported by Karl was most likely fixed by 308424. This issue looks like either firmware bug or VCHI driver bug. Because I managed to get device in this state by playing mp3s for ~48 hours, and once I tried to use test VCHI app kernel panicked. I am going to close bug opened by Karl and will use this one for tracking development.
Thanks Oleksandr
Hi Sylvain, Could you test WIP patch in bug #205979? Thank you
Patched applied cleanly, will take some time to compile but I will definitely give you feedback early next week after Christmas.
I got an error in the system log but it apparently reset itself, and didn't kill the interface permanently or hang the writing proces (which is a positive change) Dec 22 22:31:39 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead This is the same error I've seen before incidentally.
(In reply to karl from comment #8) Could you check logs for "weird" string? There should be some number along with that word. Also when this error happens, could you save output of sysctl dev.pcm.0 and attach to this issue? Thank you
No "weird" string is found in the logs.... It has only happened once so far since I put the new code and firmware in, and it has NOT locked up. The test machine didn't freeze OR print any errors in over 24 hours, so I stuck it on my home controller, which "talks" from time to time, and that's where I got the pcm error -- but no lock, which is a major improvement. This is what the home controller has for PCM: root@HD-MCP:/var/log # sysctl -a|grep dev.pcm dev.pcm.0.starved: 0 dev.pcm.0.freebuffer: 40000 dev.pcm.0.underruns: 15 dev.pcm.0.retrieved: 714244 dev.pcm.0.submitted: 752000 dev.pcm.0.callbacks: 4125 dev.pcm.0.dest: 0 dev.pcm.0.bitperfect: 0 dev.pcm.0.buffersize: 0 dev.pcm.0.play.vchanformat: s16le:2.0 dev.pcm.0.play.vchanrate: 48000 dev.pcm.0.play.vchanmode: fixed dev.pcm.0.play.vchans: 1 dev.pcm.0.%parent: vchiq0 dev.pcm.0.%pnpinfo: dev.pcm.0.%location: dev.pcm.0.%driver: pcm dev.pcm.0.%desc: VCHIQ audio dev.pcm.%parent: And "snd": root@HD-MCP:/var/log # sysctl -a |grep snd hw.snd.maxautovchans: 16 hw.snd.default_unit: 0 hw.snd.version: 2009061500/armv6 hw.snd.default_auto: 1 hw.snd.verbose: 0 hw.snd.vpc_mixer_bypass: 1 hw.snd.feeder_rate_quality: 1 hw.snd.feeder_rate_round: 25 hw.snd.feeder_rate_max: 2016000 hw.snd.feeder_rate_min: 1 hw.snd.feeder_rate_polyphase_max: 183040 hw.snd.feeder_rate_presets: 100:8:0.85 100:36:0.92 100:164:0.97 hw.snd.feeder_eq_exact_rate: 0 hw.snd.feeder_eq_presets: PEQ:16000,0.2500,62,0.2500:-9,9,1.0:44100,48000,88200,96000,176400,192000 hw.snd.basename_clone: 1 hw.snd.compat_linux_mmap: 0 hw.snd.syncdelay: -1 hw.snd.usefrags: 0 hw.snd.vpc_reset: 0 hw.snd.vpc_0db: 45 hw.snd.vpc_autoreset: 1 hw.snd.timeout: 5 hw.snd.latency_profile: 1 hw.snd.latency: 5 hw.snd.report_soft_matrix: 1 hw.snd.report_soft_formats: 1 I don't know if the underruns are related to the error in the log or not. It occurred yesterday but I have not seen it again (so far) and it also hasn't frozen the PCM output so whatever it was the error cleared itself (although the individual announcement that caused it might not have gone through; not sure on that one.)
Just caught it dead again -- and now it appears I have it in a state where I can keep it dead, thus if you have things that you want me to do, I can. Here's what's in the log: Dec 25 13:11:29 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead Dec 25 13:11:33 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp1: play interrupt timeout, channel dead No "wierd" log messages. root@HD-MCP:/var/log # sysctl -a | grep pcm dev.pcm.0.starved: 0 dev.pcm.0.freebuffer: 40000 dev.pcm.0.underruns: 24 dev.pcm.0.retrieved: 722648 dev.pcm.0.submitted: 760000 dev.pcm.0.callbacks: 5992 dev.pcm.0.dest: 0 dev.pcm.0.bitperfect: 0 dev.pcm.0.buffersize: 0 dev.pcm.0.play.vchanformat: s16le:2.0 dev.pcm.0.play.vchanrate: 48000 dev.pcm.0.play.vchanmode: fixed dev.pcm.0.play.vchans: 2 dev.pcm.0.%parent: vchiq0 dev.pcm.0.%pnpinfo: dev.pcm.0.%location: dev.pcm.0.%driver: pcm dev.pcm.0.%desc: VCHIQ audio dev.pcm.%parent: root@HD-MCP:/var/log # sysctl -a | grep snd hw.snd.maxautovchans: 16 hw.snd.default_unit: 0 hw.snd.version: 2009061500/armv6 hw.snd.default_auto: 1 hw.snd.verbose: 0 hw.snd.vpc_mixer_bypass: 1 hw.snd.feeder_rate_quality: 1 hw.snd.feeder_rate_round: 25 hw.snd.feeder_rate_max: 2016000 hw.snd.feeder_rate_min: 1 hw.snd.feeder_rate_polyphase_max: 183040 hw.snd.feeder_rate_presets: 100:8:0.85 100:36:0.92 100:164:0.97 hw.snd.feeder_eq_exact_rate: 0 hw.snd.feeder_eq_presets: PEQ:16000,0.2500,62,0.2500:-9,9,1.0:44100,48000,88200,96000,176400,192000 hw.snd.basename_clone: 1 hw.snd.compat_linux_mmap: 0 hw.snd.syncdelay: -1 hw.snd.usefrags: 0 hw.snd.vpc_reset: 0 hw.snd.vpc_0db: 45 hw.snd.vpc_autoreset: 1 hw.snd.timeout: 5 hw.snd.latency_profile: 1 hw.snd.latency: 5 hw.snd.report_soft_matrix: 1 hw.snd.report_soft_formats: 1 If I queue something now it hangs and produces the first line: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead Then shortly appears to open dsp0.vp1 and plays it (with a ~5 second delay.) Underruns is NOT getting incremented. If you have something you'd like me to try shoot it to me here -- I'll endeavor not to reboot the impacted device, since it is now in a "persistent" screwed state.
Another note -- there are in fact two "sounds" that occur normally on these events -- a call to mpg321 to play a chime, then a call to "espeak" to announce the event (text-to-speech.) The mpg321 call is the one that is failing, and the other appears to be going out to the .1 device.
Also, the string "weird" isn't anywhere in your patch here: https://bz-attachments.freebsd.org/attachment.cgi?id=178127 Are you sure it's supposed to show up?
(In reply to karl from comment #13) You're right. I removed that debug output. Look for "ch->available_space".
(In reply to karl from comment #11) Could you save output of "sysctl dev.pcm.0", try to play something, and then save output of sysctl again. Then paste both outputs here.
No "available_space" entries. root@HD-MCP:/usr/local/bin/hd-scripts # sysctl -a|grep dev.pcm.0 dev.pcm.0.starved: 0 dev.pcm.0.freebuffer: 40000 dev.pcm.0.underruns: 24 dev.pcm.0.retrieved: 706140 dev.pcm.0.submitted: 744000 dev.pcm.0.callbacks: 7132 dev.pcm.0.dest: 0 dev.pcm.0.bitperfect: 0 dev.pcm.0.buffersize: 0 dev.pcm.0.play.vchanformat: s16le:2.0 dev.pcm.0.play.vchanrate: 48000 dev.pcm.0.play.vchanmode: fixed dev.pcm.0.play.vchans: 2 dev.pcm.0.%parent: vchiq0 dev.pcm.0.%pnpinfo: dev.pcm.0.%location: dev.pcm.0.%driver: pcm dev.pcm.0.%desc: VCHIQ audio root@HD-MCP:/usr/local/bin/hd-scripts # cat ambient-announce #!/bin/sh # OUTDOOR="$(cat /tmp/HD-MCP/variables/Lanai_Multi_ambient/state)" #echo 'The outdoor temperature is' $OUTDOOR 'degrees.' ( /usr/local/bin/mpg321 -q /usr/local/sounds/chime.mp3; echo 'The outdoor temperature is ' $OUTDOOR ' degrees.' | /usr/local/bin/espeak -v en-us -s 175) & root@HD-MCP:/usr/local/bin/hd-scripts # ./ambient-announce (So you can see exactly what is being done in the script. The chime does NOT play -- the mpg321 command, although it runs and produces no visible errors -- but the announcement through espeak DOES.) This generates the following in both dmesg and /var/log/messages: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead And now the sysctl again: root@HD-MCP:/usr/local/bin/hd-scripts # sysctl -a | grep dev.pcm.0 dev.pcm.0.starved: 0 dev.pcm.0.freebuffer: 40000 dev.pcm.0.underruns: 24 dev.pcm.0.retrieved: 703680 dev.pcm.0.submitted: 740000 dev.pcm.0.callbacks: 7505 dev.pcm.0.dest: 0 dev.pcm.0.bitperfect: 0 dev.pcm.0.buffersize: 0 dev.pcm.0.play.vchanformat: s16le:2.0 dev.pcm.0.play.vchanrate: 48000 dev.pcm.0.play.vchanmode: fixed dev.pcm.0.play.vchans: 2 dev.pcm.0.%parent: vchiq0 dev.pcm.0.%pnpinfo: dev.pcm.0.%location: dev.pcm.0.%driver: pcm dev.pcm.0.%desc: VCHIQ audio
(In reply to karl from comment #16) Just to confirm. When you say "espeak DOES" - does it mean espeak plays sound and then generates error, or just generates error?
It appears that mpg321 is generating the error and not playing any sound. espeak appears to not be generating an error and *does* play the sound. However, I just ran it again (with an active "tail -f /var/log/messages" to try to confirm the exact order of events), and got a very different outcome -- this time the chime *and* espeak both ran and produced sound, but with a very long (~10 second) pause before espeak actually produced sound output, during which time it was consuming CPU. And there was no error logged either..... A second run a few seconds later produced the formerly-encountered results: 1. mpg321 runs, accumulates runtime as expected in terms of execution time and exits, but produces no sound. 2. On mpg321 exit (approximately) the "channel dead" message is logged to the syslog and dmesg 3. espeak runs and produces sound output, then exits normally, but does accumulate a fair bit more runtime than usual (spinning?) before it actually starts to speak. It exits normally without error. When things are behaving normally the two events are effectively seamless; you get the chime and then immediately (for all intents) the voicing of the message. This machine runs very little and typically has a load average around 0.5 -- in fact, other than ntp (for time sync), a vestigal sendmail (so it can send alarms to other places) and sshd (so I can get into it) its only "normal" program is a binary executable that runs my house. That code multiplexes two serial interfaces (an X10 CM11 and a Zwave stick) and several TCP sockets (it is the "master" in a master/slave relationship with the same code running at my pool equipment and in addition has a built-in web server functionality so I can both monitor and control it from afar, all via SSL.) The incantation at the pool hardware uses GPIO and I2c extensively to talk to physical sensors and relays, but while that code is in this box it's not used so I doubt there's an interference problem coming from any of the physical I/O stuff. The "heartbeat" is done through /dev/led/act but I can't see why that would be at issue here....
Thanks for detailed explanation. Two more things: - I understand you updated firmware on your test machine. Does this box have the latest version of firmware also? - Could you attach chime mp3 to this bug so I could reproduce exact setup? Change in bitrate or something along these lines may trigger this condition so I'd like to have exactly the same files. I don't think other components contribute to this issue. Most likely it's firmware + kernel + combination of audio activities.
Created attachment 178273 [details] chime.mp3 file as requested
Yes on the firmware. Without it the lockup, once it happened, was permanent until you rebooted, and it would hang the "espeak" process spinning on CPU. Now it doesn't but whether the difference is due to the firmware, your code change or both I don't know. Added the chime file for you as well.
(In reply to Oleksandr Tymoshenko from comment #6) Hi Oleskandr, as far as I am concerned, my original problem seems to be solved in 12-CURRENT with the patch you attached: I started playing some music through analogue output last Saturday, didn't play anything for the last 48h, and retried some music today => sound did play and no entry in /var/log/messages. Thanks! P.S.: not related to this patch, but although master mixer volume is set to 100, I feel like the sound output is not as powerful as it could maybe be (for what it's worth, I have audio_pwm_mode=2 in config.txt).
Last evening I *did* get this in the syslog, apparently from an attempt to play an announcement that I never heard: Dec 25 17:41:49 HD-MCP kernel: perr, count=4 Found that at line 232 in bcm2835_audio.c -- it has only shown up once in the log, and this morning announcements are working normally (no delays or errors, and the unit has not been rebooted.) This error was uncorrelated with the channel dead messages, although bracketed by two of them -- here's the log from the pertinent time of all kernel logs. Dec 25 17:12:41 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead Dec 25 17:41:49 HD-MCP kernel: perr, count=4 Dec 25 20:56:51 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp1: play interrupt timeout, channel dead
(In reply to karl from comment #23) Hi Karl, just to be sure you are testing with the same configuration than I do, are you using audio_pwm_mode=2 in config.txt? Analogue output sounds way better with this setting, see https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445
No, but I will..... that will require a reboot, but since it stopped misbehaving for the moment I'm not losing information doing that.
A commit references this bug: Author: gonzo Date: Tue Dec 27 19:08:08 UTC 2016 New revision: 310636 URL: https://svnweb.freebsd.org/changeset/base/310636 Log: [rpi] Fix bcm2835_audio locking and samples starvation Rework general approach to locking and working with audio worker thread: - Use flags to signal requested worker action - Fix submitted buffer calculations to avoid samples starvation - Protect buffer pointers with locks to fix race condition between callback and audio worker thread - Remove unnecessary vchi_service_use - Do not use lock to serialize VCHI requests since only one thread issues them now - Fix unloading signaling per hselasky@ suggestion - Add output to detect inconsistent callback data caused by possible firmware bug https://github.com/raspberrypi/firmware/issues/696 - Add stats/debug sysctls to troubleshoot possible bugs PR: 213687, 205979, 215194 MFC after: 1 week Changes: head/sys/arm/broadcom/bcm2835/bcm2835_audio.c
Since adding: audio_pwm_mode=2 to config.txt two days ago -- and rebooting -- I have not had any hangs or errors on the console (or in /var/log/messages) I have made no other changes. Hmmmm....
I did reproduce it, below is my dmesg log and the output of the new sysctl values. Admitted, my rpi2 was under a lot of stress, I was compiling a port on it and got a lot of MII and sums warnings before. [root@rpi2 ~]# dmesg ... warning: no time-of-day clock registered, system time will not be set accurately smsc0: chip 0xec00, rev. 0002 ue0: link state changed to DOWN ue0: link state changed to UP local_intc0: Spurious interrupt detected smsc0: warning: Failed to read register 0x114 smsc0: warning: MII is busy smsc0: warning: Failed to write register 0x114 smsc0: warning: Failed to read register 0x114 smsc0: warning: MII is busy smsc0: warning: Failed to read register 0x114 smsc0: warning: MII is busy smsc0: warning: Failed to read register 0x114 smsc0: warning: MII is busy smsc0: warning: Failed to read register 0x118 smsc0: warning: Failed to write register 0x114 smsc0: warning: Failed to read register 0x118 smsc0: warning: Failed to read register 0x114 smsc0: warning: MII is busy smsc0: warning: Failed to read register 0x118 smsc0: warning: Failed to write register 0x114 smsc0: warning: Failed to read register 0x118 smsc0: warning: Failed to read register 0x114 smsc0: warning: MII read timeout intc0: Spurious interrupt detected intc0: Spurious interrupt detected intc0: Spurious interrupt detected starve starve smsc0: warning: Failed to read register 0x118 smsc0: warning: Failed to write register 0x114 smsc0: warning: Failed to write register 0x114 pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead [root@rpi2 ~]# sysctl -a | grep pcm dev.pcm.0.starved: 2 dev.pcm.0.freebuffer: 36768 dev.pcm.0.underruns: 3 dev.pcm.0.retrieved: 10127596768 dev.pcm.0.submitted: 10127600000 dev.pcm.0.callbacks: 5274774 dev.pcm.0.dest: 1 dev.pcm.0.eq_preamp: +0.0dB dev.pcm.0.eq: 1 dev.pcm.0.bitperfect: 0 dev.pcm.0.buffersize: 0 dev.pcm.0.play.vchanformat: s16le:2.0 dev.pcm.0.play.vchanrate: 48000 dev.pcm.0.play.vchanmode: fixed dev.pcm.0.play.vchans: 1 dev.pcm.0.%parent: vchiq0 dev.pcm.0.%pnpinfo: dev.pcm.0.%location: dev.pcm.0.%driver: pcm dev.pcm.0.%desc: VCHIQ audio dev.pcm.%parent:
Actually, I can reproduce the error as soon as sound is playing and at the same time I launch "make" in some random /usr/port/*/*/ directory. After a few seconds of compilation, I get the channel dead error. I guess IO intensive operations make the sound driver crash. Output with sysctl hw.snd.verbose=4 ... Jan 3 21:48:08 rpi2 kernel: pcm0: WARNING: PCMDIR_PLAY DMA completion too fast/slow ! hwptr=4000, old=4000 delta=0 amt=0 ready=8192 free=31808 Jan 3 21:48:08 rpi2 last message repeated 19 times Jan 3 21:48:13 rpi2 kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead Jan 3 21:49:25 rpi2 kernel: smsc0: warning: Failed to read register 0x114 Jan 3 21:49:25 rpi2 kernel: smsc0: warning: MII read timeout -- dev.pcm.0.starved: 0 dev.pcm.0.freebuffer: 39120 dev.pcm.0.underruns: 1 dev.pcm.0.retrieved: 20043120 dev.pcm.0.submitted: 20044000 dev.pcm.0.callbacks: 10430 dev.pcm.0.dest: 1
I created separate bug 217434 for tracking this new issue since it's something that was introduced during the fix and that was not in code original bug request was filed againts
A commit references this bug: Author: gonzo Date: Mon May 8 19:57:16 UTC 2017 New revision: 317974 URL: https://svnweb.freebsd.org/changeset/base/317974 Log: MFC r308424, r310636 r308424: Fix locking in bcm2835_audio driver - Move all VCHI activity to worker thread: channel methods are called with non-sleepable lock held and VCHI uses sleepable lock. - In worker thread use sx(9) lock instead of mutex(9) for the same reason. PR: 213801, 205979 r310636: [rpi] Fix bcm2835_audio locking and samples starvation Rework general approach to locking and working with audio worker thread: - Use flags to signal requested worker action - Fix submitted buffer calculations to avoid samples starvation - Protect buffer pointers with locks to fix race condition between callback and audio worker thread - Remove unnecessary vchi_service_use - Do not use lock to serialize VCHI requests since only one thread issues them now - Fix unloading signaling per hselasky@ suggestion - Add output to detect inconsistent callback data caused by possible firmware bug https://github.com/raspberrypi/firmware/issues/696 - Add stats/debug sysctls to troubleshoot possible bugs PR: 213687, 205979, 215194 Changes: _U stable/11/ stable/11/sys/arm/broadcom/bcm2835/bcm2835_audio.c