Bug 205979 - Raspberry Pi2 loses audio after some time
Summary: Raspberry Pi2 loses audio after some time
Status: Closed FIXED
Alias: None
Product: Base System
Classification: Unclassified
Component: arm (show other bugs)
Version: CURRENT
Hardware: arm Any
: --- Affects Many People
Assignee: Oleksandr Tymoshenko
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-06 19:27 UTC by karl
Modified: 2017-05-08 19:58 UTC (History)
1 user (show)

See Also:


Attachments
This appears to sort of fix the problem.... (512 bytes, patch)
2016-02-04 04:39 UTC, karl
no flags Details | Diff
rpi-audio-fix.diff (20.08 KB, patch)
2016-12-20 02:27 UTC, Oleksandr Tymoshenko
no flags Details | Diff
rpi-audio-fix-11.diff (20.19 KB, patch)
2016-12-20 03:43 UTC, Oleksandr Tymoshenko
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description karl 2016-01-06 19:27:35 UTC
After some (relatively short) period of time on 11-Current with the Raspberry Pi2 audio fails in a "mute" state.  Some programs attempting to access audio (e.g. mpg321) will print a diagnostic on the console, others (e.g. espeak) simply hang.

If there is a diagnostic it is of the form:

pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead

There appears to be no way to clear this condition once it occurs other than a reboot.
Comment 1 karl 2016-01-15 14:51:03 UTC
BTW I'm happy to run this down but need a place to start looking.... if the folk(s) who worked on the bcm2835 code originally could chime in it would go a long way toward pointing me in the right direction.

I can rapid-build kernels via cross-compilation and can force the problem to occur within a half-hour or so -- this smells like either an interrupt isn't being enabled when it should be or is remaining masked off when it shouldn't.  Even a workaround that cleared the problem when a new open occurred on the channel would render the problem a nuisance rather than the far-more serious issue that it currently is.
Comment 2 karl 2016-01-19 20:20:25 UTC
Update:  I have built a test system and stuck a looping shell process on it playing mp3 files.  It did *NOT* fail over a period of more than 24 hours, where my "general purpose" examples that try to play audio samples and speech tend to go mute within a few hours of operation.

That test machine was doing nothing else and was continually playing audio using mpg321, which strongly suggests that some other actor that touches the bcm2835 during "normal" operation is improperly masking the interrupt for audio.
Comment 3 karl 2016-02-04 04:39:34 UTC
Created attachment 166534 [details]
This appears to sort of fix the problem....

This is a flat-out shot in the dark.... it doesn't completely fix the problem, in that the audio still goes "deaf", but once it does a short time later it comes back on and works from that point forward without further failures (~2 weeks+ without a second problem.)

It's probably wrong-headed in that this shouldn't be necessary in the first place if I'm reading the code right, but I'm not arguing with success :)
Comment 4 commit-hook freebsd_committer freebsd_triage 2016-11-07 17:39:27 UTC
A commit references this bug:

Author: gonzo
Date: Mon Nov  7 17:38:40 UTC 2016
New revision: 308424
URL: https://svnweb.freebsd.org/changeset/base/308424

Log:
  Fix locking in bcm2835_audio driver

  - Move all VCHI activity to worker thread: channel methods are called with
      non-sleepable lock held and VCHI uses sleepable lock.

  - In worker thread use sx(9) lock instead of mutex(9) for the same reason.

  PR:		213801, 205979

Changes:
  head/sys/arm/broadcom/bcm2835/bcm2835_audio.c
Comment 5 karl 2016-11-07 19:14:25 UTC
(In reply to commit-hook from comment #4)

Is this headed to MFC-land? :)
Comment 6 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-11-07 19:19:55 UTC
(In reply to karl from comment #5)

Yes, I just forgot to add MFC field :(

I'm going to try to keep STABLE-11 in sync with -CURRENT as much as possible.
Comment 7 karl 2016-11-08 14:04:03 UTC
I am still not seeing this last commit on the Stable/11 branch.....
Comment 8 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-11-08 19:26:51 UTC
(In reply to karl from comment #7)

Hans Peter Selasky suggested some improvements. I am going to investigate and commit both patches together
Comment 9 karl 2016-11-08 19:27:33 UTC
(In reply to Oleksandr Tymoshenko from comment #8)

Thanks... will look for it.
Comment 10 karl 2016-11-30 14:13:18 UTC
Is there any update in status available on this?
Comment 11 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-11-30 20:36:09 UTC
(In reply to karl from comment #10)

Not yet. Got distracted by other stuff. It is finished now and I am planning to spend some time working on it this week
Comment 12 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-12-20 02:26:57 UTC
Update on the progress. I rewrote FreeBSD driver to be more robust (I believe). The problem is it looks like there was bug in VC firmware that caused PWM audio output deadlocks. I don't see it with latest firmware but I haven't ran long-term tests. HDMI audio output doesn't seem to be affected. 

More details: https://github.com/raspberrypi/firmware/issues/696

I'd appreciate more tests of new patch with new firmware. If you use serial port make sure you have following line in config.txt:

init_uart_clock=3000000
Comment 13 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-12-20 02:27:44 UTC
Created attachment 178126 [details]
rpi-audio-fix.diff

Add WIP patch
Comment 14 karl 2016-12-20 02:46:58 UTC
What is this against?  I get a LOT of failures against r310285 (11-STABLE)....
Comment 15 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-12-20 03:43:10 UTC
Created attachment 178127 [details]
rpi-audio-fix-11.diff

Add patch agains STABLE-11
Comment 16 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-12-20 03:43:31 UTC
(In reply to karl from comment #14)
 -CURRENT. I attached patch agains 11
Comment 17 karl 2016-12-20 13:18:25 UTC
Testing now.... will be a bit for it to compile and update and then (of course) know if it fails or not.
Comment 18 karl 2016-12-20 15:05:15 UTC
I have my test Pi2 running an infinite loop playing a short mp3 chime; we'll see if it survives the day running this without locking up.  If the patch fixes the problem, it should.
Comment 19 karl 2016-12-20 22:00:39 UTC
So far looks good -- no hangs, been running for several hours looping on a mpg123 play command for recorded chime I have set up and is still playing on the headphones.
Comment 20 karl 2016-12-21 15:47:28 UTC
Nope -- got another hang, although it took a LONG time to show up.

pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead
pcm0: chn_write(): pcm0:virtual:dsp0.vp1: play interrupt timeout, channel dead
pcm0: chn_write(): pcm0:virtual:dsp0.vp2: play interrupt timeout, channel dead

73394  -  S       1:15.71 /usr/local/bin/espeak -v en-us -s 175

"espeak" is being asked to make an announcement, and it is spinning (accumulating runtime but not going anywhere)

I will attempt to craft a means of causing this more-readily.
Comment 21 karl 2016-12-21 15:47:59 UTC
FreeBSD 11.0-STABLE #1 r310285M: Tue Dec 20 07:18:19 CST 2016     karl@NewFS.denninger.net:/pics/CrossBuild/obj/arm.armv6/pics/CrossBuild/src/sys/RPI2
Comment 22 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-12-21 16:16:34 UTC
(In reply to karl from comment #20)

Could you show sysctl dev.pcm.0 output?

Also look for word "weird" in /var/log/mesages
There is debug message for inconsistent VC firmware callback.
Comment 23 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-12-21 16:25:38 UTC
(In reply to karl from comment #20)

Also, did you update firmware to the latest version?
Comment 24 karl 2016-12-21 16:41:06 UTC
Where is the firmware to update with found and where does it go?  This is what I have in the boot messages; I have updated the kernel sources but if there's a blob that has to go somewhere I probably don't have that.

vchiq0: <BCM2835 VCHIQ> mem 0xb800-0xb84f irq 28 on simplebus0
vchiq: local ver 8 (min 3), remote ver 8.
pcm0: <VCHIQ audio> on vchiq0


There is nothing interesting in /var/log/messages other than the pcm errors:

Dec 21 08:54:44 HD-MCP ntpd[546]: leapsecond file ('/etc/ntp/leap-seconds'): expired less than 360 days ago
Dec 21 09:45:04 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp0: play interrupt timeout, channel dead
Dec 21 09:45:09 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp1: play interrupt timeout, channel dead
Dec 21 09:45:10 HD-MCP kernel: pcm0: chn_write(): pcm0:virtual:dsp0.vp2: play interrupt timeout, channel dead

root@HD-MCP:/var/log # sysctl dev.pcm.0
dev.pcm.0.starved: 0
dev.pcm.0.freebuffer: 40000
dev.pcm.0.underruns: 8
dev.pcm.0.retrieved: 0
dev.pcm.0.submitted: 40000
dev.pcm.0.callbacks: 1376
dev.pcm.0.dest: 0
dev.pcm.0.bitperfect: 0
dev.pcm.0.buffersize: 0
dev.pcm.0.play.vchanformat: s16le:2.0
dev.pcm.0.play.vchanrate: 48000
dev.pcm.0.play.vchanmode: fixed
dev.pcm.0.play.vchans: 3
dev.pcm.0.%parent: vchiq0
dev.pcm.0.%pnpinfo:
dev.pcm.0.%location:
dev.pcm.0.%driver: pcm
dev.pcm.0.%desc: VCHIQ audio
Comment 25 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2016-12-21 16:55:34 UTC
(In reply to karl from comment #24)

There is FAT partition on SD card with VC firmware bits. You can download latest binaries from here: https://github.com/raspberrypi/firmware/tree/master/boot/
You need *.elf, *.dat, and bootcode.bin. Make sure you added "init_uart_clock=3000000" to config.txt on FAT partition if you have UART console connect. 

Backup content of FAT partition before updating it, just in case.
Comment 26 karl 2016-12-21 17:02:59 UTC
Picking up those files got me a non-booting unit... will investigate (have to hook up a physical console....)
Comment 27 karl 2016-12-21 17:41:46 UTC
It appears that my test machine (which was running a custom u-boot) didn't like the updated files -- I've reverted back to the stock uboot and it comes up; running the tests once again.
Comment 28 karl 2016-12-22 04:01:48 UTC
So far so good....
Comment 29 karl 2016-12-25 19:26:51 UTC
Failed and machine is now in a persistent state (good, which means I can run whatever tests you may wish) where the .0 device is dead.  See my other report here: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213687
Comment 30 commit-hook freebsd_committer freebsd_triage 2016-12-27 19:08:33 UTC
A commit references this bug:

Author: gonzo
Date: Tue Dec 27 19:08:08 UTC 2016
New revision: 310636
URL: https://svnweb.freebsd.org/changeset/base/310636

Log:
  [rpi] Fix bcm2835_audio locking and samples starvation

  Rework general approach to locking and working with audio worker thread:

  - Use flags to signal requested worker action
  - Fix submitted buffer calculations to avoid samples starvation
  - Protect buffer pointers with locks to fix race condition between callback
    and audio worker thread
  - Remove unnecessary vchi_service_use
  - Do not use lock to serialize VCHI requests since only one thread issues them now
  - Fix unloading signaling per hselasky@ suggestion
  - Add output to detect inconsistent callback data caused by possible firmware bug
    https://github.com/raspberrypi/firmware/issues/696
  - Add stats/debug sysctls to troubleshoot possible bugs

  PR:		213687, 205979, 215194
  MFC after:	1 week

Changes:
  head/sys/arm/broadcom/bcm2835/bcm2835_audio.c
Comment 31 Oleksandr Tymoshenko freebsd_committer freebsd_triage 2017-02-28 20:31:04 UTC
Closing since original problem was fixed. There is one more outstanding issue introduced by fix:  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=213687
Comment 32 commit-hook freebsd_committer freebsd_triage 2017-05-08 19:58:40 UTC
A commit references this bug:

Author: gonzo
Date: Mon May  8 19:57:16 UTC 2017
New revision: 317974
URL: https://svnweb.freebsd.org/changeset/base/317974

Log:
  MFC r308424, r310636

  r308424:
  Fix locking in bcm2835_audio driver

  - Move all VCHI activity to worker thread: channel methods are called with
      non-sleepable lock held and VCHI uses sleepable lock.

  - In worker thread use sx(9) lock instead of mutex(9) for the same reason.

  PR:		213801, 205979

  r310636:
  [rpi] Fix bcm2835_audio locking and samples starvation

  Rework general approach to locking and working with audio worker thread:

  - Use flags to signal requested worker action
  - Fix submitted buffer calculations to avoid samples starvation
  - Protect buffer pointers with locks to fix race condition between callback
    and audio worker thread
  - Remove unnecessary vchi_service_use
  - Do not use lock to serialize VCHI requests since only one thread issues them now
  - Fix unloading signaling per hselasky@ suggestion
  - Add output to detect inconsistent callback data caused by possible firmware bug
    https://github.com/raspberrypi/firmware/issues/696
  - Add stats/debug sysctls to troubleshoot possible bugs

  PR:		213687, 205979, 215194

Changes:
_U  stable/11/
  stable/11/sys/arm/broadcom/bcm2835/bcm2835_audio.c
Comment 33 commit-hook freebsd_committer freebsd_triage 2017-05-08 19:58:44 UTC
A commit references this bug:

Author: gonzo
Date: Mon May  8 19:57:16 UTC 2017
New revision: 317974
URL: https://svnweb.freebsd.org/changeset/base/317974

Log:
  MFC r308424, r310636

  r308424:
  Fix locking in bcm2835_audio driver

  - Move all VCHI activity to worker thread: channel methods are called with
      non-sleepable lock held and VCHI uses sleepable lock.

  - In worker thread use sx(9) lock instead of mutex(9) for the same reason.

  PR:		213801, 205979

  r310636:
  [rpi] Fix bcm2835_audio locking and samples starvation

  Rework general approach to locking and working with audio worker thread:

  - Use flags to signal requested worker action
  - Fix submitted buffer calculations to avoid samples starvation
  - Protect buffer pointers with locks to fix race condition between callback
    and audio worker thread
  - Remove unnecessary vchi_service_use
  - Do not use lock to serialize VCHI requests since only one thread issues them now
  - Fix unloading signaling per hselasky@ suggestion
  - Add output to detect inconsistent callback data caused by possible firmware bug
    https://github.com/raspberrypi/firmware/issues/696
  - Add stats/debug sysctls to troubleshoot possible bugs

  PR:		213687, 205979, 215194

Changes:
_U  stable/11/
  stable/11/sys/arm/broadcom/bcm2835/bcm2835_audio.c