0.4 Watt Less During Audio Playback - (Updated:) Power Performance: Pulseaudio + Interrupt-Less Alsa

(Skip to the update)
Ok, so with some help from Pierre-Louis from Intel I've managed to get it working and do some performance/power tests. But let me start at the beginning: Recently, pulseaudio not only switched to a more power efficient (and otherwise) timing system, as far as I understand a callback API. It also provided the infrastructure to use ALSA devices without causing any interrupts ("period wakeup disabling"), so you CPU can stay longer in standby mode (e.g. "C6 residency"), saving you power and avoiding playback glitches at the same time. See here and here or more background information. With kernel 2.6.38 the first driver (snd-hda-intel) supports this infrastructure out of the box, the snd-hda-intel driver. This combination is what I tested for power efficiency...

And the results are impressive and a bit surprising. All tests were done on my netbook (poulsbo), optimized to reduce wakes. As player I use ogg123 -q, which was much better than e.g. mplayer -quiet in my experience. I'm running Kernel 2.6.39 and I needed to get alsa-lib 1.0.24 and pulseaudio-git and compile them from source, first installing alsa-lib, then compiling and installing pulseaudio. And yes, I've got KDE running in the background, so these measurements would be more exact without it, but I think it's well below a significant margin of error as you'll see.

Below I will show the powertop output in different configurations and then draw conclusions.

First, here's my idle output
Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 0.4%)         1.60 Ghz     3.2%
polling           0.0ms ( 0.0%)         1333 Mhz     0.0%
C1 mwait          0.1ms ( 0.0%)         1067 Mhz     0.0%
C2 mwait          0.5ms ( 0.1%)          800 Mhz    96.8%
C4 mwait         18.7ms ( 0.5%)
C6 mwait         69.8ms (99.0%)
Wakeups-from-idle per second : 17.0     interval: 4.1s
Power usage (ACPI estimate): 5.7W (9.6 hours)

Top causes for wakeups:
  37.8% (  7.8)   [      ] [extra timer interrupt]
  12.2% (  2.5)   [     0] [kernel scheduler] Load balancing tick
  12.2% (  2.5)   [     0] kworker/0:0
  12.2% (  2.5)   [  2173] plasma-desktop
   6.1% (  1.2)   [  2171] kwin
   4.9% (  1.0)   [  2191] krunner
   3.7% (  0.8)   [  2201] konsole
   1.2% (  0.2)   [      ] PS/2 keyboard/mouse/touchpad interrupt
   1.2% (  0.2)   [  1227] Xorg
   1.2% (  0.2)   [   981] NetworkManager
   1.2% (  0.2)   [  2043] klauncher
   1.2% (  0.2)   [  2558] kwalletd
   1.2% (  0.2)   [   654] rsyslogd
   1.2% (  0.2)   [  2045] kded4
   1.2% (  0.2)   [  2557] knetworkmanager

Here's my powertop output with alsa
Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 3.6%)         1.60 Ghz    47.4%
polling           0.0ms ( 0.0%)         1333 Mhz     0.0%
C1 mwait          0.2ms ( 0.0%)         1067 Mhz     0.4%
C2 mwait          0.6ms ( 0.4%)          800 Mhz    52.2%
C4 mwait          0.9ms ( 0.3%)
C6 mwait          6.8ms (95.7%)
Wakeups-from-idle per second : 150.2    interval: 2.3s
Power usage (ACPI estimate): 6.5W (8.4 hours)
Top causes for wakeups:
  55.0% ( 99.5)   [      ] [hda_intel] <interrupt>
27.3% ( 49.5)   [      ] [extra timer interrupt]
   7.5% ( 13.5)   [     0] kworker/0:0
   5.5% ( 10.0)   [     0] [kernel scheduler] Load balancing tick
   1.7% (  3.0)   [  2173] plasma-desktop
   0.8% (  1.5)   [  2171] kwin
   0.6% (  1.0)   [  2201] konsole
   0.6% (  1.0)   [  2191] krunner
   0.3% (  0.5)   [      ] PS/2 keyboard/mouse/touchpad interrupt
   0.3% (  0.5)   [  1227] Xorg
   0.3% (  0.5)   [  2558] kwalletd
   0.3% (  0.5)   [  2045] kded4
This is the result with Pulse
Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 6.0%)         1.60 Ghz    27.3%
polling           0.0ms ( 0.0%)         1333 Mhz     0.0%
C1 mwait          0.1ms ( 0.0%)         1067 Mhz     0.3%
C2 mwait          0.5ms ( 0.4%)          800 Mhz    72.4%
C4 mwait          0.9ms ( 0.3%)
C6 mwait         23.5ms (93.3%)
Wakeups-from-idle per second : 52.9     interval: 17.2s
Power usage (ACPI estimate): 6.4W (8.5 hours) (long term: 18.6W,/2.9h)
Top causes for wakeups:
  26.7% ( 25.0)   [      ] [acpi] <interrupt>
  17.4% ( 16.3)   [      ] [Rescheduling interrupts] <kernel IPI>
  15.7% ( 14.7)   [     0] kworker/0:0
  14.6% ( 13.6)   [     0] [kernel scheduler] Load balancing tick
   9.4% (  8.8)   [  7322] alsa-sink
   6.5% (  6.1)   [      ] [extra timer interrupt]
   2.8% (  2.6)   [  2173] plasma-desktop
   1.1% (  1.1)   [  2171] kwin
   1.1% (  1.0)   [  2191] krunner
   0.9% (  0.9)   [      ] PS/2 keyboard/mouse/touchpad interrupt
   0.9% (  0.9)   [  9829] threaded-ml
   0.6% (  0.5)   [  2201] konsole
   0.5% (  0.5)   [      ] [Function call interrupts] <kernel IPI>
   0.5% (  0.5)   [  1227] Xorg
   0.3% (  0.2)   [  2558] kwalletd
We can see the wakes per second are just one third of the previous value, 100 interrupts from the driver disappeared completely, replaced by 9 alsa-sink wakes.

And now ESD (via Pulse's compability layer)
Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 3.7%)         1.60 Ghz    29.8%
polling           0.0ms ( 0.0%)         1333 Mhz     0.0%
C1 mwait          0.0ms ( 0.0%)         1067 Mhz     0.0%
C2 mwait          0.4ms ( 0.1%)          800 Mhz    70.2%
C4 mwait          1.0ms ( 0.1%)
C6 mwait         34.3ms (96.1%)
Wakeups-from-idle per second : 32.8     interval: 7.9s
Power usage (ACPI estimate): 6.4W (8.5 hours) (long term: 15.5W,/3.5h)

Top causes for wakeups:
  58.4% ( 58.3)   [      ] [acpi] <interrupt>
  14.7% ( 14.7)   [     0] [kernel scheduler] Load balancing tick
   8.7% (  8.7)   [  7322] alsa-sink
   6.9% (  6.9)   [      ] [extra timer interrupt]
   5.3% (  5.3)   [     0] kworker/0:0
   2.6% (  2.6)   [  2173] plasma-desktop
   1.1% (  1.1)   [  2171] kwin
   0.4% (  0.4)   [      ] [Rescheduling interrupts] <kernel IPI>
   0.4% (  0.4)   [   787] hald
   0.3% (  0.3)   [  2045] kded4
   0.1% (  0.1)   [      ] PS/2 keyboard/mouse/touchpad interrupt
   0.1% (  0.1)   [      ] [Function call interrupts] <kernel IPI>
   0.1% (  0.1)   [   654] rsyslogd
   0.1% (  0.1)   [  2001] ssh-agent
   0.1% (  0.1)   [  2558] kwalletd


Now the wakes/s dropped another 20, the C6 residency increased to about 30ms.




Pierre-Louis let me know that with some further tweaking you can get all the way down to < 1 wake/s. I'll have to try that sometime to see how much the power consumption changes! :)


Update 20/04/11
The exact command is "echo 512 | sudo tee /proc/asound/card0/pcm0*p/sub0/prealloc" in my case. And that makes the final difference in combination with paplay or a patched version of pulse that uses the maximum latency by default with all clients. While this breaks some clients's buffer manangement and can crash them, it works quite well with some of them as you can see below. Inside src/pulse/stream.c in currently line 139 in git, search for
s->buffer_attr.tlength = pa_usec_to_bytes(250*PA_USEC_PER_MSEC, ss); /* 250ms of buffering */
and change it to
s->buffer_attr.tlength = (uint32_t) -1;
then recompile and install. But make a backup of your previos pulse version, because many clients won't be ready for this yet! Of couse you may get some power savings with more compatability as well by just increasing the 250 to a higher value of e.g. 2000 (2 seconds).
Oh and of course you need Alsa 1.0.24 (libs) and a current pulseaudio version, possibly all self-compiled (first Alsa-libs, then pulse).


And now optimized pulse with high latency

Cn                Avg residency       P-states (frequencies)
C0 (cpu running)        ( 2.1%)         1.60 Ghz    50.7%
polling           0.0ms ( 0.0%)         1333 Mhz     0.0%
C1 mwait          0.2ms ( 0.0%)         1067 Mhz     0.0%
C2 mwait          0.6ms ( 0.2%)          800 Mhz    49.3%
C4 mwait          1.1ms ( 0.1%)
C6 mwait         73.3ms (97.6%)
Wakeups-from-idle per second : 18.7     interval: 6.0s
Power usage (ACPI estimate): 6.1W (8.7 hours) (long term: 15.0W,/3.5h)

Top causes for wakeups:
  24.0% (  7.4)   [     0] kworker/0:0
  22.1% (  6.8)   [      ] [Rescheduling interrupts] <kernel IPI>
  15.6% (  4.8)   [     0] [kernel scheduler] Load balancing tick
  11.0% (  3.4)   [      ] [extra timer interrupt]
   9.7% (  3.0)   [  2498] plasma-desktop
   4.5% (  1.4)   [  2496] kwin
   2.6% (  0.8)   [  3771] alsa-sink
   2.6% (  0.8)   [  5326] threaded-ml
   1.3% (  0.4)   [  2526] konsole
   1.3% (  0.4)   [  2371] kded4
   0.6% (  0.2)   [      ] PS/2 keyboard/mouse/touchpad interrupt
   0.6% (  0.2)   [      ] [Function call interrupts] <kernel IPI>
   0.6% (  0.2)   [  2327] ssh-agent
   0.6% (  0.2)   [  2239] Xorg
   0.6% (  0.2)   [  2516] krunner



Conclusion (Updated 20/04/11)
While the power reduction is not very noticeable here (just 0.1 W) significant at 0.4 W though it will be less with video playing or other activities on the side (TODO: test with in combination vaapi acceleration). But the improvement is very measurable. The audio device interrupts (100 per second) completely disappeared due to pulse. The total wakes went 150 down to 19 (-88%, not even twice the idle value), only 0.8 seem to be directly related to pulse and C6 residency increased from 7 ms to over 71 ms. Quite a success! Congrats to Pierre-Loius for writing the necessary patches and Lennart for writing pulseaudio, which made this possible in the first place.

Update
I've uploaded a pre-packaged .deb version of a long latency patched pulseaudio (x86 Ubuntu 9.10+), you can try it on newer Ubuntu versions as well. (If you want to compile from source, e.g. because the package didn't work, see above for the line you have to change - it's not worth creating a patch for. You also need a kernel 2.6.38+. But it's strictly for testing! It's a randomly timed development source build that may eat your machine alive! ;)

If you like this post, share it and subscribe to the RSS feed so you don't miss the next one. In any case, check the related posts section below. (Because maybe I'm just having a really bad day and normally I write much more interesting articles about theses subjects! Or maybe you'll only understand what I meant here once you've read all my other posts on the topic. ;) )

6 comments:

  1. Your link to Lennart's blog post is broken - shouldn't it be http://0pointer.de/blog/projects/pulse-glitch-free.html ?

    ReplyDelete
  2. Thanks! Blogger likes to do that to me if there's no www in front of the url...

    ReplyDelete
  3. And how the extra latencies have affected the sound quality? Or we are talking here about MP3 or Flash videos? on the cheap chinese $5 speakers with power supply negative any marginal power savings?

    ReplyDelete
  4. @Ihar: well, I haven't noticed increased latencies, but I haven't really measured them or tweaked for maximum power efficiency yet, as in today's updated hints.

    ReplyDelete
  5. Actually the second link describes quite clearly how the latencies are affected. It depends on the audio client to pulse, but even low latency response and low power sound card delivery are not totally exclusive - the client can discard the buffer while it's in queue to reduce response times.

    ReplyDelete
  6. btw. as these power savings come from the CPU and my test system runs with an Atom CPU, most systems should actually save more power with these settings.

    ReplyDelete

I appreciate comments. And I do read them.