Re: [AD] mixers gone?

[ Thread Index | Date Index | More lists.liballeg.org/allegro-developers Archives ]


On Mon, Apr 28, 2008 at 4:42 AM, Peter Wang <novalazy@xxxxxxxxxx> wrote:
On 2008-04-28, Ryan Dickie <goalieca@xxxxxxxxxx> wrote:
>
> Hmm, sorry. I guess I got a bit carried away trying to refactor in order to
> make things more manageable.
>
> I was on #allegro-dev and I found out the 'mixer' was meant to be more of a
> 'filter' system where people could do fancy audio effects processing. I
> could add something like what i describe below.

Actually, there is a much simpler purpose: we need to be able to support sound
cards/drivers which don't support multiple voices and we need to be able to
work with samples/streams with differing same sampling rates or formats.
AFAICS all that code is gone and I guess you rely on OpenAL to handle that.
But OpenAL won't be the only backend.

Yes, I rely on OpenAL for this. From what I can gather, ALSA, pulseAudio, CoreAudio, DirectSound, and the Vista/Xbox api all handle this as well.

If we do need it, then the correct place to put the mixer code would be in the driver and not part of the user framework. The mixing should happen transparently and not require the user to manage it.
 

> Each voice has an attached sample. Each voice could also have a set of
> attached 'filters'. We could do something like this:
>
> //generate kernels based on sample properties like depth, frequency, etc.
> //a small set of stock kernels, users will obviously have to make their own
> to do custom effects
> void* kernel1 = al_filter_make_kernel(AL_SAMPLE* spl,
> ALLEGRO_GAUSSIAN_KERNEL);
> ...
> //adding a filter simply convolves the kernels always yielding a single
> kernel
> //this single kernel will be convolved with the sound sample when it is time
> // f * (g * h) = ( f * g ) * h
> // scalar a for amplification/attentuation can be factored in as well.
> al_voice_add_filter( void* kernel1, int len );
> al_voice_add_filter( void* kernel2, int len );
> al_voice_clear_filters();
> ...

You'll have to explain this idea more.

Peter

Sorry if some of this is review, I don't know if everyone is familiar with DSP so i'll just start with a rough overview.

Any audio stream can be described as a function of time. That is y = x(t) where y is the output of the sound card. Basically you can think of y as a vector (each component is a channels) and t as the sample position.

There is a way to transform this sound wave from a function of time to a function of frequency. That is, now you have the entire sound file described in terms of its frequencies. In this form, it is really easy to manipulate. You can shift frequencies up/down easily. You can cut off low frequencies or maybe amplify a certain range of frequencies. You can do this easily by multiplying this function with a filter (like a mask). But the problem is that doing the transforms to and from the frequency domain (FFTs and IFFTs) eats up too much processor time without a dedicated core.

Fortunately for us, there is a duality between the frequency and time domains. If we can transform the audio sample to the frequency domain, we can transform the frequency filter to the time domain. Anything that was multiplication in the frequency domain, becomes convolution in the time domain.
Wikipedia has a decent article on it here: http://en.wikipedia.org/wiki/Convolution
For example, a reverb is done by convolving the impulse response in a room that you want. A low-pass filter will basically attenuate higher frequencies by averaging neighbouring samples using something like a Gaussian kernel. A kernel is essentially the filter that you convolve the sample with.

One nice thing about convolution is that it is associative and commutative. f * g = g * f, and (f * g) * h = f * ( g * h). That way we can merge all of the kernels before we convolve the entire sample. My plan was for a voice to have a pointer to a single kernel. Whenever you add a kernel you convolve into it. The default kernel will be the dirac-delta function, ie: f * \delta = f. Basically a kernel of size 1 that multiplies each point by 1. Doing a _clear_filters() will simply restore this function.

Kernels themselves shouldn't be too long because convolution is O(n^2). It's practically linear when the kernel is much smaller than the sample. We could look into doing an FFT anyways because modern computers are fast. I've done this real-time before from a microphone input with a few millisecond delay.

Last but not least, I was hoping to include a kernel generator. Kernels will have to match the audio samples otherwise it will be garbage. I was thinking about including some basic filters for amplification, attentuation, low-pass, high-pass, band-pass, reverb, ...

All in all, i'm not sure what complexity to expose to the user. At the moment, any user has raw access to the sample->buffer. I would prefer if the user/filters didn't touch the sample->buffer, rather it belonged to a voice and the voice would process it the moment before it is played. The idea i've been switching towards during the refactoring is that a sample only contains the sample and the voice now has all the playback/state information. In other words, the voice now has a sample rather than a sample having a voice. This way, a single sample can be sent out multiple voices and now each with their own filters.

Hopefully I included everything to make it more clear but I'm sure i left some things out.

--ryan


Mail converted by MHonArc 2.6.19+ http://listengine.tuxfamily.org/