Decoding audio files with ffmpeg

in #programming6 years ago (edited)

In this tutorial we will put together some C code that will be able to decode audio files using ffmpeg version 3.1.

Disclaimer: This post was not originally written for Steemit. I initially wrote it for my blog page www.targodan.de but I thought, why not get some more exposure by posting it here.

From now on I will release blog posts both on my blog site and here on Steemit. Feel free to give me feedback and discuss things either here or my website.

The original blog post can be read here.

The footnotes can be found at the bottom of the page, the links to them might not work, as they are not supported by Steemit.


TL;DR

The code can be found in this GitHub gist or you can download the archive here. The code is well commented and any seasoned programmer should be able to work with it. However I still recommend that you at least read the following sections, as they explain the reasons for certain decisions.

  • handleFrame
  • getSample

The code can be used as a whole or partially as per the terms of the MIT License allows.

What is audio anyway?

Audio, i.e. anything you or I can hear, is basically just differences in air pressure. These differences form waves that are translated into electrical signals by our ear or a microphone.

So, how does a microphone turn differences in air pressure into something a computer can work with? It is actually pretty easy, at least if you know the principle of electromagnetic induction. A microphone has a membrane which will vibrate in the same frequency as the audio signal it is picking up. These vibrations are transferred to a permanent magnet, wich will move inside of a coil and thus generate an analog signal representing the audio waves.

The peak voltage of that signal will depend on the properties of the microphone in question and the amplitude of the audio signal. A signal picked up by a microphone might look like this.[^1]

sinewave.png

As you all know, we live in an analog world and that means audio signals are of course continuous, i.e. for each and every pair of points in time there still is an infinite number of points between them. However, a computer can only work with finite amounts of digital data.

In order to break this infinite amount of analog data down to something the computer can utilize we, or rather our soundcard does something called sampling. The sampling is done by reading from an ADC[^2] at a fixed frequency. Each datapoint the ADC creates is called a sample. The standard sample frequency for audio smapling on your average non-musician-pc is 44100 Hz. I will go more in depth on why 44100 Hz is a decent number in a future post on general audio compression.

Usually ADCs output unsigned data of various bit ranges, the amount of bits is called the sample depth. In micro controllers, like the atmega and attiny ranges from atmel, the most common are 10, 11 or 12 bit but I am sure there are more expensive ones for audio purposes. This data is often converted into signed integer values of 8, 16, 24 or 32 bits, or even in 32 or 64 bit floatingpoint formats.

Ok, but what does "decode audio files" mean?

Unless you are able to create disk space out of thin air you want to save your music in a compressed formats. Let's quickly see how much data you would need for your average song. Here are some estimates on common values:

  • Sample frequency: 44100 Hz
  • Sample depth: 16 bit
  • Song length: 4 Minutes

So lets calculate the space needed to save all these samples.

uncompressedAudioSize.png

Oh boy, roughly 17 MiB for your average song. I don't think anyone wants that. Which is why audio files are usually compressed. In order to compress audio data a so called codec is used which does some fancy things in order to reduce the file size without sacrificing too much sound quality.[^3]

In conclusion "decoding audio files" means extracting the individual samples from a compressed audio file.

Let's get to it already

Ok, so for the actual decoding I assume that you have reasonable experience in C and/or C++. The code we will write will compile nicely with both C and C++, whatever you prefer, although the code style is definitely C.

In this gist you can find the final code and a little build script. Feel free to download and use it within the terms of the MIT license.

How to compile the code against ffmpeg

First of all I will quickly show you how to compile the code we will be writing, so that you can experiment with it from the get go.

FFmpeg consists of several libraries that work together closely but that can be used individually. For this little project we need three of the ffmpeg libraries: libavformat libavcodec libavutil. If you want to extend our little project keep in mind that you may need to add more libraries to the compilation command. Additionally to the ffmpeg libraries we need to link against the math library, so add -lm to your linker command.

The preferred way for linking against the ffmpeg libraries is using pkg-config, a little tool which helps to find the necessarily parameters for the libraries we are using.

If your are working on a larger project with a Makefile you probably have your make process split up in compilation and linking. In that case you need two calls to pkg-config.

# For compilation
C_FLAGS += $(pkg-config --cflags libavformat libavcodec libavutil)
# For linking
LD_FLAGS += -lm $(pkg-config --libs libavformat libavcodec libavutil)

If pkg-config is not available on your system these flags may work.

# C_FLAGS does not need any additional entries.
LD_FLAGS += -lm -lavformat -lavcodec -lavutil

In our little project we only have one file called decode.c so we will use this simple line for compilation.

$ gcc -g decode.c -o decode -lm $(pkg-config --cflags --libs libavformat libavcodec libavutil)

What to include

In order for this tutorial to be complete you need to know what to include.[^5]
And here it is.

#ifdef __cplusplus
extern "C" {
#endif

    #include <libavformat/avformat.h>
    #include <libavcodec/avcodec.h>

#ifdef __cplusplus
}
#endif

#include <stdio.h>

Note that since ffmpeg is a C library you need to encapsulate the header in an extern "C" {/*...*/} when compiling with a C++ compiler. In order for it to compile both with C and C++ we need to encapsulate the encapsulation with those #ifdef pre-processor directives. They remove the extern "C" directive in the pre-processor phase if it is not being compiled by a C++ compiler.

Looking at some helpers

Before we dive deep into the ffmpeg specific code let's look at some helpers that will make our project compile nicely on both C as well as C++ compilers.

Since pure C has no boolean type let's define it, but only if we are not compiling for C++ otherwise it would result in a redefinition error.

#ifndef __cplusplus
    typedef uint8_t bool;
    #define true 1
    #define false 0
#endif

Like I said I want this code to compile nicely and I am a great fan of the C++ way of casting, because it is more verbose and prevents typos from ruining your day.[^4] In order to emulate this in C and have it compile in both C and C++ consider these fairly self explanatory macros.

#ifdef __cplusplus
    #define REINTERPRET_CAST(type, variable) reinterpret_cast<type>(variable)
    #define STATIC_CAST(type, variable) static_cast<type>(variable)
#else
    #define C_CAST(type, variable) ((type)variable)
    #define REINTERPRET_CAST(type, variable) C_CAST(type, variable)
    #define STATIC_CAST(type, variable) C_CAST(type, variable)
#endif

Another useful function will make finding errors and debugging easier.

/**
 * Print an error string describing the errorCode to stderr.
 */
int printError(const char* prefix, int errorCode) {
    if(errorCode == 0) {
        return 0;
    } else {
        const size_t bufsize = 64;
        char buf[bufsize];

        if(av_strerror(errorCode, buf, bufsize) != 0) {
            strcpy(buf, "UNKNOWN_ERROR");
        }
        fprintf(stderr, "%s (%d: %s)\n", prefix, errorCode, buf);

        return errorCode;
    }
}

I don't think this function needs further explaining. The av_strerror function just gives us a string description of the error code and returns a non-zero value on failure.

Let's also add a little flag to enable some optimization if we don't care to much about the output format.

#define RAW_OUT_ON_PLANAR true

This flag is used in the handleFrame function as described later when we get to this function.

The main program

Our little program will be reading compressed audio files and write the raw audio data out to another file. In order to test the correctness of our output we can import the data with a software like Audacity but more on that later.

In this section we will only look at a fairly high-level view. I will not yet explain or show any self written lower-level functions, that is for later.

Here is a little preface figuring out the parameters of our program.

FILE* outFile;
int main(int argc, char *argv[]) {
    if(argc != 2) {
        printf("Usage: decode <audofile>\n");
        return 1;
    }

    // Get the filename.
    char* filename = argv[1];
    // Open the outfile called "<infile>.raw".
    char* outFilename = REINTERPRET_CAST(char*, malloc(strlen(filename)+5));
    strcpy(outFilename, filename);
    strcpy(outFilename+strlen(filename), ".raw");
    outFile = fopen(outFilename, "w+");
    if(outFile == NULL) {
        fprintf(stderr, "Unable to open output file \"%s\".\n", outFilename);
    }
    free(outFilename);

    // ...
}

That's the output file sorted. Now before we do anything with ffmpeg we need to initialize it and tell it to load all codecs.

// Initialize the libavformat. This registers all muxers, demuxers and protocols.
av_register_all();

After this we can open the input file. Most audio files are so called container files. These files contain extra information like which codec was used or even multiple audio streams. In order to crack these containers open and scoop out the delicious audio goop, ffmpeg brings along the avformat library. Let's open the input file.

int err = 0;
AVFormatContext *formatCtx = NULL;
// Open the file and read the header.
if ((err = avformat_open_input(&formatCtx, filename, NULL, 0)) != 0) {
    return printError("Error opening file.", err);
}

Not all container files include information about the used codec. For that case avformat can read a bit of the file and try to determine the codec by looking at that data. Don't worry the read data is buffered and not lost. We don't need to worry about seeking to the start of the file after this.

// In case the file had no header, read some frames and find out which format and codecs are used.
// This does not consume any data. Any read packets are buffered for later use.
avformat_find_stream_info(formatCtx, NULL);

Like I said earlier container files can contain multiple streams. If we were to read a video file (.mov, .avi, mp4 etc.) we would probably find one video stream and one or more audio streams (e.g. english and german audio). Yes, ffmpeg can do more than just audio, it can also do video en-/de-/transcoding, but we are not interested in that here.

Since we are only interested in audio but I want to support as many files as possible we need to find out which of the contained streams is an audio stream. For simplicities sake we will just use the first audio stream we can find and then stop looking. You can of course ask the user, wich stream to decode if you wish.

// Try to find an audio stream.
int audioStreamIndex = findAudioStream(formatCtx);
if(audioStreamIndex == -1) {
    // No audio stream was found.
    fprintf(stderr, "None of the available %d streams are audio streams.\n", formatCtx->nb_streams);
    avformat_close_input(&formatCtx);
    return -1;
}

The function findAudioStream is explained [later] (#findaudiostream).

After selecting a stream we need to find the appropriate codec. Here the formatCtx can help us. It contains all streams, which contain the codec parameters, which in turn contain the id of the codec.

// Find the correct decoder for the codec.
AVCodec* codec = avcodec_find_decoder(formatCtx->streams[audioStreamIndex]->codecpar->codec_id);
if (codec == NULL) {
    // Decoder not found.
    fprintf(stderr, "Decoder not found. The codec is not supported.\n");
    avformat_close_input(&formatCtx);
    return -1;
}

Before we can start decoding we need to open the decoder. But before we can do that we need to allocate the needed memory and initialize the codec with the appropriate parameters.

// Initialize codec context for the decoder.
AVCodecContext* codecCtx = avcodec_alloc_context3(codec);
if (codecCtx == NULL) {
    // Something went wrong. Cleaning up...
    avformat_close_input(&formatCtx);
    fprintf(stderr, "Could not allocate a decoding context.\n");
    return -1;
}

// Fill the codecCtx with the parameters of the codec used in the read file.
if ((err = avcodec_parameters_to_context(codecCtx, formatCtx->streams[audioStreamIndex]->codecpar)) != 0) {
    // Something went wrong. Cleaning up...
    avcodec_close(codecCtx);
    avcodec_free_context(&codecCtx);
    avformat_close_input(&formatCtx);
    return printError("Error setting codec context parameters.", err);
}

Now we have an AVCodecContext that is ready to be opened. Before we do that we still have the chance to influence the decoder a litte.

// Explicitly request non planar data.
codecCtx->request_sample_fmt = av_get_alt_sample_fmt(codecCtx->sample_fmt, 0);

For optimization purposes we will explicitly request non-planar data. However not all codecs support all sample formats, thus our code will still support each and every format. More on sample formats later.

Now we can open the decoder. Upon opening the decoder it can completely ignore our request for the sample format if it doesn't support the requested format. Therefore after opening it's the right time to print some information about the stream, codec and so on.

// Initialize the decoder.
if ((err = avcodec_open2(codecCtx, codec, NULL)) != 0) {
    avcodec_close(codecCtx);
    avcodec_free_context(&codecCtx);
    avformat_close_input(&formatCtx);
    return -1;
}

// Print some intersting file information.
printStreamInformation(codec, codecCtx, audioStreamIndex);

Again printStreamInformation is explained later.

Ok, that is most of the prepwork done. When decoding, the decoder will give us an AVFrame containing the decoded data if we give him an AVPacket. The AVPackets will be given to us by our AVFormatContext.
So let's prepare those containers.

AVFrame* frame = NULL;
if ((frame = av_frame_alloc()) == NULL) {
    avcodec_close(codecCtx);
    avcodec_free_context(&codecCtx);
    avformat_close_input(&formatCtx);
    return -1;
}

// Prepare the packet.
AVPacket packet;
// Set default values.
av_init_packet(&packet);

Now that everything is prepared we can read the file until there is nothing left to read. Dont' forget to handle errors though.

while ((err = av_read_frame(formatCtx, &packet)) != AVERROR_EOF) {
    if(err != 0) {
        // Something went wrong.
        printError("Read error.", err);
        break; // Don't return, so we can clean up nicely.
    }
    // ...
}

The formatCtx reads packets from any stream it can find though, so we need to filter each packet.

// Does the packet belong to the correct stream?
if(packet.stream_index != audioStreamIndex) {
    // Free the buffers used by the frame and reset all fields.
    av_packet_unref(&packet);
    continue;
}

The remaining packets belong to the correct stream, so let's send them to the decoder. The process of sending packets and receiving frames can be nicely threaded. We are not going to do that here but in order for that to work correctly you need to check for the error code AVERROR(EAGAIN) which simply means "all buffers are full, receive some frames and try again".

// We have a valid packet => send it to the decoder.
if((err = avcodec_send_packet(codecCtx, &packet)) == 0) {
    // The packet was sent successfully. We don't need it anymore.
    // => Free the buffers used by the frame and reset all fields.
    av_packet_unref(&packet);
} else {
    // Something went wrong.
    // EAGAIN is technically no error here but if it occurs we would need to buffer
    // the packet and send it again after receiving more frames. Thus we handle it as an error here.
    printError("Send error.", err);
    break; // Don't return, so we can clean up nicely.
}

After sending we need to receive as many packets as possible because depending on the codec each packet may generate multiple frames. This is handled in the function receiveAndHandle more on that later.

// Receive and handle frames.
// EAGAIN means we need to send before receiving again. So thats not an error.
if((err = receiveAndHandle(codecCtx, frame)) != AVERROR(EAGAIN)) {
    // Not EAGAIN => Something went wrong.
    printError("Receive error.", err);
    break; // Don't return, so we can clean up nicely.
}

A codec implementation may buffer to its hearts content, so we need to drain the decoder after we've read the file. Again, more on that later.

// Drain the decoder.
drainDecoder(codecCtx, frame);

And last but not at all least we need to clean up everything we opened.

// Free all data used by the frame.
av_frame_free(&frame);

// Close the context and free all data associated to it, but not the context itself.
avcodec_close(codecCtx);

// Free the context itself.
avcodec_free_context(&codecCtx);

// We are done here. Close the input.
avformat_close_input(&formatCtx);

// Close the outfile.
fclose(outFile);

A deeper look

Now that we established the skeleton of our little program let's add some guts and get the heart pumping.

findAudioStream

Finding the audio stream is fairly easy. Just iterate over the streams and check for the stream type.

/**
 * Find the first audio stream and returns its index. If there is no audio stream returns -1.
 */
int findAudioStream(const AVFormatContext* formatCtx) {
    int audioStreamIndex = -1;
    for(size_t i = 0; i < formatCtx->nb_streams; ++i) {
        // Use the first audio stream we can find.
        // NOTE: There may be more than one, depending on the file.
        if(formatCtx->streams[i]->codecpar->codec_type == AVMEDIA_TYPE_AUDIO) {
            audioStreamIndex = i;
            break;
        }
    }
    return audioStreamIndex;
}

printStreamInformation

Let's show some interesting information about the used codec and sample frequency. First of all we'll output the name of the codec, followed by the supported sample formats, any of these can be requested before opening the codec via codecCtx->request_sample_fmt = AV_SAMPLE_FMT_<NAME>. After that the function prints some information about the selected stream, like the index but also the sample format, rate and size. The amount of channels is interesting as well (i.e. 1 = mono, 2 = stereo). And last but not least we'll tell the user if the output is going to be float-samples. If it isn't float it's going to be whatever the original sample format was.

/*
 * Print information about the input file and the used codec.
 */
void printStreamInformation(const AVCodec* codec, const AVCodecContext* codecCtx, int audioStreamIndex) {
    fprintf(stderr, "Codec: %s\n", codec->long_name);
    if(codec->sample_fmts != NULL) {
        fprintf(stderr, "Supported sample formats: ");
        for(int i = 0; codec->sample_fmts[i] != -1; ++i) {
            fprintf(stderr, "%s", av_get_sample_fmt_name(codec->sample_fmts[i]));
            if(codec->sample_fmts[i+1] != -1) {
                fprintf(stderr, ", ");
            }
        }
        fprintf(stderr, "\n");
    }
    fprintf(stderr, "---------\n");
    fprintf(stderr, "Stream:        %7d\n", audioStreamIndex);
    fprintf(stderr, "Sample Format: %7s\n", av_get_sample_fmt_name(codecCtx->sample_fmt));
    fprintf(stderr, "Sample Rate:   %7d\n", codecCtx->sample_rate);
    fprintf(stderr, "Sample Size:   %7d\n", av_get_bytes_per_sample(codecCtx->sample_fmt));
    fprintf(stderr, "Channels:      %7d\n", codecCtx->channels);
    fprintf(stderr, "Float Output:  %7s\n", !RAW_OUT_ON_PLANAR || av_sample_fmt_is_planar(codecCtx->sample_fmt) ? "yes" : "no");
}

receiveAndHandle

In here just receive frames, handle them in a different function and clean up any buffers to prevent data leaks.

/**
 * Receive as many frames as available and handle them.
 */
int receiveAndHandle(AVCodecContext* codecCtx, AVFrame* frame) {
    int err = 0;
    // Read the packets from the decoder.
    // NOTE: Each packet may generate more than one frame, depending on the codec.
    while((err = avcodec_receive_frame(codecCtx, frame)) == 0) {
        // Let's handle the frame in a function.
        handleFrame(codecCtx, frame);
        // Free any buffers and reset the fields to default values.
        av_frame_unref(frame);
    }
    return err;
}

handleFrame

This time I'll give you the code first and explain later.

/**
 * Write the frame to an output file.
 */
void handleFrame(const AVCodecContext* codecCtx, const AVFrame* frame) {
    if(av_sample_fmt_is_planar(codecCtx->sample_fmt) == 1) {
        // This means that the data of each channel is in its own buffer.
        // => frame->extended_data[i] contains data for the i-th channel.
        for(int s = 0; s < frame->nb_samples; ++s) {
            for(int c = 0; c < codecCtx->channels; ++c) {
                float sample = getSample(codecCtx, frame->extended_data[c], s);
                fwrite(&sample, sizeof(float), 1, outFile);
            }
        }
    } else {
        // This means that the data of each channel is in the same buffer.
        // => frame->extended_data[0] contains data of all channels.
        if(RAW_OUT_ON_PLANAR) {
            fwrite(frame->extended_data[0], 1, frame->linesize[0], outFile);
        } else {
            for(int s = 0; s < frame->nb_samples; ++s) {
                for(int c = 0; c < codecCtx->channels; ++c) {
                    float sample = getSample(codecCtx, frame->extended_data[0], s*codecCtx->channels+c);
                    fwrite(&sample, sizeof(float), 1, outFile);
                }
            }
        }
    }
}

A word on the used buffer

The observant reader probably saw that we use the buffer frame->extended_data and might think "Is there a frame->data as well? If so why don't we use that on instead?". Glad you ask, yes, frame->data exists and the documentation says it is limited in size. That means it may not contain the complete data, but only a part of it.

frame->extended_data however always contains the complete data. If frame->data is large enough frame->extended_data simply points to frame->data, if not it points to an extra buffer containing all data. Therefore frame->extended_data is the safe bet.

Planar or thightly packed

The function handleFrame depends heavily on the getSample function. That function takes the codecCtx, the buffer that contains the data and the index of the sample that should be read and converted. This means the handleFrame function needs to handle if the data is planar or tightly packed.

If the data is planar each channel has its own buffer and the sample index translates directly to the buffer index. If the data is tightly packed however all channels are contained in the first buffer and the channel samples alternate in the buffer.

Addressbuff[0][0]buff[0][1]buff[1][0]buff[1][1]
PlanarLeft / Sample 0Left / Sample 1Right / Sample 0Right / Sample 1
Addressbuff[0][0]buff[0][1]buff[0][2]buff[0][3]
PackedLeft / Sample 0Right / Sample 0Left / Sample 1Right / Sample 1

The translation of sample index to the buffer index of planar data is not that hard. Just think of it as a two dimensional matrix which is saved into a one dimensional array row by row.

0123
00123
14567
2891011

In order to translate (row, col) coordinates into array indices just follow this simple equation.

indexCalculation01.png

Transferring that to our audio data that would be.

indexCalculation02.png

In our code you can find that inside the second while loop.

getSample(codecCtx, frame->extended_data[0], s*codecCtx->channels+c);

Optimization

That is everything of handleFrame explained, except for the RAW_OUT_ON_PLANAR part. RAW_OUT_ON_PLANAR is a little flag that is either true or false and it exists to show you a little chance for some optimization for our purpose. Since we want to import the audio using Audacity and since Audacity likes its imported data tightly packed but supports pretty much all data types we can just copy the data from the buffer into the file without any conversion.

fwrite(frame->extended_data[0], 1, frame->linesize[0], outFile);

This line writes frame->linesize[0] bytes to the outFile. frame->linesize[0] always contains the size of the buffer frame->extended_data[i] in bytes.

getSample

There are several sample types of different sample depths as well as both integer and floating point types.

enumsizedecription
AV_SAMPLE_FMT_U81unsigned 8 bits
AV_SAMPLE_FMT_S162signed 16 bits
AV_SAMPLE_FMT_S324signed 32 bits
AV_SAMPLE_FMT_FLT4float
AV_SAMPLE_FMT_DBL8double
AV_SAMPLE_FMT_U8P1unsigned 8 bits, planar
AV_SAMPLE_FMT_S16P2signed 16 bits, planar
AV_SAMPLE_FMT_S32P4signed 32 bits, planar
AV_SAMPLE_FMT_FLTP4float, planar
AV_SAMPLE_FMT_DBLP8double, planar

Source: FFmpeg documentation

In order to support all those sample formats getSample will convert any of those formats into a float with a value between -1 and 1.

/**
 * Extract a single sample and convert to float.
 */
float getSample(const AVCodecContext* codecCtx, uint8_t* buffer, int sampleIndex) {
    int64_t val = 0;
    float ret = 0;
    int sampleSize = av_get_bytes_per_sample(codecCtx->sample_fmt);
    // ...
}

The first step to this is converting the buffer into the original type, correcting the sample index and saving the result as an int64, this can hold anything the decoder throws at us.

switch(sampleSize) {
    case 1:
        // 8bit samples are always unsigned
        val = REINTERPRET_CAST(uint8_t*, buffer)[sampleIndex];
        // make signed
        val -= 127;
        break;

    case 2:
        val = REINTERPRET_CAST(int16_t*, buffer)[sampleIndex];
        break;

    case 4:
        val = REINTERPRET_CAST(int32_t*, buffer)[sampleIndex];
        break;

    case 8:
        val = REINTERPRET_CAST(int64_t*, buffer)[sampleIndex];
        break;

    default:
        fprintf(stderr, "Invalid sample size %d.\n", sampleSize);
        return 0;
}

There are actually some more formats, e.g. 24 bit audio. Those odd depths are implemented by giving us the next bigger format and padding with 0 in the least significant bits. The AVCodecContext can tell us how many bits are actually used. If you wanted the original sample value you would need to shift it down like this.

val = (val >> (sampleSize * 8 - codecCtx->bits_per_raw_sample));

For our purposes we don't need to do that though. If we were to shift it down we would end up with a less loud signal than the original.

Now that we have the correct size we need to handle the actual format and convert it into a float from -1 to 1.

// Check which data type is in the sample.
switch(codecCtx->sample_fmt) {
    case AV_SAMPLE_FMT_U8:
    case AV_SAMPLE_FMT_S16:
    case AV_SAMPLE_FMT_S32:
    case AV_SAMPLE_FMT_U8P:
    case AV_SAMPLE_FMT_S16P:
    case AV_SAMPLE_FMT_S32P:
        // integer => Scale to [-1, 1] and convert to float.
        ret = val / STATIC_CAST(float, ((1 << (sampleSize*8-1))-1));
        break;

    case AV_SAMPLE_FMT_FLT:
    case AV_SAMPLE_FMT_FLTP:
        // float => reinterpret
        ret = *REINTERPRET_CAST(float*, &val);
        break;

    case AV_SAMPLE_FMT_DBL:
    case AV_SAMPLE_FMT_DBLP:
        // double => reinterpret and then static cast down
        ret = STATIC_CAST(float, *REINTERPRET_CAST(double*, &val));
        break;

    default:
        fprintf(stderr, "Invalid sample format %s.\n", av_get_sample_fmt_name(codecCtx->sample_fmt));
        return 0;
}

return ret;

I think both float and double are pretty obvious, but let's look a bit more into the math of the integer conversion. Here is an equivalent but more split up version of that line, to help with the explaining.

First we calculate the amount of bits of the signed sample. Keep in mind that we have one bit less than usual because it is a signed value.

int64_t numSignedBits = sampleSize*8 - 1;

Then can calculate the maximum value of the sample format in question.

int64_t signedMaxValue = (1 << (numSignedBits)) - 1;

Now we devide by the maximum in order to generate a value in between -1 and 1 (inclusive).

ret = val / STATIC_CAST(float, signedMaxValue);

drainDecoder

In order to drain the decoder we simply send NULL as packet to the decoder, which activates drain mode. Now the decoder knows it mustn't buffer any packets any more and we just handle any frames it will give us just as above. Note however that now we can not only get an AVERROR(EAGAIN) but also an AVERROR_EOF, both of which are not errors in this case.

/*
 * Drain any buffered frames.
 */
void drainDecoder(AVCodecContext* codecCtx, AVFrame* frame) {
    int err = 0;
    // Some codecs may buffer frames. Sending NULL activates drain-mode.
    if((err = avcodec_send_packet(codecCtx, NULL)) == 0) {
        // Read the remaining packets from the decoder.
        err = receiveAndHandle(codecCtx, frame);
        if(err != AVERROR(EAGAIN) && err != AVERROR_EOF) {
            // Neither EAGAIN nor EOF => Something went wrong.
            printError("Receive error.", err);
        }
    } else {
        // Something went wrong.
        printError("Send error.", err);
    }
}

Importing the decoded data with Audacity

Our little program is done. Now I will quickly show you how to import the decoded data with Audacity.

First of all open the import dialog via the menu "File" -> "Import" -> "Raw Data...".

audacity_01.png

After selecting the file we need to specify the format. The needed parameters can be read from the output of our printStreamInformation function.

  • If Float Output: yes can be seen, you always need to select "32-bit float". If not, it depends on the output of Sample Format, e.g. if it says s16 or s16p you need to select "Signed 16-bit PCM".
  • The byte order depends on the byte order of your local machine, for x86 processors it is "Little-endian".
  • The amount of channels is printed as well. Channels: #
  • The "Start offset:" is always "0"
  • "Amount to import:" is always "100 %"
  • "Sample rate:" can be seen printed at Sample Rate: ###

audacity_02.png

Now just click the "Import" button and you are good to go.

audacity_03.png

Conclusion and Up Next

I hope this made the ffmpeg library a little more accessible for you, at least the audio part. If you have any questions left or if you have any suggestions or critique, feel free to leave a comment below.

If you liked this post and want to support my blog, please please please spread the word, tell your friends and post links to my blog.

Up Next is a little post about audio compression in general. In that post I'll try to answer questions like "Why 44100 Hz?", "What influences the audio quality?" and so on. Let me know if you are interested in the electronics involved with audio recording and playback too.


  1. The actual voltages are not necessarily accurate for any real world microphone but you get the idea.

  2. At this point I expect that your are familiar with ADCs (Analog-to-Digital Converters).

  3. Again, this will be covered more in depth in a future post.

  4. Imagine you want to do a reinterpret cast but forget the *. That will most likely take hours before you see that it is missing and ruin your day.

  5. I hate it when people give you code snippets without telling you which header files to include.

Coin Marketplace

STEEM 0.28
TRX 0.12
JST 0.032
BTC 65821.61
ETH 3008.95
USDT 1.00
SBD 3.74