Catching elusive bugs

Libav contains numerous bugs. Many have already been fixed, some remain - and ocasionally, one reappears. Complex code has plenty of corner cases and many of them can lead to memory corruption and crashes, infinite loops and memory leaks. Fortunately, there are a variety of useful tools available to catch them. Consider using Libav in a sandbox.

The Libav build system provides built-in support for most of the instrumentation tools described below. Patches to support additional tools or suggestions regarding new useful tools are always welcome.

Where to start

FATE has a number of instances running instrumented to catch regressions on the normal codepaths, with a variety of compilers and platforms, giving us some level of confidence that normal decoding continues to work as expected as Libav continually evolves.

On the other hand, corner cases, non-standard samples, and corrupted samples are bound to exericise codepaths that are not checked routinely: this is where most of the remaining bugs hide.

Dynamic Instrumentation

Those tools let you run unmodified binaries though some emulation layers, or a just-in-time binary patcher, so are the fastest to get started with: you do not need to build Libav, and can have useful information after running a single command.

Valgrind

Valgrind is a suite of tools for checking for errors using memory. Usually memcheck and massif yield interesting results. Unfortunately, helgrind has problems tracking our non-standard threading system. Valgrind works well on Linux and Mac OS X; it does not work on Windows. See the Valgrind documentation for more information.

Memcheck

Memcheck is a memory error detector. It catches illegal reads/writes, use of uninitialized values, illegal frees, and some memory leaks, among other issues. See the Memcheck manual for more details.

Using Memcheck

valgrind --tool=memcheck --leak-check=full  --track-origins=yes avplay yourfile.wav

Massif

Massif is a heap profiler, and can catch some leaks that mecheck cannot. See the Massif manual for more details.

Using Massif

valgrind --tool=massif avplay yourfile.wav

Dr. Memory

Dr. Memory is similar to memcheck feature-wise but faster and known to work on Linux, Windows, and Mac OS X. It is less mature than memcheck.

Compiler specific instrumentation

The following tools work with a specific compiler and instrument the binary produced. Using them requires building Libav, after specifying the instrumentation to use while configuring Libav.

AddressSanitizer (gcc, clang)

AddressSanitizer instruments binaries to check for faulty memory access, such as out of bound access or use-after-free bugs. You still need Memcheck and Massif to track memory leaks.

The Clang AddressSanitizer documentation claims the runtime slowdown is about a factor of 2.

The simplest way to use asan is to pass --toolchain=clang-asan or --toolchain=gcc-asan to configure.

Using AddressSanitizer with GDB

There are several ways to use AddressSanitizer with GDB.

  • It is possible to call some of its internal functions to introspect pointers.

define p_a
    print __asan_describe_address($arg0)
end
  • It is possible to set breaks on entry points and inspect the state using the normal commands (e.g. frame and bt)

define b_a
    br __asan_report_load1
    br __asan_report_load2
    br __asan_report_load4
    br __asan_report_load8
    br __asan_report_load16
    br __asan_report_store1
    br __asan_report_store2
    br __asan_report_store4
    br __asan_report_store8
    br __asan_report_store16
    br __asan_report_error
end

Lists of breakpoints like the above can be added to your .gdbinit, then loaded by simply typing b_a in gdb.

Recent versions of asan support AsanDie covering all the reports above and few more.

MemorySantizer (gcc, clang)

MemorySanitizer is an experimental tool, which tries to closely mimic Valgrind's Memcheck. It is currently much harder to use, because it needs to have every library instrumented, including libc.

define b_m
    br __msan_warning
    br __msan_warning_noreturn
end

The DrMsan helper will make it more useful.

UndefinedBehaviourSanitizer (gcc, clang)

Undefined behaviour is particularly tricky to track, since you might or might not get the expected behaviour, depending on the architecture, compiler version, optimization level, etc. Luckily, once found, this kind of issue is straightforward to fix.

The simplest way to use ubsan is to pass --toolchain=clang-usan or --toolchain=gcc-usan to configure. For historical reasons, the toolchain arguments refer to usan, not ubsan. It will not halt the execution by default as asan does. Pass -fno-sanitize-recover as --extra-cflags= to configure to have undefined behavior trigger runtime asserts that stop execution.

The undefined behavior sanitizer adds significant overhead, and makes compilation much slower.

Like the AddressSanitizer, it is possible to integrate it with gdb and break on (some) undefined code. Combined with the fact that ubsan does not abort execution by default, it is quite handy for evaluating a group of issues in a single session.

GDB breakpoints for ubsan

  • Breakpoints

define b_u
    br __ubsan_handle_add_overflow
    br __ubsan_handle_mul_overflow
    br __ubsan_handle_negate_overflow

    br __ubsan_handle_builtin_unreachable
    br __ubsan_handle_divrem_overflow
    br __ubsan_handle_out_of_bounds

    br __ubsan_handle_float_cast_overflow
    br __ubsan_handle_shift_out_of_bounds

    br __ubsan_handle_function_type_mismatch
    br __ubsan_handle_sub_overflow

    br __ubsan_handle_load_invalid_value
    br __ubsan_handle_type_mismatch

    br __ubsan_handle_missing_return
    br __ubsan_handle_vla_bound_not_positive
end

Using the UndefinedBehaviourSanitizer

After configuring with ubsan and building, run commands like normal; avconv is particularly useful.

./avconv -i yourfile.mp4 -f null -

In addition to the normal output, there is information printed about undefined behavior:

/path/to/libav/libavcodec/h264_mb.c:304:27: runtime error: left shift of negative value -1

Fuzzing

The simplest type of fuzzing consists of generating a large number of samples with some bits swapped randomly. We check if they trigger uncaught issues using the rest of the tools on this page, as potentially invalid input data can trigger branches that would otherwise not be run in Libav, as well as incorrect memory access. The fuzzing project provides good tutorials to use a good number of useful tools.

zzuf

zzuf is among the easier and faster to use fuzzer.

How to use zzuf

while true; SEED=$RANDOM; do
    for file in SAMPLES; do
        zzuf -M -1 -q -U 60 -s $SEED ./avconv -i "$file" -f null - || echo $SEED $file >> fuzz
    done
done
  • -M sets the max memory to use (unlimited)
  • -q hides the ouput
  • -U kills the process after a given time (60s) (useful for exiting out of infinite loops)

Leave this running for a while and magic will happen. When your application crashes zzuf will print the seed and ratio parameters you'll need to reproduce the crash. For example

zzuf[s=5115,r=0.004]: signal 11 (SIGSEGV)

means that the application crashed because of a segfault and by calling zzuf -s 5115 -r 0.004 you will make it crash again.

If you want to debug the application you can't use zzuff directly, but rather you can fuzz the file, dump it and feed it to avconv with your favourite debugger. Using data from the example above

zzuf -s=5115 -r=0.004 cat working_input.file > fuzzed_output.file

Note that sometimes invalid reads/writes do not cause a crash during debugging, so Valgrind might be a good alternative too.

Static analysis

Some errors can be detected by analyzing source code without running it. Static analisys tools are a way to find some bugs, though they suffer from false positives and cannot catch every problem.

clang static analizer

Clang offers scan-build to easily analyze projects by adding an extra phase in the normal build process.

It generates a descriptive html report. However, sometimes the amount of false positives are high enough that it is not always useful.


CategoryWIP CategoryDebug CategorySecurity