Catching elusive bugs
Libav contains numerous bugs. Many have already been fixed, some remain - and ocasionally, one reappears. Complex code has plenty of corner cases and many of them can lead to memory corruption and crashes, infinite loops and memory leaks. Fortunately, there are a variety of useful tools available to catch them. Consider using Libav in a sandbox.
The Libav build system provides built-in support for most of the instrumentation tools described below. Patches to support additional tools or suggestions regarding new useful tools are always welcome.
Where to start
FATE has a number of instances running instrumented to catch regressions on the normal codepaths, with a variety of compilers and platforms, giving us some level of confidence that normal decoding continues to work as expected as Libav continually evolves.
On the other hand, corner cases, non-standard samples, and corrupted samples are bound to exericise codepaths that are not checked routinely: this is where most of the remaining bugs hide.
Those tools let you run unmodified binaries though some emulation layers, or a just-in-time binary patcher, so are the fastest to get started with: you do not need to build Libav, and can have useful information after running a single command.
Valgrind is a suite of tools for checking for errors using memory. Usually memcheck and massif yield interesting results. Unfortunately, helgrind has problems tracking our non-standard threading system. Valgrind works well on Linux and Mac OS X; it does not work on Windows. See the Valgrind documentation for more information.
Memcheck is a memory error detector. It catches illegal reads/writes, use of uninitialized values, illegal frees, and some memory leaks, among other issues. See the Memcheck manual for more details.
valgrind --tool=memcheck --leak-check=full --track-origins=yes avplay yourfile.wav
Massif is a heap profiler, and can catch some leaks that mecheck cannot. See the Massif manual for more details.
valgrind --tool=massif avplay yourfile.wav
Dr. Memory is similar to memcheck feature-wise but faster and known to work on Linux, Windows, and Mac OS X. It is less mature than memcheck.
Compiler specific instrumentation
The following tools work with a specific compiler and instrument the binary produced. Using them requires building Libav, after specifying the instrumentation to use while configuring Libav.
AddressSanitizer (gcc, clang)
AddressSanitizer instruments binaries to check for faulty memory access, such as out of bound access or use-after-free bugs. You still need Memcheck and Massif to track memory leaks.
The Clang AddressSanitizer documentation claims the runtime slowdown is about a factor of 2.
The simplest way to use asan is to pass --toolchain=clang-asan or --toolchain=gcc-asan to configure.
Using AddressSanitizer with GDB
There are several ways to use AddressSanitizer with GDB.
- It is possible to call some of its internal functions to introspect pointers.
define p_a print __asan_describe_address($arg0) end
It is possible to set breaks on entry points and inspect the state using the normal commands (e.g. frame and bt)
define b_a br __asan_report_load1 br __asan_report_load2 br __asan_report_load4 br __asan_report_load8 br __asan_report_load16 br __asan_report_store1 br __asan_report_store2 br __asan_report_store4 br __asan_report_store8 br __asan_report_store16 br __asan_report_error end
Lists of breakpoints like the above can be added to your .gdbinit, then loaded by simply typing b_a in gdb.
Recent versions of asan support AsanDie covering all the reports above and few more.
MemorySantizer (gcc, clang)
MemorySanitizer is an experimental tool, which tries to closely mimic Valgrind's Memcheck. It is currently much harder to use, because it needs to have every library instrumented, including libc.
define b_m br __msan_warning br __msan_warning_noreturn end
The DrMsan helper will make it more useful.
UndefinedBehaviourSanitizer (gcc, clang)
Undefined behaviour is particularly tricky to track, since you might or might not get the expected behaviour, depending on the architecture, compiler version, optimization level, etc. Luckily, once found, this kind of issue is straightforward to fix.
The simplest way to use ubsan is to pass --toolchain=clang-usan or --toolchain=gcc-usan to configure. For historical reasons, the toolchain arguments refer to usan, not ubsan. It will not halt the execution by default as asan does. Pass -fno-sanitize-recover as --extra-cflags= to configure to have undefined behavior trigger runtime asserts that stop execution.
The undefined behavior sanitizer adds significant overhead, and makes compilation much slower.
Like the AddressSanitizer, it is possible to integrate it with gdb and break on (some) undefined code. Combined with the fact that ubsan does not abort execution by default, it is quite handy for evaluating a group of issues in a single session.
GDB breakpoints for ubsan
define b_u br __ubsan_handle_add_overflow br __ubsan_handle_mul_overflow br __ubsan_handle_negate_overflow br __ubsan_handle_builtin_unreachable br __ubsan_handle_divrem_overflow br __ubsan_handle_out_of_bounds br __ubsan_handle_float_cast_overflow br __ubsan_handle_shift_out_of_bounds br __ubsan_handle_function_type_mismatch br __ubsan_handle_sub_overflow br __ubsan_handle_load_invalid_value br __ubsan_handle_type_mismatch br __ubsan_handle_missing_return br __ubsan_handle_vla_bound_not_positive end
Using the UndefinedBehaviourSanitizer
After configuring with ubsan and building, run commands like normal; avconv is particularly useful.
./avconv -i yourfile.mp4 -f null -
In addition to the normal output, there is information printed about undefined behavior:
/path/to/libav/libavcodec/h264_mb.c:304:27: runtime error: left shift of negative value -1
The simplest type of fuzzing consists of generating a large number of samples with some bits swapped randomly. We check if they trigger uncaught issues using the rest of the tools on this page, as potentially invalid input data can trigger branches that would otherwise not be run in Libav, as well as incorrect memory access. The fuzzing project provides good tutorials to use a good number of useful tools.
zzuf is among the easier and faster to use fuzzer.
How to use zzuf
while true; SEED=$RANDOM; do for file in SAMPLES; do zzuf -M -1 -q -U 60 -s $SEED ./avconv -i "$file" -f null - || echo $SEED $file >> fuzz done done
- -M sets the max memory to use (unlimited)
- -q hides the ouput
- -U kills the process after a given time (60s) (useful for exiting out of infinite loops)
Leave this running for a while and magic will happen. When your application crashes zzuf will print the seed and ratio parameters you'll need to reproduce the crash. For example
zzuf[s=5115,r=0.004]: signal 11 (SIGSEGV)
means that the application crashed because of a segfault and by calling zzuf -s 5115 -r 0.004 you will make it crash again.
If you want to debug the application you can't use zzuff directly, but rather you can fuzz the file, dump it and feed it to avconv with your favourite debugger. Using data from the example above
zzuf -s=5115 -r=0.004 cat working_input.file > fuzzed_output.file
Note that sometimes invalid reads/writes do not cause a crash during debugging, so Valgrind might be a good alternative too.
Some errors can be detected by analyzing source code without running it. Static analisys tools are a way to find some bugs, though they suffer from false positives and cannot catch every problem.
clang static analizer
Clang offers scan-build to easily analyze projects by adding an extra phase in the normal build process.
It generates a descriptive html report. However, sometimes the amount of false positives are high enough that it is not always useful.