|Deletions are marked like this.||Additions are marked like this.|
|Line 15:||Line 15:|
|Corner cases, non-standard samples, corrupted samples on the other hand are bound to exericise codepaths that aren't not checked routinely: this is where most of the remaining bugs hide.||Corner cases, non-standard samples, corrupted samples on the other hand are bound to exericise codepaths that are not checked routinely: this is where most of the remaining bugs hide.|
Catching elusive bugs
Libav contains bugs, many have already been fixed, some remain and few might appear again. Complex code has plenty of corner cases and many of them can lead to memory corruption and crashes, infinite loops and memory leaks. Fortunately, there are a variety of useful tools available to catch them. Consider using Libav in a sandbox.
The Libav build system provides built-in support for most of the instrumentation tools described below. Patches to support additional tools or suggestions regarding new useful tools are always welcome.
Where to start
FATE has a number of instances running instrumented to catch regressions on the normal codepaths, giving us some level of confidence normal decoding would work as expected.
Corner cases, non-standard samples, corrupted samples on the other hand are bound to exericise codepaths that are not checked routinely: this is where most of the remaining bugs hide.
Those tools let you run unmodified binaries though some emulation layers or just-in-time binary patcher.
Valgrind is a suite of tools for checking for errors using memory. Usually memcheck and massif provide good and precise results. Unfortunately, helgrind has problems tracking our non-standard threading system. Valgrind works well on Linux and Mac OS X. See the Valgrind documentation for more information.
Memcheck is a memory error detector, and catches illegal reads/writes, use of uninitialized values, illegal frees, and some memory leaks, among other issues. See the Memcheck manual for more details.
valgrind --tool=memcheck avplay yourfile.wav
Massif is a heap profiler, and can catch some leaks that mecheck cannot. See http://valgrind.org/docs/manual/ms-manual.html for details.
valgrind --tool=massif avplay yourfile.wav
Dr. Memory is similar to memcheck feature-wise but faster and known to work on Linux, Windows, and Mac OS X. It is less mature than memcheck.
Compiler specific instrumentation
The following tools work with a specific compiler and instrument the binary produced.
AddressSanitizer (gcc, clang)
AddressSanitizer is an instrumentation to check faulty memory access. It is somewhat like a faster Memcheck, without the memory bookkeeping. You still need Memcheck and Massif to track memory leaks.
The overhead is bearable in execution and negligible on compilation.
The simplest way to use asan is to pass --toolchain=clang-asan or --toolchain=gcc-asan to configure.
Using AddressSanitizer with GDB
It integrates nicely with gdb:
- It is possible to call some of its internal functions to introspect pointers.
define p_a print __asan_describe_address($arg0) end
It is possible to set breaks on entry points and inspect the state using the normal commands (e.g. frame and bt)
define b_a br __asan_report_load1 br __asan_report_load2 br __asan_report_load4 br __asan_report_load8 br __asan_report_load16 br __asan_report_store1 br __asan_report_store2 br __asan_report_store4 br __asan_report_store8 br __asan_report_store16 end
MemorySantizer (gcc, clang)
MemorySanitizer is an experimental tool, which tries to closely mimic Valgrind's Memcheck. It is currently much harder to use, because it needs to have every library instrumented, including libc.
UndefinedBehaviourSanitizer (gcc, clang)
Undefined behaviour are a particularly tricky fault to track, since you might or might not get the expected behaviour, depending on the architecture, compiler version, optimization level, etc. Luckily, once found, this kind of issues are the easiest to fix.
The simplest way to use ubsan is to pass --toolchain=clang-usan or --toolchain=gcc-usan to configure. For historical reasons, the toolchain arguments refer to usan, not ubsan. It will not halt the execution by default as asan does. Pass -fno-sanitize-recover as --extra-cflags= to have runtime asserts that stop execution.
This compiler instrumentation makes quite easy spot them as well.
The undefined behavior sanitizer adds significant overhead, and makes compilation much slower.
Like the AddressSanitizer, it is possible to integrate it with gdb and break on (some) undefined code. Combined with the fact that ubsan does not abort execution by default, it comes handy when evaluating a group of issues in a single session.
GDB breakpoints for ubsan
define b_u br __ubsan_handle_add_overflow br __ubsan_handle_mul_overflow br __ubsan_handle_negate_overflow br __ubsan_handle_builtin_unreachable br __ubsan_handle_divrem_overflow br __ubsan_handle_out_of_bounds br __ubsan_handle_float_cast_overflow br __ubsan_handle_shift_out_of_bounds br __ubsan_handle_function_type_mismatch br __ubsan_handle_sub_overflow br __ubsan_handle_load_invalid_value br __ubsan_handle_type_mismatch br __ubsan_handle_missing_return br __ubsan_handle_vla_bound_not_positive end
Using the UndefinedBehaviourSanitizer
After configuring with ubsan and building, run commands like normal; avconv is particularly useful.
./avconv -i yourfile.mp4 -f null -
In addition to the normal output, there is information printed about undefined behavior:
/path/to/libav/libavcodec/h264_mb.c:304:27: runtime error: left shift of negative value -1
The simplest type of fuzzing consists of generating a large number of samples with some bits swapped randomly. We check if they trigger uncaught issues using the rest of the tools on this page, as potentially invalid input data can trigger branches that would otherwise not be run in Libav, as well as incorrect memory access.
zzuf is among the easier and faster to use fuzzer.
How to use zzuf
while true; SEED=$RANDOM; do for file in SAMPLES; do zzuf -M -1 -q -U 60 -s $SEED ./avconv -i "$file" -f null - || echo $SEED $file >> fuzz done done
- -M sets the max memory to use (1MB)
- -q hides the ouput
- -U kills the process after a given time (60s) (useful for exiting out of infinite loops)
Leave this running for a while and magic will happen. When your application crashes zzuf will print the seed and ratio parameters you'll need to reproduce the crash. For example
zzuf[s=5115,r=0.004]: signal 11 (SIGSEGV)
means that the application crashed because of a segfault and by calling zzuf -s 5115 -r 0.004 you will make it crash again.
If you want to debug the application you can't use zzuff directly, but rather you can fuzz the file, dump it and feed it to avconv with your favourite debugger. Using data from the example above
zzuf -s=5115 -r=0.004 cat working_input.file > fuzzed_output.file
Note that sometimes invalid reads/writes do not cause a crash during debugging, so Valgrind might be a good alternative too.
Some errors can be detected by analyzing source code without running it. Static analisys tools are a way to find some bugs, though they suffer from false positives and cannot catch every problem.
clang static analizer
Clang offers scan-build to easily analyze projects by adding an extra phase in the normal build process.
It generates a descriptive html report. However, sometimes the amount of false positives are high enough that it is not always useful.