Subtitles are (rich) text or pixmap data with spatial and timing information. A subtitle should appear within the video frame at defined coordinates at a specific time, stay overlaid for a certain amount of time and then disappear.

They are overlaid over a video frame to deliver informations such as textual rendering of the current voice dialog, lirics (as in karaoke) or unrelated commentaries.

Current state

Libav uses AVSubtitles to provide a list of rectangles all sharing the same timing, rectangles can contain plain text, rich text and pixmaps.


Proposed evolutions

Things that need to be decided:

So here it is, I'd like to move on with the API. Currently we use
AVSubtitle (and its AVSubtitleRects) which is defined in lavc. We wanted
at some point to move it to libavutil, eventually with slight changes.
OTOH, the idea of using the AVFrame was disregarded quickly for various
reasons. The thing is, it might have various benefits:

 - since it doesn't introduce a new struct and its own semantic, it will
   ease a lot its integration with for instance libavfilter, and probably
   allow various code refactoring

 - AVFrames have a ref counting system: if audio and video are ref
   counted, we expect the subtitles to be as well, otherwise it can become
   a source of pain for users

 - using AVFrame also suggest the idea that we could embed subtitles data
   within existing AVFrame: think closed captioning. "Freelance" subtitles
   OTOH would just use a empty/zero frame shell. Note that this conflicts
   with the ref counting idea since it can't share the data buffers.