Simply use two packet_buffers that are double-linked lists of AVPacket structs. This way we can control which packets to feed to the decoders at the right time. This solves the playback problem with the MP4 files.