Simply use two packet_buffers that are double-linked lists of
AVPacket structs. This way we can control which packets to feed
to the decoders at the right time.
This solves the playback problem with the MP4 files.
The video fifo can be removed, since we have a ring buffer in it's
place. This removes unneeded copy operations and as a positive side
improves overall decoding speed.
Makes 8k60p SW and 4k60p HW decoding possible on my system.
For now the ring buffer is 32 images deep. This limitation will
be removed, once audio and video decoder have their own
packet handling.