[Libav-user] questions about decoding outline and decoder state

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[Libav-user] questions about decoding outline and decoder state

Bobby Shen
Hello readers,

I am curious about some algorithmic / numerical aspects of specifically decoding (not encoding) an AC3 or AAC stream. Let's assume that all sample rates are 48000 and that all audio is mono.

Main section 1. Is my outline of the decoding process correct? Which points are wrong? Some of the points are from https://libav.org/documentation/doxygen/master/group__lavc__encdec.html.

1. When decoding an audio stream, as segment the stream into an ordered list of packets, e.g. P[0], P[1], ..., P[999]. (Assume 1000 packets.)
2. This segmentation involves syncwords in order to guard against total data corruption in case 1 byte is lost. If 1 byte is lost, then usually only 1 or 2 packets are affected.
3. For AAC, the packets have different numbers of bytes. AC3 files usually have a constant packet size.
4. In the C process, a decoding object D is initialized
5. We pass packet P[0] to the D.avcodec_send_packet() method, returning output Y[0]. This effectively passes a small binary data string of on the order of 500 bytes.
6. Since I'm assuming everything is mono audio, this method returns a 1-d array of floats. This method may the internal state of D. We then pass packet P[1], then P[2], ..., P[999]. These successive calls return Y[1], Y[2], ..., Y[999], respectively. Because of the possible state change, it is important to pass the packets in a specific order.
7. This array has length (always?) 1024 for AAC and 1536 for AC3.
8. This page claims that frames stand alone. Does that mean that packets are decoded independently?; or does this just mean that 1024-sample frames are encoded independently?; or am I just misunderstanding.
 https://wiki.multimedia.cx/index.php/Understanding_AAC
9. (less important for me) If the packet timestamps of the stream are very uniform, then we will simply concatenate all of the returned arrays Y[0], ..., Y[999] into the full array, and this is the decoded array. If the packets have nonuniform timestamps, then we still might concatenate all of the arrays, or maybe insert zero samples, depending on the other parameters of the FFmpeg call.

--

Main section 2. Let's suppose that my outline in section 1 is accurate. If not, then the rest of my message might be moot.

Let's suppose we have initial decoder object D and either the AAC or AC3 codec and packets P[0], P[1], ..., P[999]. Assuming that the decoder state matters a lot, I'd like to consider 3 orders of passing the packets to D.

Order 1
: The same order as the packets. P[0], P[1], ..., P[999]
Order 2: we remove P[0] completely.  P[1], P[2], ..., P[999]
Order 3: We replace P[0] with an arbitrary packet, P_new. (e.g. P_new = P[1], but P_new could be an arbitrary packet not in the list.) P_new, P[1], ..., P[999]

In order 1, suppose that the output arrays are Y[0], Y[1], ..., Y[999]
In order 2, since the state may matter, we can't say that the first array output is Y[1]. Instead, we use different symbols  Y2[1], Y2[2], ..., Y2[999]. (indexing from 1. This output list has 999 elements.)
In order 3, suppose that the output arrays are Y3[0], Y3[1], ..., Y3[999]. (1000 elements).

My main questions are: Is the state of D flushed fairly quickly or is the state very persistent such that any sequence 'mutation' will significantly change state, or somewhere in between? Although the lists Y1, Y2, and Y3 are clearly similar waveforms perceptually, are they completely different at a low level or do they converge.

If hypothetically the state of D is flushed after 50 packets, then would Y[n], Y2[n], Y3[n] be approximately equal length-1024 float arrays for n >= 51? Is there any such value of n? Or maybe the state of D depends on how many packets are decoded and is otherwised flushed after 50 packets? If so, is Y[n] ~ Y3[n] for n >= 51 but Y[n] != Y2[n] for any large n because the decoder processed n packets before outputting Y[n] but only n-1 packets before Y2[n]

Note that I have experimented with PyAV and I suspect that for the AC3 codec and a deletion mutation, there is no such value of n. The decoder states will always be different. I do not know about a substitution mutation or the AAC codec or if I am doing my PyAV analysis correctly. I don't know for sure and I would be obliged if a reader knows.) I have only done experimenting with PyAV snce I am not used to using C. 

Sincerely,
Bobby

_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: questions about decoding outline and decoder state

Carl Eugen Hoyos-2
Am Sa., 23. Mai 2020 um 04:27 Uhr schrieb Bobby Shen
<[hidden email]>:

> Is my outline of the decoding process correct?

You are not mentioning the parser that is nearly always needed
for AAC and AC3 decoding.

Before opus, audio decoding states were very limited.

> Which points are wrong?

Experience indicates that the type of your question rarely leads
to useful answers (here).

Carl Eugen
_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".