[Libav-user] Resample frame to specified number of samples

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[Libav-user] Resample frame to specified number of samples

Kerry Loux
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user
Reply | Threaded
Open this post in threaded view
|

Re: Resample frame to specified number of samples

Anton Shekhovtsov


2017-07-19 20:54 GMT+03:00 Kerry Loux <[hidden email]>:
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user


Try to study examples (resampling_audio, transcoding_audio, don't remember which is most relevant).
You are not supposed to resample individual frames. You must feed it continuously. AFAIK this is clearly explained in swr docs.
AAC wants packets of fixed size (1024).


_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user
Reply | Threaded
Open this post in threaded view
|

Re: Resample frame to specified number of samples

Kerry Loux

On Thu, Jul 20, 2017 at 1:19 PM, Anton Shekhovtsov <[hidden email]> wrote:


2017-07-19 20:54 GMT+03:00 Kerry Loux <[hidden email]>:
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user


Try to study examples (resampling_audio, transcoding_audio, don't remember which is most relevant).
You are not supposed to resample individual frames. You must feed it continuously. AFAIK this is clearly explained in swr docs.
AAC wants packets of fixed size (1024).


_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user

 
Yes, I am feeding it continuously.  I am doing this:

AVPacket* ADTSEncoderInterface::EncodeAudio(const AVFrame& inputFrame)
{
if (avcodec_send_frame(encoderContext, &inputFrame) != 0)
return nullptr;

AVPacket* lastOutputPacket, *nextOutputPacket(nullptr);
bool nextPacketIsA(true);
int returnCode;
do
{
lastOutputPacket = nextOutputPacket;
nextPacketIsA = !nextPacketIsA;
if (nextPacketIsA)
nextOutputPacket = &outputPacketA;
else
nextOutputPacket = &outputPacketB;

returnCode = avcodec_receive_packet(encoderContext, nextOutputPacket);
} while (returnCode == 0);

if (returnCode != AVERROR(EAGAIN) || !lastOutputPacket)
return nullptr;

return lastOutputPacket;
}

I assumed (possibly incorrectly) that if AAC requires packets containing 1024 samples, that I would get AVERROR(EAGAIN) returned from avcodec_receive_packet() if there were not enough input samples available.  It seems that this is not the case, however, instead I need to do something myself in order to ensure the encoder has at least 1024 samples before I call avcodec_receive_packet().

I haven't found anything in the documentation to suggest that it is the callers responsibility to do this.  Maybe this wouldn't be found in FFmpeg docs, but in documentation describing the AAC format?  If that were the case, it may have been helpful if the call to avcodec_send_frame() failed with some kind of "wrong number of input samples" error.

I did find a solution, although it seems rather inefficient.  I introduced an additional AVFrame object, fullSizeFrame, and prior to calling the encoder (my EncodeAudio method pasted above), I do this:

while (fullSizeFrame->nb_samples < packetSampleCount)// packetSampleCount == 1024
{
assert(!dataQueue.empty());
nextFrame = dataQueue.front();
if (!nextFrame)
continue;

const int samplesToCopy(std::min(packetSampleCount - fullSizeFrame->nb_samples, nextFrame->nb_samples));
memcpy(fullSizeFrame->data[0] + fullSizeFrame->nb_samples * sampleSize, nextFrame->data[0], samplesToCopy * sampleSize);
fullSizeFrame->nb_samples += samplesToCopy;
pendingSamples -= samplesToCopy;

if (samplesToCopy == nextFrame->nb_samples)
{
dataQueue.pop();
av_frame_free(&nextFrame);
}
else
{
memmove(nextFrame->data[0], nextFrame->data[0] + samplesToCopy * sampleSize, (nextFrame->nb_samples - samplesToCopy) * sampleSize);
nextFrame->nb_samples -= samplesToCopy;
}
}

Thanks for your help.

-Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user
Reply | Threaded
Open this post in threaded view
|

Re: Resample frame to specified number of samples

Anton Shekhovtsov


2017-07-20 22:48 GMT+03:00 Kerry Loux <[hidden email]>:

On Thu, Jul 20, 2017 at 1:19 PM, Anton Shekhovtsov <[hidden email]> wrote:


2017-07-19 20:54 GMT+03:00 Kerry Loux <[hidden email]>:
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user


Try to study examples (resampling_audio, transcoding_audio, don't remember which is most relevant).
You are not supposed to resample individual frames. You must feed it continuously. AFAIK this is clearly explained in swr docs.
AAC wants packets of fixed size (1024).


_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user

 
Yes, I am feeding it continuously.  I am doing this:

AVPacket* ADTSEncoderInterface::EncodeAudio(const AVFrame& inputFrame)
{
if (avcodec_send_frame(encoderContext, &inputFrame) != 0)
return nullptr;

AVPacket* lastOutputPacket, *nextOutputPacket(nullptr);
bool nextPacketIsA(true);
int returnCode;
do
{
lastOutputPacket = nextOutputPacket;
nextPacketIsA = !nextPacketIsA;
if (nextPacketIsA)
nextOutputPacket = &outputPacketA;
else
nextOutputPacket = &outputPacketB;

returnCode = avcodec_receive_packet(encoderContext, nextOutputPacket);
} while (returnCode == 0);

if (returnCode != AVERROR(EAGAIN) || !lastOutputPacket)
return nullptr;

return lastOutputPacket;
}

I assumed (possibly incorrectly) that if AAC requires packets containing 1024 samples, that I would get AVERROR(EAGAIN) returned from avcodec_receive_packet() if there were not enough input samples available.  It seems that this is not the case, however, instead I need to do something myself in order to ensure the encoder has at least 1024 samples before I call avcodec_receive_packet().

I haven't found anything in the documentation to suggest that it is the callers responsibility to do this.  Maybe this wouldn't be found in FFmpeg docs, but in documentation describing the AAC format?  If that were the case, it may have been helpful if the call to avcodec_send_frame() failed with some kind of "wrong number of input samples" error.

I did find a solution, although it seems rather inefficient.  I introduced an additional AVFrame object, fullSizeFrame, and prior to calling the encoder (my EncodeAudio method pasted above), I do this:

while (fullSizeFrame->nb_samples < packetSampleCount)// packetSampleCount == 1024
{
assert(!dataQueue.empty());
nextFrame = dataQueue.front();
if (!nextFrame)
continue;

const int samplesToCopy(std::min(packetSampleCount - fullSizeFrame->nb_samples, nextFrame->nb_samples));
memcpy(fullSizeFrame->data[0] + fullSizeFrame->nb_samples * sampleSize, nextFrame->data[0], samplesToCopy * sampleSize);
fullSizeFrame->nb_samples += samplesToCopy;
pendingSamples -= samplesToCopy;

if (samplesToCopy == nextFrame->nb_samples)
{
dataQueue.pop();
av_frame_free(&nextFrame);
}
else
{
memmove(nextFrame->data[0], nextFrame->data[0] + samplesToCopy * sampleSize, (nextFrame->nb_samples - samplesToCopy) * sampleSize);
nextFrame->nb_samples -= samplesToCopy;
}
}

Thanks for your help.

-Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user


I am not ffmpeg expert by any means but I was able to figure these details somehow
Look at encode_audio.c
...
    frame->nb_samples     = c->frame_size;
...
this should give some idea. frame_size is indeed 1024 for AAC.

My comment about "feed it continuously" was about calling swr_convert.

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user
Reply | Threaded
Open this post in threaded view
|

Re: Resample frame to specified number of samples

Andy Shaules
In reply to this post by Kerry Loux


On Jul 19, 2017 10:59 AM, "Kerry Loux" <[hidden email]> wrote:
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user



2 things perhaps. what sample rate does your sdp advertize? what scale are your audio rtp timstamps? 

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user
Reply | Threaded
Open this post in threaded view
|

Re: Resample frame to specified number of samples

Kerry Loux


On Fri, Jul 21, 2017 at 11:38 PM, Andy Shaules <[hidden email]> wrote:


On Jul 19, 2017 10:59 AM, "Kerry Loux" <[hidden email]> wrote:
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user



2 things perhaps. what sample rate does your sdp advertize? what scale are your audio rtp timstamps? 

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user

SDP correctly advertises 16000 Hz and my timestamps are scaled to microseconds.

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user
Reply | Threaded
Open this post in threaded view
|

Re: Resample frame to specified number of samples

Andy Shaules


On Jul 24, 2017 5:44 AM, "Kerry Loux" <[hidden email]> wrote:


On Fri, Jul 21, 2017 at 11:38 PM, Andy Shaules <[hidden email]> wrote:


On Jul 19, 2017 10:59 AM, "Kerry Loux" <[hidden email]> wrote:
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user



2 things perhaps. what sample rate does your sdp advertize? what scale are your audio rtp timstamps? 

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user

SDP correctly advertises 16000 Hz and my timestamps are scaled to microseconds.

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user

then yes, most likely needing to concatenate frames to the currect duration for the sample rate

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user
Reply | Threaded
Open this post in threaded view
|

Re: Resample frame to specified number of samples

salsaman
Yes, you need to buffer sufficient audio frames to feed to the encoder.

Calculate the number of in samples:

    /* compute src number of samples */
    in_nb_samples = av_rescale_rnd(swr_get_delay(swr_ctx, c->sample_rate) + out_nb_samples,
                    in_sample_rate, c->sample_rate, AV_ROUND_DOWN);

then allocate buffers to concatenate the in samples until you have enough to pass to swr_ctx.


Gabriel.




On Tue, Jul 25, 2017 at 10:37 PM, Andy Shaules <[hidden email]> wrote:


On Jul 24, 2017 5:44 AM, "Kerry Loux" <[hidden email]> wrote:


On Fri, Jul 21, 2017 at 11:38 PM, Andy Shaules <[hidden email]> wrote:


On Jul 19, 2017 10:59 AM, "Kerry Loux" <[hidden email]> wrote:
Hello all,

I have an application where I am opening an audio file that was sampled at 44100 Hz, decoding it, resampling to 16000 Hz, encoding it again (AAC) then broadcasting it on an RTSP stream.  On the receiving end, I decode the incoming AAC packets and render them.

The rendered audio is very slow.

It appears to me that the problem is related to the AVFrame.nb_samples field.  When I read a packet from file (using av_read_frame()), the packet size is 1024 samples (at 44100 Hz).  After I resample to 16000 Hz, I have ~1/3 the samples that I had in the original frame (as expected).  Then, the frame gets encoded, streamed and decoded.  After decoding, the AVFrame.nb_samples is 1024 when I expect it to be 372 or so.  The AVCodecContext passed to avcodec_receive_frame() has frame_size = 1024, so I assume that the decoder is setting the number of samples of the decoded frame to 1024 regardless of the number of samples actually contained in the input packet?  Or maybe it's my job to ensure that the input packets always contain 1024 samples?

I'm not entirely sure what's going on.  My thoughts include:
- Try buffering 3x number of input frames prior to resampling so the resulting frame will be ~1024 samples
- Calculate the number of samples manually (how to do this is unclear) and override the number of samples assigned by the decoder (this seems wrong...)

Any recommendations?  Can I just stick multiple frames together in a larger buffer prior to resampling (i.e. calling swr_convert())?

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user



2 things perhaps. what sample rate does your sdp advertize? what scale are your audio rtp timstamps? 

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user

SDP correctly advertises 16000 Hz and my timestamps are scaled to microseconds.

Thanks,

Kerry

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user

then yes, most likely needing to concatenate frames to the currect duration for the sample rate

_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user



_______________________________________________
Libav-user mailing list
[hidden email]
http://ffmpeg.org/mailman/listinfo/libav-user