[Libav-user] investigating runtime demux performance

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[Libav-user] investigating runtime demux performance

Blake Senftner

I have an ffmpeg / libav based video playback library which is tuned for computer vision use:

it discards any non-video packets, and attempts to deliver video as quickly as possible while only consuming 1 thread per video stream.

 

For USB and IP streams, video frames are delivered as soon as they are received, and media files playback as fast as the disk can deliver frames.

 

Other than discarding non-video packets and playing as fast as possible, my player library supports monitoring USB & IP stream dropping – typically someone tripping over or disconnecting the wire.

For the added support of my library’s stream monitoring, I need to recompile ffmpeg.

 

Between ffmpeg versions 3.2.2 and 4.2.2 I am seeing a fairly significant difference in playback performance.

The new version is noticeably slower.

Case in point, decompressing 3000 frames ASAP without any display:

Version 3.2.2: 0.002319 seconds per frame (measured using boost high_resolution_clock::time_point)

Version 4.2.2: 0.004250 seconds per frame

 

In a computer vision context, this increase in processing for version 4.2.2 triggers a need for significantly faster hardware to maintain the same computer vision application performance.

So I am investigating to learn if it is possible to get better performance out of version 4.2.2.  

 

Between the two ffmpeg versions, my player library requires slightly different setup to play a stream, but the playback code handling receiving of packets, conversion to video frames, delivery to the player library, and final presentation is unmodified.

 

Other than the change of getting libav from ffmpeg 3.2.2 to getting libav from ffmpeg version 4.2.2, I have these differences between my video player library versions:

  • The 3.2.2 version was created by cross compiling with the Zeranoe build script (back when 3.2.2 was “new”) after I added the USB/IP stream dropping support to the source code
  • The 3.2.2 version is used as dlls, while the 4.2.2 libav libraries are static builds, requiring no libav dlls by the final executable.
  • The 4.2.2 version is created by native compiling using Visual Studio’s build tools via a Msys2 shell launched from a “VS2015 x64 Native Tools Command Prompt” and this confiture line:
    • ./configure --prefix="./ffmpeg_build"  --toolchain=msvc  --arch=x86_64 --target-os=win64 --extra-cflags=-MT --extra-cxxflags=-MT --optflags=-O2 --enable-x86asm --enable-asm --enable-static --disable-shared --disable-debug --enable-gpl --disable-w32threads
    • The “--optflags=-O2" Is an experiment to see if I get any faster demux processing – it results in slightly different size for a few of the libraries. But no noticeable difference other than that.

 

Does anyone have any ideas why demuxing frames with the new version requires so much more CPU?

Is anyone seeing better performance from the 4.2 series than the 3.2 series?

Any ideas on this topic?

 

Blake Senftner

Sr. Software Scientist | CyberExtruder

1401 Valley Road, Wayne, New Jersey 07470

cel: 213 400 6424 (pacific daylight savings timezone)

[hidden email]

 


_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: investigating runtime demux performance

Carl Eugen Hoyos-2
Am Mo., 11. Mai 2020 um 20:15 Uhr schrieb <[hidden email]>:

> Between ffmpeg versions 3.2.2 and 4.2.2 I am seeing a fairly
> significant difference in playback performance.
>
> The new version is noticeably slower.
>
> Case in point, decompressing 3000 frames ASAP without any display:
>
> Version 3.2.2: 0.002319 seconds per frame (measured using boost
> high_resolution_clock::time_point)
>
> Version 4.2.2: 0.004250 seconds per frame

Please remember that only current FFmpeg git head is supported,
both here and on the bug tracker.
Can you reproduce with ffmpeg, the command line application?

Please understand that you cannot report performance regressions
for builds using "optflags", feel free to run a bisect if the issue is
not reproducible with ffmpeg and if it is reproducible without using
"optflags" (and with using the same toolchain and either static or
dynamic linking for both builds, mixing them makes no sense for
performance comparison).

Carl Eugen
_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: investigating runtime demux performance

Blake Senftner
>> The new version is noticeably slower.
>> Case in point, decompressing 3000 frames ASAP without any display:
>> Version 3.2.2: 0.002319 seconds per frame (measured using boost high_resolution_clock::time_point)
>> Version 4.2.2: 0.004250 seconds per frame

>Please remember that only current FFmpeg git head is supported, both here and on the bug tracker.
>Can you reproduce with ffmpeg, the command line application?

>Please understand that you cannot report performance regressions for builds using "optflags", feel free to run a bisect if the issue is not reproducible with ffmpeg and if it is reproducible >without using "optflags" (and with using the same toolchain and either static or dynamic linking for both builds, mixing them makes no sense for performance comparison).

I removed use of the "optflag" in my ffmpeg builds. Considering my goals are demux only, while ffmpeg.exe will not isolate to a demux only operation, I tried building the 4.2.2 libraries with the cross compiling gcc toolchain as well as building the 3.2.14 libraries as static libraries with the MSVC toolchain.
While I seem to have runtime library linking issues with the gcc toolchain produced libraries, the 3.2.14 libav libraries built with the MSVC toolchain built and are working fine.

So now I can compare ffmpeg 3.2.14 and ffmpeg 4.2.2 both built as static libraries with the same MSVC toolchain.
My tests are using h.265 media files with the "wait for next packet" code disabled, causing the media files to play as fast as the file loads off disk.
The 3.2.14 libraries are clearly twice as fast as the 4.2.2 libraries, as least with the h.265 media files I'm testing with.
(I'm testing h.265 because that is what most of our client's IP cameras stream, and this video playback library is for our computer vision applications.)  

Is anyone else noticing this pretty significant increase in processing? Any idea why?


Blake Senftner
Sr. Software Scientist | CyberExtruder
1401 Valley Road, Wayne, New Jersey 07470
cel: 213 400 6424 (pacific daylight savings timezone)
[hidden email]



 

_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: investigating runtime demux performance

Carl Eugen Hoyos-2
Am Mi., 13. Mai 2020 um 13:12 Uhr schrieb <[hidden email]>:

> The 3.2.14 libraries are clearly twice as fast as the 4.2.2 libraries, as least
> with the h.265 media files I'm testing with.

Then please run git bisect, but allow me to repeat that it may make sense to
reproduce with ffmpeg (which would make your bug report much, much
simpler).

Carl Eugen
_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: investigating runtime demux performance

Blake Senftner

>> The 3.2.14 libraries are clearly twice as fast as the 4.2.2 libraries,
>> as least with the h.265 media files I'm testing with.

>Then please run git bisect, but allow me to repeat that it may make sense to reproduce with ffmpeg (which would make your bug report much, much simpler).
>
>Carl Eugen

The source of the timing differences I've been investigating is found, and the problem is outside ffmpeg/libav.
It was two different versions of a threading library, each configured for different pe3rformance goals.


Blake Senftner
Sr. Software Scientist | CyberExtruder
1401 Valley Road, Wayne, New Jersey 07470
cel: 213 400 6424 (pacific daylight savings timezone)
[hidden email]





_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".
Reply | Threaded
Open this post in threaded view
|

Re: investigating runtime demux performance

Carl Eugen Hoyos-2
Am Di., 19. Mai 2020 um 14:49 Uhr schrieb <[hidden email]>:

>
>
> >> The 3.2.14 libraries are clearly twice as fast as the 4.2.2 libraries,
> >> as least with the h.265 media files I'm testing with.
>
> > Then please run git bisect, but allow me to repeat that it may make
> > sense to reproduce with ffmpeg (which would make your bug report
> > much, much simpler).
>
> The source of the timing differences I've been investigating is found,
> and the problem is outside ffmpeg/libav.
> It was two different versions of a threading library, each configured
> for different pe3rformance goals.

(Just for completeness)
Which I would have told you if I had seen the ffmpeg console outputs.

Carl Eugen
_______________________________________________
Libav-user mailing list
[hidden email]
https://ffmpeg.org/mailman/listinfo/libav-user

To unsubscribe, visit link above, or email
[hidden email] with subject "unsubscribe".