Python and FFmpeg: create video with audio in one pass or otherwise speed up the process

Trying to make friends with Python and FFmpeg under Windows.

It is necessary to make one long video from a set of images and audio tracks (stored in the program memory as a numpy array).

Could do it in 2 passes:

  • first I create a video file without an audio track,
  • then I add audio to it, but I would like it to be more optimal and faster:

 

ffmpeg -y -f rawvideo -vcodec rawvideo -s 1920x1080 -pix_fmt bgr24 -r 5.00 ^
       -i pipe:0 -an -vcodec libx264 -preset medium -pix_fmt yuv420p video.avi

- creating a video

ffmpeg -y -f s16le -acodec pcm_s16le -ar 44100 -ac 1 ^
       -i pipe:0 -i video.avi -c:v h264 -c:a ac3 videoANDaudio.avi

- adding audio.

Is it possible to do this in 1 pass? I.e. how is it necessary to transmit video and audio simultaneously, in 2 streams. I thought about NamedPipe, but I didn't find any information on how to create it in Windows on the Internet.

PS. Suggest a technology, preferably FFmpeg, to solve the problem, if available.

Author: MarianD, 2016-05-03

2 answers

About video processing in Python

PS. Suggest a technology, better FFMpeg for solving the problem, if such a is available.

PyAV

There is such a remarkable project: PyAV. These are python bindings to libav. To work with videos in Python, I use it. The API of PyAV is very different from the arguments of ffmpeg and vanilla libav, but it seems very clear and logical in the context of Python.

PyAV: Example

Https://gist.github.com/w495/7d843bd5d42fc35e15486ec60a87d9bf

import av
from av.video.frame import VideoFrame
from av.video.stream import VideoStream

# В этом списке будем хранить кадры в виде numpy-векторов.
array_list = []

# Откроем контейнер на чтение
input_container = av.open('input.mp4')

# Применим «инверсное мультиплексирование» =)
# Получим пакеты из потока.
input_packets = input_container.demux()

# Получии все кадры видео и положим их в `array_list`.
for packet in input_packets:
    if isinstance(packet.stream, VideoStream):
        # Получим все кадры пакета
        frames = packet.decode()
        for raw_frame in frames:
            # Переформатируем кадры, к нужному размеру и виду.
            # Это лучше делать средствами pyav (libav)
            #   потому что быстрее.
            frame = raw_frame.reformat(32, 32, 'rgb24')
            # Превратить каждый кадр в numpy-вектор (dtype=int).
            array = frame.to_nd_array()
            # Положим в список numpy-векторов.
            array_list += [array]

# Откроем контейнер на запись.
output_container = av.open('out.mp4', mode='w', format='mp4')

# Добавим к контейнеру поток c кодеком h264.
output_stream = output_container.add_stream('h264', rate=25)

# В этом списке будем хранить пакеты выходного потока.
output_packets = []

# Пройдем по списку векторов и упакуем их в пакеты выходного протока.
for array in array_list:
    # Построим видео-кадр по вектору.
    frame = VideoFrame.from_ndarray(array, format='rgb24')
    # Запакуем полученный кадр.
    packet = output_stream.encode(frame)
    # Положим в список пакетов.
    output_packets += [packet]

# Применим «прямое мультиплексирование» =)
# Для каждого пакета вызовем мультиплексор.
for packet in output_packets:
    if packet:
        output_container.mux(packet)

output_container.close()

More examples:

  • encode_frames.py - creates a video from a sequence of transmitted images; uses OpenCV to work with images. In fact, you can do without OpenCV here.
  • gen_rgb_rotate.py - creates a video in which the frame color changes in a rainbow color sequence.
  • encode.py - records frames of the original video, until their number of video frames exceeds 100.

I myself actively use PyAV in this project: Video Shot Detector. Perhaps there will be something useful in its code for you.

Https://github.com/w495/python-video-shot-detector

PyAV: Installation

There is a small problem that PyAV is quite hard to build. Especially under Windows. To build from source code requires specific versions of dependencies (ffmpeg, h264 But there are already ready-made builds for the python package manager conda. I haven't tried it, but it seems simple enough to put conda on Windows:

  1. Select the installer you need here: Miniconda. I assume it will be Python 2.7 64-bit (exe installer)
  2. Then run and follow its instructions.
  3. Next, as described in Windows Miniconda Install on the Windows command line conda list .
  4. After that, you will need to install the necessary packages.
conda install numpy
conda install -c danielballan pyav
# или conda install -c soft-matter pyav

I'm not sure that this will all start under Windows - I haven't tried it. If not, then the official website has instructions on how to assemble it yourself:

The authors of the library are very responsive, and you can safely ask them questions and write about problems here: PyAV Issues.

Alternatives

Of the alternatives, I still came across

  • Avpy - just binding to ffmpeg and libav;
  • pyffmpeg - also binding to ffmpeg;
  • ffmpeg-cffi-py - another binding, works only on Windows;
  • pyVideoInput - use their own video handler without ffmpeg, it looks too confused;
  • Python GStreamer - use I haven't mastered my own video handler yet, but on a cursory inspection, it looks convenient;
  • Python OpenCV - processing video via OpenCV is initially similar to driving nails with a microscope;
  • MoviePy is a library for non-linear video editing, but it can also be encoded. Partly, too, driving nails with a microscope.

About Python GStreamer:

Quality of alternatives

  • Avpy I somehow did not start.
  • pyffmpeg is also not going right away, and you have to edit the code before starting the installation.
  • ffmpeg-cffi-py could not find the paths to the required libraries.
  • I haven't tried the others yet.
 6
Author: Ilia w495 Nikitin, 2019-12-20 18:21:03

About FFMpeg

Briefly

Is it possible to do this in 1 pass

Can. To do this, you need ffmpeg to transmit both audio and video in one command. This is easily done by simply passing another input stream to the source command

ffmpeg 
    -f rawvideo -codec:v rawvideo 
        -s 1920x1080 
        -pix_fmt bgr24 
        -r 5.00 
        -i pipe:0 # video 
    -f s16le -codec:a pcm_s16le 
        -ar 44100 
        -ac 1 
        -i pipe:0 # audio 
    -codec:v libx264 
        -preset medium 
        -pix_fmt yuv420p 
    -codec:a ac3

But the problem is that you send it images and samples via stdin (pipe:0). The simplest solution would be:

  1. Write images from your program to a single file.
  2. Write samples to another file.
    • and these actions can be performed in parallel (multiprocessing);
  3. Both files are fed as -i filename.

I assume that this will work faster.

But I would like to be more optimal and faster.

For FFmpeg, the -threads key with the value 0 allows you to work multithreaded, and distribute calculations across all processors.

Many letters

ffmpeg has some special feature - you can pass many arguments of the same name. And the order of these arguments matters. In many cases, this feature is not very effective. But at some point, it shoots hard.

The order of the arguments follows the following logic.

  1. Input stream parameters
    • general first:
      • format, time offset, etc., something like:
        • -ss '00:05:00' - encode from the fifth minute.
    • then separate rooms:
      • video, audio, subtitles:
        • -f rawvideo -codec:v rawvideo ...;
        • -f s16le -codec:a pcm_s16le ...;
  2. Output stream parameters
    • separate first:
      • video, audio, subtitles:
        • -codec:v 'libx264' -profile:v 'main' -b:v '1000k' -filter:v "yadif=1:-1:0,scale=0:576";
        • -strict 'experimental' -codec:a 'aac' -b:a '196k' -ac '6'.
    • then general: time offset, container format, etc., something like: * -ss '00:05:01' -to '00:05:30' * -movflags '+faststart' -f 'mp4' -y file_name.mp4

When following this logic, it will be possible to collect videos from more than one stream in one command. And even more than one thread: Creating Multiple Outputs.

I have some examples at hand (from Bulk Video Converter). It seems that recording video from the screen and audio from the microphone is very similar to your situation.

/usr/bin/ffmpeg                                     \
    -threads '0'                                    \
    -f x11grab                                      \
        -s wxga                                     \
        -i ':0.0'                                   \
    -f alsa                                         \
        -i hw:0                                     \
    -codec:v 'libx264'                              \
        -profile:v 'main'                           \
        -b:v '1000k'                                \
        -filter:v "yadif=1:-1:0,scale=0:576"        \
    -codec:a 'libmp3lame'                           \
        -b:a '196k'                                 \
    -f 'mp4' -y 'video_pal_sd.mp4'                  \

This works on *nix. I assume that on Windows it will work with accuracy to the names of the encoders. I don't fully understand your the problem, because here I will now describe what each of the arguments means.

Input stream settings:

  1. Input video:
    • -f x11grab - audio format, in my case - driver name;
    • -s wxga - the size of the input video frame, here we tell ffmpeg how to perceive what we transmit it - in my case: the size of my monitor -
    • -i ':0.0' - here, the display number, but any other source is available;
  2. Input field audio:
    • -f 'alsa - audio format, in my case - the name of the audio card driver;
    • -i 'hw:0' - here, audio cards (input stream - i.e. microphone), but any other source is possible;

Settings of the resulting streams:

  1. Resulting video:
    • -codec:v 'libx264' - define the codec that we will use to encode the video (h264).
    • -profile:v 'main' - encoding profile for h264.
    • -b:v '1000k' - bitrate for video stream.
    • -filter:v "yadif=1:-1:0,scale=0:576" - filters for the video stream:
      • yadif=1:-1:0 - removes interlacing;
      • scale=0:576 - reduces the output video stream to a foot size - this size does not correspond in any way to the size of the input stream.
  2. Resulting audio:
    • -codec:a 'libmp3lame' - define the codec that we will use to encode the audio (mp3).
    • -b:a '196k' - bitrate for the audio stream.

Settings of the resulting file file (container): * -f 'mp4' - format, you can not specify, and then ffmpeg will try to "guess" itself; * -y 'video_pal_sd.mp4' - overwrite flag -y and output file name.

 5
Author: Ilia w495 Nikitin, 2019-12-15 11:17:33