What machine learning methods can be used to solve the problem of video classification?

Question

What machine learning methods can be used to solve the problem of video classification?

I need to implement a multi-class classification of videos. By gesturing, you need to identify the person. The dataset consists of more than 100 hours of video, resulting in more than 5,000,000 images. Tell me with what methods it is better to implement this? What solutions can be optimal in terms of memory consumption and time consumption? I'm using python 3.6 on an NVIDIA GeForce GTX 970.

0

python-3.x машинное-обучение видео классификация

Author: Alex, 2020-04-01

Source

1 answers

score 0 · Accepted Answer

The problem is not trivial. Here, we should not think about "optimal" solutions, but about any, as long as they solve the problem. Well, and think about where to get the resources for the solution (I'm not talking about computer hardware, although they are also, but last of all).

First, you need to learn to recognize the gestures themselves. Even for sign language, i.e. not arbitrary movements, but rather highly standardized gestures of the deaf and dumb, Google has only recently built a more or less working recognition system, and it seems they we were the first to get real results. Although people have been working for several years.

Https://dev.by/news/google-sozdala-algoritm-raspoznavaniya-zhestov-rabotayushii-cherez-kameru-smartfona

Or here's another

Https://cyberleninka.ru/article/n/sistema-raspoznavaniya-zhestov-na-osnove-neyrosetevyh-tehnologiy/viewer

Http://openarchive.nure.ua/bitstream/document/9173/1/kulishova_kazakova_PMW2019.pdf

I understand that you have your gestures can be arbitrary (at least you didn't tell us anything else). Well, there, scratch your nose, smooth your hair, smile, wink, twist your muzzle :-). So you need to start with the fact that-would be able to recognize them. I don't think there are ready-made solutions. The question is-how to cope with this task? Compete with Google?

The second problem. Let's say you've highlighted the gestures. Now you need to be able to classify these people by the same gestures of different people. And-not by one gesture of all, and provided that different people make different gestures (at least, so it follows from the description of the task that you have given) At the same time, you need to be able to classify your gestures in such a way that you can identify adequate classifying features on them. It is not clear what you mean by "method", but most likely neural networks will have to work here. Most likely-convolutional type or derived from them. Here is the recognition by voice or by photo until the end is not yet resolved tasks (I'm not talking about advertising, I'm talking about real tasks and projects), and you want - by gestures.

I think the equipment you described is clearly not for this task. Well, or you will train your grids for years :-). In addition, for all my fondness for Python, for this task - this is not the tool that you will have to use at least at the training stage.

Generally speaking, the task pulls on a serious project with elements of scientific research. And it certainly can hardly be it was performed by an "amateur" without any serious training in the field of machine learning.

And for the rest-of course, I can wish you good luck.