A selection of datasets for building machine learning models
This question collects lists of datasets and utilities for data markup. Please edit the existing response
2
Author: hedgehogues, 2020-04-23
1 answers
- Russian-Chinese parallel texts. ~8mn rows
- An ancient collection of data repositories. A lot of old and outdated data
- ODS Community dataset collection
- Data with photos of people's faces by nationality
- Data markup tools
- kaggle datasets
- Collection of datasets on text sammarization
- Apple Company shows statistics on the mobility of its users in connection with COVID-19
- CSSEGISandData COVID-19
- Utility for manual data markup
- Photos of roofs of houses 1
- Similarwed dataset. Domain Categorizer
- 1941-1945 Data from the website of the Ministry of Defense about veterans. They contain about a million records.
- The corpus is a 12.9 billion tokens sample of the lib.rus.ec book collection (150GB of raw text)
- Collection of books in Russian
- toxic dataset in Russian
- Russian Language Toxic Comments
- Collection of photos of people in masks (Chinese)
- COVID-19 data collection in Russia
- Collection of data on COVID-19 in Russia, created with the support of INVITRO
- Images and descriptions
- Text corpora on Portuguese
1
Author: hedgehogues, 2020-05-26 17:48:05