This page list some datasets used for deep learning, neural networks, classification... I try to update this page with new dataset as soon as I can.
I'm currently looking for translation dataset, if you know some (top quality only) that are not listed on this page, please contact me.
If you find a broken link or a mistake on this page, please contact me.
Database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples.
entire corpus of training materials for handprinted document and character recognition. It contains over 800,000 images with hand checked classifications.
Consists of 60000 32x32 colour images in 10 classes (airplane, bird, cat, truck ...) with 6000 images per class. There are 50000 training images and 10000 test images.
Pictures of objects belonging to 101 categories. About 40 to 800 images per category (roughly 300 x 200 pixels).
Pictures of objects belonging to 256 categories.
Collection d'environ 20 000 documents de newsgroups, répartis (presque) uniformément sur 20 newsgroups différents.
Une vaste collection d'articles de Reuters News à utiliser dans la recherche et le développement de systèmes de traitement du langage naturel, de recherche d'informations et d'apprentissage automatique.
Provides semantic role annotation and predicate sense disambiguation for roughly 50,000 predicates, corresponding to all verbs, all adjectives in equational clauses and all nouns considered to be predicative.
Contains over 1.8 million articles written and published by the New York Times between January 1, 1987 and June 19, 2007.
Contains English word n-grams and their observed frequency counts. The length of the n-grams ranges from unigrams (single words) to five-grams.
Wikipedia offers free copies of all available content to interested users.