Search for
Home > Computers > Artificial Intelligence > Machine Learning >

Datasets
New! Submit a site
 
whatUseek Collection Sites (submit a site ):
 
Give your site great placement in this category in as little as two business days!
 
 

whatUseek Directory Site Listings:
 
AdEater data - AdEater is a program that learns to remove Internet advertisements. The machine learning dataset is available from this page.
 
Bilkent University Function Approximation Repository - Datasets used for the experimental analysis of function approximation techniques and for training and demonstration by machine learning and statistics community.
 
DELVE - Data for Evaluating Learning in Valid Experiments - Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based primarily on empirical data. Delve makes it possible for users to compare their learning methods with other methods on many datasets.
 
DNA microarray gene expression data - A collection of public gene expression data sources maintained by A. Brazma.
 
Dataset generator - Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets can be used to validate learning algorithms.
 
Face recognition dataset - A dataset of face images for face recognition algorithms.
 
ILP Applications and Datasets - A collection of datasets used with Inductive Logic Programming algorithms. Includes drug structure-activity, mutagenesis, protein secondary structure, chess, etc.
 
Learning Relational Concepts from Sensor Data of a Mobile Robot - A set of data sets, where each data set is represented in first order logic. Maintained at the University of Dortmund, Germany.
 
Leukemia ALL and AML Datasets - Gene expression data used for molecular classification of cancer.
 
NIST Special Database 4. - This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs.
 
National Space Science Data Center - Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight missions, in addition to selected other data and some models and software.
 
Penn Treebank Project - A corpus of parsed sentences. Used by many researchers for training data-driven parsing algorithms.
 
RISE: Repository of Information Sources used in information Extraction tasks. - Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
 
TREC Data - Text datasets used in information retrieval and learning in text domains.
 
The 20 Newsgroups Data Set - 20 Newsgroups for text categorization. Widely used dataset.
 
The RCSB Protein Data Bank (PDB) - Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.
 
The Reuters-21578 Text Categorization Test Collection - A classic benchmark for text categorization algorithms. Maintained at AT&T.
 
The StatLib Datasets Archive - A repository of datasets used in statistics and machine learning.
 
Time Series Data Library - A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subject.
 
UCI Machine Learning Repository
 
 

[ 1 2 ]
Help build the largest human-edited directory on the web.
  Submit a Site - Open Directory Project - Become an Editor  
About   Help   Content Filter   Terms   Privacy Policy

© 2018 whatUseek