get-dataset
function
Usage: (get-dataset dataset-key & {:keys [incanter-home from-repo], :or {incanter-home (or (getProperty incanter.home) (getenv INCANTER_HOME)), from-repo true}})
Returns the sample dataset associated with the given key. Most datasets
are from R's sample data sets, as are the descriptions below.
Options:
:incanter-home -- if the incanter.home property is not set when the JVM is
started (using -Dincanter.home) or there is no INCANTER_HOME
environment variable set, use the :incanter-home options to
provide the parent directory of the sample data directory.
:from-repo (default false) -- If true, retrieves the dataset from the online repository
instead of locally, it will do this by default if incanter-home is not set.
Datasets:
:iris -- the Fisher's or Anderson's Iris data set gives the
measurements in centimeters of the variables sepal
length and width and petal length and width,
respectively, for 50 flowers from each of 3 species
of iris.
:cars -- The data give the speed of cars and the distances taken
to stop. Note that the data were recorded in the 1920s.
:survey -- survey data used in Scott Lynch's 'Introduction to Applied Bayesian Statistics
and Estimation for Social Scientists'
:us-arrests -- This data set contains statistics, in arrests per 100,000
residents for assault, murder, and rape in each of the 50 US
states in 1973. Also given is the percent of the population living
in urban areas.
:flow-meter -- flow meter data used in Bland Altman Lancet paper.
:co2 -- has 84 rows and 5 columns of data from an experiment on the cold tolerance
of the grass species _Echinochloa crus-galli_.
:chick-weight -- has 578 rows and 4 columns from an experiment on the effect of diet
on early growth of chicks.
:plant-growth -- Results from an experiment to compare yields (as measured by dried
weight of plants) obtained under a control and two different
treatment conditions.
:pontius -- These data are from a NIST study involving calibration of load cells.
The response variable (y) is the deflection and the predictor variable
(x) is load.
See http://www.itl.nist.gov/div898/strd/lls/data/Pontius.shtml
:filip -- NIST data set for linear regression certification,
see http://www.itl.nist.gov/div898/strd/lls/data/Filip.shtml
:longely -- This classic dataset of labor statistics was one of the first used to
test the accuracy of least squares computations. The response variable
(y) is the Total Derived Employment and the predictor variables are GNP
Implicit Price Deflator with Year 1954 = 100 (x1), Gross National Product
(x2), Unemployment (x3), Size of Armed Forces (x4), Non-Institutional
Population Age 14 & Over (x5), and Year (x6).
See http://www.itl.nist.gov/div898/strd/lls/data/Longley.shtml
:Chwirut -- These data are the result of a NIST study involving ultrasonic calibration.
The response variable is ultrasonic response, and the predictor variable is
metal distance.
See http://www.itl.nist.gov/div898/strd/nls/data/LINKS/DATA/Chwirut1.dat
:thurstone -- test data for non-linear least squares.
:austres -- Quarterly Time Series of the Number of Australian Residents
:hair-eye-color -- Hair and eye color of sample of students
:airline-passengers -- Monthly Airline Passenger Numbers 1949-1960
:math-prog -- Pass/fail results for a high school mathematics assessment test
and a freshmen college programming course.
:iran-election -- Vote counts for 30 provinces from the 2009 Iranian election.
Examples:
(def data (get-dataset :cars))
(def data2 (get-dataset :cars :incanter.home "/usr/local/packages/incanter"))
Source