Main Usage¶
To load a dataset do the following
In [3]: import statsmodels.api as sm
In [4]: data = sm.datasets.longley.load()
The Dataset object follows the bunch pattern as explain in the proposal.
Most datasets have two attributes of particular interest to users for examples
In [5]: data.endog
Out[5]:
array([ 60323., 61122., 60171., 61187., 63221., 63639., 64989.,
63761., 66019., 67857., 68169., 66513., 68655., 69564.,
69331., 70551.])
In [6]: data.exog
Out[6]:
array([[ 83. , 234289. , 2356. , 1590. , 107608. , 1947. ],
[ 88.5, 259426. , 2325. , 1456. , 108632. , 1948. ],
[ 88.2, 258054. , 3682. , 1616. , 109773. , 1949. ],
[ 89.5, 284599. , 3351. , 1650. , 110929. , 1950. ],
[ 96.2, 328975. , 2099. , 3099. , 112075. , 1951. ],
[ 98.1, 346999. , 1932. , 3594. , 113270. , 1952. ],
[ 99. , 365385. , 1870. , 3547. , 115094. , 1953. ],
[ 100. , 363112. , 3578. , 3350. , 116219. , 1954. ],
[ 101.2, 397469. , 2904. , 3048. , 117388. , 1955. ],
[ 104.6, 419180. , 2822. , 2857. , 118734. , 1956. ],
[ 108.4, 442769. , 2936. , 2798. , 120445. , 1957. ],
[ 110.8, 444546. , 4681. , 2637. , 121950. , 1958. ],
[ 112.6, 482704. , 3813. , 2552. , 123366. , 1959. ],
[ 114.2, 502601. , 3931. , 2514. , 125368. , 1960. ],
[ 115.7, 518173. , 4806. , 2572. , 127852. , 1961. ],
[ 116.9, 554894. , 4007. , 2827. , 130081. , 1962. ]])
Univariate datasets, however, do not have an exog attribute. You can find out the variable names by doing
In [7]: data.endog_name
Out[7]: 'TOTEMP'
In [8]: data.exog_name
Out[8]: ['GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP', 'YEAR']
If the dataset does not have a clear interpretation of what should be an endog and exog, then you can always access the data or raw_data attributes. This is the case for the macrodata dataset, which is a collection of US macroeconomic data rather than a dataset with a specific example in mind. The data attribute contains a record array of the full dataset and the raw_data attribute contains an ndarray with the names of the columns given by the names attribute.
In [9]: type(data.data)
Out[9]: numpy.core.records.recarray
In [10]: type(data.raw_data)
Out[10]: numpy.ndarray
In [11]: data.names
Out[11]: ['TOTEMP', 'GNPDEFL', 'GNP', 'UNEMP', 'ARMED', 'POP', 'YEAR']