load_lalonde_nsw¶
Loading the Lalonde NSW dataset from the local file.
Data description¶
The dataset contains the treated and control units from the male sub-sample from the National Supported Work Demonstration as used by Lalonde in his paper.
Features | 7 |
Treatment | 2 |
Samples total | 722 |
Features description¶
- treat - an indicator variable for treatment status.
- age - age in years.
- educ - years of schooling.
- black - indicator variable for blacks.
- hisp - indicator variable for Hispanics.
- married - indicator variable for martial status.
- nodegr - indicator variable for high school diploma.
- re75 - real earnings in 1975.
- re78 - real earnings in 1978.
More information about dataset you can find here.
Parameters: | data_home: str, default=None
Specify another download and cache folder for the dataset.
By default the dataset will be stored in the data folder in the same folder.
download_if_missing: bool, default=True
Download the dataset if it is not downloaded.
|
Returns: | dataset: dict
Dictionary object with the following attributes:
dataset.description : str
Description of the Hillstrom email marketing dataset.
dataset.data: numpy ndarray of shape (722, 7)
Each row corresponding to the 7 feature values in order.
dataset.feature_names: list, size 7
List of feature names.
dataset.treatment: numpy ndarray, shape (722,)
Each value corresponds to the treatment.
dataset.target: numpy array of shape (722,)
Each value corresponds to one of the outcomes. By default, it’s re78 outcome.
|
Examples¶
from pyuplift.datasets import load_lalonde_nsw
df = load_lalonde_nsw()
print(df)