load_lalonde_nsw¶

Loading the Lalonde NSW dataset from the local file.

Data description¶

The dataset contains the treated and control units from the male sub-sample from the National Supported Work Demonstration as used by Lalonde in his paper.

Features	7
Treatment	2
Samples total	722

Features description¶

treat - an indicator variable for treatment status.
age - age in years.
educ - years of schooling.
black - indicator variable for blacks.
hisp - indicator variable for Hispanics.
married - indicator variable for martial status.
nodegr - indicator variable for high school diploma.
re75 - real earnings in 1975.
re78 - real earnings in 1978.

More information about dataset you can find here.

Parameters:

data_home: str, default=None

Specify another download and cache folder for the dataset.

By default the dataset will be stored in the data folder in the same folder.

download_if_missing: bool, default=True

Download the dataset if it is not downloaded.

Returns:

dataset: dict

Dictionary object with the following attributes:

dataset.description : str

Description of the Hillstrom email marketing dataset.

dataset.data: numpy ndarray of shape (722, 7)

Each row corresponding to the 7 feature values in order.

dataset.feature_names: list, size 7

List of feature names.

dataset.treatment: numpy ndarray, shape (722,)

Each value corresponds to the treatment.

dataset.target: numpy array of shape (722,)

Each value corresponds to one of the outcomes. By default, it’s re78 outcome.

Examples¶

from pyuplift.datasets import load_lalonde_nsw
df = load_lalonde_nsw()
print(df)