load_hillstrom_email_marketing

Loading the Hillstrom Email Marketing dataset from the local file.

Data description

This dataset contains 64,000 customers who last purchased within twelve months. The customers were involved in an e-mail test.

  • 1/3 were randomly chosen to receive an e-mail campaign featuring Mens merchandise.
  • 1/3 were randomly chosen to receive an e-mail campaign featuring Womens merchandise.
  • 1/3 were randomly chosen to not receive an e-mail campaign.

During a period of two weeks following the e-mail campaign, results were tracked. Your job is to tell the world if the Mens or Womens e-mail campaign was successful.

Features 8
Treatment 3
Samples total 64,000
Average spend rate 1.05091
Average visit rate 0.14678
Average conversion rate 0.00903

More information about dataset you can find in the official paper.

Parameters:
data_home: str, default=None
Specify another download and cache folder for the dataset.
By default the dataset will be stored in the data folder in the same folder.
load_raw_data: bool, default=False
The loading of raw or preprocessed data?
download_if_missing: bool, default=True
Download the dataset if it is not downloaded.
Returns:
dataset: dict
Dictionary object with the following attributes:
dataset.description : str
Description of the Hillstrom email marketing dataset.
dataset.data: numpy ndarray of shape (64000, 8)
Each row corresponding to the 8 feature values in order.
dataset.feature_names: list, size 8
List of feature names.
dataset.treatment: numpy ndarray, shape (64000,)
Each value corresponds to the treatment.
dataset.target: numpy array of shape (64000,)
Each value corresponds to one of the outcomes. By default, it’s spend outcome (look at target_spend below).
dataset.target_spend: numpy array of shape (64000,)
Each value corresponds to how much customers spent during a two-week outcome period.
dataset.target_visit: numpy array of shape (64000,)
Each value corresponds to whether people visited the site during a two-week outcome period.
dataset.target_conversion: numpy array of shape (64000,)
Each value corresponds to whether they purchased at the site (“conversion”) during a two-week outcome period.

Examples

from pyuplift.datasets import load_hillstrom_email_marketing
df = load_hillstrom_email_marketing()
print(df)