make_linear_regression¶
Generate data by formula.
Data description¶
Synthetic data generated by Generate data by formula:
Y' = X1 + X2 * T + E
Y = Y', if Y' - int(Y') > eps,
Y = 0, otherwise.
Statistics for default parameters and size equals 100,000:
Features | 3 |
Treatment | 2 |
Samples total | size |
Y not equals 0 | 0.49438 |
Y values | 0 to 555.93 |
Parameters: | size: integer
The number of observations.
x1_params : tuple(mu, sigma), default: (0, 1)
The feature with gaussian distribution and mean=mu, sd=sigma.
X1 ~ N(mu, sigma)
x2_params : tuple(mu, sigma), default: (0, 0.1)
The feature with gaussian distribution and mean=mu, sd=sigma.
X2 ~ N(mu, sigma)
x3_params : tuple(mu, sigma), default: (0, 1)
The feature with gaussian distribution and mean=mu, sd=sigma.
X3 ~ N(mu, sigma)
t_params : tuple(mu, sigma), default: (0, 1)
The treatment with uniform distribution. Min value=min, Max value=max-1
T ~ R(min, max)
e_params : tuple(mu, sigma), default: (0, 1)
The error with gaussian distribution and mean=mu, sd=sigma.
E ~ N(mu, sigma)
eps : tuple(mu, sigma), default: (0, 1)
The border value.
random_state : integer, default=777
random_state is the seed used by the random number generator.
|
Returns: | dataset: pandas DataFrame
Generated data.
|
Examples¶
from pyuplift.datasets import make_linear_regression
df = make_linear_regression(10000)
print(df)