PROMO: Simple causal effects in time series

This dataset is proposed in the context of the Causality Workbench. Please also check out its page on the repository.

Summary

The PROMO dataset proposes the task to identify which promotions affect sales. Artificial data about 1000 promotion variables and 100 product sales is provided. The goal is to predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years (i.e., 1095 days).

Each of the 100 products has a defined seasonal baseline, repeating over the years. The seasonal effect can vary from almost inexistent to major. On top of this baseline are promotions. Each product is influenced by between 1 and 50 promotions out of the 1000 promotions available. Promotions usually increase the sales with respect to the baseline, but can occasionally reduce them (e.g., when a similar competing product is promoted, that promotion might have a negative effect on the sales of the current product). On top of that are daily variations.

Each of the 1000 promotions can be seasonal or not; i.e., they can have the same pattern from one year to another or be completely different. The average time a promotion stays active or inactive, however, is constant for each promotion.

The weighted normalized influence matrix is provided for result evaluation. It is normalized so that the maximum positive contribution is 1 and the maximum negative contribution is –1, and each nonzero (i,j) entry is weighted by how much promotion i affects product j. Algorithms can be requested to output either a boolean influence matrix, or a weighted matrix similar to the one provided for result evaluation..

Files

The provided zipped archive contains 3 text files, describing matrices. Each line of the file describes a space-delimited row of values. The files are:

  • products.dat: a 1095x100 matrix. The column i represents the daily sales over 3 years for the product i as a time series of a continuous variable;
  • promotions.dat: a 1095x1000 binary matrix, describing when each of the 1000 promotions are active;
  • influence.dat: the 1000x100 influence matrix. The (i,j)th entry is zero if promotion i has no influence on product j; otherwise, it is a continuous value between –1 and 1.

Download

Download PROMO.zip