HIVA is the HIV infection database

The task of HIVA is to predict which compounds are active against the AIDS HIV infection. The original data has 3 classes (active, moderately active, and inactive). We brought it back to a two-class classification problem (active vs. inactive), but we provide the original labels for the "prior knowledge track". The compounds are represented by their 3d molecular structure for the “prior knowledge” track. For the “agnostic track” we represented the data as 2000 sparse binary input variables. The variables represent properties of the molecule inferred from its structure. The problem is therefore to relate structure to activity (a QSAR=quantitative structure-activity relationship problem) to screen new compounds before actually testing them (a HTS=high-throughput screening problem.)