TabArena Benchmark Results
Context: TabArena is a NeurIPS 2025 benchmark suite of 38 classification datasets within a larger living tabular benchmark. This page positions the current tabnetics
generalrun against the current TabArena leaderboard using the same core scoring model: task-weighted pairwise battles, MLE Elo with bootstrap confidence intervals, average rank, win-rate, MRR, and normalized score.
Note: this is an informational snapshot showing how tabnetics competes on general tabular data. General-tabular optimization is not currently the focus of the library, and this page should be read as a reference point rather than as a formal leaderboard submission.
Interactive browser: The same public snapshot is also available in the static Results Browser alongside the HDLSS benchmark explorer.
General profile
Run configuration
| Parameter | Value |
|---|---|
| Profile | general |
| Seeds | 42 |
| Max training samples | 50000 |
| Task timeout | 3600 s |
| Workers | 12 |
| Classifier oracle | MNPO hybrid |
| Leaderboard bootstrap | 200 rounds |
Current run snapshot
On the 38 classification datasets currently covered by the merged general profile run, tabnetics (general) receives Elo 1012.1 in the overall leaderboard-style comparison, with normalized score 0.105.
The corresponding binary and multiclass leaderboard rows are shown below, followed by the dataset-level results from this run against the current official benchmark table.
Tabnetics row
| method | Elo | Elo 95% CI | Score | Rank | Winrate | MRR |
|---|---|---|---|---|---|---|
| tabnetics (general) | 1012.1 | +105/-121 | 0.105 | 32.56 | 0.283 | 0.116 |
Nearby overall leaderboard rows
| method | Elo | Score | Rank | Winrate |
|---|---|---|---|---|
| NN_TORCH (default) | 1079.2 | 0.017 | 29.68 | 0.348 |
| FASTAI (default) | 1040.5 | 0.038 | 31.39 | 0.309 |
| tabnetics (general) | 1012.1 | 0.105 | 32.56 | 0.283 |
| RF (default) | 1000 | 0.022 | 33.04 | 0.272 |
| LR (tuned + ensemble) | 980.6 | 0.027 | 33.78 | 0.255 |
Binary leaderboard row
| method | Elo | Elo 95% CI | Score | Rank | Winrate | MRR |
|---|---|---|---|---|---|---|
| tabnetics (general) | 1008.3 | +135/-151 | 0.076 | 32.84 | 0.276 | 0.099 |
Multiclass leaderboard row
| method | Elo | Elo 95% CI | Score | Rank | Winrate | MRR |
|---|---|---|---|---|---|---|
| tabnetics (general) | 1027.3 | +269/-481 | 0.213 | 31.49 | 0.307 | 0.176 |
Per-dataset comparison against the current official best method
| Dataset | problem_type | Bal. Acc. | Tabnetics metric_error | Best official metric_error | Best official method | Delta vs best | Dataset rank | Selected model |
|---|---|---|---|---|---|---|---|---|
| APSFailure | binary | 0.957 | 0.0139 | 0.0071 | TABICL (default) | 0.0069 | 42 | mnpo_lr |
| Amazon_employee_access | binary | 0.756 | 0.1437 | 0.1168 | CAT (tuned) | 0.0269 | 11 | mnpo_rf |
| Bank_Customer_Churn | binary | 0.766 | 0.1589 | 0.1256 | TABPFNV2 (tuned) | 0.0333 | 39 | mnpo_lr |
| Bioresponse | binary | 0.788 | 0.1459 | 0.1243 | XGB (tuned + ensemble) | 0.0216 | 36 | mnpo_rf |
| Diabetes130US | binary | 0.573 | 0.39 | 0.3277 | GBM (tuned + ensemble) | 0.0623 | 41 | mnpo_lr |
| E-CommereShippingData | binary | 0.688 | 0.2715 | 0.2557 | TABPFNV2 (default) | 0.0159 | 42 | mnpo_lr |
| Fitness_Club | binary | 0.735 | 0.1911 | 0.1781 | TABPFNV2 (default) | 0.013 | 32 | mnpo_lr |
| GiveMeSomeCredit | binary | 0.71 | 0.2281 | 0.1329 | TABM (tuned + ensemble) | 0.0953 | 42 | mnpo_lr |
| HR_Analytics_Job_Change_of_Data_Scientists | binary | 0.73 | 0.2208 | 0.1947 | TABICL (default) | 0.0261 | 42 | mnpo_rf |
| Is-this-a-good-customer | binary | 0.678 | 0.298 | 0.2495 | EBM (default) | 0.0485 | 41 | mnpo_lr |
| Marketing_Campaign | binary | 0.784 | 0.1342 | 0.0806 | TABPFNV2 (tuned + ensemble) | 0.0536 | 42 | mnpo_lr |
| NATICUSdroid | binary | 0.931 | 0.0199 | 0.0126 | TABICL (default) | 0.0074 | 40 | mnpo_lr |
| bank-marketing | binary | 0.697 | 0.2395 | 0.2344 | CAT (default) | 0.0051 | 27 | mnpo_lr |
| blood-transfusion-service-center | binary | 0.734 | 0.2189 | 0.2445 | FASTAI (tuned + ensemble) | -0.0256 | 1 | mnpo_lr |
| churn | binary | 0.856 | 0.0934 | 0.0695 | MNCA (default) | 0.0238 | 38 | mnpo_xgb |
| coil2000_insurance_policies | binary | 0.632 | 0.3008 | 0.2268 | TABPFNV2 (tuned + ensemble) | 0.0739 | 40 | mnpo_lr |
| credit-g | binary | 0.661 | 0.2452 | 0.2037 | GBM (tuned + ensemble) | 0.0416 | 42 | mnpo_lr |
| credit_card_clients_default | binary | 0.695 | 0.271 | 0.2121 | TABICL (default) | 0.0589 | 42 | mnpo_nb |
| customer_satisfaction_in_airline | binary | 0.939 | 0.0138 | 0.0049 | REALMLP (tuned + ensemble) | 0.0089 | 36 | mnpo_rf |
| diabetes | binary | 0.805 | 0.1137 | 0.1556 | TABPFNV2 (default) | -0.0419 | 1 | mnpo_lr |
| hazelnut-spread-contaminant-detection | binary | 0.927 | 0.0244 | 0.0076 | TABDPT (default) | 0.0168 | 22 | mnpo_lgbm |
| heloc | binary | 0.715 | 0.2051 | 0.1987 | TABPFNV2 (tuned + ensemble) | 0.0064 | 27 | mnpo_lr |
| in_vehicle_coupon_recommendation | binary | 0.75 | 0.1748 | 0.1483 | TABM (tuned + ensemble) | 0.0265 | 23 | mnpo_lgbm |
| jm1 | binary | 0.654 | 0.2904 | 0.2239 | TABICL (default) | 0.0665 | 44 | mnpo_lr |
| kddcup09_appetency | binary | 0.74 | 0.1867 | 0.1542 | CAT (default) | 0.0325 | 25 | mnpo_lr |
| online_shoppers_intention | binary | 0.821 | 0.0996 | 0.0627 | TABPFNV2 (tuned + ensemble) | 0.0369 | 42 | mnpo_lr |
| polish_companies_bankruptcy | binary | 0.777 | 0.0681 | 0.0187 | TABPFNV2 (tuned + ensemble) | 0.0494 | 31 | mnpo_lgbm |
| qsar-biodeg | binary | 0.865 | 0.0822 | 0.0615 | TABICL (default) | 0.0207 | 40 | mnpo_lr |
| seismic-bumps | binary | 0.651 | 0.2824 | 0.2166 | TABICL (default) | 0.0658 | 42 | mnpo_lr |
| taiwanese_bankruptcy_prediction | binary | 0.837 | 0.0993 | 0.0547 | REALMLP (tuned + ensemble) | 0.0446 | 42 | mnpo_lr |
| MIC | multiclass | 0.373 | 2.2882 | 0.4303 | TABM (tuned + ensemble) | 1.858 | 45 | mnpo_elastic_net_lr |
| SDSS17 | multiclass | 0.964 | 0.1221 | 0.0723 | RF (tuned + ensemble) | 0.0498 | 37 | mnpo_rf |
| anneal | multiclass | 0.894 | 0.1004 | 0.0156 | TABPFNV2 (default) | 0.0847 | 42 | mnpo_lr |
| hiva_agnostic | multiclass | 0.349 | 1.43 | 0.1738 | RF (tuned) | 1.2562 | 45 | mnpo_knn |
| maternal_health_risk | multiclass | 0.84 | 0.3628 | 0.4048 | TABDPT (default) | -0.0419 | 1 | mnpo_xgb |
| splice | multiclass | 0.975 | 0.1048 | 0.0993 | TABPFNV2 (tuned + ensemble) | 0.0056 | 5 | mnpo_xgb |
| students_dropout_and_academic_success | multiclass | 0.693 | 0.6439 | 0.5266 | TABPFNV2 (tuned + ensemble) | 0.1173 | 42 | mnpo_lr |
| website_phishing | multiclass | 0.885 | 0.2802 | 0.2215 | TABPFNV2 (tuned + ensemble) | 0.0587 | 32 | mnpo_rf |
Interpretation
This page is intended as an informational view of the current tabnetics run against the current TabArena leaderboard. It provides a general-tabular reference point while the library remains focused primarily on HDLSS problems.
Documentation and webpages on this site are generated from authoritative internal sources using a combination of deterministic rules and generative AI. Errors are possible. Please report issues via GitHub Discussions or email [email protected].