Results Browser

This browser complements the narrative Benchmark Results and TabArena Results pages with an interactive explorer backed by the published public data bundles.

The benchmark tab covers the public HDLSS Val-18 / Val-19 / Val-20 / Val-21 bundle, including per-run metrics, profile summaries, dataset metadata, and the SOTA comparison bands shown in the results page. The Auto Router tab summarizes the packaged V25 router evidence and candidate policy. The TabArena tab loads the public general-tabular snapshot when that bundle is available at publish time.

Use the benchmark tab when you want to slice the HDLSS validation surface by family, campaign, campaign scope, tier, or domain. Use the Auto Router tab when you want to inspect the V25 training-CV policy, candidate selections, and current holdout status. Use the TabArena tab when you want to inspect the general-tabular comparison snapshot and the per-dataset gap against the current official best method. For a compact guide to the campaign families and datasets exposed here, see Browser Data Guide. When you want the published seeds and run settings behind a profile, open the Profile Config Browser.

Interactive browser

Interactive result explorer

Explore the published benchmark results at your own pace, compare the strongest profiles, and jump into the exact seeds and run settings behind any profile when you want the full picture.

Loading browser data…

Dataset landscape

Best filtered profile per dataset. Click a point to focus the dataset detail view.

Family frontier

Mean filtered profile performance by experiment family.

Dataset detail

Top profiles for the selected dataset, or the global filtered frontier when nothing is selected.

V25 policy slices

Training-CV aggregate and the latest available frozen-router holdout context.

Candidate selection counts

How often each supported V25 candidate was selected by the calibrated policy.

Training datasets

Out-of-fold V25 policy deltas against the current default, aggregated by dataset.

Leaderboard snapshot

Overall Elo ladder with the current `tabnetics (general)` row highlighted.

Per-dataset gap to best official method

Positive deltas are behind the official best; negative deltas mean tabnetics wins that dataset slice.


Documentation and webpages on this site are generated from authoritative internal sources using a combination of deterministic rules and generative AI. Errors are possible. Please report issues via GitHub Discussions or email [email protected].