Conversation
|
Related PR for the sake of inversing label cleaner: autogluon/autogluon#5482 |
|
We ran a few benchmarks and are in contact with the first author of the model to improve the method and fix some bugs on their side in the package. We are waiting for a final method to benchmark and all bugs to be fixed before running the full benchmark. |
|
Notes for later:
|
# Conflicts: # tabarena/pyproject.toml # tabflow_slurm/run_setup_slurm_jobs.py
|
I updated to the newest version and verified that the label column name is passed to the model. Once the compute is available, I will rerun its results with tuning and submit this to the official leaderboard, and then merge TabSTAR. |
|
Started TabArena-Lites runs for a sanity check, and then will go over to full and get results for the submission. |
|
I reduced the maximal number of configurations for HPO to 50 for now. Since I stopped the runs in between (after ca. 2 weeks), there might be datasets in the final submission with more than 50 configs. If needed, we can reevaluate running more configs later. |
|
It is still running into issues and takes too long. Reduced to 25 random configs. |
|
We now go results with at least 25 configs per dataset:
Raw data: https://data.lennart-purucker.com/tabarena/leaderboard_submissions/data_TabSTAR_02032026.zip I will merge this PR once I am done with perpetual as well (and then rebase etc) |
# Conflicts: # tabarena/pyproject.toml # tabarena/tabarena/benchmark/models/model_registry.py # tabarena/tabarena/models/utils.py # tabflow_slurm/run_setup_slurm_jobs.py # tabflow_slurm/setup_slurm_base.py # tabflow_slurm/simple_evaluation/run_eval_for_new_model.py

Add support for the TabStar model (https://arxiv.org/abs/2505.18125).
Working on some more TODOs before starting the benchmark