Bug description
When running mace_run_train with --wandb, training fails during setup_wandb() because the code tries to JSON-serialize the full args namespace. If at least one attribute is a function (or other callable), and the custom JSON encoder doesn’t handle callables, then json.dumps(args_dict, cls=CustomEncoder) raises.
To Reproduce
Steps to reproduce the behavior:
- Install MACE with wandb (e.g. pip install mace-torch[wandb] or wandb installed).
- Run training with wandb flags, for example:
'''
mace_run_train
--name=$exp_name
--default_dtype="float32"
--energy_key='energy'
--forces_key='forces'
--stress_key='stress'
--model='MACELES'
--train_file="train.extxyz"
--valid_fraction=0.05
--test_file="test.extxyz"
--E0s="{xx: xxx, yy: yyy, zz: zzz}"
--loss='universal'
--energy_weight=100
--forces_weight=1000
--eval_interval=1
--error_table='PerAtomMAE'
--r_max=6.0
--num_radial_basis=10
--pair_repulsion
--distance_transform="Agnesi"
--num_channels=32
--max_L=0
--num_interactions=2
--correlation=3
--max_ell=3
--scaling='rms_forces_scaling'
--num_workers=8
--lr=0.01
--weight_decay=1e-8
--ema
--ema_decay=0.99
--scheduler_patience=5
--batch_size=32
--valid_batch_size=32
--max_num_epochs=1
--patience=50
--amsgrad
--distributed
--device=cuda
--seed=1
--clip_grad=10
--keep_checkpoints
--save_cpu
--wandb --wandb_project irp --wandb_entity astagroup --wandb_name $exp_name
--restart_latest
'''
Wandb probably takes the full config string and then turns it into a JSON string, herein lies the issue. MACE is trying to log the full args to wandb as JSON, and args contains at least one function, which JSON cannot represent.
Bug description
When running mace_run_train with --wandb, training fails during setup_wandb() because the code tries to JSON-serialize the full args namespace. If at least one attribute is a function (or other callable), and the custom JSON encoder doesn’t handle callables, then json.dumps(args_dict, cls=CustomEncoder) raises.
To Reproduce
Steps to reproduce the behavior:
'''
mace_run_train
--name=$exp_name
--default_dtype="float32"
--energy_key='energy'
--forces_key='forces'
--stress_key='stress'
--model='MACELES'
--train_file="train.extxyz"
--valid_fraction=0.05
--test_file="test.extxyz"
--E0s="{xx: xxx, yy: yyy, zz: zzz}"
--loss='universal'
--energy_weight=100
--forces_weight=1000
--eval_interval=1
--error_table='PerAtomMAE'
--r_max=6.0
--num_radial_basis=10
--pair_repulsion
--distance_transform="Agnesi"
--num_channels=32
--max_L=0
--num_interactions=2
--correlation=3
--max_ell=3
--scaling='rms_forces_scaling'
--num_workers=8
--lr=0.01
--weight_decay=1e-8
--ema
--ema_decay=0.99
--scheduler_patience=5
--batch_size=32
--valid_batch_size=32
--max_num_epochs=1
--patience=50
--amsgrad
--distributed
--device=cuda
--seed=1
--clip_grad=10
--keep_checkpoints
--save_cpu
--wandb --wandb_project irp --wandb_entity astagroup --wandb_name $exp_name
--restart_latest
'''
Wandb probably takes the full config string and then turns it into a JSON string, herein lies the issue. MACE is trying to log the full args to wandb as JSON, and args contains at least one function, which JSON cannot represent.