DeepRWCap is a machine learning-guided random walk solver that accelerates capacitance extraction by predicting the transition quantities required to guide each step of the walk. This repository contains the implementation described in the paper:
- Hector R. Rodriguez, Jiechen Huang, and Wenjian Yu, "DeepRWCap: Neural-guided random-walk capacitance solver for IC design," in Proc. the AAAI Conference on Artificial Intelligence (AAAI), Singapore, Jan. 2026.
- Python 3.10+
- CUDA 12.6+ (for GPU support)
- CMake 3.18+
- GCC/G++ compiler
Containerized approach (run the container in shell mode with the /workspace binding):
singularity pull pytorch-24.12-py3.sif docker://nvcr.io/nvidia/pytorch:24.12-py3
singularity shell --nv --bind /path/to/deepRWCap:/workspace pytorch-24.12-py3.sifNote: Replace /path/to/deepRWCap with the actual path to your repository directory. The bind mount makes the repository contents available inside the container at /workspace.
cd ggft
./run_ggft.shWarning
Check the GGFT documentation to generate the training datasets using a finite difference method for model training. Alternativelly, use the provided script cd ggft and source run_ggft.sh.
Each dataset file is a binary file with the following format: Header (2 values):
N: Grid resolution (e.g., 16, 21, 23)block_w: Block width parameter (set to 1)
Body (repeated samples): Each sample contains:
- Dielectric data:
N³values representing the permittivity distribution - Structure data:
7 × n_structuresvalues (geometric structure parameters, unused) - Poisson1/Gradient data:
6 × N²values for the 6 faces of the cube
Install the other dependencies on the container with:
pip install thop neuraloperatorcd training_pytorch
./run_training.sh # to train the models from scratch
./run_compilation.sh # to compile with TensorRT and copy to `/workspace/models/`training_pytorch/src/main.py manages training and optimization of the presented models using PyTorch and TensorRT.
- Trains multiple predefined models on GPU(s) with multiprocessing
- Automatically measures FLOPs and parameter counts
- Exports best models in TorchScript format
- Benchmarks and compiles models with TensorRT (FP32 & FP16)
- Reports latency and throughput improvements after compilation
python src/main.py [train] [compile]train→ Run training onlycompile→ Run TensorRT compilation onlyNo arguments→ Run both training and compilation
Model configurations and datasets are predefined in the script (see MODELS_TO_TRAIN and DATASET_BASE_CONFIGS).
- Trained models saved in:
/workspace/training_pytorch/models/ - Logs saved in:
/workspace/training_pytorch/runs/
The C++ backend provides high-performance inference using LibTorch and TensorRT with CUDA acceleration.
The library for dynamic linking will be created in inference_cpp/build/lib/dnnsolver.so.
unset CUDACXX
cd inference_cpp
mkdir build && cd build
cmake ..
make -j$(nproc)Note: The DeepRWCap binary expects dnnsolver.so to be in the /workspace/executable directory.
-
Activate the Singularity container.
-
Make sure that the
dnnsolver.soandmodels.txtfiles are inside theexecutabledirectory. -
To ensure correct CUDA Stream synchronization, ensure the system is using a single GPU with
export CUDA_VISIBLE_DEVICES=0.
To run a capacitance extraction task directly use:
/path/to/binary -f <input_file.cap3d> -n <num_cores> [accuracy_options]Required Arguments:
-f <input_file.cap3d>: Input file containing the 3D capacitance structure definition-n <num_cores>: Number of CPU cores to use for parallel processing
Accuracy Control Options:
-p <value>: Convergence threshold for self-capacitance-c <value>: Convergence threshold for capacitance matrix--c-ratio <value>: Fraction of the capacitance matrix elements that must meet the convergence threshold.
Example:
./bin/deepRWCap -f /workspace/testcases/cap3d/case3.cap3d -n 16 -p 0.01 -c 0.01 --c-ratio 0.95Expected output files:
case3.cap3d.out: Capacitance extraction resultscase3.cap3d.log: Detailed execution log
To replicate the capacitance extraction results from the paper use the python script run_script.py. It provides:
- Automated testing: Runs each test case multiple times for statistical analysis
- Multi-core scaling: Tests performance across different core counts (1, 2, 4, 8, 16 cores)
- Error analysis: Computes relative errors against reference solutions
python run_script.py /path/to/binary <number_of_runs> [test_cases...]Parameters:
/path/to/binary:./bin/deepRWCapfor our method,./baselines/rwcap_agf,./baselines/rwcap_microwalk, or./baselines/rwcap_agffor other methods.<number_of_runs>: Number of iterations per test case (e.g., 10)[test_cases...]: Optional list of test cases (case1,case2, etc.) orallfor all cases
Example:
python run_script.py ./bin/deepRWCap 10 allFootnotes
-
The surface Green's function is equivalent to the Poisson kernel. ↩