zama-ai · agnesLeroy · Oct 22, 2025
diff --git a/tfhe/docs/getting-started/benchmarks/cpu/README.md b/tfhe/docs/getting-started/benchmarks/cpu/README.md
@@ -9,4 +9,5 @@ All CPU benchmarks were launched on an `AWS hpc7a.96xlarge` instance equipped wi
 {% endhint %}
 
 * [Integer operations](cpu-integer-operations.md)
+* [ERC20](cpu-erc20.md)
 * [Programmable Bootstrapping](cpu-programmable-bootstrapping.md)
diff --git a/tfhe/docs/getting-started/benchmarks/cpu/cpu-erc20.md b/tfhe/docs/getting-started/benchmarks/cpu/cpu-erc20.md
@@ -0,0 +1,66 @@
+As TFHE-rs is the underlying library of the Zama Confidential Blockchain Protocol, to illustrate real-world performance,  
+consider an ERC20 transfer that requires executing the following sequence of operations:
+```rust
+fn erc20_transfer_whitepaper(
+    from_amount: &FheUint64,
+    to_amount: &FheUint64,
+    amount: &FheUint64,
+) -> (FheUint64, FheUint64) {
+    let has_enough_funds = (from_amount).ge(amount);
+
+    let mut new_to_amount = to_amount + amount;
+    new_to_amount = has_enough_funds.select(&new_to_amount, to_amount);
+
+    let mut new_from_amount = from_amount - amount;
+    new_from_amount = has_enough_funds.select(&new_from_amount, from_amount);
+
+    (new_from_amount, new_to_amount)
+}
+```
+This is one way to compute an encrypted ERC20 transfer, but it is not the most efficient.
+Instead, it is possible to compute the same transfer in a more efficient way by not using the `select` operation:
+```rust
+fn erc20_transfer_no_select(
+    from_amount: &FheUint64,
+    to_amount: &FheUint64,
+    amount: &FheUint64,
+) -> (FheUint64, FheUint64) {
+    let has_enough_funds = (from_amount).ge(amount);
+
+    let amount = amount * FheType::cast_from(has_enough_funds);
+
+    let new_to_amount = to_amount + &amount;
+    let new_from_amount = from_amount - &amount;
+
+    (new_from_amount, new_to_amount)
+}
+```
+An even more efficient way to compute an encrypted ERC20 transfer is to use the `overflowing_sub` operation as follows:
+```rust
+use tfhe::FheUint64;
+fn erc20_transfer_overflow(
+    from_amount: &FheUint64,
+    to_amount: &FheUint64,
+    amount: &FheUint64,
+) -> (FheUint64, FheUint64) {
+    let (new_from, did_not_have_enough) = (from_amount).overflowing_sub(amount);
+    let did_not_have_enough = &did_not_have_enough;
+    let had_enough_funds = !did_not_have_enough;
+
+    let (new_from_amount, new_to_amount) = rayon::join(
+        || did_not_have_enough.if_then_else(from_amount, &new_from),
+        || to_amount + (amount * FheType::cast_from(had_enough_funds)),
+    );
+    (new_from_amount, new_to_amount)
+}
+```
+In a blockchain protocol, the FHE operations would not be the only ones used to compute the transfer:
+ciphertext compression and decompression, as well as rerandomization, would also be used. 
+Network communications would also introduce significant overhead.
+For the sake of simplicity, here the focus is only placed on the performance of the FHE operations.
+The latency and throughput of these three ERC20 FHE transfer implementations are compared in the following table:
+
+TODO add SVG
+
+The throughput shown here is the maximum that can be achieved with TFHE-rs on CPU, in an ideal scenario. 
+In a blockchain protocol, the throughput would be limited by the latency of the network.
diff --git a/tfhe/docs/getting-started/benchmarks/gpu/README.md b/tfhe/docs/getting-started/benchmarks/gpu/README.md
@@ -9,4 +9,5 @@ All GPU benchmarks were launched on H100 GPUs, and rely on the multithreaded PBS
 {% endhint %}
 
 * [Integer operations](gpu-integer-operations.md)
+* [ERC20](gpu-erc20.md)
 * [Programmable Bootstrapping](gpu-programmable-bootstrapping.md)
diff --git a/tfhe/docs/getting-started/benchmarks/gpu/gpu-erc20.md b/tfhe/docs/getting-started/benchmarks/gpu/gpu-erc20.md
@@ -0,0 +1,7 @@
+Similarly to the [CPU benchmarks](../cpu/cpu-erc20.md), the latency and throughput of a confidential ERC20 token transfer can be measured.
+
+TODO add SVG
+
+The throughput shown here is the maximum that can be achieved with TFHE-rs on an 8xH100 GPU node, in an ideal scenario.
+In a blockchain protocol, the throughput would be limited by the latency of the network and the necessity to apply 
+other operations (compression, decompression, rerandomization).
diff --git a/tfhe/docs/getting-started/benchmarks/hpu/README.md b/tfhe/docs/getting-started/benchmarks/hpu/README.md
@@ -9,3 +9,4 @@ All HPU benchmarks were launched on AMD Alveo v80 FPGAs.
 {% endhint %}
 
 * [Integer operations](hpu-integer-operations.md)
+* [ERC20](hpu-erc20.md)
diff --git a/tfhe/docs/getting-started/benchmarks/hpu/hpu-erc20.md b/tfhe/docs/getting-started/benchmarks/hpu/hpu-erc20.md
diff --git a/tfhe/src/test_user_docs.rs b/tfhe/src/test_user_docs.rs
@@ -15,6 +15,12 @@ mod test_cpu_doc {
         configuration_rust_configuration
     );
 
+    // BENCHMARKS
+    doctest!(
+        "../docs/getting-started/benchmarks/cpu/cpu-erc20.md",
+        benchmarks_cpu_erc20
+    );
+
     // FHE COMPUTATION
 
     // ADVANCED FEATURES
Original file line number	Diff line number	Diff line change
Expand Up		@@ -9,3 +9,4 @@ All HPU benchmarks were launched on AMD Alveo v80 FPGAs.
		{% endhint %}

		* [Integer operations](hpu-integer-operations.md)
		* [ERC20](hpu-erc20.md)