CHANGES:
We're excited to announce the release of Raven 1.0.0~alpha2! Less than a month after alpha1, this release notably includes contributions from Outreachy applicants in preparation for the upcoming two internships.
Some highlights from this release include:
- NumPy-compatible text I/O with
Nx_io.{save,load}_text - Lots of new functions in Nx/Rune, including neural-net ones
dropout,log_softmax,batch_norm,layer_norm, and activation functions likeceluandcelu, and generic ones likeconjugate,index_put, and more. - Addition of
.toplibraries fornx,rune, andhuginthat auto-install pretty-printers in the OCaml toplevel. You can run e.g.#require "nx.top". - Addition of a visualization API in Fehu via the new
fehu.visualizelibrary, supporting video recording. - Redesign of Kaun core datastructure and checkpointing subsystem for complete snapshotting.
- Many, many bug fixes and correctness improvements.
We've also made numerous performance improvements across the board:
- Nx elementwise ops: 5–50× faster (e.g., Add 50×50 f32 88.81 µs → 1.83 µs, 48×; Mul 100×100 f32 78.51 µs → 2.41 µs, 33×).
- Nx conv2d: 4–5× faster on common shapes; up to 115× on heavy f64 batched cases (e.g., B16 C64→128 16×16 K3 f64 1.61 s → 13.96 ms).
- Rune autodiff: 1.2–3.7× faster on core grads (e.g., MatMulGrad Medium 34.04 ms → 11.91 ms, 2.86×; Large 190.19 ms → 50.97 ms, 3.73×).
- Talon dataframes: big wins in joins and group-bys (Join 805.35 ms → 26.10 ms, 31×; Group-by 170.80 ms → 19.03 ms, 9×; Filter 9.93 ms → 3.39 ms, 3×).
- Saga tokenizers: realistic workloads 4–17% faster (e.g., WordPiece encode single 136.05 µs → 115.92 µs, 1.17×; BPE batch_32 24.52 ms → 22.27 ms, 1.10×)
We're closing 8 user-reported issues or feature requests and are totalling 30 community contributions from 8 unique contributors.
Nx
- Fix einsum output axis ordering for free axes (e.g.,
i,jk->jki,ij,klj->kli) by correcting final transpose permutation and intermediate left-axis reordering. (@tmattio) - Add
Nx_io.Cache_dirmodule with consolidated cache directory utilities respectingRAVEN_CACHE_ROOT,XDG_CACHE_HOME, andHOMEfallback, replacing project-specific cache logic across the whole raven ecosystem (#134, @Arsalaan-Alam) - Add
Nx_io.save_txt/Nx_io.load_txtwith NumPy-compatible formatting, comments, and dtype support (#120, @six-shot) - Optimize
multi_dotfor matrix chains, reducing intermediate allocations and improving performance (@tmattio) - Add public
index_putfunction for indexed updates (@tmattio) - Clarify
reshapedocumentation to match its view-only semantics (@tmattio) - Provide
nx.top,rune.top, andhugin.toplibraries that auto-install pretty printers in the OCaml toplevel and update Quill to load them (@tmattio) - Add
ifillfor explicit in-place fills and makefillreturn a copied tensor (@tmattio) - Speed up contiguous elementwise ops via vectorized loops (@tmattio)
- Fast-path contiguous single-axis reductions to avoid iterator fallback (@tmattio)
- Speed up float reductions with contiguous multi-axis fast paths (@tmattio)
- Fast-path padding-free
unfoldto lower conv2d overhead (@tmattio) - Move neural-network operations (softmax, log_softmax, relu, gelu, silu, sigmoid, tanh) from Kaun to Nx (@tmattio)
- Add public
conjugatefunction for complex number conjugation (#125, @Arsalaan-Alam) - Fix complex vdot to conjugate first tensor before multiplication, ensuring correct mathematical behavior (#123, @Arsalaan-Alam)
- Update comparison and conditional operations to use boolean tensors (#115, @nirnayroy)
- Add support for rcond parameter and underdetermined systems to
lstsq(#102, @Shocker444) - Fix
matrix_rank/pinvHermitian fast paths to use eigen-decomposition and match NumPy for complex inputs (#96, @six-shot, @tmattio) - Optimize matmul BLAS dispatch for strided tensors, improving matrix multiplication performance (@tmattio)
- Fix slow builds reported since alpha1 (#88, @tmattio)
- Fix macOS ARM crash when loading extended bigarray kinds (@tmattio)
- Add float16 and bfloat16 support to safetensors I/O, including precise conversions that preserve denormals/NaNs (#84, @six-shot, @tmattio)
- Refined
Viewinternals for leaner contiguity checks and stride handling, cutting redundant materialization on hot paths (@tmattio) - Merge
Lazy_viewinto the coreViewAPI so movement ops operate on a single composed view (@tmattio) - Documented the reworked
Viewinterface (@tmattio) - Documented the
Symbolic_shapeinterface (@tmattio) - Added Accelerate framework flag when compiling on macOS, fixing issues in some environments (#129, @nirnayroy)
Hugin
- Let
Hugin.showwindows close cleanly via the window button orEsc/q, avoiding frozen macOS REPL sessions (@tmattio)
Rune
- Add
Rune.no_gradandRune.detachto mirror JAX stop-gradient semantics (@tmattio) - Improve gradient performance slightly by replace the reverse-mode tape's linear PhysicalTbl with an identity hash table (@tmattio)
- Fix
Rune.Rng.shuffleflattening outputs for multi-dimensional tensors; the
shuffle now gathers along axis 0 and keeps shapes intact (@tmattio) - Replace
Rune.Rng.truncated_normalclipping with rejection sampling so
samples stay inside the requested interval without boundary spikes (@tmattio) - Add support for categorical sampling with
Rune.Rng.categorical(#89, @nirnayroy) - Allow plain
llvm-configin discovery, fixing build in some platforms (#71, @stepbrobd)
Kaun
- Added Similarity and Polysemy analysis to the BERT example (#137, @nirnayroy)
- Support attention masks via the new
Kaun.Attentionmodule (@tmattio) - Support loading sharded Hugging Face safetensors (@tmattio)
- Fix BERT and GPT‑2 model loading (@tmattio)
- API simplification: removed type parameters from public types;
Ptreenow supports mixed‑dtype trees via packed tensors with typed getters. (@tmattio) - Checkpointing overhaul: versioned
Train_statewith schema tagging, explicitCheckpoint.{Snapshot,Artifact,Manifest,Repository}(retention, tags, metadata), and simple save/load helpers for snapshots and params. (@tmattio) - Overhaul dataset combinators: derive tensor specs from Rune dtype, fix sampling/window bugs, validate weighted sampling, and respect
drop_remainder(@tmattio) - Make dataset
prefetchtruly asynchronous with background domains and allow reusing an external Domainslib pool viaparallel_map ~pool(@tmattio) - Use
Dataset.iterfor epoch batches to reduce overhead (@tmattio) - Update BERT and GPT-2 tokenizer cache to use
Nx.Cachefor consistent cache directory resolution (#134, @Arsalaan-Alam) - Honor text dataset encodings via incremental Uutf decoding (#122, @Satarupa22-SD).
- Preserve empty sequential modules when unflattening so indices stay aligned for checkpoint round-tripping (@tmattio)
- Prevent
Training.fit/evaluatefrom consuming entire datasets eagerly and fail fast when a dataset yields no batches, avoiding hangs and division-by-zero crashes (@tmattio) - Allow metric history to tolerate metrics that appear or disappear between epochs so dynamic metric sets no longer raise during training (@tmattio)
- Make
Optimizer.clip_by_global_normrobust to zero gradients and empty parameter trees to avoid NaNs during training (@tmattio) - Split CSV loader into
from_csvandfrom_csv_with_labelsto retain labels when requested (#114, @Satarupa22-SD) - Implement AUC-ROC and AUC-PR in Kaun metrics and simplify their signatures (#124, #131, @Shocker444)
- Add mean absolute percentage error, explained variance, R² (with optional adjustment), KL-divergence, and top-k accuracy to Kaun metrics (@tmattio)
- Add NDCG, MAP, and MRR ranking metrics to Kaun metrics (@tmattio)
- Add BLEU, ROUGE, and METEOR metrics to Kaun for pre-tokenized sequences, removing tokenizer dependencies (@tmattio)
- Add SSIM, IoU, and Dice metrics for vision workloads in Kaun (@tmattio)
Talon
- Remove automatic sentinel-based null detection for numeric columns; explicit masks (via [_opt] constructors) now define missing data semantics (@tmattio)
- Replace join nested loops with hashed join indices, cutting lookup from O(n·m) to near O(n) (@tmattio)
- Reuse a shared Nx-based column reindexer so filter/sample paths avoid repeated array copies (@tmattio)
- Fix
fillnato honor column null masks and replacements, restoring expected nullable semantics (@tmattio) - Preserve null masks when reindexing during joins so sentinel values remain valid data (@tmattio)
- Handle numeric index columns in
pivot, preventing distinct keys from collapsing into a single bucket (@tmattio) - Respect null masks when serializing numeric columns to JSON, emitting JSON
nullinstead of sentinel values (@tmattio) - Detect big integers as int64 in Talon CSV loader (#121, @Arsalaan-Alam)
- Allow forcing column types in Talon JSON loader (#104, @nirnayroy)
Saga
- Remove legacy
Normalizers.nmtandNormalizers.precompiledconstructors (and their JSON serializers) so the public surface only advertises supported normalizers (@tmattio) - Tighten template processor JSON parsing: require integer type ids, drop the legacy special-token list format, and ensure multi-id special tokens round-trip with the new record fields (@tmattio)
- Make tokenizer JSON loading tolerant of HuggingFace quirks (missing
model.type, string-encoded merges), restoring compatibility with upstreamtokenizer.jsonfiles (@tmattio) - Cache byte-level encode/decode lookup tables to avoid rebuilding them during tokenization, trimming avoidable allocations (@tmattio)
- Skip BPE dropout sampling when dropout is disabled, removing redundant RNG work on common hot paths (@tmattio)
- Fix Unigram tokenization so longest matches are emitted without aborting the sequence when a vocab hit occurs (@tmattio)
- Recompute pad token ids when the pad special string changes, preventing padding with stale ids (@tmattio)
- Fix Unigram
token_to_id/id_to_tokenvocabulary lookups (#117, @RidwanAdebosin) - Optimize
Pre_tokenizers.whitespaceto reduce allocations and improve tokenization performance (@tmattio) - Simplify tokenizers interface (@tmattio)
Sowilo
- Add
resize(nearest & bilinear) that works for 2D, batched, and NHWC tensors (@tmattio) - Update grayscale conversion and RGB/BGR channel swaps to run entirely on Rune ops, keeping batched inputs compatible with JIT backends (@tmattio)
- Make
median_blurcompute the true median so salt-and-pepper noise is removed as expected (@tmattio) - Fix
erode/dilateso custom structuring elements (e.g. cross vs. square) and batched tensors produce the correct morphology result (@tmattio)
Fehu
- Added snapshot-based save/load for DQN and REINFORCE agents (#127, @RidwanAdebosin, @tmattio)
- Added typed
Renderpayloads with enforcedrender_modeselection inEnv.create, auto human-mode rendering, and vectorizedEnv.renderaccessors so environments consistently expose frames for downstream tooling (@tmattio) - Introduced the
Fehu_visualizelibrary with ffmpeg/gif/W&B sinks, overlay combinators, rollout/evaluation recorders, and video wrappers for single and vectorized environments, providing a cohesive visualization stack for Fehu (@tmattio) - Added a
Fehu.Policyhelper module (random/deterministic/greedy) and sinkwith_*guards so visualization sinks handle directory creation and cleanup automatically (@tmattio) - Added
Buffer.Replay.sample_tensorsto streamline batched training loops and exploration handling (@tmattio) - Reworked
Fehu_algorithms.Dqnaroundinit/step/trainprimitives with functional state, warmup control, and snapshotting helpers (@tmattio) - Rebuilt
Fehu_algorithms.Reinforceon the sameinit/step/traininterface with optional baselines, tensor-based rollouts, snapshot save/load, and updated tests/examples/docs using the new workflow (@tmattio) - Upgraded the GridWorld environment to return ANSI and RGB-array frames using the new render types, and updated the DQN example to optionally record pre- and post-training rollouts via
FEHU_DQN_RECORD_DIRusingFehu_visualizesinks (@tmattio) - Reworked space sampling to return
(value, next_rng)and split keys internally, fixing correlated draws in Box/Multi-discrete/Tuple/Dict/Sequence/Text samplers while addingSpace.boundary_valuesfor deterministic compatibility checks (@tmattio) - Extended vectorized environments to reuse space boundary probes and now store structured
final_observationpayloads inInfo, improving downstream consumption (@tmattio) - Added
Buffer.Replay.add_manyandBuffer.Replay.sample_arrays, preserved backing storage onclear, and exposed struct-of-arrays batches for vectorised learners (@tmattio) - Tightened
Env.creatediagnostics with contextual error messages and an optional~validate_transitionhook for custom invariants (@tmattio) - Enriched
Wrapperutilities withmap_info, Boxclip_action/clip_observation, and time-limit info reporting elapsed steps (@tmattio) - Upgraded
Infovalues to carry int/float/bool arrays with stable JSON round-tripping (handling NaN/∞) and sorted metadata serialization for deterministic diffs (@tmattio) - Improved training helpers: Welford-based normalization with optional unbiased variance, documented
done = terminated || truncated, and returnednanwhen explained variance is undefined (@tmattio) - Treat time-limit truncations as terminals when computing rollout advantages and expose the
truncatedflag in buffer steps (@tmattio) - Require callers of
Training.compute_gaeto pass final bootstrapping values and ensureTraining.evaluatefeeds the current observation to policies (@tmattio) - Allow
Space.Sequence.createto omitmax_length, keeping sequences unbounded above while preserving validation and sampling semantics (@tmattio) - Validate vectorized environments by round-tripping sample actions/observations across every instance, preventing incompatible spaces from slipping through (@tmattio)
- Finish clipped value loss support in Fehu.Training (#119, @nirnayroy)
Nx-datasets
- Migrate to
Nx.Cachefor cache directory resolution, enabling consistent behavior. (#133, @Arsalaan-Alam) - Fix cache directory resolution to respect
RAVEN_CACHE_ROOT(or fall back toXDG_CACHE_HOME/HOME), allowing custom cache locations. (#128, @Arsalaan-Alam) - Switch CIFAR-10 loader to the binary archive so parsing succeeds again (@tmattio)
- Add a CIFAR-10 example (@tmattio)
- Standardize dataset examples on
Logs(@tmattio) - Use
Logsfor dataset loader logging (#95, @Satarupa22-SD)