Skip to content

Commit d5df442

Browse files
authored
Merge pull request #25 from DerekStride/claude/uuidv7-ulid-comparison-7iG1v
2 parents 40768df + 405b316 commit d5df442

1 file changed

Lines changed: 151 additions & 0 deletions

File tree

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
layout: post
3+
title: UUIDv7 vs ULID
4+
excerpt: >
5+
Both UUIDv7 and ULID solve the same problem: time-sortable, globally unique identifiers. They share the same
6+
fundamental structure but differ in standardization, encoding, and ecosystem support. Here's how they compare and when
7+
to use each.
8+
---
9+
10+
If you've been building systems that need unique identifiers, you've probably run into the limitations of UUIDv4. Random
11+
UUIDs fragment B-tree indexes, cause excessive page splits, and destroy cache locality in databases. Two formats emerged
12+
to fix this by putting a timestamp first: ULID (2016) and UUIDv7 (2024, via RFC 9562). They're remarkably similar in
13+
structure but differ in important ways.
14+
15+
## The Problem They Both Solve
16+
17+
UUIDv4 generates fully random 128-bit values. When used as a primary key, each insert targets a random location in a
18+
B-tree index. Sequential IDs cause 10-20 page splits per million records; UUIDv4 causes 5,000-10,000+. Index pages
19+
average ~69% full instead of ~90%, wasting disk space and I/O. Buffer cache effectiveness drops because hot pages are
20+
spread across the entire index.
21+
22+
Both UUIDv7 and ULID fix this by putting a millisecond-precision Unix timestamp in the most significant bits. New IDs
23+
append to the end of the index, just like auto-incrementing integers, while retaining 128-bit global uniqueness.
24+
25+
## Structure
26+
27+
Both formats are 128 bits. Both dedicate the leading 48 bits to a Unix epoch millisecond timestamp. The difference is
28+
in how they use the remaining 80 bits.
29+
30+
### ULID
31+
32+
```
33+
01AN4Z07BY 79KA1307SR9X4MV3
34+
|----------| |----------------|
35+
Timestamp Randomness
36+
48 bits 80 bits
37+
```
38+
39+
All 80 remaining bits are available for randomness. No bits are reserved for format metadata.
40+
41+
### UUIDv7
42+
43+
```
44+
0 1 2 3
45+
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
46+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
47+
| unix_ts_ms |
48+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
49+
| unix_ts_ms | ver | rand_a |
50+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
51+
|var| rand_b |
52+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
53+
| rand_b |
54+
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
55+
```
56+
57+
Six bits are consumed by the version (`ver`, 4 bits, set to `0111`) and variant (`var`, 2 bits, set to `10`) fields.
58+
That leaves 74 bits for randomness: 12 in `rand_a` and 62 in `rand_b`. Those 6 bits are the cost of UUID format
59+
compliance.
60+
61+
## Encoding
62+
63+
This is where the two formats diverge most visibly.
64+
65+
**ULID** uses Crockford's Base32 encoding (5 bits per character), producing a 26-character string:
66+
67+
```
68+
01ARZ3NDEKTSV4RRFFQ69G5FAV
69+
```
70+
71+
The alphabet (`0123456789ABCDEFGHJKMNPQRSTVWXYZ`) omits I, L, O, and U to avoid visual ambiguity. It's case
72+
insensitive, contains no hyphens or special characters, and is URL-safe.
73+
74+
**UUIDv7** uses the standard UUID hex-with-dashes encoding, producing a 36-character string:
75+
76+
```
77+
01932c08-e690-7584-b945-253de779b977
78+
```
79+
80+
The `7` after the second dash is the version nibble. This format is instantly recognizable as a UUID by any system that
81+
handles UUIDs.
82+
83+
## Monotonicity
84+
85+
Both specs address what happens when multiple IDs are generated within the same millisecond.
86+
87+
**ULID** increments the random component by 1 in the least significant bit position. If the 80-bit random space
88+
overflows within a single millisecond, generation fails. In practice, 2^80 IDs per millisecond is not a realistic
89+
concern.
90+
91+
**UUIDv7** offers three methods (RFC 9562, Section 6.2):
92+
93+
1. **Fixed-length counter** in `rand_a` or the leading bits of `rand_b`
94+
2. **Monotonic random**: treat the random bits as a counter, increment on each generation
95+
3. **Sub-millisecond precision**: use up to 12 bits of `rand_a` for sub-millisecond clock precision
96+
97+
The approach is left to the implementation. PostgreSQL 18's built-in `uuidv7()` uses method 3, storing sub-millisecond
98+
precision in `rand_a` to guarantee monotonicity within a single backend connection.
99+
100+
## Standardization
101+
102+
**UUIDv7** is defined in RFC 9562, published May 2024 by the IETF. It supersedes RFC 4122 and carries the weight of a
103+
formal internet standard. The version and variant bits make it self-describing: any system that understands UUIDs can
104+
parse a UUIDv7, detect its version, and extract the timestamp.
105+
106+
**ULID** is a community specification hosted on GitHub ([ulid/spec](https://github.com/ulid/spec)). It has broad
107+
adoption and multiple implementations across languages but is not an IETF or ISO standard. There is no version/variant
108+
metadata in the format itself.
109+
110+
## Ecosystem Support
111+
112+
UUIDv7 benefits from the UUID ecosystem. Every language, database, and framework already has UUID support. The `uuid`
113+
column type in PostgreSQL, MySQL's `BINARY(16)`, and ORMs across every language all handle UUIDs natively. PostgreSQL 18
114+
adds a built-in `uuidv7()` function. For earlier versions, the `pg_uuidv7` extension fills the gap.
115+
116+
ULID requires dedicated libraries. Most languages have mature ULID implementations, but you'll need to add a dependency
117+
rather than relying on standard library support. Database storage typically means storing ULIDs as `CHAR(26)`,
118+
`BINARY(16)`, or converting to a UUID-compatible binary representation.
119+
120+
## Comparison
121+
122+
| | ULID | UUIDv7 |
123+
|---|---|---|
124+
| **Size** | 128 bits | 128 bits |
125+
| **String length** | 26 chars | 36 chars |
126+
| **Encoding** | Crockford Base32 | Hex with dashes |
127+
| **Timestamp** | 48-bit ms | 48-bit ms |
128+
| **Random bits** | 80 | 74 (6 used by ver/var) |
129+
| **Standardized** | Community spec | RFC 9562 |
130+
| **UUID-compatible** | No | Yes |
131+
| **Case sensitive** | No | No |
132+
| **URL safe** | Yes | Needs encoding for dashes |
133+
| **Native DB support** | Limited | PostgreSQL 18+, growing |
134+
135+
## Which One Should You Use?
136+
137+
**Use UUIDv7** if you're working in an ecosystem that already uses UUIDs, need database-native support, or want the
138+
backing of a formal standard. In most cases this is the right default. It slots into existing UUID columns, works with
139+
existing UUID libraries, and will only gain more native support over time.
140+
141+
**Use ULID** if you need shorter, more human-readable identifiers, are already using ULIDs in your system, or are
142+
working in a context where the 10-character savings matters (URLs, logs, user-facing IDs). The format is also a
143+
reasonable choice when you want the full 80 bits of randomness per millisecond.
144+
145+
**Either way**, both are a significant improvement over UUIDv4 for database primary keys. The timestamp prefix means
146+
sequential index inserts, better cache locality, and the ability to extract creation time directly from the ID. If
147+
you're still using UUIDv4 as a primary key, switching to either format is worth it.
148+
149+
One thing to keep in mind: both formats embed a creation timestamp, which means they leak timing information. If that's
150+
a concern (security tokens, API keys, session IDs), UUIDv4 remains the right choice for those specific use cases. A
151+
common pattern is UUIDv7 for internal primary keys and UUIDv4 for externally-exposed identifiers.

0 commit comments

Comments
 (0)