|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: UUIDv7 vs ULID |
| 4 | +excerpt: > |
| 5 | + Both UUIDv7 and ULID solve the same problem: time-sortable, globally unique identifiers. They share the same |
| 6 | + fundamental structure but differ in standardization, encoding, and ecosystem support. Here's how they compare and when |
| 7 | + to use each. |
| 8 | +--- |
| 9 | + |
| 10 | +If you've been building systems that need unique identifiers, you've probably run into the limitations of UUIDv4. Random |
| 11 | +UUIDs fragment B-tree indexes, cause excessive page splits, and destroy cache locality in databases. Two formats emerged |
| 12 | +to fix this by putting a timestamp first: ULID (2016) and UUIDv7 (2024, via RFC 9562). They're remarkably similar in |
| 13 | +structure but differ in important ways. |
| 14 | + |
| 15 | +## The Problem They Both Solve |
| 16 | + |
| 17 | +UUIDv4 generates fully random 128-bit values. When used as a primary key, each insert targets a random location in a |
| 18 | +B-tree index. Sequential IDs cause 10-20 page splits per million records; UUIDv4 causes 5,000-10,000+. Index pages |
| 19 | +average ~69% full instead of ~90%, wasting disk space and I/O. Buffer cache effectiveness drops because hot pages are |
| 20 | +spread across the entire index. |
| 21 | + |
| 22 | +Both UUIDv7 and ULID fix this by putting a millisecond-precision Unix timestamp in the most significant bits. New IDs |
| 23 | +append to the end of the index, just like auto-incrementing integers, while retaining 128-bit global uniqueness. |
| 24 | + |
| 25 | +## Structure |
| 26 | + |
| 27 | +Both formats are 128 bits. Both dedicate the leading 48 bits to a Unix epoch millisecond timestamp. The difference is |
| 28 | +in how they use the remaining 80 bits. |
| 29 | + |
| 30 | +### ULID |
| 31 | + |
| 32 | +``` |
| 33 | + 01AN4Z07BY 79KA1307SR9X4MV3 |
| 34 | +|----------| |----------------| |
| 35 | + Timestamp Randomness |
| 36 | + 48 bits 80 bits |
| 37 | +``` |
| 38 | + |
| 39 | +All 80 remaining bits are available for randomness. No bits are reserved for format metadata. |
| 40 | + |
| 41 | +### UUIDv7 |
| 42 | + |
| 43 | +``` |
| 44 | + 0 1 2 3 |
| 45 | + 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 |
| 46 | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 47 | +| unix_ts_ms | |
| 48 | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 49 | +| unix_ts_ms | ver | rand_a | |
| 50 | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 51 | +|var| rand_b | |
| 52 | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 53 | +| rand_b | |
| 54 | ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| 55 | +``` |
| 56 | + |
| 57 | +Six bits are consumed by the version (`ver`, 4 bits, set to `0111`) and variant (`var`, 2 bits, set to `10`) fields. |
| 58 | +That leaves 74 bits for randomness: 12 in `rand_a` and 62 in `rand_b`. Those 6 bits are the cost of UUID format |
| 59 | +compliance. |
| 60 | + |
| 61 | +## Encoding |
| 62 | + |
| 63 | +This is where the two formats diverge most visibly. |
| 64 | + |
| 65 | +**ULID** uses Crockford's Base32 encoding (5 bits per character), producing a 26-character string: |
| 66 | + |
| 67 | +``` |
| 68 | +01ARZ3NDEKTSV4RRFFQ69G5FAV |
| 69 | +``` |
| 70 | + |
| 71 | +The alphabet (`0123456789ABCDEFGHJKMNPQRSTVWXYZ`) omits I, L, O, and U to avoid visual ambiguity. It's case |
| 72 | +insensitive, contains no hyphens or special characters, and is URL-safe. |
| 73 | + |
| 74 | +**UUIDv7** uses the standard UUID hex-with-dashes encoding, producing a 36-character string: |
| 75 | + |
| 76 | +``` |
| 77 | +01932c08-e690-7584-b945-253de779b977 |
| 78 | +``` |
| 79 | + |
| 80 | +The `7` after the second dash is the version nibble. This format is instantly recognizable as a UUID by any system that |
| 81 | +handles UUIDs. |
| 82 | + |
| 83 | +## Monotonicity |
| 84 | + |
| 85 | +Both specs address what happens when multiple IDs are generated within the same millisecond. |
| 86 | + |
| 87 | +**ULID** increments the random component by 1 in the least significant bit position. If the 80-bit random space |
| 88 | +overflows within a single millisecond, generation fails. In practice, 2^80 IDs per millisecond is not a realistic |
| 89 | +concern. |
| 90 | + |
| 91 | +**UUIDv7** offers three methods (RFC 9562, Section 6.2): |
| 92 | + |
| 93 | +1. **Fixed-length counter** in `rand_a` or the leading bits of `rand_b` |
| 94 | +2. **Monotonic random**: treat the random bits as a counter, increment on each generation |
| 95 | +3. **Sub-millisecond precision**: use up to 12 bits of `rand_a` for sub-millisecond clock precision |
| 96 | + |
| 97 | +The approach is left to the implementation. PostgreSQL 18's built-in `uuidv7()` uses method 3, storing sub-millisecond |
| 98 | +precision in `rand_a` to guarantee monotonicity within a single backend connection. |
| 99 | + |
| 100 | +## Standardization |
| 101 | + |
| 102 | +**UUIDv7** is defined in RFC 9562, published May 2024 by the IETF. It supersedes RFC 4122 and carries the weight of a |
| 103 | +formal internet standard. The version and variant bits make it self-describing: any system that understands UUIDs can |
| 104 | +parse a UUIDv7, detect its version, and extract the timestamp. |
| 105 | + |
| 106 | +**ULID** is a community specification hosted on GitHub ([ulid/spec](https://github.com/ulid/spec)). It has broad |
| 107 | +adoption and multiple implementations across languages but is not an IETF or ISO standard. There is no version/variant |
| 108 | +metadata in the format itself. |
| 109 | + |
| 110 | +## Ecosystem Support |
| 111 | + |
| 112 | +UUIDv7 benefits from the UUID ecosystem. Every language, database, and framework already has UUID support. The `uuid` |
| 113 | +column type in PostgreSQL, MySQL's `BINARY(16)`, and ORMs across every language all handle UUIDs natively. PostgreSQL 18 |
| 114 | +adds a built-in `uuidv7()` function. For earlier versions, the `pg_uuidv7` extension fills the gap. |
| 115 | + |
| 116 | +ULID requires dedicated libraries. Most languages have mature ULID implementations, but you'll need to add a dependency |
| 117 | +rather than relying on standard library support. Database storage typically means storing ULIDs as `CHAR(26)`, |
| 118 | +`BINARY(16)`, or converting to a UUID-compatible binary representation. |
| 119 | + |
| 120 | +## Comparison |
| 121 | + |
| 122 | +| | ULID | UUIDv7 | |
| 123 | +|---|---|---| |
| 124 | +| **Size** | 128 bits | 128 bits | |
| 125 | +| **String length** | 26 chars | 36 chars | |
| 126 | +| **Encoding** | Crockford Base32 | Hex with dashes | |
| 127 | +| **Timestamp** | 48-bit ms | 48-bit ms | |
| 128 | +| **Random bits** | 80 | 74 (6 used by ver/var) | |
| 129 | +| **Standardized** | Community spec | RFC 9562 | |
| 130 | +| **UUID-compatible** | No | Yes | |
| 131 | +| **Case sensitive** | No | No | |
| 132 | +| **URL safe** | Yes | Needs encoding for dashes | |
| 133 | +| **Native DB support** | Limited | PostgreSQL 18+, growing | |
| 134 | + |
| 135 | +## Which One Should You Use? |
| 136 | + |
| 137 | +**Use UUIDv7** if you're working in an ecosystem that already uses UUIDs, need database-native support, or want the |
| 138 | +backing of a formal standard. In most cases this is the right default. It slots into existing UUID columns, works with |
| 139 | +existing UUID libraries, and will only gain more native support over time. |
| 140 | + |
| 141 | +**Use ULID** if you need shorter, more human-readable identifiers, are already using ULIDs in your system, or are |
| 142 | +working in a context where the 10-character savings matters (URLs, logs, user-facing IDs). The format is also a |
| 143 | +reasonable choice when you want the full 80 bits of randomness per millisecond. |
| 144 | + |
| 145 | +**Either way**, both are a significant improvement over UUIDv4 for database primary keys. The timestamp prefix means |
| 146 | +sequential index inserts, better cache locality, and the ability to extract creation time directly from the ID. If |
| 147 | +you're still using UUIDv4 as a primary key, switching to either format is worth it. |
| 148 | + |
| 149 | +One thing to keep in mind: both formats embed a creation timestamp, which means they leak timing information. If that's |
| 150 | +a concern (security tokens, API keys, session IDs), UUIDv4 remains the right choice for those specific use cases. A |
| 151 | +common pattern is UUIDv7 for internal primary keys and UUIDv4 for externally-exposed identifiers. |
0 commit comments