A comprehensive, production-ready TypeScript/JavaScript framework for creating, reading, and manipulating Microsoft Word (.docx) documents programmatically.
- Create DOCX files from scratch
- Read and modify existing DOCX files
- Buffer-based operations (load/save from memory)
- Document properties (core, extended, custom)
- Memory management with dispose pattern
- Bookmark pair validation and auto-repair (
validateBookmarkPairs()) - App.xml metadata preservation (HeadingPairs, TotalTime, etc.)
- Document background color/theme support
- Character formatting: bold, italic, underline, strikethrough, subscript, superscript
- Font properties: family, size, color (RGB and theme colors), highlight
- Text effects: small caps, all caps, shadow, emboss, engrave
- Paragraph alignment, indentation, spacing, borders, shading
- Text search and replace with regex support
- Custom styles (paragraph, character, table)
- CJK/East Asian paragraph properties (kinsoku, wordWrap, overflowPunct, topLinePunct)
- Underline color and theme color attributes
- Theme font references (asciiTheme, hAnsiTheme, eastAsiaTheme, csTheme)
- Numbered lists (decimal, roman, alpha)
- Bulleted lists with various bullet styles
- Multi-level lists with custom numbering and restart control
- Tables with formatting, borders, shading
- Cell spanning (merge cells horizontally and vertically)
- Advanced table properties (margins, widths, alignment)
- Table navigation helpers (
getFirstParagraph(),getLastParagraph()) - Legacy horizontal merge (
hMerge) support - Table layout parsing (
fixed/auto) - Table style shading updates (modify styles.xml colors)
- Cell content management (trailing blank removal with structure preservation)
- Images (PNG, JPEG, GIF, SVG, EMF, WMF) with positioning, text wrapping, and full ECMA-376 DrawingML attribute coverage
- Headers & footers (different first page, odd/even pages)
- Hyperlinks (external URLs, internal bookmarks)
- Hyperlink defragmentation utility (fixes fragmented links from Google Docs)
- Bookmarks and cross-references
- Body-level bookmark support (bookmarks between block elements)
- Shapes and text boxes
- Track changes (revisions for insertions, deletions, formatting)
- Granular character-level tracked changes (text diff-based)
- Comments and annotations
- Compatibility mode detection and upgrade (Word 2003/2007/2010/2013+ modes)
- Table of contents generation with customizable heading levels and relative indentation
- Fields: merge fields, date/time, page numbers, TOC fields
- Footnotes and endnotes (full round-trip with save pipeline, parsing, and clear API)
- Content controls (Structured Document Tags)
- Form field data preservation (text input, checkbox, dropdown per ECMA-376 §17.16)
- w14 run effects passthrough (Word 2010+ ligatures, numForm, textOutline, etc.)
- Expanded document settings (evenAndOddHeaders, mirrorMargins, autoHyphenation, decimalSymbol)
- People.xml auto-registration for tracked changes authors
- Style default attribute preservation (
w:default="1") - Namespace order preservation in generated XML
- Multiple sections with different page layouts
- Page orientation, size, and margins
- Preserved element round-trip (math equations, alternate content, custom XML)
- Unified shading model with theme color support and inheritance resolution
- Lossless image optimization (PNG re-compression, BMP-to-PNG conversion)
- Run property change tracking (w:rPrChange) with direct API access
- Paragraph mark revision tracking (w:del/w:ins in w:pPr/w:rPr) for full tracked-changes fidelity
- Normal/NormalWeb style linking with preservation flags
- Complete XML generation and parsing (ReDoS-safe, position-based parser)
- 40+ unit conversion functions (twips, EMUs, points, pixels, inches, cm)
- Validation utilities and corruption detection
- Text diff utility for character-level comparisons
- webSettings.xml auto-generation
- Safe OOXML parsing helpers (zero-value handling, boolean parsing)
- Full TypeScript support with comprehensive type definitions
- Error handling utilities
- Logging infrastructure with multiple log levels
npm install docxmlaterimport { Document } from 'docxmlater';
// Create a new document
const doc = Document.create();
// Add a paragraph
const para = doc.createParagraph();
para.addText('Hello, World!', { bold: true, fontSize: 24 });
// Save to file
await doc.save('hello.docx');
// Don't forget to dispose
doc.dispose();import { Document } from 'docxmlater';
// Load existing document
const doc = await Document.load('input.docx');
// Find and replace text
doc.replaceText(/old text/g, 'new text');
// Add a new paragraph
const para = doc.createParagraph();
para.addText('Added paragraph', { italic: true });
// Save modifications
await doc.save('output.docx');
doc.dispose();import { Document } from 'docxmlater';
const doc = Document.create();
// Create a 3x4 table
const table = doc.createTable(3, 4);
// Set header row
const headerRow = table.getRow(0);
headerRow.getCell(0).addParagraph().addText('Column 1', { bold: true });
headerRow.getCell(1).addParagraph().addText('Column 2', { bold: true });
headerRow.getCell(2).addParagraph().addText('Column 3', { bold: true });
headerRow.getCell(3).addParagraph().addText('Column 4', { bold: true });
// Add data
table.getRow(1).getCell(0).addParagraph().addText('Data 1');
table.getRow(1).getCell(1).addParagraph().addText('Data 2');
// Apply borders
table.setBorders({
top: { style: 'single', size: 4, color: '000000' },
bottom: { style: 'single', size: 4, color: '000000' },
left: { style: 'single', size: 4, color: '000000' },
right: { style: 'single', size: 4, color: '000000' },
insideH: { style: 'single', size: 4, color: '000000' },
insideV: { style: 'single', size: 4, color: '000000' },
});
await doc.save('table.docx');
doc.dispose();import { Document } from 'docxmlater';
import { readFileSync } from 'fs';
const doc = Document.create();
// Load image from file
const imageBuffer = readFileSync('photo.jpg');
// Add image to document
const para = doc.createParagraph();
await para.addImage(imageBuffer, {
width: 400,
height: 300,
format: 'jpg',
});
await doc.save('with-image.docx');
doc.dispose();import { Document } from 'docxmlater';
const doc = await Document.load('document.docx');
// Get all hyperlinks
const hyperlinks = doc.getHyperlinks();
console.log(`Found ${hyperlinks.length} hyperlinks`);
// Update URLs in batch (30-50% faster than manual iteration)
doc.updateHyperlinkUrls('http://old-domain.com', 'https://new-domain.com');
// Fix fragmented hyperlinks from Google Docs
const mergedCount = doc.defragmentHyperlinks({
resetFormatting: true, // Fix corrupted fonts
});
console.log(`Merged ${mergedCount} fragmented hyperlinks`);
await doc.save('updated.docx');
doc.dispose();import { Document, Style } from 'docxmlater';
const doc = Document.create();
// Create custom paragraph style
const customStyle = new Style('CustomHeading', 'paragraph');
customStyle.setName('Custom Heading');
customStyle.setRunFormatting({
bold: true,
fontSize: 32,
color: '0070C0',
});
customStyle.setParagraphFormatting({
alignment: 'center',
spacingAfter: 240,
});
// Add style to document
doc.getStylesManager().addStyle(customStyle);
// Apply style to paragraph
const para = doc.createParagraph();
para.addText('Styled Heading');
para.applyStyle('CustomHeading');
await doc.save('styled.docx');
doc.dispose();import { Document, CompatibilityMode } from 'docxmlater';
const doc = await Document.load('legacy.docx');
// Check compatibility mode
console.log(`Mode: ${doc.getCompatibilityMode()}`); // e.g., 12 (Word 2007)
if (doc.isCompatibilityMode()) {
// Get detailed compatibility info
const info = doc.getCompatibilityInfo();
console.log(`Legacy flags: ${info.legacyFlags.length}`);
// Upgrade to Word 2013+ mode (equivalent to File > Info > Convert)
const report = doc.upgradeToModernFormat();
console.log(`Removed ${report.removedFlags.length} legacy flags`);
console.log(`Added ${report.addedSettings.length} modern settings`);
}
await doc.save('modern.docx');
doc.dispose();Creation & Loading:
Document.create(options?)- Create new documentDocument.load(filepath, options?)- Load from fileDocument.loadFromBuffer(buffer, options?)- Load from memory
Handling Tracked Changes:
By default, docXMLater accepts all tracked changes during document loading to prevent corruption:
// Default: Accepts all changes (recommended)
const doc = await Document.load('document.docx');
// Explicit control
const doc = await Document.load('document.docx', {
revisionHandling: 'accept' // Accept all changes (default)
// OR
revisionHandling: 'strip' // Remove all revision markup
// OR
revisionHandling: 'preserve' // Keep tracked changes (may cause corruption, but should not do so - report errors if found)
});Revision Handling Options:
'accept'(default): Removes revision markup, keeps inserted content, removes deleted content'strip': Removes all revision markup completely'preserve': Keeps tracked changes as-is (may cause Word "unreadable content" errors)
Why Accept By Default?
Documents with tracked changes can cause Word corruption errors during round-trip processing due to revision ID conflicts. Accepting changes automatically prevents this issue while preserving document content.
Content Management:
createParagraph()- Add paragraphcreateTable(rows, cols)- Add tablecreateSection()- Add sectiongetBodyElements()- Get all body content
Search & Replace:
findText(pattern)- Find text matchesreplaceText(pattern, replacement)- Replace textfindParagraphsByText(pattern)- Find paragraphs containing text/regexgetParagraphsByStyle(styleId)- Get paragraphs with specific stylegetRunsByFont(fontName)- Get runs using a specific fontgetRunsByColor(color)- Get runs with a specific color
Bulk Formatting:
setAllRunsFont(fontName)- Apply font to all textsetAllRunsSize(size)- Apply font size to all textsetAllRunsColor(color)- Apply color to all textgetFormattingReport()- Get document formatting statistics
Hyperlinks:
getHyperlinks()- Get all hyperlinksupdateHyperlinkUrls(oldUrl, newUrl)- Batch URL updatedefragmentHyperlinks(options?)- Fix fragmented linkscollectAllReferencedHyperlinkIds()- Comprehensive scan of all hyperlink relationship IDs (includes nested tables, headers/footers, footnotes/endnotes)
Statistics:
getWordCount()- Count wordsgetCharacterCount(includeSpaces?)- Count charactersestimateSize()- Estimate file size
Compatibility Mode:
getCompatibilityMode()- Get document's Word version mode (11/12/14/15)isCompatibilityMode()- Check if document targets a legacy Word versiongetCompatibilityInfo()- Get full parsed compat settingsupgradeToModernFormat()- Upgrade to Word 2013+ mode (removes legacy flags)
Footnotes & Endnotes:
createFootnote(paragraph, text)- Add footnotecreateEndnote(paragraph, text)- Add endnoteclearFootnotes()/clearEndnotes()- Remove all notesgetFootnoteManager()/getEndnoteManager()- Access note managers
Numbering:
restartNumbering(numId, level?, startValue?)- Restart list numbering (creates new instance with startOverride)cleanupUnusedNumbering()- Remove unused numbering definitions (scans body, headers, footers, footnotes, endnotes)consolidateNumbering(options?)- Merge duplicate abstract numbering definitionsvalidateNumberingReferences()- Fix orphaned numId references
Shading:
getComputedCellShading(table, row, col)- Resolve effective cell shading with inheritance
Document Sanitization:
flattenFieldCodes()- Strip INCLUDEPICTURE field markup, preserving embedded imagesstripOrphanRSIDs()- Remove orphan RSIDs from settings.xmlclearDirectSpacingForStyles(styleIds)- Remove direct spacing overrides from styled paragraphs
Image Optimization:
optimizeImages()- Lossless PNG re-compression and BMP-to-PNG conversion (zero dependencies)
Saving:
save(filepath)- Save to filetoBuffer()- Save to Bufferdispose()- Free resources (important!)
Content:
addText(text, formatting?)- Add text runaddRun(run)- Add custom runaddHyperlink(hyperlink)- Add hyperlinkaddImage(buffer, options)- Add image
Formatting:
setAlignment(alignment)- Left, center, right, justifysetIndentation(options)- First line, hanging, left, rightsetSpacing(options)- Line spacing, before/aftersetBorders(borders)- Paragraph borderssetShading(shading)- Background colorapplyStyle(styleId)- Apply paragraph style
Properties:
setKeepNext(value)- Keep with next paragraphsetKeepLines(value)- Keep lines togethersetPageBreakBefore(value)- Page break beforeclearSpacing()- Remove direct spacing (inherit from style)
Numbering:
setNumbering(numId, level)- Apply list numbering
Text:
setText(text)- Set run textgetText()- Get run text
Character Formatting:
setBold(value)- Bold textsetItalic(value)- Italic textsetUnderline(style?)- UnderlinesetStrikethrough(value)- StrikethroughsetFont(name)- Font familysetFontSize(size)- Font size in pointssetColor(color)- Text color (hex)setHighlight(color)- Highlight color
Advanced:
setSubscript(value)- SubscriptsetSuperscript(value)- SuperscriptsetSmallCaps(value)- Small capitalssetAllCaps(value)- All capitalsclearMatchingFormatting(styleFormatting)- Remove formatting matching a style (for inheritance)getPropertyChangeRevision()- Get run property change revision (w:rPrChange)setPropertyChangeRevision(propChange)- Set run property change revision
Structure:
addRow()- Add rowgetRow(index)- Get row by indexgetCell(row, col)- Get specific cell
Formatting:
setBorders(borders)- Table borderssetAlignment(alignment)- Table alignmentsetWidth(width)- Table widthsetLayout(layout)- Fixed or auto layout
Style:
applyStyle(styleId)- Apply table style
Content:
addParagraph()- Add paragraph to cellgetParagraphs()- Get all paragraphs
Formatting:
setBorders(borders)- Cell borderssetShading(color)- Cell backgroundsetVerticalAlignment(alignment)- Top, center, bottomsetWidth(width)- Cell width
Spanning:
setHorizontalMerge(mergeType)- Horizontal mergesetVerticalMerge(mergeType)- Vertical merge
Convenience Methods:
setTextAlignment(alignment)- Set alignment for all paragraphssetAllParagraphsStyle(styleId)- Apply style to all paragraphssetAllRunsFont(fontName)- Apply font to all runssetAllRunsSize(size)- Apply font size to all runssetAllRunsColor(color)- Apply color to all runs
Content Management:
removeTrailingBlankParagraphs(options?)- Remove trailing blank paragraphs from cellremoveParagraph(index)- Remove paragraph at index (updates nested content positions)addParagraphAt(index, paragraph)- Insert paragraph at index (updates nested content positions)
Table Style Shading:
updateTableStyleShading(oldColor, newColor)- Update shading colors in styles.xmlupdateTableStyleShadingBulk(settings)- Bulk update table style shadingremoveTrailingBlanksInTableCells(options?)- Remove trailing blanks from all table cells
Sorting:
sortRows(columnIndex, options?)- Sort table rows by column
Line Numbering:
setLineNumbering(options)- Enable line numberinggetLineNumbering()- Get line numbering settingsclearLineNumbering()- Disable line numbering
Resolution:
resolve()- Mark comment as resolvedunresolve()- Mark comment as unresolvedisResolved()- Check if comment is resolved
Filtering:
getResolvedComments()- Get all resolved commentsgetUnresolvedComments()- Get all unresolved comments
Unit Conversions:
import { twipsToPoints, inchesToTwips, emusToPixels } from 'docxmlater';
const points = twipsToPoints(240); // 240 twips = 12 points
const twips = inchesToTwips(1); // 1 inch = 1440 twips
const pixels = emusToPixels(914400, 96); // 914400 EMUs = 96 pixels at 96 DPIValidation:
import { validateRunText, detectXmlInText, cleanXmlFromText } from 'docxmlater';
// Detect XML patterns in text
const result = validateRunText('Some <w:t>text</w:t>');
if (result.hasXml) {
console.warn(result.message);
const cleaned = cleanXmlFromText(result.text);
}Corruption Detection:
import { detectCorruptionInDocument } from 'docxmlater';
const doc = await Document.load('suspect.docx');
const report = detectCorruptionInDocument(doc);
if (report.isCorrupted) {
console.log(`Found ${report.locations.length} corruption issues`);
report.locations.forEach((loc) => {
console.log(`Line ${loc.lineNumber}: ${loc.issue}`);
console.log(`Suggested fix: ${loc.suggestedFix}`);
});
}Full TypeScript definitions included:
import {
Document,
Paragraph,
Run,
Table,
RunFormatting,
ParagraphFormatting,
DocumentProperties,
} from 'docxmlater';
// Type-safe formatting
const formatting: RunFormatting = {
bold: true,
fontSize: 12,
color: 'FF0000',
};
// Type-safe document properties
const properties: DocumentProperties = {
title: 'My Document',
author: 'John Doe',
created: new Date(),
};Current Version: 10.1.7
See CHANGELOG.md for detailed version history.
The framework includes comprehensive test coverage:
- 3,084 test cases across 143 test suites
- Tests cover all phases of implementation
- Integration tests for complex scenarios
- Performance benchmarks
- Edge case validation
Run tests:
npm test # Run all tests
npm run test:watch # Watch mode
npm run test:coverage # Coverage report- Use
dispose()to free resources after document operations - Buffer-based operations are faster than file I/O
- Batch hyperlink updates are 30-50% faster than manual iteration
- Large documents (1000+ pages) supported with memory management
- Streaming support for very large files
The framework follows a modular architecture:
src/
├── core/ # Document, Parser, Generator, Validator
├── elements/ # Paragraph, Run, Table, Image, etc.
├── formatting/ # Style, Numbering managers
├── managers/ # Drawing, Image, Relationship managers
├── constants/ # Compatibility mode constants, limits
├── types/ # Type definitions (compatibility, formatting, lists)
├── tracking/ # Change tracking context
├── validation/ # Revision validation rules
├── helpers/ # Cleanup utilities
├── xml/ # XML generation and parsing
├── zip/ # ZIP archive handling
└── utils/ # Validation, units, error handling
Key design principles:
- KISS (Keep It Simple, Stupid) - no over-engineering
- Position-based XML parsing (ReDoS-safe)
- Defensive programming with comprehensive validation
- Memory-efficient with explicit disposal pattern
- Full ECMA-376 (OpenXML) compliance
docXMLater includes multiple security measures to protect against common attack vectors:
The XML parser uses position-based parsing instead of regular expressions, preventing catastrophic backtracking attacks that can cause denial of service.
Size Limits:
- Default document size limit: 150 MB (configurable)
- Warning threshold: 50 MB
- XML content size validation before parsing
// Configure size limits
const doc = await Document.load('large.docx', {
sizeLimits: {
warningSizeMB: 100,
maxSizeMB: 500,
},
});Nesting Depth:
- Maximum XML nesting depth: 256 (configurable)
- Prevents stack overflow attacks
import { XMLParser } from 'docxmlater';
// Parse with custom depth limit
const obj = XMLParser.parseToObject(xml, {
maxNestingDepth: 512, // Increase if needed
});File paths within DOCX archives are validated to prevent directory traversal attacks:
- Blocks
../path sequences - Blocks absolute paths
- Validates URL-encoded path components
All text content is properly escaped using:
XMLBuilder.escapeXmlText()for element contentXMLBuilder.escapeXmlAttribute()for attribute values
This prevents injection of malicious XML elements through user-provided text content.
All text files are explicitly UTF-8 encoded per ECMA-376 specification, preventing encoding-related vulnerabilities.
- Node.js 18.0.0 or higher
- TypeScript 5.0+ (for development)
jszip- ZIP archive handling
MIT
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass
- Submit a pull request
- GitHub Issues: https://github.com/ItMeDiaTech/docXMLater/issues
Built with careful attention to the ECMA-376 Office Open XML specification. Special thanks to the OpenXML community for comprehensive documentation and examples.