-
Notifications
You must be signed in to change notification settings - Fork 179
Add rendered HTML content support via Accept header #195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Follow-up: I discovered Dataview/DataviewJS blocks were still empty unless the renderer had a real DOM host. The latest commit |
Implement DOM-based extraction of rendered Obsidian content with support for both plain text and structured JSON output formats. Features: - rendered-text content type: Plain text extraction from rendered DOM - rendered-json content type: Structured JSON with typed content blocks - Cache system for fast repeated access - Full Dataview/DataviewJS rendering support - Settings UI for cache configuration - Automatic cache invalidation on file changes Content Types: - application/vnd.olrapi.note+rendered-text (plain text) - application/vnd.olrapi.note+rendered-json (structured JSON) JSON Structure: - metadata: source path, render timestamp, format version - frontmatter: complete YAML frontmatter as object - content: array of typed blocks (heading, table, list, paragraph, code, callout) Table blocks include headers and rows as 2D arrays, making them easy to parse and analyze programmatically. Implementation: - RenderCacheManager: handles rendering and caching - StructuredExtractor: converts DOM to typed JSON blocks - Cache stored in .obsidian/render-cache/ with hash-based keys - 2-second content settlement wait for async plugin rendering - Restores original user view after extraction Benefits for AI Clients: - Tables as parseable 2D arrays - Document structure preserved with type tags - Frontmatter metadata accessible - Content queryable by type - No parsing ambiguity Testing: - Verified with complex Dataview dashboards - Tables extract correctly with headers and rows - All frontmatter fields preserved - Bundle size: 2.4mb (optimized, no PDF dependencies) 🤖 Generated with Claude Code https://claude.com/claude-code Co-Authored-By: Claude <[email protected]>
Document the new rendered-text and rendered-json content types, including usage examples, JSON schema, caching system, and troubleshooting guide. - Updated README.md with rendered content overview - Added comprehensive RENDERED_CONTENT.md guide - Includes API reference, examples, and use cases - Documents both text and JSON output formats 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]>
coddingtonbear
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the patch, @papertray3 — there are some great ideas here! I’m definitely open to supporting a feature that returns rendered HTML when the user supplies an appropriate Accept header (likely text/html, as I noted in a few comments).
I did notice that this PR bundles together a few larger changes that aren’t directly related to that HTML-rendering behavior. To keep things easy to review and to get your work merged more quickly, would you mind splitting this into smaller, focused PRs, each introducing a single change or feature? When you do, a quick update to the docs in /docs would also help people understand how to use the new functionality.
Once you’ve had a chance to break things out, feel free to ping me — I really appreciate the contribution, and I think the rendered-HTML feature in particular will be super useful to a lot of users!
|
|
||
| This was inspired by [Vinzent03](https://github.com/Vinzent03)'s [advanced-uri plugin](https://github.com/Vinzent03/obsidian-advanced-uri) with hopes of expanding the automation options beyond the limitations of custom URL schemes. | ||
|
|
||
| ## Rendered Content Support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have public docs in the /docs/ folder. Instead of documenting this in the readme, that should probably be in the docs themselves if you could.
| @@ -0,0 +1,540 @@ | |||
| # Rendered Content API | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm betting this is an artifact created when you asked an LLM to help you code this and I bet you didn't intend to include this in the PR.
| olrapiNoteJson = "application/vnd.olrapi.note+json", | ||
| olrapiNoteHtml = "application/vnd.olrapi.note+html", | ||
| olrapiRenderedText = "application/vnd.olrapi.note+rendered-text", | ||
| olrapiRenderedJson = "application/vnd.olrapi.note+rendered-json", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me, but this PR is already extremely long -- could I trouble you to break this apart into individual PRs for each of these types of content? Only one of these content types appears to be discussed in this PR; so I think sticking to rendered HTML in this one is probably the right choice, and I want to say: I might need some convincing before I'll be up for taking on a patch for the other two rendering options.
Second thing: you don't need to invent a new content type for things like html/text/json -- those already exist:
text/htmltext/plainapplication/json
| ); | ||
|
|
||
| // Cache statistics | ||
| if (this.plugin.renderCacheManager) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a particular performance problem you found that inspired you to want to cache renders? It seems like kind of a lot of complication to add unless there are severe performance problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for retrieving rendered HTML and structured content from Obsidian notes via HTTP content negotiation. The implementation introduces three new Accept header content types: HTML rendering for notes, structured JSON extraction for AI/LLM integration, and plain text extraction. A comprehensive caching system stores rendered content to improve performance on subsequent requests.
Key Changes
- New rendering capabilities: HTML rendering via
MarkdownRenderer.render(), structured JSON extraction from DOM, and plain text extraction - Caching infrastructure: MD5-based content-addressable cache with automatic invalidation on file modifications
- Settings UI: Added configuration options for cache directory, size limits, auto-cleanup, and render timeout
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 19 comments.
Show a summary per file
| File | Description |
|---|---|
src/structuredExtractor.ts |
New file implementing DOM-to-JSON structured content extraction with support for headings, tables, lists, code blocks, callouts, and paragraphs |
src/renderCacheManager.ts |
New caching manager with file-based cache storage, MD5 content hashing, and automatic invalidation via vault event listeners |
src/requestHandler.ts |
Added request handlers for three new content types (HTML, rendered JSON, rendered text) with proper error handling and content negotiation |
src/main.ts |
Integrated render cache manager initialization, lifecycle management, and settings UI with cache statistics and manual clear functionality |
src/types.ts |
Added cache-related settings: directory path, max size, auto-cleanup flag, and render timeout |
src/constants.ts |
Added three new content type constants and default cache settings |
package.json |
Added @types/pdf-parse and pdfjs-dist dependencies (unused) |
RENDERED_CONTENT.md |
Comprehensive documentation covering API usage, JSON schema, caching system, performance characteristics, and use cases |
README.md |
Added rendered content feature overview with examples and usage instructions |
Comments suppressed due to low confidence (2)
src/renderCacheManager.ts:5
- Unused import pdfParse.
import pdfParse from "pdf-parse";
src/structuredExtractor.ts:66
- Unused variable currentTable.
let currentTable: TableBlock | null = null;
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| contentEl: HTMLElement | ||
| ): StructuredContent { | ||
| const content: ContentBlock[] = []; | ||
| let currentTable: TableBlock | null = null; |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable currentTable is declared but never used in this method. Consider removing it or implementing the intended table-handling logic.
| let currentTable: TableBlock | null = null; |
| this.settings.cacheDirectory | ||
| ); | ||
| for (const dir of files.folders) { | ||
| if (!dir.endsWith("CacheIndex.json")) { |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for clearing cache is incorrect. This checks if folder names end with "CacheIndex.json" (which they won't, since folders are directories not files). It should check if dir === this.cacheIndexPath or use path.basename(dir) !== "CacheIndex.json" to avoid deleting the index file.
| if (!dir.endsWith("CacheIndex.json")) { | |
| if (path.basename(dir) !== "CacheIndex.json") { |
| const content = await this.app.vault.cachedRead(file); | ||
| const fileHash = this.getFileHash(content); | ||
|
|
||
| if (this.cacheIndex[fileHash]) { |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache invalidation reads the current file content to compute its hash, but if the file was deleted, this will fail. Consider catching errors from cachedRead for the delete event or using the file hash from the cache index based on file path instead.
| const content = await this.app.vault.cachedRead(file); | |
| const fileHash = this.getFileHash(content); | |
| if (this.cacheIndex[fileHash]) { | |
| let fileHash: string | undefined; | |
| try { | |
| const content = await this.app.vault.cachedRead(file); | |
| fileHash = this.getFileHash(content); | |
| } catch (error) { | |
| // File may have been deleted; try to find hash from cache index | |
| for (const [hash, metadata] of Object.entries(this.cacheIndex)) { | |
| if (metadata.sourcePath === file.path) { | |
| fileHash = hash; | |
| break; | |
| } | |
| } | |
| } | |
| if (fileHash && this.cacheIndex[fileHash]) { |
| console.error("Error rendering to text:", error); | ||
| this.returnCannedResponse(res, { | ||
| statusCode: 500, | ||
| message: `Failed to render content: ${error.message}`, |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error object should be properly typed or checked before accessing its message property. TypeScript's error in catch blocks is of type unknown. Consider using error instanceof Error ? error.message : String(error) for safer error message extraction.
| message: `Failed to render content: ${error.message}`, | |
| message: `Failed to render content: ${error instanceof Error ? error.message : String(error)}`, |
| private async extractTextFromDOM(file: TFile): Promise<string> { | ||
| // Store current active file to restore later | ||
| const originalFile = this.app.workspace.getActiveFile(); | ||
| const activeLeaf = this.app.workspace.getLeaf(false); | ||
| const previousMode = (activeLeaf.view as any).getMode?.(); | ||
|
|
||
| try { | ||
| // Open the file | ||
| await activeLeaf.openFile(file); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Opening files in the active leaf to render them can cause unexpected user experience issues if rendering fails or times out, as the original file might not be restored. Consider using a hidden/detached leaf or container for rendering to avoid disrupting the user's workspace.
| const pdfPath = this.getCachePath(fileHash, "pdf"); | ||
| const textPath = this.getCachePath(fileHash, "txt"); | ||
|
|
||
| try { | ||
| await this.app.vault.adapter.stat(pdfPath); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache validation checks for the existence of both PDF and text files, but the code only generates text files (see cacheTextOnly method). This will cause cache misses even when text cache exists. Either remove the PDF check or implement PDF caching.
| const pdfPath = this.getCachePath(fileHash, "pdf"); | |
| const textPath = this.getCachePath(fileHash, "txt"); | |
| try { | |
| await this.app.vault.adapter.stat(pdfPath); | |
| const textPath = this.getCachePath(fileHash, "txt"); | |
| try { |
| } | ||
|
|
||
| // Wait for content to settle (Dataview, etc.) | ||
| await new Promise(resolve => setTimeout(resolve, 2000)); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout is hardcoded to 2000ms but should use this.settings.renderTimeoutMs to respect the user's configuration. Apply the same fix to the renderToJson method (line 189).
| await new Promise(resolve => setTimeout(resolve, 2000)); | |
| await new Promise(resolve => setTimeout(resolve, this.settings.renderTimeoutMs)); |
| await view.setState({ mode: "preview" }, {}); | ||
| } | ||
|
|
||
| await new Promise(resolve => setTimeout(resolve, 2000)); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout is hardcoded to 2000ms but should use this.settings.renderTimeoutMs to respect the user's configuration.
| if (req.headers.accept === ContentTypes.olrapiNoteHtml) { | ||
| const file = this.app.vault.getAbstractFileByPath(path) as TFile; | ||
| if (file && mimeType === ContentTypes.markdown) { | ||
| const markdown = await this.app.vault.cachedRead(file); | ||
| const html = await this.renderMarkdownToHtml(markdown, path); | ||
| res.setHeader("Content-Type", ContentTypes.olrapiNoteHtml); | ||
| res.send(html); | ||
| return; | ||
| } else { | ||
| this.returnCannedResponse(res, { | ||
| statusCode: 400, | ||
| message: "Rendered HTML is only available for markdown files", | ||
| }); | ||
| return; | ||
| } | ||
| } |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original handler for ContentTypes.olrapiNoteJson was removed in this PR. This content type returns file metadata via getFileMetadataObject(). If this was intentional (replaced by olrapiRenderedJson), the constant should be removed from ContentTypes enum and the bodyParser middleware (line 1348). If unintentional, the original handler should be restored to maintain backward compatibility.
| el.querySelectorAll("li").forEach((li) => { | ||
| const text = li.textContent?.trim(); | ||
| if (text) items.push(text); | ||
| }); |
Copilot
AI
Nov 23, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list extraction using querySelectorAll("li") will capture all nested list items, which could lead to incorrect flattening of nested lists. Consider using direct children only (> li) or implementing recursive handling to preserve list structure.
Summary
This PR adds support for retrieving rendered HTML content from markdown files using HTTP content negotiation. Clients can now request fully-rendered HTML by specifying
Accept: application/vnd.olrapi.note+htmlin GET requests.Changes
ContentTypes.olrapiNoteHtmlconstant (application/vnd.olrapi.note+html)renderMarkdownToHtml()method using Obsidian's nativeMarkdownRenderer.render()API_vaultGet()to handle HTML content type requestsFeatures
The rendered HTML output includes:
data-hrefattributesAPI Usage
Backward Compatibility
This feature:
/vault/,/active/,/periodic/)Testing
Tested with:
/vault/{filename}and/active/endpointsUse Cases
This feature enables:
🤖 Generated with Claude Code