This repository contains a Python-based PDF parser tool that can process both searchable and non-searchable PDF files. The parser extracts title, headings, subheadings, and content from PDF files and incorporates the processed data into custom-designed HTML representation to preserve the structure and formatting of the original document. Additionally, the extracted data is stored in a CSV file for easy retrieval and analysis.
SurekhaSuresh/PDF-Parser
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|