-
Notifications
You must be signed in to change notification settings - Fork 24
XML to PDF Converter
The pdf_generator is a command line utility designed to convert XML files into PDF articles.
- Python 3.x
- lxml
- python-docx
- LibreOffice
- Single-Language Support: Generates PDFs in a single language, ensuring consistency.
- Complex Tables: Supports tables with merged cells.
- Table Layout: Automatically decides whether a table should span the full page or a single column.
- Figures: Supports both remote and local figures.
- Figure Layout: Automatically decides whether a figure should span the full page or a single column.
- Two-Column Layout: Uses a default two-column layout for text.
- Section Styling: Automatically formats sections according to their hierarchy.
- Simple Citations: Formats citations in a clean and direct way.
-
Headers and Footers:
- First Page Header: Includes the journal name and article DOI.
- Subsequent Page Headers: Includes the journal name and short article title.
- Footers: Includes page numbering, issue details, and “cite as” (on the first page).
- Intermediate Formats: Supports generation of intermediate files in .docx format.
- Web Interface version
- Library version
- New document structures
- New pdf templates
To use the XML to PDF converter, you must have LibreOffice version 24.2 installed. You can download it directly from this link or visit the LibreOffice website.
Packtools can be installed using pip. The following sections provide step-by-step instructions for installation on both Linux and Windows systems.
Create a folder, enter it, create a virtual environment called .venv, activate it, and install packtools:
mkdir scielo-packtools
cd scielo-packtools
python3 -m venv .venv
source .venv/bin/activate
pip install packtools>=4.10.0Create a folder, enter it, create a virtual environment called .venv, and install packtools:
md scielo-packtools
cd scielo-packtools
python3 -m venv .venv
.venv\Scripts\activate
pip install packtools>=4.10.0Access the scielo-packtools folder:
cd scielo-packtoolsActivate the virtual environment:
env\Scripts\activateTo use the utility, you need to provide the path to the XML file and the desired output path for the PDF file. Optionally, you can provide a DOCX layout file for custom formatting. You can find a default layout file here. This file contains a set of predefined DOCX styles used to format the article content.
usage: pdf_generator [-h] -i PATH_TO_READ [-l LAYOUT] -o PATH_TO_WRITE
Convert XML file from SciELO format to PDF format.
optional arguments:
-h, --help show this help message and exit
-i PATH_TO_READ, --xml_scielo PATH_TO_READ
Path for reading the SciELO XML file.
-l LAYOUT, --layout LAYOUT
Path for reading the DOCX layout file.
-o PATH_TO_WRITE, --pdf PATH_TO_WRITE
Path for writing the PDF file.pdf_generator -i path/to/article.xml -o path/to/article.pdfIf you have a custom DOCX layout file, you can include it as follows:
pdf_generator -i path/to/article.xml -l path/to/layout.docx -o path/to/article.pdfOutput:
Documento intermediário salvo em path/to/article.docx
convert /home/user/article.docx as a Writer document -> /home/user/article.pdf using filter : writer_pdf_Export
Figure 1. XML file used as input.
Figure 2. PDF file generated using the pdf_generator utility.