psychKG‑pilot

Extract structured construct–measured_by–justification triples from TEI‑encoded research papers using LLMs.

🚀 Aim

Built to extract construct–measurement–justification triples from psychology papers using LLMs with strong schema validation(medium.com, medium.com).

🧩 Scripts (in `src/`)

psychKG-IE-HuggingFace.py
Uses a local Hugging Face model via the Instructor + Pydantic pipeline.
psychKG-IE-OpenAI.py
Uses OpenAI GPT‑4 API with function‑calling and Pydantic validation. Outputs to data/IE_output/o3.
psychKG-IE-ChatAI.py
Connects to the KISSKI ChatAI API (via the GWDG/KISSKI HPC service) for various open-weights models including from the Qwen model family (e.g., Qwen 2.5‑72B), deepseek and GPT models. Outputs to data/IE_output/qwen2_5.

📥 Input

Raw TEI‑XML papers located in:

data/papers_input_tei_xml/

📤 Output

Extracted data saved as JSON to:

data/IE_output/
├── o3/       ← OpenAI‑ and ChatAI-based scripts output
└── qwen2_5/  ← ChatAI script output

Each JSON file contains a list of entries:

{
  "construct": "...",
  "measured_by": "...",
  "justification": "..."
}

⚙️ Requirements

Packages used:

transformers, instructor, pydantic, beautifulsoup4, openai
Access to KISSKI ChatAI endpoint (AcademicCloud / GWDG HPC)
GPU recommended for Hugging Face script

▶️ Usage

1. Hugging Face (local)

python src/psychKG-IE-HuggingFace.py \
  --input_dir data/papers_input_tei_xml \
  --output_dir data/IE_output/qwen2_5

2. OpenAI GPT‑4

python src/psychKG-IE-OpenAI.py \
  --input_dir data/papers_input_tei_xml \
  --output_dir data/IE_output/o3

3. ChatAI via KISSKI

Ensure you have API access to KISSKI ChatAI (see GWDG/KISSKI LLM‑Service) and appropriate credentials, then run:

python src/psychKG-IE-ChatAI.py

ℹ️ Notes

Qwen output also comes via the KISSKI ChatAI endpoint using Qwen 2.5‑72B weights hosted by the service.
KISSKI ChatAI API is a secure, OpenAI-compatible endpoint (supports GPT‑4 and open models) and adheres to data privacy rules (kisski.gwdg.de, dfn.de).

🔖 Citation

If you use this repository in your work, please cite:

D'Souza, J., & Wulff, D. (2025). psychKG-pilot: A Minimal Knowledge Graph for Psychology via LLM-based Structured Extraction (Version 0.1.0) [Computer software]. TIB & MPIB. https://github.com/sciknoworg/psychKG-pilot

Or use the CITATION.cff file for automatic citation formats.

BibTeX:

@software{dsouza2025psychkg,
  author       = {D'Souza, Jennifer and Wulff, Dirk},
  title        = {psychKG-pilot: A Minimal Knowledge Graph for Psychology via LLM-based Structured Extraction},
  year         = 2025,
  version      = {0.1.0},
  publisher    = {TIB & MPIB},
  url          = {https://github.com/sciknoworg/psychKG-pilot}
}

📬 License & Contact

This project is licensed under the MIT License.

If you have questions, feedback, or ideas to improve the project, feel free to open an issue or get in touch with us — we'd love to hear from you!

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
src		src
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

psychKG‑pilot

🚀 Aim

🧩 Scripts (in `src/`)

📥 Input

📤 Output

⚙️ Requirements

▶️ Usage

1. Hugging Face (local)

2. OpenAI GPT‑4

3. ChatAI via KISSKI

ℹ️ Notes

🔖 Citation

📬 License & Contact

About

Uh oh!

Releases

Packages

Languages

License

sciknoworg/psychKG-pilot

Folders and files

Latest commit

History

Repository files navigation

psychKG‑pilot

🚀 Aim

🧩 Scripts (in src/)

📥 Input

📤 Output

⚙️ Requirements

▶️ Usage

1. Hugging Face (local)

2. OpenAI GPT‑4

3. ChatAI via KISSKI

ℹ️ Notes

🔖 Citation

📬 License & Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

🧩 Scripts (in `src/`)

Packages