I am trying to find a dataset (or a set of clinical notes) that were not utilized during Llama3.3 (specifically 70B model) training. Which of the below datasets were not utilized in Llama3.3 training? Can anyone help please?
- MIMIC-III (Medical Information Mart for Intensive Care III)
- i2b2 (Informatics for Integrating Biology & the Bedside)
- n2c2 (National NLP Clinical Challenges)
- SHARE (Stanford Health AI Research and Evaluation)
- PhysioNet
- CLEF eHealth
- TREC Medical Records Track
- OpenNotes
- eICU Collaborative Research Database
- PubMed Central (PMC) Open Access Subset
Thanks!