1Department of Biomedical Informatics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA; 2Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA; 3Center for Autoimmune Genomics and Etiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA; 4Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA; 5Division of Developmental Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA; 6Division of Human Genetics, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA; 7Division of Rheumatology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA; 8Division of Allergy & Immunology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA *Co-corresponding authors: Leah.Kottyan@cchmc.org; Matthew.Weirauch@cchmc.org
Pathogens can hide in our bodies for years after initial infection. For example, the chickenpox virus can cause shingles decades after infection. Likewise, certain pathogens may contribute to the development or severity of subsequent conditions that are not contagious, referred to as non-communicable diseases (NCDs). Our study analyzed a vast number of electronic medical records to discover potential pathogen-NCD connections. We identified 206 such links. Future research based on these findings could revolutionize healthcare by enabling the development of vaccines targeted against these pathogens. For example, approaches similar to the introduction of the human papillomavirus vaccine which has led to declining cervical cancer rates. The development of vaccines for the pathogens identified in this study could potentially enable dramatic reductions in the NCDs linked to these pathogens.
Many relationships between pathogens and human disease are well-established. However, only a small fraction involve diseases considered non-communicable (NCDs). In this study, we sought to leverage the vast amount of newly available electronic health record data to identify potentially novel pathogen-NCD associations and find additional evidence supporting known associations.
We leverage data from The UK Biobank and TriNetX to perform a systematic survey across 20 pathogens and 426 diseases, primarily NCDs. To this end, we assess the association between disease status and infection history proxies using a logistic regression-based statistical approach.
Our approach identifies 206 pathogen-disease pairs that replicate in both cohorts. We replicate many established relationships, including Helicobacter pylori, with several gastroenterological diseases and connections between Epstein-Barr virus and both multiple sclerosis and lupus. Overall, our approach identifies evidence of association for 15 pathogens and 96 distinct diseases, including a currently controversial link between human cytomegalovirus (CMV) and ulcerative colitis (UC). We validate the CMV-UC connection through two orthogonal analyses, revealing increased CMV gene expression in UC patients and enrichment for UC genetic risk signal near human genes that have altered expression upon CMV infection.
Collectively, these results form a foundation for future investigations into mechanistic roles played by pathogens in the processes underlying NCDs. All results are easily accessible on this website.
Lape, M., Schnell, D., Parameswaran, S. et al. A survey of pathogenic involvement in non-communicable human diseases. Commun Med 5, 242 (2025). https://doi.org/10.1038/s43856-025-00956-x
Have feedback on how we can make this resource better? Let us know.
Fetching and formatting data…
Nothing showing up here? Please let us know.