Skip to main content
Publication

PlasmidHostFinder: Prediction of Plasmid Hosts Using Random Forest

Authors

Aytan-Aktug, Derya; Clausen, Philip; Szarvas, Judit ; Munk, Patrick; Otani, Saria ; Nguyen, Marcus; Davis, James; Lund, Ole; Aarestrup, Frank

Abstract

Plasmids play a major role facilitating the spread of antimicrobial resistancebetween bacteria. Understanding the host range and dissemination trajectories of plasmids is critical for surveillance and prevention of antimicrobial resistance. Identificationof plasmid host ranges could be improved using automated pattern detection methodscompared to homology-based methods due to the diversity and genetic plasticity ofplasmids. In this study, we developed a method for predicting the host range of plasmids using machine learningspecifically, random forests. We trained the models with8,519 plasmids from 359 different bacterial species per taxonomic level; the modelsachieved Matthews correlation coefficients of 0.662 and 0.867 at the species and orderlevels, respectively. Our results suggest that despite the diverse nature and genetic plasticity of plasmids, our random forest model can accurately distinguish between plasmidhosts. This tool is available online through the Center for Genomic Epidemiology