Skip to main content
Publication

Uncertainty-Informed Deep Transfer Learning of Perfluoroalkyl and Polyfluoroalkyl Substance Toxicity

Authors

Feinstein, Jeremy; Sivaraman, Ganesh; Picel, Kurt; Peters, Brian; Vazquez-Mayagoitia, Alvaro; Ramanathan, Arvind; MacDonell, Margaret; Foster, Ian; Yan, Eugene

Abstract

Perfluoroalkyl and polyfluoroalkyl substances (PFAS) pose a significant hazard because oftheir widespread industrial uses, environmental persistence, and bioaccumulativity. A growing,increasingly diverse inventory of PFAS, including 8,163 chemicals, has recently been updatedby the U.S. Environmental Protection Agency. But, with the exception of a handful of wellstudied examples, little is known about their human toxicity potential because of thesubstantial resources required for in vivo toxicity experiments. We tackle the problem of expensive in vivo experiments by evaluating multiple machine learning (ML) methodsincluding random forests, deep neural networks (DNN), graph convolutional networks, andGaussian processes, for predicting acute toxicity (e.g., median lethal dose, or LD50) of PFAScompounds. To address the scarcity of toxicity information for PFAS, publicly availabledatasets of oral rat LD50 for all organic compounds are aggregated and used to develop stateof-the-art ML source models for transfer learning. 518 fluorinated compounds containing 2 ormore C-F bonds with known toxicity are used for knowledge transfer to ensembles of the bestperforming source model, DNN, to generate the target models for the PFAS domain withaccess to uncertainty. This study predicts toxicity for all 8,163 PFAS with a defined chemicalstructure. To further inform prediction confidence, the transfer-learned model is embeddedwithin a SelectiveNet architecture, where the model is allowed to identify regions of predictionwith greater confidence and abstain from those with high uncertainty using a calibrated cutoff rate.