Visit the Pennsylvania State University Home Page

Center for Trustworthy Machine Learning

  • Home
    • Background
    • Our Research
  • Outreach & Education
  • People
    • Investigators
    • Industrial Advisory Board
    • Graduate Students
    • Undergraduate Students
  • Publications
  • Data Sets

Publications

Publications

2022

“Capturing failures of large language models via human cognitive biases”, In submission, Eric Jones and Jacob Steinhardt.

 

 

“Auditing Visualizations: Transparency methods struggle to detect anomalous behavior”, In submission, Jean-Stanislas Denain and Jacob Steinhardt.

 

 

“Generalized Resilience and Robust Statistics”, In submission, Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt.

 

 

“PixMix: dreamlike pictures comprehensively improve safety measures”, Dan Hendrycks, Andy Zou, Mantas Mazeika, Leonard Tang, Dawn Song and Jacob Steinhardt, the IEEE Computer Vision and Pattern Recognition Conference (CVPR), New Orleans, LA, June 19-24, 2022.

“Scaling out-of-distribution detection for real-world settings”, Dan Hendrycks, Steven Basart, Mantas Mazeika, Mohammadreza Mostajabi, Jacob Steinhardt, and Dawn Song, the International Conference on Machine Learning, Baltimore, MD, July 17-23, 2022.

“Predicting out-of-distruibution error with the projection norm”, Yaodong Yu, Zitong Yang, Alexander Wei, Yi Ma, and Jacob Steinhardt, the International Conference on Machine Learning, Baltimore, MD, July 17-23, 2022.

“Robust estimation via generalized quasi-gradients”, Banghua Zhu, Jiantao Jiao, and Jacob Steinhardt, Information and Inference, A Journal of the IMA, Volume 11, Issue 2, pages 581-636, June 2022.

“Incorporating Label Uncertainty in Understanding Adversarial Robustness”, Xiao Zhang and David Evans, the 10th International Conference on Learning Representations (ICLR), virtual conference, April 25-29, 2022.

“Extending the WILDS Benchmark for Unsupervised Adaptation”, Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Levine, Chelsea Finn, and Percy Liang, the 10th International Conference on Learning Representations (ICLR), virtual conference, April 25-29, 2022.

“Fine-tuning can distort pretrained features and underperform out-of-distribution”, Ananya Kumar, Aditi Raghunathan, Robbie Jones, Tengyu Ma, and Percy Liang, the 10th International Conference on Learning Representations (ICLR), virtual conference, April 25-29, 2022.

“The effects of reward misspecification: mapping and mitigating misaligned models”, Alexander Pan, Kush Bhatia, and Jacob Steinhardt, the 10th International Conference on Learning Representations (ICLR), virtual conference, April 25-29, 2022.

“Stealthy Backdoors as Compression Artifacts”, Yulong Tian, Fnu Suya, Fengyuan Xu and David Evans, the IEEE Transactions on Information Forensics and Security, Volume 17, March 16, 2022.

2021

 

 

“Consistent Non-Parametric Methods for Adaptive Robustness”, Robi Bhattacharjee and Kamalika Chaudhuri, the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), virtural conference, December 7-10, 2021.

“What would Jiminy Crickey Do? Towards Agents that Behave Morally”, Dan Hendrycks, Mantas Mazeika, Andy Zou, Sahil Patel, Christine Zhu, Jesus Navarro, Dawn Song, Bo Li, and Jacob Steinhardt, the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), virtural conference, December 7-10, 2021.

“On the Robustness of Domain Constraints“, Ryan Sheatsley, Blaine Hoak, Eric Pauley, Yohan Beugin, Michael Weisman, and Patrick McDaniel, the Association of Computing Machinery’s Conference on Computer and Communications Security (ACM CCS), virtual conference, November 14-19, 2021.

 

 

 

“The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Deneralization”, Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer, the International Conference on Computer Vision (ICCV), virtual conference, October 11-17, 2021.

“Data Poisoning Won’t Save You from Facial Recognition”, Evani Radiya-Dixit and Florian Tramer, the International Conference on Machine Learning (ICML) Workshop on Adversarial Machine Learning (AdvML), virtual conference, July 24, 2021.

“Sample Complexity of Adversarially Robust Linear Classification on Separated Data”, Robi Bhattacharjee, Somesh Jha, and Kamalika Chaudhuri, the International Conference on Machine Learning (ICML), virtual conference, July 18-24, 2021.

“Model-Targeted Poisoning Attacks with Provable Convergence”, Fnu Suya, Saeed Mahloujifar,Anshuman Suri,  David Evans, and Yuan Tian, the International Conference on Machine Learning (ICML), virtual conference, July 18-24, 2021.

“WILDS: A Benchmark of In-the-Wild Distribution Shifts”, Pang Wei Koh, Shiori Sagawa, Henrik Marklund, Sang Michael Xie, Marvin Zhang, Akshay Balsubramani, Weihua Hu, Michinhiro Yasunaga, Richard Lanas Phillips, Irena Gao, Tony Lee, Etinenne David, Ian Stavness, Wei Guo, Berton A. Earnshaw, Imran S. Haque, Sara Beery, Jure Leskovec, Anshul Kundaje, Emma Pierson, Sergey Levine, Chelsea Finn, and Percy Liang, the International Conference on Machine Learning (ICML), virtual conference, July 18-24, 2021.

 

“Limitations of post-hoc feature alignment for robustness”, Collin Burns and Jacob Steinhardt, the IEEE Computer Vision and Pattern Recognition (CVPR), virtual conference, June 19-25, 2021..

 

“Natural Adversarial Examples”, Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song, the IEEE Computer Vision and Pattern Recognition (CVPR), virtual conference, June 19-25, 2021.

 

“Measuring Massive Multitask Language Understanding”, Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt, the International Conference on Learning Representations (ICLR), virtual conference, May 4-7, 2021.

“Differentially Private Learning Needs Better Features (or Much More Data)”, Florian Tramèr and Dan Boneh, at the International Conference on Learning Representations (ICLR), virtual conference, May 4-7, 2021.

“Improved Estimation of Concentration Under l_p-Norm Distance Metrics Using Half Spaces”, Jack Prescott, Xiao Zhang, and David Evans, the International Conference on Learning Representations (ICLR), virtual conference, May 4-7, 2021.

“Selective Classification can Magnify Disparities Across Groups”, Erik Jones, Shlori Sagawa, Pang Wei Koh, Ananya Kumar, and Percy Liang, the International Conference on Learning Representations (ICLR), virtual conference, May 4-7, 2021.

“In-N-Out: Pre-training and Self-training using Auxiliary Information for Out-of-Distribution Robustness”, Sang Michael Xie, Ananya Kumar, Robbie Jones, Fereshte Khani, Tengyu Ma, and Percy Liang, the International Conference on Learning Representations (ICLR), virtual conference, May 4-7, 2021.

2020

“A Closer Look at Robustness vs. Accuracy”, Yao-Yuan Yang, Cyrus Rashtchian, Hongyang Zhang, Ruslan Salakhutdinov, and Kamalika Chaudhuri, the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual conference, December 6-12, 2020.

“Enabling Certification of Verification-Agnostic Networks via Memory-Efficicent Semideinite Programming”, Sumanth Dathathri, Krishnamurthy Dvijotham, Alex Kurakin, Asuttu Raghunathan, Jonathan Uesato, Rudy Bunel, Sherya Shankar, Jacob Steinhardt, Ian Goodfellow, Percy Liang, and Pushmeet Kohli, the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual conference, December 6-12, 2020.

“Finding Friends and Flipping Frenemies: Automatic Paraphrase Dataset Augmentation Using Graph Theory”,  Hannah Chen, Yangfeng Ji, and David Evans, the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Virtual Conference, November 16-20, 2020.

“On the Importance of Adaptive Data Collection for Extremely Imbalanaced Pairwise Tasks”,  Stephen Mussman, Robin Jia, and Percy Liang, the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), Virtual Conference, November 16-20, 2020.

“Preparing for the Age of Deepfakes and Disinformation“,  Dan Boneh, Andrew Grotto, Patrick McDaniel, and Nicolas Papernot, the Stanford University Human-Centered Artificial Intelligence, November, 2020.

“Robustness for Non-parametric Classification: A Generic Attack and Defense”,  Yao-Yuan Yang, Cyrus Rashtchian, Yizhen Wang, and Kamalika Chaudhuri, the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual Conference, August 26-28, 2020.

“Understanding the Intrinsic Robustness of Image Distributions using Conditional Generative Models”,   Xiao Zhang, Jinghui Chen, Quanquan Gu, and David Evans,  the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Virtual Conference, August 26-28, 2020.

“Exploring Connections Between Active Learning and Model Extraction“, Varun Chandrasekaran, Kamalika Chaudhuri, Irene Giacomelli, Somesh Jha, and Songbai Yan, the 29th USENIX Security Symposium, Boston, MA, August 12-14, 2020.

“Hybrid Batch Attacks: Finding Black-box Adversarial Examples with Limited Queries”,  Fnu Suya, Jainfeng Chi, David Evans, and Yuan Tian, the 29th USENIX Security Symposium, Boston, MA, August 12-14, 2020.

“When are Non-Parametric Methods Robust?”, Robi Bhattacharjee and Kamalika Chaudhuri, the Thirty-seventh International Conference on Machine Learning (ICML), Virtual Conference, July 12-18, 2020.

“Understanding and mitigating the tradeoff between robustness and accuracy”, Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, and Percy Liang, Thirty-seventh International Conference on Machine Learning (ICML), Virtual Conference, July 12-18, 2020.

“Robust encodings: a framework for combating adversarial typos”, Erik Jones, Robin Jia, Aditi Raghunathan, and Percy Liang, Association for Computaional Linguistics (ACL), Virtual Conference, July 5-8, 2020.

“Robustness for Non-parametric Methods: A Generic Attack and Defense”, Yaoyuan Yang, Cyrus Rashtchian, Yizhen Wang, and Kamalika Chaudhuri, 23rd International Conference on Artificial Intelligence and Statistics, Palerma, Italy, June 3-5, 2020.

“How Relevant is the Turing Test in the Age of Sophisbots?”, Dan Boneh, Andrew J. Gortto, Patrick McDaniel, and Nicolas Papernot,  IEEE Symposium on Security and Privacy (IEEE S&P 2020), Virtual Conference, May 18-20, 2020.

 

2019

“Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty”,  Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, and Dawn Song, Advances in Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, December 8-14, 2019.

“Adversarial Training and Robustness for Multiple Perturbations”, Florian Tramer and Dan Boneh,  Advances in Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, December 8-14, 2019.

“Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness”, Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, and David Evans,  Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, December 8-14, 2019.

“Unlabeled data improves adversarial robustness”, Yair Carmon, Aditi Raghunathan, Ludwig Schmidt, Percy Liang, and John C. Duchi,   Advances in Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada, December 8-14, 2019.

“Certified robustness to adversarial word substitutions”, Robin Jia, Aditi Raghunathan, Kerem Göksel, and Percy Liang. Empirical Methods in Natural Language Processing (EMNLP)“, Hong Kong, China, November 3-7, 2019.

“AdVersarial: Perceptual Ad Blocking meets Adversarial Machine Learning”, Florian Tramer, Pascal Dupre, Gili Rusak, Giancarlo Pellegrino, and Dan Boneh,  the Association of Computing Machinery’s Conference on Computer and Communications Security (ACM CCS), London, UK, November 11-15, 2019.

“Profile-based privacy for locally private computations”, Joseph Geumlek and Kamalika Chaudhuri, IEEE International Symposium on Information Theory (ISIT 2019), Paris, France, July 7-12, 2019.

“A Theory of Selective Prediction”, Mingda Qiao and Gregory Valiant, Conference on Learning Theory (COLT 2019), Phoenix, AZ, June 25-28, 2019.

“Towards Understanding Limitations of Pixel Discretization Against Adversarial Attacks”, Jiefeng Chen, Xi Wu, Vaibhav Rastogi, Yingyu Liang, and Somesh Jha,  IEEE European Symposium on Security and Privacy (EuroS&P 2019), Stockholm, Sweden, June 17-19, 2019.

“Using Pre-Training Can Improve Model Robustness and Uncertainty”, Dan Hendrycks, Kimin Lee and Mantas Mazeika, International Conference on Machine Learning (ICML 2019), Long Beach, CA, June 9-15, 2019.

“Adversarial Training Can Hurt Generalization”, Aditi Raghunathan, Sang Michael Xie, Fanny Yang, John C. Duchi, and Percy Liang, International Conference on Machine Learning (ICML 2019) Workshop on Identifying and Understanding Deep Learning Phenomena, Long Beach, CA, June 9-15, 2019.

“Slalom: Fast, Verifiable and Private Execution of Neural Networks in Trusted Hardware”, Florian Tramer and Dan Boneh, International Conference on Learning Representations (ICLR 2019), New Orleans, LA, May 6-9, 2019.

“Cost-Sensitive Robustness against Adversarial Examples”, Xiao Zhang and David Evans, International Conference on Learning Representations (ICLR 2019), New Orleans, LA, May 6-9, 2019.

“Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness”, Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, and David Evans, International Conference on Learning Representations (ICLR 2019) Workshop  on Debugging Machine Learning Models, New Orleans, LA, May 6-9, 2019.

“Exploiting Excessive Invariance caused by Norm-Bounded Adversarial Robustness”, Jörn-Henrik Jacobsen, Jens Behrmannn, Nicholas Carlini, Florian Tramèr, and Nicolas Papernot, International Conference on Learning Representations (ICLR 2019) Safe Machine Learning Workshop, New Orleans, LA, May 6-9, 2019.

“Context-aware Monitoring in Robotic Surgery”, Mohammad Samin Yasar, David Evans, and Homa Alemzadeh, International Symposium on Medical Robotics (ISMR 2019), Atlanta, GA, April 3-5, 2019.

2018

“A Spectral View of Adversarially Robust Features”, Shivam Garg, Vatsal Sharan, Brian Hu Zhang, and Gregory Valiant, Conference on Neural Information Processing Systems (NeurIPS 2018), Montreal, Canada, December 2-8, 2018.

“Enablers of Adversarial Attacks in Machine Learning”, Rauf Izmailov, Shridatt Sugrim, Ritu Chadha, Patrick McDaniel, and Anatrhram Swami, IEEE Military Communications Conference (MILCOM 2018), Los Angeles, CA, October 29-31, 2018.

“Semantic Adversarial Deep Learning”, Tommaso Dreossi, Somesh Jha, and Sanjit Seshia, 30th International Conference on Computer Aided Verification (CAV 2018), Oxfordshire, UK, July 14-17, 2018.

“Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training” , Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, and Somesh Jha, International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, July 10-15, 2018.

“Analyzing the Robustness of Nearest Neighbors to Adversarial Examples“, Yizhen Wang, Somesh Jha, and Kamalika Chaudhuri, International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, July 10-15, 2018.

“Privacy Risk in Machine Learning: Analyzing the Connection to Overfitting”, Samuel Yeom, Irene Giacomelli, Matt Fredrikson, and Somesh Jha, IEEE Computer Security Foundations Symposium (CSF 2018), Oxford, UK, July 9-12, 2018.

“SoK:Security and Privacy in Machine Learning”, Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Michael Wellman, IEEE European Symposium on Security and Privacy (EuroS&P 2018),  London, UK, April 24-26, 2018.

“Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning”, Nicolas Papernot and Patrick McDaniel.

“Making Machine Learning Robust Against Adversarial Inputs”, Ian Goodfellow, Patrick McDaniel and Nicolas Papernot, Communications of the ACM, ACM, July 2018.

“Ensemble Adversarial Training: Attacks and Defenses”, Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, and Patrick McDaniel, International Conference on Learning Representations (ICLR 2018), Vancouver, Canada, April 30-May 3, 2018.

 

 Visit the Pennsylvania State University Home Page
Copyright 2025 © The Pennsylvania State University Privacy Non-Discrimination Equal Opportunity Accessibility Legal

Support for the Center for Trustworthy Machine Learning (CTML) is provided through NSF Grant #(CNS-1805310), part of the NSF Secure and Trustworthy Cyberspace Program. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Additional support is provided byPenn State University,Stanford University,UC Berkeley,UC San Diego,University of Wisconsin,andUniversity of Virginia.