Skip to main content

Mark R. Fahey

Deputy Director - High Performance and Institutional Computing

Deputy Director

Biography

Background

I have been working with massively parallel and vector supercomputers since 1998 after receiving my Ph.D. in Numerical Analysis and Scientific Computing.  My first job was with the DoD High Performance Computing Center in Vicksburg, MS at the Engineering Research and Development Center.  I assisted research scientists in migrating codes to massively parallel machines, both distributed and shared memory architectures.  We primarily help parallelize and optimize scientific applications using MPI and OpenMP. 

I then joined Oak Ridge National Laboratory in 2001.  For many years I was part of the technical staff. I served as an NCCS liaison to fusion, combustion, materials, and chemistry researchers. Activities included assisting research scientists in parallelizing, porting, and debugging their applications on a wide variety of modern, parallel architectures.  I developed two infrastructure software layers used by a variety of centers across the world (1) a library tracking database (ALTD) where centers can determine library and code usage and (2) a mostly automated system (SWTools) for installing and maintaining third-party applications on our supercomputers.

In 2009, I received a joint appointment with the University of Tennessee Knoxville and began working for the National Institute for Computational Sciences as the Scientific Computing Group lead.  This group was comprised of 8 PhD level application scientists in a variety of fields including chemistry, CFD and computer science.  I coordinated and directe our advanced user support efforts with applications researchers in astrophysics, materials science, climate, chemistry, and computational biology so that they could effectively use the [then] largest NSF supercomputer (Kraken) – a Cray XT5 with 99K cores.  While in this role, I took on the additional role of leader for the XSEDE Extended Support for Research Teams area of the Extended Collaboration Support Services component of XSEDE, managing ~40 consultants across the country.  In 2011, I became Deputy Director at NICS where I was responsible for all day-to-day operations at NICS including operations for the Kraken Cray XT5 Petaflop supercomputer.  Acquired and deployed Cray XK6 and XC30 machines. This included temporary assignments as Interim User Assistance Group Leader and Interim HPC Operations Group leader where I filled in to provide leadership and hire staff including fulltime group leaders. 

I became the Director of Operations for the Argonne Leadership Computing Facility in February of 2015 after leaving Oak Ridge.  The group operated and supported leadership computers for the Department of Energy.   In my time at ALCF, we operated a 10PF BlueGene/Q machine and installed and operated an 11.69 PF Cray XC40 with Intel Knight Landing processors as well as deploying two 100 PB filesystems with a novel community sharing ability.

Current Position

In April 2022, I started in a project management role within the CELS directorate, and will serve to align the ALCF and LCRC operations.

Education

  • Ph.D. in Numerical Analysis and Scientific Computing, University of Kentucky, May 1999.
  • Master of Arts in Mathematics , University of Kentucky, May 1994, GPA: 4.0.
  • Bachelor of Arts in Mathematics , St. Norbert College, May 1992, Minor in Comp. Sci., GPA: 3.97, Summa Cum Laude.

Publication Highlights (recent only)

  • M. R. Fahey, et. Al., Theta and Mira at Argonne National Laboratory.”  In Jeff Vetter (Ed.), Contemporary High Performance Computing: From Petascale to Exascle, Volume 3  (pp. 31-62). CRC Computational Science Series, Taylor and Francis, Boca Raton May 2019. ISBN 9781351036863, DOI: https://​doi​.org/​1​0​.​1​2​0​1​/​9​7​8​1​3​5​1​0​36863.
  • Harms, G. McPheeters, B. Allen, M. Fahey, Intel Enterprise Edition 3.0 for Lustre on Sonexion 3000,” proceedings of the 2017 Lustre Users Group, June 2017, Bloomington, IN.
  • Harms, T. Leggett, B. Allen, S. Coghlan, M. R. Fahey, C. Holohan, G. McPheeters, P. Rich, Theta: Rapid Installation and Acceptance of an XC40 KNL System,” proceedings of the 2017 Cray User Group Conference, May 2017, Redmond, WA. And Concurrency and Computation: Practice and Experience,” DOI10.1002/cpe.4336, Article accepted on 23 August, 2017. Published 5 Dec 2017.
  • M. R. Fahey, K. Antypas, B. Archer, A. Bland, K. Cupps, E. Dart, G. Grider, B. Hendrickson, I. Monga, K. Riley, B. Springmeyer, Facilities and ECP,” 2017 ECP Annual Meeting, Jan 2017 – presentation only.
  • Budiardja, K. Agrawal, M. R. Fahey, R. McLay, and D. James, Library Function Tracking with XALT,” proceedings of the XSEDE16 Conference, July 2016, Miami, FL
  • Budiardja, M. R. Fahey, P. Maddumage, B. Hadri, R. McLay, and D. James, Community Use of XALT in its First Year in Production,” proceedings of the Second Workshop on HPC Tools for User Support (HUST15) Workshop held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), November 2015, Austin, TX
  • James, R. McLay, S. Liu, T. Evans, W. L. Barth, A. Lamas-Linares, R. Budiardja, M. R. Fahey, Tales from the Trenches: Can User Support Tools Make a Difference,” proceedings of the Second Workshop on HPC User Support Tools (HUST15) held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), November 2015, Austin, TX
  • K. Agrawal, M. R. Fahey, R. McLay, and D. James, User Environment Tracking and Problem Detection with XALT,” proceedings of the First Workshop on HPC Tools for User Support (HUST14) Workshop held in conjunction with the International Conference for High Performance Computing, Networking, Storage and Analysis (SC14), November 2014, New Orleans, LA.
  • Contributor to Standing Together for Reproducibility in Large-Scale Computing,” principal editors Doug James, Nancy Wilkins-Diehr, Victoria Stodden, Dirk Colbry, and Carlos Rosales; Report on reproducibility@XSEDE - An XSEDE14 Workshop July 14, 2014, Atlanta, GA. http://​arx​iv​.org/​a​b​s​/​1​4​1​2​.5557 and https://​www​.xsede​.org/​w​e​b​/​r​e​p​r​o​d​u​c​i​b​ility. 
  • M. R. Fahey, Robert McLay, Reproducibility responsibilities in the HPC arena” position paper, Reproducibility Workshop held in conjunction with the 2014 Annual Conference of the Extreme Science and Engineering Discovery Environment XSEDE 14, July 13 - 18 2014, Atlanta, GA, USA
  • M. R. Fahey, A leap forward with UTK’s Cray XC30,” in Proceedings of the 2014 Annual Conference of the Extreme Science and Engineering Discovery Environment XSEDE 14, July 13 - 18 2014, Atlanta, GA, USA.
  • M. R. Fahey, R. Budiardja, L. Crosby, S. McNally, Deploying Darter – A Cray XC30 System,” Lecture Notes in Computer Science, J.M. Kunkel, T. Ludwig, and H.W. Meuer (Eds.): ISC 2014, LNCS 8488, pp. 430–439, Springer International Publishing Switzerland June 2014.
  • M. R. Fahey, Performance of the fusion code GYRO on four generations of Cray computers,” Proceedings of the 56th Cray User Group (CUG2014), Lugano, Switzerland, May 2014.
  • B. Hadri and M. R. Fahey, Mining Software Usage with the Automatic Library Database (ALTD),” 2013 International Conference on Computational Science in Procedia Computer Science 18 (2013), Barcelona, Spain, June 2013. DOI 10.1016/j.procs.2013.05.352
  • M. R. Fahey, L. Crosby, G. Rogers, V. Hazlewood, Kraken: the First Academic Petaflop Computer.”  In Jeff Vetter (Ed.), Contemporary High Performance Computing: From Petascale toward Exascale (pp. 453-491). CRC Computational Science Series, Taylor and Francis, Boca Raton 2013. ISBN 978-1-4665-6834-1
  • B. Rekepalli, P. Giblock, C. Reardon, S. Sarkar, M. R. Fahey, Web-Enabled Systems Biology Science Gateway on Supercomputers,” Poster at the ACM Conference on Bioinformatics, Computational Biology and Biomedicine 2012, Orlando, FL, October 2012. 
  • H. Zhang, H. You, B. Hadri, M. R. Fahey, HPC Usage Behavior Analysis and Performance Estimation with Machine Learning Techniques,” 18th International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, July 2012. 
  • M. Vanderlan, J. Celso, J. Wilck, X. Li, M. R. Fahey, Analyzing supercomputer utilization under queuing with a priority formula and a strict backfill policy,” International Journal of Decision Sciences (IJDS), vol 3, no 1, 2012, pp. 95-114.