About me
I am a Senior Deep Learning Scientist at the Illumina AI Lab. I earned my Ph.D. in Computer Science from the Center for Computational Biology, Johns Hopkins University (August 2025), advised by Dr. Steven Salzberg and Dr. Mihaela Pertea. My research focuses on AI for genomics—including sequence-to-function modeling, genome annotation, and DNA language models. I hold a B.S. in Electrical Engineering from National Taiwan University and exchanged during my final year at the College of Engineering & Computer Science at Australian National University.
🧬 My research interest intersects deep learning with genomics and transcriptomics:
- In RNA-Seq gene expression prediction, I developed Shorkie, a masked‑DNA language model pretrained on 165 fungal genomes and fine‑tuned on high‑resolution yeast TF induction time‑course RNA‑seq data. (Paper; Slides; Talk).
- In splice site prediction, I built a dilated residual convolutional neural network to decode the complexities of RNA splicing, alternative splicing, and the impact of genetic variants on cryptic splicing (Paper I; Paper II; News; Talk).
- In genome annotation, I used graph-based methods to stitch together fragmented DNA and protein alignments, thereby assembling them into more accurate annotations. (Paper; Talk).
- In genome assembly, I assembled and annotated the first gapless Southern Chinese Han genome, Han1, using PacBio HiFi and Oxford Nanopore long reads, with T2T-CHM13 as a guide (Paper; Genome).
- For pangenome indexing, I applied new renaming heuristics and an SMT solver to make the Wheeler graph recognition problem computationally feasible (Paper; News; Talk).
💻 I am an advocate for open-source software, embracing the philosophy of “build what you need, use what you build”. I invite you to explore my NEWS page for the latest updates on my projects.
💬 Feel free to reach out to me for collaborations, discussions, or just to say hi! Coffee chat! ☕️
Selected Publication (* corresponding author, † co-first author)
- Kuan-Hao Chao*, Majed Mohamed Magzoub, Emily Stoops, Sean R Hackett, Johannes Linder*, David R Kelley* (2025) Predicting dynamic expression patterns in budding yeast with a fungal DNA language model, bioRxiv
- Kuan-Hao Chao*†, Alan Mao†, Anqi Liu, Steven L Salzberg*, Mihaela Pertea* (2025) OpenSpliceAI: An efficient, modular implementation of SpliceAI enabling easy retraining on non-human species, eLife
- Kuan-Hao Chao*, Jakob M. Heinz, Celine Hoh, Alan Mao, Mihaela Pertea, Steven L. Salzberg* (2025) Combining DNA and protein alignments to improve genome annotation with LiftOn, Genome Research
- Kuan-Hao Chao*, Alan Mao, Steven L. Salzberg, Mihaela Pertea* (2024). Splam: a deep-learning-based splice site predictor that improves spliced alignments, Genome Biology
- Kuan-Hao Chao*, A.V. Zimin, M. Pertea, S.L. Salzberg* (2023). The first gapless, reference-quality, fully annotated genome from a Southern Han Chinese individual, G3: Genes, Genomes, Genetics
- Kuan-Hao Chao*†, Pei-Wei Chen†, Sanjit A. Seshia, Ben Langmead* (2023). WGT: Tools and algorithms for recognizing, visualizing and generating Wheeler graphs, iScience
- Yu-Hsin Chen†, Kuan-Hao Chao†, Jin Yung Wong, Chien-Fu Liu, Jun-Yi Leu*, Huai-Kuang Tsai* (2023). A feature extraction free approach for protein interactome inference from co-elution data, Briefings in Bioinformatics
- Kuan-Hao Chao*, K. Barton, S. Palmer, and R. Lanfear* (2021). sangeranalyseR: simple and interactive processing of Sanger sequencing data in R, Genome Biology and Evolution
- Kuan-Hao Chao, Yi-Wen Hsiao, Yi-Fang Lee, Chien-Yueh Lee, Liang-Chuan Lai, Mong-Hsun Tsai, Tzu-Pin Lu, Eric Y. Chuang (2019). RNASeqR: an R package for automated two-group RNA-Seq analysis workflow, IEEE/ACM Transactions on Computational Biology and Bioinformatics
Honors & Awards
- Research highlight by JHU Department of Computer Science [Article], 2025
- Research highlight by JHU HUB, Whiting School of Engineering and CS Department [Article], 2024
- Mark O. Robbins Prize awarded by Advanced Research Computing at Hopkins (ARCH) [Article], 2024
- Taiwan Government Scholarship to Study Abroad (GSSA) awarded by Taiwan Ministry of Education, 2024
- Research highlight by JHU Whiting School of Engineering and CS Department [Article], 2024
- Best Poster Award, Bioconductor Conference (Bioc2021), 2021
- College Student Research Fellowship awarded by Taiwan Ministry of Science and Technology, 2019
- Elite Prize (1st prize), 2017 HackNTU (500+ participants Hackathon) [Photo], 2019
Education
- Ph.D. Candidate in Computer Science, Johns Hopkins University, Sep/2021 - Aug/2025
- M.S.E. in Computer Science, Johns Hopkins University, Sep/2021 - May/2023
- B.S. in Electrical Engineering, National Taiwan University, Sep/2016 - Jan/2021
Experience
- Sr. Deep Learning Scientist & Engineer, Artificial Intelligence Lab,
, Aug/2025 - Present - Genomics Machine Learning Research Intern, Kelley Lab,
, May/2024 - Aug/2024 - Research Assistant, Institute of Information Science, Academia Sinica, Jul/2020 - Jan/2021
- Research Student, Research School of Biology, The Australian National University, Jul/2019 - Jun/2020
- Research Student, Centers of Genomic and Precision Medicine, National Taiwan University, Aug/2018 - Jul/2019
Selected Presentation
- Doctoral Dissertation Defense Seminar: "Decoding the Language of Genomes: Bridging Sequences and Function through Deep Learning", Baltimore, MD, August 2025, Slides, Website
- Robbins Prize Awardee Talk, JHU Symposium on High-Performance Computing (HPC 2025): "Teaching machines to learn biology: splice site prediction and gene expression prediction", Baltimore, MD, April 2025, Photo, Slides
- JHU Joint Biostats-Genomics Lab Meeting Talk: "Unifying ChIP-exo DNA-Binding and RNA-Seq Coverage Predictions with a Multi-Species Fungal Language Model", Baltimore, MD, Jan 2025, Video, Slides
- Calico internship 1-hour Talk, Calico: "Improving ChIP-exo DNA-binding and gene expression predictions with a multi-species fungal language model", South San Francisco, CA, August 2024, Photo, Slides
- Invited Google Deep Dive 1-hour Talk, Google Health: "Computational methods to improve genome annotation, splice site prediction, and gene expression prediction", Virtual & Mountain View, CA, August 2024, Video [Google internal only], Slides
- ISMB General Computational COSI Talk, International Conference on Intelligent Systems for Molecular Biology: "Combining DNA and protein alignments to improve genome annotation with LiftOn", Montréal, Canada, July 2024, Video, Slides
- JHU Joint Biostats-Genomics Lab Meeting Talk: "Predicting splice sites in DNA sequences with sequence models", Baltimore, MD, May 2024, Video, Slides
- RECOMB-seq Talk, Research in Computational Molecular Biology on Biological Sequence Analysis: "Combining DNA and protein alignments to improve genome annotation with LiftOn", Cambridge, USA, April 2024, Slides
- RECOMB-seq Proceeding Talk, Research in Computational Molecular Biology on Biological Sequence Analysis: "WGT: Tools and algorithms for recognizing, visualizing and generating Wheeler graphs", Istanbul, Türkiye, April 2023, Video, Slides
- ISMB/ECCB Poster, Intelligent Systems for Molecular Biology / European Conference on Computational Biology 2023: "Splam: a deep-learning-based splice site predictor that improves spliced alignments", Lyon, France, July 2023, Link
Teaching
- Johns Hopkins University
- EN.580.458 / 658 Computing the Transcriptome, Teaching assistant, Spring 2023. Taught by Dr. Mihaela Pertea. Jan 2023 - May 2023.
- National Taiwan University
- CSX 4001 Data Science Programming, Teaching assistant, Spring 2019
- EE 1006 Cornerstone EECS Design and Implementation, Teaching assistant, Fall 2018
Mentorship & Community Leadership
- Alan Mao, JHU Computer Science & Biomedical Engineering undergrad (May 2023 – August 2025)
- Current: PhD Candidate, Department of Biomedical Data Science, Stanford University
- Co-founder and Organizer, Johns Hopkins Deep Learning + Genomics Study Group (Oct 2024 - Aug 2025)
- I hosted biweekly deep learning seminar, organizing speakers to initiate the discussion between researchers at Hopkins doing Deep learning + genomics research. Slides, Repository, Schedule
I am open to mentor students who are interested to do computational genomics + AI research! We can work together on a focused six-month research project. I’ve been fortunate to learn from many wonderful mentors, and I’d love to give back to the community. If this sounds interesting to you, please feel free to reach out.
Selected open-source software
- Shorkie, yeast RNA-Seq coverage predictor
Code Paper
- OpenSpliceAI, splice site prediction framework
Code Documentation Poster Paper
- Splam, splice site predictor
Code Documentation Poster Paper
- LiftOn, annotation lift-over tool
Code Documentation Paper
- sangeranalyseR, R package for analyzing Sanger sequence
Code Documentation Poster Paper
- Wheele Graph Toolkit
Code Poster Paper
Service
- Reviewer
- Human Genetics and Genomics Advances: 2025
- BMC Bioinformatics: 2025
- BMC Genomics: 2025
- Scientific Reports: 2025
- Genome Research: 2024
- G3: Genes, Genomes, Genetics: 2024
- BMC Bioinformatics: 2024
- International Society for Computational Biology (ISCB): 2024
- Chromatographia: 2023
- Sub-reviewer
- Genome Research: 2024
- Nature Machine Intelligence: 2023
- G3: Genes, Genomes, Genetics: 2022
Side Projects
- Biobaby, Unity WebGL game, ▶️ Play it now!
- Flappy penguin, Unity WebGL game, ▶️ Play it now!
- Tank fire, Unity WebGL game, ▶️ Play it now!

