Software engineer winter intern
Published:
Short description of portfolio item number 1
Short description of portfolio item number 2
Published in IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 2019
This R package is designed for case-control RNA-Seq analysis (two-group). There are six steps: “RNASeqRParam S4 Object Creation”, “Environment Setup”, “Quality Assessment”, “Reads Alignment & Quantification”, “Gene-level Differential Analyses” and “Functional Analyses”. Each step corresponds to a function in this package. After running functions in order, a basic RNASeq analysis would be done easily.
Citation:
K.H. Chao, Y.W. Hsiao, Y.F. Lee, C.Y. Lee, L.C. Lai, M.H. Tsai, T.P. Lu, and E.Y. Chuang* (2019). RNASeqR: an R package for automated two-group RNA-Seq analysis workflow, IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 18, no. 5, pp. 2023-2031, 1 Sept.-Oct. 2021, doi: 10.1109/TCBB.2019.2956708.
Published in Genome Biology and Evolution(GBE), 2021
This package builds on sangerseqR to allow users to create contigs from collections of Sanger sequencing reads. It provides a wide range of options for a number of commonly-performed actions including read trimming, detecting secondary peaks, and detecting indels using a reference sequence. All parameters can be adjusted interactively either in R or in the associated Shiny applications. There is extensive online documentation, and the package can outputs detailed HTML reports, including chromatograms.
Citation:
K.H. Chao*, K. Barton, S. Palmer, and R. Lanfear* (2021). sangeranalyseR: simple and interactive processing of Sanger sequencing data in R, Genome Biology and Evolution, Volume 13, Issue 3, March 2021, evab028, https://doi.org/10.1093/gbe/evab028.
I worked as a research assistant at professor Robert Lanfear’s Molecular Evolution and Phylogenetics Lab during my exchange at the Australian National University (September, 2019 - June, 2020). My first project is to rewrite an R package, sangeranalyseR, that Rob wrote few years ago. I changed the R package into object oriented version, added shiny applications, and uploaded it to Bioconductor. The second project is using machine learning to estimate phylogenetic trees from DNA sequence data.
I started to do research at professor Eric Y. Chuang’s Bioinformatics and Biostatistics Core Lab at National Taiwan University Centers of Genomic and Precision Medicine (CGM) since my second year of university (February, 2018). In the first few months, I learned basic bioinformatics data analysis technique for example quality control, short reads alignment, differential gene expression analysis and functional analysis etc.
I am working as an research assistant at professor Huai-Kuang Tsai’s Bioinformatics Lab since July, 2020. My main goal is to develop an elution profile-based protein complexes inference algorithm. The current methods calculate different scores based on entire protein elution profiles and the method that I am working on is to calculate local scores, which provide more in-depth information, for each fraction on elution profiles.
I started to do research at Bioinformatics and Biostatistics Core Lab at National Taiwan University Centers of Genomic and Precision Medicine (CGM) since my second year of university (February, 2018) and my research projects are mostly co-advised by professor Eric Y. Chuang and professor Tzu-Pin Lu.
I developed RNASeqR for two-group (case-control) RNA-Seq analysis and it is now available on Bioconductor 3.10 release. There are six steps: “RNASeqRParam S4 Object Creation”, “Environment Setup”, “Quality Assessment”, “Reads Alignment & Quantification”, “Gene-level Differential Analyses” and “Functional Analyses”. Each step corresponds to a function in this package. After running functions in order, a basic RNASeq analysis would be done easily.
I developed sangeranalyseR while working as a research assistant at the Molecular Evolution and Phylogenetics Lab, led by Prof. Robert Lanfear during my exchange at the Australian National University. sangeranalyseR is now available on Bioconductor 3.12.
I am building a mathematical Markov chain model to simulate numbers of people infected after being vaccinated by trivalent or quadrivalent inactivated influenza vaccines (TIV/QIV) and the cost-effectiveness with different vaccine coverage. The main goal is to provide a powerful vaccine cost-effectiveness website for Taiwan Centers for Disease Control to assess public health policies. The back-end of the website is written in Python Django and Django-Q task scheduler.
I am developing a new software to improve the current two tools, EPIC and PrInCE, which predict protein-protein interactions (PPI) with machine learning approaches, by focusing on local features of an elution profile.
Wang Zhiming and I developed this text-mining website to help the NTU Medical Genie AI team to get the genotypes information. There are two main steps. In the first step, we used MetaMap and Clinphen to extract phenotype information from electronic health record (EHR); in the second step, we used Phenolyzer and Variant Prioritizer to convert selected phenotypes to genotype information. The website was developed by Python Django framework.
I am developing a website for Mycobacterium tuberculosis sequence analysis. It includes a reference-based sequence assembly pipelines. The back-end of the website is written in Python Django, snakemake and Celery task distributor.