Solina Kim
- Hi!
- I'm Solina Kim, a computer science major at the University of Notre Dame.
- As a rising junior dedicated to working as an engineer in the healthcare industry, I spent the past summer at University Health as a ML Engineering intern evaluating Text Analytics for Health, a Named Entity Recognition model for healthcare data.
- I also have experience in other projects at the intersection of healthcare and computer science, such as transformer model development for SARS-CoV-2 inhibitors, and bioinformatics for malaria in Sub-Saharan Africa.
- In my free time, I like to train models, build programs, play tetris, work out, travel, and more!
Tools and Skills
Courses and Certificates
Experience
Indiana University Health
ML Engineering Intern
May - August 2022
- Conducted error analysis on output from Microsoft’s Named Entity Recognition model on IU Health’s doctors’ notes, using Python Pandas and Scala in a Databricks environment.
- Discovered issues in case-sensitivity and lack of knowledge transfer between model components.
- Delivered novel insights on ways to collaborate with Microsoft to improve model performance to fit IU Health's data.
- Proactively communicated and balanced the diverse needs of stakeholders at IU Health, Microsoft Cognitive Services, and AnalytiXIN.
Lucy Family Institue for Data and Society
Research Assistant
August 2021 - May 2022
- Generative AI Design and Exploration of Nucleoside Analogs
- Contributed to developing the Conditional Random Transformer model – a ML based algorithm which efficiently searches chemical space to generate limited quantity of molecules that are qualitatively similar to SARS-CoV-2 inhibitors – using Python RDKit, Tqdm, Pandas, and NumPy.
- Analyzed Tanimoto similarity, Morgan fingerprints, pairwise similarity, and validity of molecules generated by CRT model using Python RDKit and Pandas.
World Health Organization & University of Notre Dame
Research Assistant
March 2021 - May 2022
- ITS2 and CO1 Gene Sequence Analysis
- Extracted and parsed all available CO1 and ITS2 sequences of sub-Sharan African Anopheles from NCBI and Bold Systems using R. Script also successfully detected genetic anomalies due to unknown species or human error in data submission.
- Further analyzed anomalous genes using SeqMan and flagged potential novel species for physiochemical analysis on-site in Africa.
University of Notre Dame
Research Assistant @ Lobo Lab
March 2021 - May 2022
- Accesibility to Malaria Treatment in Zambia
- Processed GPS tracker data into shapefiles and generated map of roads to malaria treatment facilities using R.
- Compared GPS tracker-generated map to satellite images of roads on ArcGIS to assess realistic quality of access to malaria treatment in Zambia.
Korea Centers for Disease Control and Prevention
Research Intern @ Insect-Borne Diseases Department
July 2019 - August 2019
- Collected mosquito samples, manually identified species, and conducted PCR for the government’s insect-borne diseases surveillance system.
- Designed and conducted experiments to test mosquito repellant products for client companies.
Projects
- Developed a neural network, xgboosted decision tree, and logistic regression model for credit card scam detection.
- Improved F1 scores by an average of 60% for all models to reach 0.85 - 0.90 on test set.
- Implemented functions and pipelines to reliably and efficiently replicate and experiment.
- Utilized Gridsearch CV for hyperparameter tuning, and explored rus, ros, SMOTE, and normalization to improve model performance on imbalanced dataset.
Machine Learning
Neural Networks
Tensorflow
Scikit-learn
ML Diagnostics
- Developed an expanded version of the Gale-Shapley algorithm to fit unequal sets and incomplete, noncardinal preferences.
- Implemented the ideated algorithm into a memory and run-time efficient program using deques and sets.
- Embedded the program into an automated email UI with Python Pandas, ImapLib, and NumPy for professors to utilize easily.
- Identified deliverables, project dependencies, design considerations, and distributed responsibility among team members for 2 month long project
Data Structures
Algorithms
ImapLib
Pandas
Teamwork
- Developed a program to request, collect, pipe data from Kiwoom Security’s (S Korea’s #1 stock brokerage service) API.
- Designed an Excel file to receive data from above program and output visualizations of metrics as requested by client.
- Reduced client’s time spent on data parsing, processing, and analysis by 70% while maintaining his 40% annual average profit rate.
Python
Kiwoom API
pyqt5
client service
communication
- Research was funded by the University of Notre Dame with a DaVinci Grant of $4390.
- Defined a country-specific qualitative index of sociopolitical factors that affect data surveillance measures for pandemic control.
- Analyzed the relationship between index and pandemic control effectiveness in China, South Korea, and USA based on surveys conducted in the 3 countries.
Political Philosophy
Statistics
R
Problem Solving
Project Management