Lia Cappellari
Data Scientist

Enthusiastic and results-driven Data Engineer with a strong foundation in industrial engineering. Proficient in coding languages including R, Python, HTML, SQL, and Javascript, with a solid grasp of machine learning fundamentals. Experienced in AWS systems, conducting impactful data analyses and implementing process improvements using Six-Sigma methodologies. Proven leadership as the President of Alpha Pi Mu, coupled with work experience optimizing operations at CAES, Restaurant World and currently, SteerBridge Strategies.

My Projects

Ribbit - An app for automated frog species identification and classification

This project aims to build an application that leverages an API and Machine Learning for real time identification and classification of amphibian species through use uploaded recordings of frog calls. Data gathered will be contributed to global biodiversity repositories for conservation efforts.  
Approach:
This project is currently in progress. Information will be updated when complete.
Impact:
This project is currently in progress. Information will be updated when complete.

Olympic Insights Webpage

This fully interactive webpage, built to explore historical Olympic data, provides users with in-depth insights into Olympic medal distributions, athlete performance, and country dominance across various sports and time periods. The goal of the project was to create a user-friendly interface that allows dynamic exploration of the data through multiple visualizations and filters, offering a comprehensive view of the Olympics.
Approach:
Webpage Development: The webpage was developed using HTML, CSS, and JavaScript for frontend design and responsiveness. The visualizations were embedded using Tableau to offer a seamless user experience.
• Interactive Features: The webpage includes multiple different dashboards that allow users to explore medal counts by type of medal, athlete, sport, country, gender, or game year. Various different charts are used for different purposes. Multiple filters, including drop-down menus and sliders, allow users to customize the analysis based on game year, discipline, and country. Filters were implemented to dynamically adjust visualizations in real-time.
• Tools Used: HTML, CSS, JavaScript for webpage design and interactivity. Tableau was used for creating dynamic and interactive visualizations embedded into the webpage. Python (Pandas) for data cleaning and preparation. Flask for deployment.

Predicting Student Dropout Rates with Machine Learning

The goal of this project was to predict student dropout rates in higher education using a variety of machine learning models. We explored four different models: baseline logistic regression, random forest, and neural network.
Approach:
• Data Preprocessing: Cleaned and preprocessed data from the educational institution's student database, handling missing values and normalizing numerical features.
• Modeling: Developed a baseline logistic regression model and iterated over Random Forest and Neural Network models, tuning hyperparameters using grid search and cross-validation.
• Performance Metrics: Assessed model performance using accuracy, precision, recall, and F1 score. Random Forest yielded the best results with an F1 score of 0.78.
• Tools Used: Python (Pandas, Scikit-learn), Jupyter Notebooks for modeling and visualization.
Impact:
The Random Forest model provided the institution with insights into which factors were most indicative of student dropout, enabling targeted interventions for at-risk students.

Effects of Expressing Gratitude on Tipping Behavior

This experiment explored how expressing gratitude affects tipping behavior at a New York City coffee shop. A sign thanking customers for "supporting a local business" was placed in front of the register for the treatment group, while the control group saw no sign.
Approach:
• Data Collection: Collected tipping percentages from both groups over a 2-week period.
• Statistical Analysis: Performed a two-sample t-test to determine whether there was a statistically significant difference in tipping behavior between the two groups. Also conducted tests for heterogenous treatment effects between the days of the week.
• Results: While the treatment group tipped 1.08% more on average, the difference was not statistically significant (p > 0.05) at the 5% level, but it was significant at the 10% level.
• Tools Used: R for statistical analysis and visualizations, Excel for data aggregation.

Scrap Reduction

The objective of this project was to identify inefficiencies in the material handling process at Thermo Fisher Scientific and implement data-driven improvements to reduce scrap and operational costs.
Approach:
• Data Analysis: Collected and analyzed operational data using SQL and R. Used clustering techniques to group together product units based on similarities as well as scrap rates and potential savings. Identified key areas of waste and inefficiency, particularly in the handling of raw materials and defective parts.
• Process Optimization: Applied Six Sigma methodologies to propose changes in the material handling process, leading to more efficient workflows and reduced scrap.
• Results: Successfully reduced scrap by 10%, resulting in annual cost savings of $700,000. Created an interactive R Shiny Dashboard for future uses.
• Tools Used: R for statistical analysis, SQL for querying data from operational databases.

Drug Overdose Analysis

This project aimed to identify key factors contributing to drug overdoses, focusing on geographic and temporal trends as well as substance combinations that lead to higher risk of overdose.
Approach:
• Data Collection: Aggregated data from public health sources, including the CDC, on drug overdose rates, substance types, and demographic information.
• Exploratory Data Analysis (EDA): Used Python (Pandas, Matplotlib, Seaborn) for in-depth EDA, identifying trends by age, gender, and location. Conducted time series analysis to observe changes in overdose rates over time.
• Statistical Modeling: Built regression models to explore relationships between variables (e.g., opioid usage and overdose rates), identifying key predictors.
• Tools Used: Python for data wrangling and visualization, Tableau for presenting geographic and temporal trends.
Impact:
The analysis provided actionable insights for public health officials, identifying geographic areas and demographic groups most at risk for drug overdoses, potentially informing future policy interventions.

Data Analysis App

This project involved developing a user-friendly web portal using R Shiny to enable seamless data analysis and visualization for the Design of Experiments (DOE) methodology.
Approach:
• Web Portal Development: Built an interactive web interface using R Shiny that allows users to upload datasets and conduct a 7-step DOE analysis.
• Analysis Features: The portal performs tasks such as model significance testing, residual analysis, ANOVA assumptions verification, and data visualization.
• Customization: Included functionality for customizable plots, interactive tables, and downloadable reports, making the portal accessible for both novice and expert users.
• Tools Used: R Shiny for web development, ggplot2 for data visualization, R for statistical analysis.
Impact:
The portal enabled users to perform complex statistical analyses without needing advanced coding skills, making it a valuable tool for researchers and engineers conducting experimental designs.

My Experience

Data Engineer
SteerBridge Strategies
January 2025 - Present
Data Scientist / Analyst
Restaurant World
May 2023 - May 2024
Continuous Improvement Intern
CAES
June 2022 - August 2022
President, Alpha Pi Mu
University of San Diego
March 2022-March 2023
Lean Six-Sigma Intern
San Ysidro Health
September 2021-December 2021

My Skills

Python
R Studio
Tableau
SQL
JavaScript
HTML
Jupyter Notebook
AWS
Experiments and Testing


PowerBI
Excel
MATLAB
Data Analysis
Data Visualization
Six Sigma Green Belt
Git
Predictive Modeling
Machine Learning

My Education

University of California, Berkeley – MS in Data Science
University of San Diego – BS/BA in Industrial and Systems Engineering

Contact Me

E-Mail: liacappellari.1@gmail.com
Phone: (310)-721-6485
Github: https://github.com/lia-cappellari
LinkedIn: www.linkedin.com/in/lia-cappellari-aa5481144