Abby Fan
  480 60th Street, Brooklyn NY, 11220
  (917) 945-8397
  zf2169@columbia.edu
Profile
6+ years of academic experience focusing on Data Analysis
3+ years of hands-on experience in Data Mining, Data Visualization and Machine Learning.
Interpreting trends or patterns in complex data sets (Image, Text, Longitudinal Data, Financial data)
Drawing actionable business insights.
Strong Communication skills, fluent in English and Mandarin.
Skills
Modeling: Python • R • SQL • Matlab • SAS • HTML

Database: AWS • MySQL • Tableau • Excel (VBA, Power Pivot, VLOOKUP) • Oracle • Google Cloud • Hadoop

Machine Learning • Data Visulization • Data Interpreting • Big Data
Hobbies
              
              
              
Awards & Certifications
Data Analysis and Presentation Skills: the PwC Approach Specialization @Coursera
SQL for Data Science by University of California @Coursera
Machine Learning by Stanford University @Coursera
Machine Learning A-Z™: Hands-On Python & R in Data Science @Udemy
Complete Python Bootcamp @Udemy
top10/60 Second-annual Columbia Data Science Student Challenge @CU


2019
Business Insight - Data Manager @Altice USA
May 2019 – Current, New York
2019
Data Analyst @Arecy LLC
Dec 2018 – Apr 2019, New York
• Big Data: utilized Google Cloud, AWS and SQL to query and preprocess business and economic data from different sources
• Data Mining: performed data exploration and feature engineering including transformation, imputation, one-hot encoding, tf-idf using Python; increased computation accuracy and efficiency
• Insight Driving: draw actionable business insights through data visualization; assisted in problem solving and decision making
• Project Automation: automated data maintenance process of assigned portfolios including acquiring raw data, implementing feature engineering, exporting and updating records using Python xlwings
2018
FinTech- Intelligence Personal Investment Consultant, Project @Arecy
• Utilized Lending Club API to query historic loans data from year 2014; applied data exploration, visualization, engineering (NLP techniques, one-hot encoding, label encoding, transformation, binning, imputation)
• Fitted XGBoost model, selected optimal threshold, tuned parameters based on AUC scores and got final recall rate of 0.71
• Established a web application interface that allow users to adjust important parameters of a loan and get a predicted default rate
2018
Analyst, Product Engineering and Development @David Yurman
May 2018 – Dec 2018, New York
• Sourcing: initiated and maintained new product development life-cycle; assisted in managing and negotiating new product costs; worked with management to seek cost reduction and process improvement
• Data Manipulation and Analysis: extracted and updated key product data from PLM in E-Commerce projects; analyzed and dealt with inventory and pricing discrepancies through Oracle
• Project Development: assisted in achieving optimal timing for new product launches
2017
Data Science Analyst Intern @Somar Capital Management
Oct 2017 – May 2018, New York
• Web-Scrapping: applied web-scrapping skills to collect open data including product reviews on Amazon, product prices on various online stores, Google search counts, Facebook followers, App ratings, shop counts and active customers amount
• NLP: analyzed customer reviews using NLP algorithms including tokenization, bag of words, tf-idf, cosine similarity, topic modeling, latent dirichlet allocation; built recommendation system based on prices, shipping fees and review analysis
• Machine Learning: implemented SVM, Logistic Regression, time series models to predict future shop counts with 85% accuracy
• Data Visualization: made graphs using Tableau and draw conclusions to assist manager in investing US and European market
2017
The Fragile Families Challenge, Project @Data Science Competition
May 2017, New York
-to predict six evaluation outcomes of children on age 15 given a huge questionnaire dataset from his birth to age 9
• Implemented feature engineering techniques including feature selection, transformation, one-hot encoding, K-Means clustering, missing data processing on background dataset (about 13,000 dimensions); built XGBoost models using Python
• Won top 1 progress prizes, top 2 final prizes on outcome Material Hardship, top 4 on GPA out of 200 teams
2017
Application of Black-Litterman Model in Optimal Portfolio @Math of Finance Class
Apr 2017, New York
• Established a baseline portfolio on Dow Jones 30 historical data weighted on market capitalizations of each industry
• Improved the expected returns by 121.1% by implementing investor’s views from a paper of Morgan Stanley to the baseline
2017
Image Classification Using Machine Learning @Applied Data Science Class
Mar 2017, New York
• Extracted features from poodles and fried chicken images with SIFT, HOG and LBP algorithms; reduced dimensions using K-means
• Acquired 90% accuracy by majority voting on random forest, BP neural network model and SVM model built on LBP features
2016
Columbia University, Graduate School of Arts and Sciences
Sep 2016 - Feb 2018, New York
• M.A. in Statistics, Data Science Focus (Qualify for STEM OPT)
• GPA: 3.7/ 4.0
• Coursework: Advanced Machine Learning, Applied Data Science, Time Series Analysis, Statistical Methods in Finance
2015
Quantitative Analyst Intern @QUANT Investment
Jun 2015 - Aug 2016, Shanghai, China
• Technical Skills: processed over 100,000 agricultural data of stocks, futures and other derivatives; applied Discrete Fourier Transformation and found price cycle through Python; built optimal portfolio and improved annualized predicted return by 17%
2012
Southeast University, Department of Mathematics
Sep 2012 - Jun 2016, Nanjing, China
• B.S. in Mathematics, Statistics Focus
• GPA: 3.6/ 4.0 (top 15%)
• Coursework: Statistics of Prediction& Decision, Sampling Survey, Actuarial Mathematics, Analysis of Qualitative Data