Projects

Structural Discovery in Temporal Networks, PhD Project
January 2017 – present

  • Proposed a novel decomposition that assumes a stable across time low-rank component and a sparse time-varying component. The latter can evolve in a smooth manner or exhibit sharp changes.
  • Developed a fast alternating minimization algorithm and illustrate the results on synthetic and real data.

Statistical Modeling of Activities in Dynamic Social Networks, PhD Project
September 2015 – December 2016

  • Developed counting process models to measure user influence in social networks with community structure such as Facebook or Twitter and to quantify the connections between users based on their online behavior, which can be used for influencer marketing and customer segmentation respectively.
  • Simulated user behavior data including posts, shares and likes from the models, estimated the parameters by maximizing partial likelihood and evaluated the results.

Ads Analysis
September 2016

  • Predicted the numbers of ads to be shown next month for 40 ad groups based on their historical data by generalized linear models, which can help the company set financial goals and decide budget allocation.
  • Clustered the ads into 3 groups depending on whether the average cost-per-click is going up, going down or staying flat based on linear regression. The ads in the group of going up and going down need further inspection to understand what happens in order to improve ad performance.

Diagnosis of Mesothelioma Disease
March 2016 – April 2016

  • Discovered and discarded one variable that is exactly the same as the class label, which can lead to a perfect classification but cannot yield much useful information.
  • Classified whether a patient has mesothelioma disease by random forest and achieved an accuracy of 82%. Platelet count, city and keep side are found to be most important based on the two measures of variable importance, which can help doctors make more accurate diagnoses.

Semi-parametric Modeling of Temporal Trends for Cholera Outbreak in Haiti, Master’s Project
May 2015 – January 2016

  • Wrote a program to extract cholera data from 1000+ ill-formatted daily reports after the earthquake in 2010.
  • Captured the temporal trend of cholera for each department of Haiti with smoothing spline Poisson regression models and Bayesian generalized linear mixed models. The models can be used to predict future cholera cases and thus help decision-making in healthcare resources allocation.
https://en.wikipedia.org/wiki/List_of_natural_disasters_in_Haiti#/media/File:Haiti_departements_map-fr.png
https://en.wikipedia.org/wiki/List_of_natural_disasters_in_Haiti

A Strategy to Build Connection with a Stranger in Social Networks
March 2015 – April 2016

  • Analyzed the characteristics of the Enron email network and detected communities with fast algorithms including Louvain method and label propagation.
  • Designed metrics based on community structure and eigenvector centrality to rank all shortest paths between two vertices.

Tournament Planner
May 2016

  • Wrote a Python module that uses the PostgreSQL database to keep track of players, matches and standings in a single-elimination tournament.

Online Forum Summarization
April 2016 – May 2016

  • Wrote MapReduce programs for Hadoop in Python to analyze the user behavior in Udacity’s discussion forums.

Locations and Attendances of Joint Statistical Meetings
October 2016

  • Visualized the locations and attendances of Joint Statistical Meetings from 1993 to 2012 with D3.js.

miRNA-Gene Regulatory Pattern Recognition
March 2013 – June 2013

  • Integrated multiple genomic data from 385 ovarian cancer patients and found 9 miRNA-gene regulatory modules based on spectral clustering.