Introduction
This project aims to build a predictive model for linkedin responses as well as email responses. We have a number of factors that we think would hold predictive power. When deciding to reach out to potential partners, we need to be efficient as we only have so much time and want to focus on the avenues that are most likely to result in response.
Attributes of Interest
LinkedIn Connections(# of connections), LinkedIn Summary Complete(Yes/No), Age(Current year - year graduated from college), Gender(M/F), Number of Skills Endorsed, Profile Picture(Yes/No), State, City, Month Request Was Sent, Day of Week Request Was Sent, Time of Day Request Was Sent
Target Attributes
Response to Linkedin Message(Yes/No)
Analysis
Information were scrapped using Python Selenium from about 100 targeted LinkedIn profiles. The data cleaning process fixed missing value. Then, feature generation process was performed. Age was categorized into 4 groups: 'Below 5 Yrs', '5 - 15 Yrs', '15 - 30 Yrs', '30+ Yrs', and Months Worked for Most Recent Job feature was generated and categorized into 4 groups: 'Below 12 Months', '12 - 36 Months', '36 - 60 Months', '60+ Months'.
Random Over-Sampling was utilized in order to fix data imbalance. And categorical data was encode into binary data. Next, Logistic Regression, Random Forest and LGB models were constructed. After tuning, the model with the highest ROC AUC score is LGB, which has an accuracy of 0.928.
Conclusion
The top 4 highest feature importance identified by using feature importance function embedded in Random Forest, SHAP and Permutation Feature Importance are quite similar, which are Number of Connections, Time Sent - 10:45, Time Sent - 11:45 and City - Greater Los Angeles Area
HAVE A GREAT OPPORTUNITY FOR ME? |
Feel free to drop me a line