This page provides the material and scripts to replicate our submitted article about developer retention in software ecosystems.
You can download the data and scripts here. The zip file contains 3 files:
– script.R: The R script that runs the survival analyses and exports all the survival curves (in pdf format)
– 2 csv files: One for the developers of each ecosystem
The columns of each csv file correspond to:
– user: The user id based on the GHTorrent dataset
– ta_abandoner (technical activity): Boolean variable concerning the technical abandonment (0: active, 1: abandoner)
– sa_abandoner (social activity): Variable concerning the social abandonment. (0: active, 1: abandoner, -1: the developer was never socially active)
– ta_duration_months (technical activity): Number of months between the first and last commit
– sta_duration (socio-technical activity): Number of months between the first and last commit or social message
– sa_messages (social activity): Number of messages that the developer has exchanged with other developers (social activity)
– sa_activity_months (social activity): Number of distinct months that a developer has exchanged messages with other developers (Different value from sta_duration_months since months with no messages are not considered for sa_activity_months)
– sa_largest_delta (social activity): Largest social inactivity gap measured in months
– ta_commit_contributions (technical activity): Number of commits
– ta_activity_months (technical activity): Number of distinct months that a developer has had commits (Different value from sta_duration_months since months with no commits are not considered for ta_activity_months)
– ta_largest_delta (technical activity): Largest technical inactivity gap measured in months
– developerTechnicalIntensity (technical activity): TI value (refer to the paper for the definition)
– developerTechnicalSpread (technical activity): TS value (refer to the paper for the definition)