In the 2nd week, I had exported the extracted data to an Airtable view.
I didn’t realise it until just last week but Airtable has a limit of 1200 records per base only for a free version and the SCMS could be implemented in any spreadsheet software as far as GrimoireLab is concerned.
Our initial goal was to collect as much data as possible to represent the community’s sentiments holistically and make SCMS usable by other open-source communities as well.
So we decided to shift the implementation to Google Sheets. Google Sheets has a limit at 5 million cells, which is adequately enormous [yeah....0.o].
So, I collected all data from Github, Mailing lists of Grimoirelab and IRC channel of CHAOSS. Then, randomly tagged it (as done in Week 2), and exported to Google Sheets using an API.
After looking at the data carefully, we also noticed that we had Github comments by Coveralls indicating the coverage increased or decreased. Other than this, we had lots of IRC messages indicating a person has joined the channel or has left the channel. This data is not providing us with any additional information about the community and is not a user sentiment. So, we planned to remove all those unnecessary records. We collected around 5.6k filtered data from Github, Mailing list, IRC channel.
A major part of the week was planned to be devoted to building a Codex Sheet.
Codices help us to rely on Qualitative data in an objective sense over time much like Quantitative data. To reduce the subjectivity of this qualitative data, it is imperative to define a codex table which can help in tagging data points accurately by better defining and honing in on the purpose of tags. It also helps in collaborating with the team to keep similar ideas running.
Codices contain the definitions of metrics within the organisation (here it is Grimoirelab), and an example to illustrate the definition. It also consists of “when to use” and “when-not-to-use” a metric.
I also made a short rough draft of an overview of SCMS to be published for the CHAOSS blog!
Other than this, since we have a full version of data present in google sheets, I converted this entire Excel sheet data to the ElasticSearch index (precisely as done in Week 2). The only difference is the number of records being used. Earlier, I could only have limited records containing the “extra_scms_data” in the Enriched index.
Now, every meaningful record (i.e. ignoring comments by coveralls) has an additional field present in its ElasticSearch index.
And the weekly meeting...
Per usual, I had a weekly meeting with the mentors on 19 June’20, minutes are here.
We discussed Dashboarding and expanding on the SCMS. We’ll be having a collaboration meeting on Friday,26 for discussing the findings of tagged data. Till then, in the next week, I’ll be focusing on writing tests and making a basic dashboard. ?
That's it for this week.? Make sure you have a look at the project updates on Github #ria18405/GSoC. All questions and comments are welcomed! Stay tuned for more weekly updates. ?