Studios in
Scratch are collections of Projects centered around a specific “theme”
|
If we want to analyze comments from a
rather large Scratch studio, the best way to accomplish this via the system built into Scratch currently is to
click the “load more” button many times and try to search top-level replies with CTRL+F.
This strategy is inefficient when you want to perform whole data-set analysis
or simply search all the comments for specific keywords. Although the API is
considered deprecated, it can still be used to access the information in HTML
format. If you want to access a specific page, you can simply use the following
URL.
https://scratch.mit.edu/site-api/comments/gallery/146521/?page=InsertPageNumberHere
The following python3 program can take
all the comments from a given studio and sequentially inserts them into a csv
file. This format can be opened in most spreadsheet programs. You will need to
change the link in the requests.get() command with the specific number
associated with the studio.
How to find the number associated with a specific studio |
There is also a code segment labelled “analysis”, which produces some high-level information about the studio as shown in the spreadsheet below. It will be unsorted, so you will need to navigate to the appropriate sorting tools within your program, most likely excel or google spreadsheets.
Example of the top-level analysis for the S.D.S. Studio "Growing" |
""" Purpose: Track user contributions within the specific studios Created by makethebrainhappy """ import requests from bs4 import BeautifulSoup import pandas as pd import collections def main(): #Data Collection Portion users = [] comments = [] timestamps = [] pages = 1 while True: html_doc = requests.get("https://scratch.mit.edu/site-api/comments/gallery/146521/?page="+str(pages)) if html_doc.status_code == 200: print('Success!') elif html_doc.status_code == 404: print('Not Found.') break soup = BeautifulSoup(html_doc.content, 'html.parser') #print(soup.prettify()) for com in soup.find_all("div", class_="comment"): users.append(com.select("div.name a")[0].string) for com in soup.find_all("div", class_="comment"): comments.append(com.select("div.content")[0].get_text(" ",strip=True)) for com in soup.find_all("div", class_="comment"): timestamps.append(com.select("span.time")[0].get_text(" ",strip=True)) pages = pages + 1 d = {"user":users,"comment":comments,"timestamp":timestamps} df = pd.DataFrame(data=d) df.to_csv("welcomingCommitteeComments.csv",encoding="utf-8") #Analysis Portion newUsers = Counter(users) lenComments = [] for i in comments: lenComments.append(len(i)) newDict = {} for j in range(0,len(lenComments)): newDict[users[j]] = 0 for j in range(0,len(lenComments)): newDict[users[j]] = newDict[users[j]]+lenComments[j] newUsers = collections.OrderedDict(sorted(newUsers.items())) newDict = collections.OrderedDict(sorted(newDict.items())) avg = [] for j in range(0,len(newUsers)): avg.append(newDict.values()[j]/newUsers.values()[j]) d = {"user":newUsers.keys(),"Number of Comments":newUsers.values(),"Total Characters in Comments:":newDict.values(),"Average Characters per Comment":avg} df = pd.DataFrame(data=d) df.to_csv("studioAnalysis.csv") main()
Credit to apple502j for helping me with beautifulsoup.