MakeTheBrainHappy: Utilizing the Scratch API to download comments from a Scratch Studio

Studios in Scratch are collections of Projects centered around a specific “theme”

If we want to analyze comments from a rather large Scratch studio, the best way to accomplish this via the system built into Scratch currently is to click the “load more” button many times and try to search top-level replies with CTRL+F. This strategy is inefficient when you want to perform whole data-set analysis or simply search all the comments for specific keywords. Although the API is considered deprecated, it can still be used to access the information in HTML format. If you want to access a specific page, you can simply use the following URL.

https://scratch.mit.edu/site-api/comments/gallery/146521/?page=InsertPageNumberHere

The following python3 program can take all the comments from a given studio and sequentially inserts them into a csv file. This format can be opened in most spreadsheet programs. You will need to change the link in the requests.get() command with the specific number associated with the studio.

How to find the number associated with a specific studio

There is also a code segment labelled “analysis”, which produces some high-level information about the studio as shown in the spreadsheet below. It will be unsorted, so you will need to navigate to the appropriate sorting tools within your program, most likely excel or google spreadsheets.

Example of the top-level analysis for the S.D.S. Studio "Growing"

Enjoy utilizing the program for your interests. Please note that it may not work for some exceptionally large studios due to Scratch's built-in rate limit.

"""
Purpose: Track user contributions within the specific studios

Created by makethebrainhappy
"""

import requests
from bs4 import BeautifulSoup
import pandas as pd
import collections

def main():
    #Data Collection Portion
    users = []
    comments = []
    timestamps = []
    pages = 1
    while True:
        html_doc = requests.get("https://scratch.mit.edu/site-api/comments/gallery/146521/?page="+str(pages))
        if html_doc.status_code == 200:
            print('Success!')
        elif html_doc.status_code == 404:
            print('Not Found.')
            break
        soup = BeautifulSoup(html_doc.content, 'html.parser')
        #print(soup.prettify())
        for com in soup.find_all("div", class_="comment"):
            users.append(com.select("div.name a")[0].string)
        for com in soup.find_all("div", class_="comment"):
            comments.append(com.select("div.content")[0].get_text(" ",strip=True))
        for com in soup.find_all("div", class_="comment"):
            timestamps.append(com.select("span.time")[0].get_text(" ",strip=True))
        pages = pages + 1
    d = {"user":users,"comment":comments,"timestamp":timestamps}
    df = pd.DataFrame(data=d)
    df.to_csv("welcomingCommitteeComments.csv",encoding="utf-8")
    
    #Analysis Portion
    newUsers = Counter(users)
    lenComments = []
    for i in comments:
        lenComments.append(len(i))
    newDict = {}
    for j in range(0,len(lenComments)):
        newDict[users[j]] = 0
    for j in range(0,len(lenComments)):
        newDict[users[j]] = newDict[users[j]]+lenComments[j]
    newUsers = collections.OrderedDict(sorted(newUsers.items()))
    newDict = collections.OrderedDict(sorted(newDict.items()))
    avg = []
    for j in range(0,len(newUsers)):
        avg.append(newDict.values()[j]/newUsers.values()[j])
    d = {"user":newUsers.keys(),"Number of Comments":newUsers.values(),"Total Characters in Comments:":newDict.values(),"Average Characters per Comment":avg}
    df = pd.DataFrame(data=d)
    df.to_csv("studioAnalysis.csv")

main()

Credit to apple502j for helping me with beautifulsoup.

9 comments:

AnonymousAugust 27, 2024 at 7:11 PM
моторы эвинруд
AnonymousSeptember 4, 2024 at 7:55 AM
Nie przegap wyjątkowych artykułów pełnych ciekawostek! Sprawdź je wszystkie: remonty mieszkań w warszawie
AnonymousSeptember 9, 2024 at 1:49 AM
które rozbudzą Twoją kreatywność!
AnonymousSeptember 9, 2024 at 3:40 AM
Jak Taraflex wpływa na wynik sportowy? Odkryj tajniki w naszym artykule! najczęstsze zastosowania taraflex
AnonymousSeptember 9, 2024 at 4:13 AM
Planujesz remont kuchni? Dowiedz się, jak wybrać styl w odpowiedniej cenie! Sprawdź remonty mieszkań z gwarancją
AnonymousSeptember 9, 2024 at 5:43 AM
Niezdecydowany co do remontu? Dowiedz się, ile to może kosztować! profesjonalne usługi remontowe Warszawa
AnonymousSeptember 9, 2024 at 12:21 PM
Jak urządzić mały pokój dzienny z sypialnią? Znajdź odpowiedzi w naszym artykule! remonty generalne Warszawa
AnonymousOctober 16, 2024 at 5:47 AM
IsraFace - еврейские песни, это сообщество с евреями, где знакомятся евреи и еврейки и русский еврей из России, Украины. Добавляйте любимые фото, видосики, подключайтесь в портал, читайте блог, делайте визиты на форум, начинайте еврейские знакомства.
AnonymousDecember 29, 2024 at 7:10 PM
ссылочная масса онлайн разместить внешние ссылки на сайт