Utilizing the Scratch API to download comments from a Scratch Studio


Studios in Scratch are collections of Projects centered around a specific “theme”

         If we want to analyze comments from a rather large Scratch studio, the best way to accomplish this via the system built into Scratch currently is to click the “load more” button many times and try to search top-level replies with CTRL+F. This strategy is inefficient when you want to perform whole data-set analysis or simply search all the comments for specific keywords. Although the API is considered deprecated, it can still be used to access the information in HTML format. If you want to access a specific page, you can simply use the following URL.

https://scratch.mit.edu/site-api/comments/gallery/146521/?page=InsertPageNumberHere

         The following python3 program can take all the comments from a given studio and sequentially inserts them into a csv file. This format can be opened in most spreadsheet programs. You will need to change the link in the requests.get() command with the specific number associated with the studio.

How to find the number associated with a specific studio

        
There is also a code segment labelled “analysis”, which produces some high-level information about the studio as shown in the spreadsheet below. It will be unsorted, so you will need to navigate to the appropriate sorting tools within your program, most likely excel or google spreadsheets.



Example of the top-level analysis for the S.D.S. Studio "Growing"
Enjoy utilizing the program for your interests. Please note that it may not work for some exceptionally large studios due to Scratch's built-in rate limit. 


"""
Purpose: Track user contributions within the specific studios

Created by makethebrainhappy
"""

import requests
from bs4 import BeautifulSoup
import pandas as pd
import collections

def main():
    #Data Collection Portion
    users = []
    comments = []
    timestamps = []
    pages = 1
    while True:
        html_doc = requests.get("https://scratch.mit.edu/site-api/comments/gallery/146521/?page="+str(pages))
        if html_doc.status_code == 200:
            print('Success!')
        elif html_doc.status_code == 404:
            print('Not Found.')
            break
        soup = BeautifulSoup(html_doc.content, 'html.parser')
        #print(soup.prettify())
        for com in soup.find_all("div", class_="comment"):
            users.append(com.select("div.name a")[0].string)
        for com in soup.find_all("div", class_="comment"):
            comments.append(com.select("div.content")[0].get_text(" ",strip=True))
        for com in soup.find_all("div", class_="comment"):
            timestamps.append(com.select("span.time")[0].get_text(" ",strip=True))
        pages = pages + 1
    d = {"user":users,"comment":comments,"timestamp":timestamps}
    df = pd.DataFrame(data=d)
    df.to_csv("welcomingCommitteeComments.csv",encoding="utf-8")
    
    #Analysis Portion
    newUsers = Counter(users)
    lenComments = []
    for i in comments:
        lenComments.append(len(i))
    newDict = {}
    for j in range(0,len(lenComments)):
        newDict[users[j]] = 0
    for j in range(0,len(lenComments)):
        newDict[users[j]] = newDict[users[j]]+lenComments[j]
    newUsers = collections.OrderedDict(sorted(newUsers.items()))
    newDict = collections.OrderedDict(sorted(newDict.items()))
    avg = []
    for j in range(0,len(newUsers)):
        avg.append(newDict.values()[j]/newUsers.values()[j])
    d = {"user":newUsers.keys(),"Number of Comments":newUsers.values(),"Total Characters in Comments:":newDict.values(),"Average Characters per Comment":avg}
    df = pd.DataFrame(data=d)
    df.to_csv("studioAnalysis.csv")

main()

Credit to apple502j for helping me with beautifulsoup.

Data Processing in Scratch

Data is information. This can be just the number ‘3’ or a whole book.
Computer Programming Languages manipulate (use or change) data.
We generally split data into two categories in Scratch: numbers and strings. Strings are either single characters, words or sentences.
Examples of numbers: 1, 1.0, 15, 18, -5
Examples of strings: “A”, “b”, “I like Pi”, “$400”, “(^_^)”, “Sentences etc.”
This is a variable in Scratch. Variables contain Data.
Variables can contain either numbers or strings.


Putting Data into a Variable












Showing/Hiding Variables

Event Handler: The “When Green Flag is Clicked Block” is an event handler. This means that it executes code in response to a stimulus.

Operators: These green blocks allow you to simply manipulate data. We will first focus on number operators.
These include the traditional addition, subtraction, multiplication & division operators.

Other “Number” Operators:
We can also generate randomness and with integers b/t intervals. Scratch also has a built in “round” command.

Variables + Operators: As you may have guessed, we can place variables inside of our operators. In order for the operators to work correctly, the variable needs to be holding a number.
 
Find the final value of z in the following code:

Printing: This is a convenient way to examine the final data (output) of our project. This will “print” by having the sprite say the data on stage.



Problems:


Create a 3x3 table of variables on stage (nine variables total). Set each of these variables to create a X in your table. Note: You can move variables around the screen with your mouse.
Find the rounded answer for 6012.345*1232.1234 = (Write it here)


Advanced Assignment: Code a program which generates random math problems on screen with division (see the image below for an example output).



Appendix 1: Advanced “Numbers” Operators
Scratch has many built-in operators that perform more advanced mathematical functions. These include the trigonometric functions (sin, cos, tan).

Scratch also contains the modulo operator which finds the remainders. Essentially, this looks at the “left-over” piece from division. For example, if I divide 4/3, it would equal 1 + 1/3. The modulo function would then return 1 because this is the piece which did not divide in evenly.