The advancement of AI in modern times is vast. As presented below, using Voyant-Tools, we can go above and beyond in writing analysis. Like never before, we have started to build programs that allow us to dive deeper into anything online. In this program specifically, I downloaded in a .txt file of Romeo and Juliet, and I was given a ton of data that can be useful in many ways. For example, in the word cloud below, AI is able to synthesize a visual image of the most frequently used words correlated by their size in the word cloud. The bigger the word, the more frequently it is used, like “Romeo” in the example below. Not only this, but Voyant can form multiple other ways to visualize this data, including trend graphs. In this plot below, we can see the varying frequency of the most used words over different segments of the script. I find this incredibly significant when I think about it. Can AI really do something so amazing, so easily? It really is crazy to me, after possibly ten seconds of processing, data that would take humans days, or weeks, to analyze takes the computer mere seconds.


Moreover, using Anastasia Salter’s instructions to see how Google Gemini can help with distant reading tasks, it was definitely a sight to see how far AI has become. After downloading a .txt file from another work, The Romance of Lust, a classic Victorian erotic novel created by an anonymous writer, and consulting Gemini for a Bag of Words (BoW), it takes you through Python to achieve this goal and provides the fifty most frequent content words. Below is the coding Gemini rapidly used:
import re
from collections import Counter
def generate_bag_of_words(file_path):
# 1. Read the file
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
# 2. Cleaning: Remove Gutenberg headers/footers (simplified)
# Finding the start and end markers usually found in these files
start_marker = "*** START OF THE PROJECT GUTENBERG EBOOK"
end_marker = "*** END OF THE PROJECT GUTENBERG EBOOK"
start_idx = text.find(start_marker)
end_idx = text.find(end_marker)
if start_idx != -1 and end_idx != -1:
# Adjust indices to get the actual text between markers
text = text[start_idx + len(start_marker):end_idx]
# 3. Tokenization: Lowercase and split into words
# We use regex to keep only words with 3+ letters to remove noise
words = re.findall(r'\b[a-z]{3,}\b', text.lower())
# 4. Stopword Removal: Common English words to ignore
stopwords = set([
'the', 'and', 'to', 'of', 'a', 'in', 'was', 'that', 'her', 'it', 'she',
'my', 'i', 'had', 'with', 'for', 'as', 'on', 'you', 'not', 'at', 'be',
'but', 'is', 'he', 'his', 'me', 'all', 'by', 'this', 'from', 'which',
'so', 'one', 'were', 'have', 'would', 'or', 'an', 'been', 'no', 'could',
'their', 'very', 'are', 'them', 'when', 'up', 'out', 'what', 'did', 'if',
'who', 'they', 'then', 'now', 'more', 'about', 'some', 'into', 'him',
'down', 'than', 'after', 'there', 'see', 'do', 'we', 'your', 'made'
])
filtered_words = [w for w in words if w not in stopwords]
# 5. Create the Bag (Counter)
bag = Counter(filtered_words)
return bag
# Usage
# bag = generate_bag_of_words('pg30254.txt')
# print(bag.most_common(50))
When asking Gemini the simple prompt, “Can you visualize the most frequent phrases,” I received an extensive bar graph of the most common phrases. To me, this was highly worrying. While AI can be used to make life easier, it can definitely be a tool used for replacing human work due to its abilities. In light of this, I worry about the employment of AI, potentially decreasing job quantities. I also worry about the loss of humanity in future online endeavors. With AI, while it can be made to sound human, there will always be no real voice behind the words. Therefore, I worry greatly about the future and the constant development of AI.

I thought it was very interesting that the output you got from Gemini in terms of phrases. Specifically, the separation of bigrams and trigrams. In my experience, AI will never replicate the human voice in terms of writing and producing. In terms of ability, it is very powerful but lacks alot of the nuance that human labor produces without extreme machine learning and prompting.
Brandon, I loved examining your work and how different the phrases were from the word bubble that you created using Voyant. Whether it was different tools of phrasing vs specific actions, the action emphasis on the phrases and the emphasis on names in the singular words is fascinating. Keep up the great work!
It is really cool how you showed that Voyant can take a huge play like Romeo and Juliet and turn it into a word cloud so fast. Using the Python code to find the most common words in that Victorian novel was a smart way to show how distant reading works without having to read every single page yourself. I like how you got honest about the scary side of AI replacing jobs or making things feel less human because it is a big worry for the future. The way the trend graphs show the story moving along through data points is super interesting even if it feels a bit weird to see art turned into numbers like that!