The second season of Arcane, a latest blockbuster sequence on Netflix primarily based on the universe of one of the crucial in style on-line video video games ever, League of Legends, is about in a fantasy world with heavy steampunk design, closed with astonishing visuals and a record-breaking price range. As a superb community and knowledge scientist with a selected curiosity in turning pop cultural objects into knowledge visualization, this was all I wanted after ending the closing season to map out the hidden connections and switch the storyline of Arcane right into a community visualization — utilizing Python. Therefore, by the tip of this tutorial, you should have hands-on abilities on the best way to create and visualize the community behind Arcane.
Nonetheless, these abilities and strategies are completely not particular to this story. In actual fact, they spotlight the final method community science gives to map out, design, visualize, and interpret networks of any complicated system. These programs can vary from transportation and COVID-19 spreading community patterns to mind networks to varied social networks, reminiscent of that of the Arcane sequence.
All pictures created by the writer.
Since right here we’re going to map out the connections behind all characters, first, we have to get an inventory of every character. For this, the Arcane fan wiki web site is a superb supply of free-to-use info (CC BY-SA 3.0), which we will simply entry by easy net scraping methods. Specifically, we’ll use urllib to obtain, and with BeautifulSoup, we’ll extract the names and fan wiki profile URLs of every character listed on the principle character web page.
First downloading the character itemizing web site’s html:
import urllib
import bs4 as bs
from urllib.request import urlopenurl_char = 'https://arcane.fandom.com/wiki/Class:Characters'
sauce = urlopen(url_char).learn()
soup = bs.BeautifulSoup(sauce,'lxml')
Then, I extracted all the doubtless related names. One can simply determine what tags to feed the parsed html saved within the ‘soup’ variable by simply right-clicking on a desired ingredient (on this case, a personality profile) and choosing the ingredient inspection choice in any browser.
From this, I realized that the identify and url of a personality are saved in a line which has ‘title=’ in it, however doesn’t comprise ‘:’ (which corresponds to classes). Moreover, I created a still_character flag, which helped me determine which subpages on the character itemizing web page nonetheless belong to authentic characters of the story.
import rechars = soup.find_all('li')
still_character = True
names_urls = {}
for char in chars:
if '" title="' in str(char) and ':' not in char.textual content and still_character:
char_name = char.textual content.strip().rstrip()
if char_name == 'Arcane':
still_character = False
char_url = 'https://arcane.fandom.com' + re.search(r'href="([^"]+)"', str(char)).group(1)
if still_character:
names_urls[char_name] = char_url
The earlier code block will create a dictionary (‘names_urls’) which shops the identify and url of every character as key-value pairs. Now let’s have a fast have a look at what we now have and print the name-url dictionary and the overall size of it:
for identify, url in names_urls.objects():
print(identify, url)
A pattern of the output from this code block, the place we will textual content every hyperlink — pointing to the biography profile of every character:
print(len(names_urls))
Which code cell returns the results of 67, implying the overall variety of named characters we now have to take care of. This implies we’re already completed with the primary process — we now have a complete listing of characters in addition to quick access to their full textual profile on their fan wiki websites.
To map out the connections between two characters, we determine a technique to quantify the connection between every two characters. To seize this, I depend on how often the 2 character’s biographies reference one another. On the technical finish, to realize this, we might want to acquire these full biographies we simply bought the hyperlinks to. We are going to get that once more utilizing easy net scraping methods, after which save the supply of every web site in a separate file domestically as follows.
# output folder for the profile htmls
import os
folderout = 'fandom_profiles'
if not os.path.exists(folderout):
os.makedirs(folderout)# crawl and save the profile htmls
for ind, (identify, url) in enumerate(names_urls.objects()):
if not os.path.exists(folderout + '/' + identify + '.html'):
fout = open(folderout + '/' + identify + '.html', "w")
fout.write(str(urlopen(url).learn()))
fout.shut()
By the tip of this part, our folder ‘fandom_profiles’ ought to comprise the fanwiki profiles of every Arcane character — able to be processed as we work our method in the direction of constructing the Arcane community.
To construct the community between characters, we assume that the depth of interactions between two characters is signaled by the variety of occasions every character’s profile mentions the opposite. Therefore, the nodes of this community are the characters, that are linked with connections of various energy primarily based on the variety of occasions every character’s wiki web site supply references another character’s wiki.
Constructing the community
Within the following code block, we construct up the sting listing — the listing of connections that incorporates each the supply and the goal node (character) of every connection, in addition to the load (co-reference frequency) between the 2 characters. Moreover, to conduct the in-profile search successfully, I create a names_ids which solely incorporates the particular identifier of every character, with out the remainder of the net tackle.
# extract the identify mentions from the html sources
# and construct the listing of edges in a dictionary
edges = {}
names_ids = {n : u.cut up('/')[-1] for n, u in names_urls.objects()}for fn in [fn for fn in os.listdir(folderout) if '.html' in fn]:
identify = fn.cut up('.html')[0]
with open(folderout + '/' + fn) as myfile:
textual content = myfile.learn()
soup = bs.BeautifulSoup(textual content,'lxml')
textual content = ' '.be a part of([str(a) for a in soup.find_all('p')[2:]])
soup = bs.BeautifulSoup(textual content,'lxml')
for n, i in names_ids.objects():
w = textual content.cut up('Picture Gallery')[0].depend('/' + i)
if w>0:
edge = 't'.be a part of(sorted([name, n]))
if edge not in edges:
edges[edge] = w
else:
edges[edge] += w
len(edges)
As this code block runs, it ought to return round 180 edges.
Subsequent, we use the NetworkX graph analytics library to show the sting listing right into a graph object and output the variety of nodes and edges the graph has:
# create the networkx graph from the dict of edges
import networkx as nx
G = nx.Graph()
for e, w in edges.objects():
if w>0:
e1, e2 = e.cut up('t')
G.add_edge(e1, e2, weight=w)G.remove_edges_from(nx.selfloop_edges(G))
print('Variety of nodes: ', G.number_of_nodes())
print('Variety of edges: ', G.number_of_edges())
The output of this code block:
This output tells us that whereas we began with 67 characters, 16 of them ended up not being related to anybody within the community, therefore the smaller variety of nodes within the constructed graph.
Visualizing the community
As soon as we now have the community, we will visualize it! First, let’s create a easy draft visualization of the community utilizing Matplotlib and the built-in instruments of NetworkX.
# take a really temporary have a look at the community
import matplotlib.pyplot as plt
f, ax = plt.subplots(1,1,figsize=(15,15))
nx.draw(G, ax=ax, with_labels=True)
plt.savefig('check.png')
The output picture of this cell:
Whereas this community already provides a number of hints about the principle construction and most frequent traits of the present, we will design a way more detailed visualization utilizing the open-source community visualization software program Gephi. For this, we have to export the community right into a .gexf graph knowledge file first, as follows.
nx.write_gexf(G, 'arcane_network.gexf')
Now, the tutorial on the best way to visualize this community utilizing Gephi:
Extras
Right here comes an extension half, which I’m referring to within the video. After exporting the node desk, together with the community group indices, I learn that desk utilizing Pandas and assigned particular person colours to every group. I bought the colours (and their hex codes) from ChatGPT, asking it to align with the principle shade themes of the present. Then, this block of code exports the colour—which I once more utilized in Gephi to paint the ultimate graph.
import pandas as pd
nodes = pd.read_csv('nodes.csv')pink = '#FF4081'
blue = '#00FFFF'
gold = '#FFD700'
silver = '#C0C0C0'
inexperienced = '#39FF14'
cmap = {0 : inexperienced,
1 : pink,
2 : gold,
3 : blue,
}
nodes['color'] = nodes.modularity_class.map(cmap)
nodes.set_index('Id')[['color']].to_csv('arcane_colors.csv')
As we shade the community primarily based on the communities we discovered (communities that means extremely interconnected subgraphs of the unique community), we uncovered 4 main teams, every similar to particular units of characters throughout the storyline. Not so surprisingly, the algorithm clustered collectively the principle protagonist household with Jinx, Vi, and Vander (pink). Then, we additionally see the cluster of the underground figures of Zaun (blue), reminiscent of Silco, whereas the elite of Piltover (blue) and the militarist implement (inexperienced) are additionally well-grouped collectively.
The sweetness and use of such group constructions is that whereas such explanations put it in context very simply, often, it will be very arduous to provide you with an analogous map solely primarily based on instinct. Whereas the methodology offered right here clearly reveals how we will use community science to extract the hidden connections of digital (or actual) social programs, let it’s the companions of a regulation agency, the co-workers of an accounting agency, and the HR division of a serious oil firm.