Toronto Elections data with Neo4j and Python part 1 of 3

As promised I am pushing the envelope on the 2006 Elections contributions datasets. This time I am going to do some analysis using Neo4j, but since the data needs to be loaded using the right syntax, I have a little preparation to do, first. Currently, my data sits in a csv file and looks like the following:

1;Robichon, Georges; ;H3R1R3;H3R;Mont-Royal (Quebec);200.00;CT0001;Cash;CR0001;Individual;LeDrew, Stephen;1.00;Mayor; 2;Rousseau, Remy; ;J4M2B3;J4M;Longueil (Quebec);1000.00;CT0001;Cash;CR0001;Individual;Pitfield, Jane;1.00;Mayor;
The first column (in the above it is “1” and “2”) that is a unique ID I created for each donation. It will be useful for identifying unique contributions as we’ll see later. Here is my sketch as to what I need the Neo4j graph to look like:

So I basically need to create nodes and relationships for each donation, and put it into a text file. That text file can then be cut-and-paste or otherwise imported into Neo4j. There might be a better way to import, but there is some tricky conditional statements you have to make around contributors, because someone can contribute to a candidate multiple times and/or contribute to multiple candidates. As a result a ‘straight CREATE’ statement on each line will result in duplicate entries. (EDIT: I still ended up having duplicate nodes, which I had to delete but you get the idea.)
EDIT: I tried out py2neo and liked it! Code is updated with latest version of this Python plugin.
import csv import py2neo from py2neo import neo4j, node, rel #this is a handy add-in for working with Neo4j source = 'C:\Users\jlalonde\Documents\personal\elections\\tblMayor.csv' #obviously you want to adjust your path to suit what you need graph_db = neo4j.GraphDatabaseService() graph_db.clear() d = [] #complete dataset, that can then be sorted p = [] #this will be a unique list of postal codes c = [] #this will be a unique list of candidates cont = [] #this will help with checking duplicate contributors m=1 with open(source, 'rb') as csvfile: s = csv.reader(csvfile, delimiter=';') #this next section creates nodes for row in s: if row[11] == 'Gold, Mitch': d.append([row[0], row[1], row[3], row[5], row[6], row[8], row[10], row[11]]) # this only collects the data you need for this demo #from the original data.... # row[0] is the unique ID #row[1] is the contributor name #row[2] is the contributor address #row[3] is the contributor postal code #row[4] is the contributor postal code FSA #row[5] is the contributor location or neighborhood #row[6] is the amount #row[7] is the contribution code #row[8] is the type (usually cash) #row[9] is some kind of contributor code #row[10] is the contributor type (individual vs. corporation) #row[11] is the candidate name #the new rows in d (for reference) #row[0] is the unique ID #row[1] is the contributor name #row[2] is the contributor postal code #row[3] is the contributor location or neighbourhood #row[4] is the amount #row[5] is the type #row[6] is the contributor type #row[7] is the candidate name from operator import itemgetter d.sort(key = itemgetter(2, 1)) ##this sorts by postal code and by name for row in d:#in this instance you want to create a unique list of nodes for candidates and postal codes #you'll treat people differently, later. p.append(row[2]) c_nospace = str(row[7]).replace(' ','').replace(',','_').replace('-','').replace('.','').replace('&','').replace('(','').replace(')','')##yeah yeah yeah I probably could have used REGEX here c.append(c_nospace) contribution_create, = graph_db.create(node(contribution_id = 'ID' + str(row[0]), amount=row[4])) contribution_create.add_labels("contribution") p2 = list(set(p)) #create a list of unique values of postal codes c2 = list(set(c)) #create a list of unique candidates for row in p2: postcode, = graph_db.create(node(p_name = str(row))) postcode.add_labels("PostalCode") for row in c2: #Adam Sit was a candidate and also someone named Adam Sit made a contribution. So I added the 'C_' to make sure there was no error. # f.write(write_can) candidate, = graph_db.create(node(candidate_name = str(row))) candidate.add_labels("Candidate") #now you can go through each line of the dataset, creating nodes if they are unique

Now that I have created the array, I can continue by finishing off my creating the relationships and other nodes.

#the reason why you have the next part is that someone could be donating to more than one candidate or to the same candidate twice. contributor1 = '' pcode1 = '' for row in d: contributor2 = str(row[1]).replace(' ','').replace(',','_').replace('-','').replace('.','').replace('&','').replace('(','').replace(')','') if contributor2 in cont and pcode1 != pcode2: contributor2 = contributor2 + str(row[0]) #this ensures there is no duplicate contributor names who are not the same person. Trust me. cont.append(contributor2) pcode2 = row[2] candidate = str(row[7]).replace(' ','').replace(',','_') if contributor1 == contributor2 and pcode1 == pcode2: ###if they are the same then you do not have to create a new node, just a new contribution ###if they are NOT the same then a new contributor node gets created ## you do this because you could have two people with the SAME name making a contribution. You figure this out by throwing the postal code into the mix string1 = 'MATCH (a {contributor_name: "' + contributor1 + '"}), (b {contribution_id: "ID' + str(row[0]) +'"})' string1 = string1 + ' CREATE UNIQUE a-[:CONTRIBUTED]->b' query1 = neo4j.CypherQuery(graph_db, string1) go1 = query1.execute() string2 = 'MATCH (c {contribution_id: "ID' + str(row[0]) +'"}), (d {candidate_name: "' + candidate + '"})' string2 = string2 + ' CREATE UNIQUE c-[:RECEIVED]->d' query2 = neo4j.CypherQuery(graph_db, string2) go2 = query2.execute() else: #here means the contributor is new. (1) Set up the contributor. (2) Set up their relationship with their postal code and their donation f.write('CREATE (' + contributor2 + ':contributor {contributor_name:\'' + contributor2 + '\', type:\'' + row[6] + '\'})\n') contributor, = graph_db.create(node(contributor_name = contributor2)) contributor.add_labels("contributor") string1 = 'MATCH (a {contributor_name: "' + contributor2 + '"}), (b {contribution_id: "ID' + str(row[0]) +'"})' string1 = string1 + ' CREATE UNIQUE a-[:CONTRIBUTED]->b' query1 = neo4j.CypherQuery(graph_db, string1) go1 = query1.execute() string2 = 'MATCH (c {contribution_id: "ID' + str(row[0]) +'"}), (d {candidate_name: "' + candidate + '"})' string2 = string2 + ' CREATE UNIQUE c-[:RECEIVED]->d' query2 = neo4j.CypherQuery(graph_db, string2) go2 = query2.execute() string3 = 'MATCH (e {contributor_name: "' + contributor2 + '"}), (f {p_name: "' + pcode2 + '"})' string3 = string3 + ' CREATE UNIQUE e-[:LIVES]->f' query3 = neo4j.CypherQuery(graph_db, string3) go3 = query3.execute() contributor1 = contributor2 pcode1 = pcode2 f.close() v.close()

EDIT: Post the nodes first (obviously) and make sure the nodes and relationships are in the same box when entering them. Don’t get caught like I did!

Here is a screen capture of Mitch Gold’s network.

2 Thoughts to “Toronto Elections data with Neo4j and Python part 1 of 3”

zenbot – Faking a Graph Structure with Google Sheets

January 3, 2016 at 8:52 pm

[…] the graph database! Or not. I’ve played around with Neo4j in the past, and as cool as it is, I didn’t have a bandwidth to maintain a Neo4j database on top of […]
zenbot – CBC is adopting Neo4j

January 25, 2016 at 8:04 pm

[…] second chord this struck with me was the adoption of Neo4j: I’ve played around with the free version and it is pretty powerful. CBC’s reason for adopting it was that it allows for the ability to […]

Comments are closed.

Share this:

Related posts

Sound Detector with Microview

Monkey Tank

Animating simpleheat.js

2 Thoughts to “Toronto Elections data with Neo4j and Python part 1 of 3”