Toronto Elections data with Neo4j and Python part 3 of 3

As the title suggests, this is the home stretch for my 3-part series on Neo4j and Python. This last bit is more Neo4j focussed, with Python doing most of the heavy lifting in the first two posts.

Using a 2006 elections contribution dataset, I’ve loaded into Neo4j (2.0 Community Edition) the candidates, contributions, contributor names, postal codes. Additionally, I tried to get the distance between the postal codes for some geocoding. Now to try a few simple queries to test this out.

The first one, is to see the top number of contributors by postal code, who they are and how much they contributed. In a normalized key-value schema this could be a number of joins, not to mention the added complication of ‘rolling up’ everyone’s names into one field. It can be done, but using Cypher, I found it a little cleaner. The use case for this is a candidate who might want to hit a large number of postal codes very efficiently in order to get the most potential donors and the most dollar amounts.

Here is the query, and I’ll break it all down subsequently.

MATCH (x:contribution)--(n:contributor)--(p:PostalCode) return distinct p.p_name AS Postal_Code, collect(DISTINCT n.contributor_name) AS List, sum(x.amount) AS Total_Contribution, count(DISTINCT n.contributor_name) AS Number_Contributors ORDER BY count(DISTINCT n.contributor_name) DESC LIMIT 500

So the first match is to get the relationships between contributions to contributors all the way through to postal codes (“MATCH (x:contribution)--(n:contributor)--(p:PostalCode)“).

The second part is returning what I want from that set of nodes:

  • p.p_name AS Postal_Code
  • collect(DISTINCT n.contributor_name) (I need the distinct, otherwise contributor names were double-counting if they contributed more than once.)
  • the total contribution for that postal code (sum(x.amount) AS Total_Contribution)
  • and finally the total number of contributors, which I order in descending order.

contributorList1_Neo4j_part3

That’s all well-and-good but what if I wanted to know what candidate they contributed to? Simple, I just add a Candidate node and pull candidate_name like the following:

MATCH (can:Candidate)--(x:contribution)--(n:contributor)--(p:PostalCode) return distinct can.candidate_name, p.p_name AS Postal_Code, collect(DISTINCT n.contributor_name) AS List, sum(x.amount) AS Total_Contribution, count(DISTINCT n.contributor_name) AS Number_Contributors ORDER BY count(DISTINCT n.contributor_name) DESC LIMIT 500

For fun, let’s see if there were any “divided” areas. That means areas that were hotly contested by two or more candidates. I used a similar ‘match’ statement, but instead returned a list of postal codes and candidates.

MATCH (can:Candidate)--(x:contribution)--(n:contributor)--(p:PostalCode) return distinct p.p_name as Postal_Code, collect(Distinct can.candidate_name), count(Distinct can.candidate_name) AS Number_of_contributions ORDER BY count(Distinct can.candidate_name) DESC LIMIT 500

I’d love to see the lawn signs in this neighborhood šŸ™‚

contributorList2_Neo4j_part3

Related posts