Category: Dataviz

  • Animating simpleheat.js

    Animating simpleheat.js

    As a POC (proof-of-concept) for a presentation, I wanted to generate a heat map, similar to the kind you see for Toronto Raptors players and for other examples, such as for airline flights or overlays on physical world maps.

    Because I was presenting this in front of a fairly mixed audience, I did not want to make assumptions about how well they could make the mental ‘leap’ while I explain the narrative I had built for them. In other words, for each part of the story I was telling them, I wanted them to see the transition from one idea to another. Therefore, I decided to find a way to animate the heatmap to make the transition as smooth as possible.

    A great lightweight heatmap is simpleheat.js (demo) that I integrated into a wireframe in the form of a web page. I stripped my presentation down and posted it to GitHub here.

    Because this is just a concept, the data is not actually generated by the client; it was data that I created for this purpose (see my blog post on creating placeholder data). For the demo, there are actually several data sets.

    However, the beginning heatmap result and the final result at the end can be really different than one another. Therefore, if you just hit ‘load’ on the next set of data to re-populate the heat map, it will ‘jump’ and not be a smooth transition.

    To get a smooth transition, I implemented an animation concept called inbetweening which in this case loads all the states in between the beginning and the end of the two heatmaps. Because this was just a fast proof-of-concept, there is lots of room for code optimization to make an even smoother transition and error checking.

    The order of operations is the following:

    1. Bring in the first json dataset and populate heatmap.
    2. Load in the second json but don’t populate the heatmap, yet. Make sure they have the same xy coordinates and that those coordinates in the second json are in the same order as the first json. (This is assumed to be generated at source. I didn’t add a try-catch to the code.)
    3. Find the two coordinates that have the greatest difference in z-values. This means running through the list, loading the values into the array and performing Math.max on the array. (Note this is a different ‘maximum’ than ‘themax’ variable in the code.)
    4. The max value from the step above is the number of incremental changes.
    5. The function looptheloop() is called as many times as needed, as determined by the steps 3 and 4 above. This function looks at the difference of the z-values for each coordinate.
    6. If there is no difference in the z-value for a given set of coordinates, do nothing and ride out the loops until the end (see ‘Same value’ section below).
    7. If there is a difference in z-value for a given pair of coordinates, it updates the starting json by moving it gradually closer and closer to the target json (see ‘Different z-value’ section, below).
    8. In the code, you will see there is an incremental update to a value called ‘themax’, but this might not be needed in your case. I used it to avoid the predominance of too much red in one of the results I was seeing.
    9. Therefore, the starting json is updated and reloaded, step-by-step. Once the starting and target json are the same, the incremental refreshing of the heatmap stops.

    Same z-value for source and target?

    Different z-value? Change and re-load by increments.

    In developing this code, I had a look at some dataframe capabilities to avoid so many loops. However, because I wanted to get this out quickly, I did not finish exploring javascript dataframes. If the number of loops causes your animation to grind, then a dataframe would be highly suggested.

    Want a smoother animation? Simply make the increments to be fractions (right now the fractions are 1 at a time, but could be 0.5 or 0.25 as simpleheat can handle it). Then call looptheloop function more often.

  • Placeholder heatmap data

    Placeholder heatmap data

    I recently needed to create an interactive wireframe concept using a heatmap. This was for a presentation and I needed to walk the client through a particular story. As a result I needed just enough data to populate the heatmap but had to somehow create the data myself.

    Getting the right data wouldn’t be straightforward: The heatmap, using json, requires both an x-and-y coordinates, and then a third value to denote intensity of colour on it (I’m calling it z or z-value). Since this was a POC (proof-of-concept) I needed just enough for it to work, but didn’t want to over-engineer it.

    Even though the heatmap is about 750 by 500 pixels, you don’t need 500 rows and 750 columns in Excel to get it to work. In my example I had 30 rows and 47 columns, and then I just multiplied them to get them to reach across to the full range of pixels in the heat map. This is because the points are fuzzy and spread out and cover any gaps. You’ll see this in the Excel example for download, below.

    Since I was kind of hacking this out, I simply created a grid in Excel which had x-coordinates along the top, y-coordinates on the side, and then in the middle, the z-values. In order for me to see roughly what it would look like, I used conditional formatting in Excel to show colour values based on a scale.

    Below the Excel ‘heatmap’, I concatenated values in the format of the json file, multiplying the coordinates so it would fill in the whole canvas in html. Then I cut the ‘json’ file into somewhere the javascript would find it. Running the simpleheat.js code on the placeholder file was the final step.

    In case the explanation isn’t enough, I’ve made the Excel file available for download here.

  • Exploring Etobicoke and Mississauga High Schools

    Since one of my children is getting close to middle school, that puts high school on the radar. As a result, we decided to narrow our search to high schools because it would probably get too complicated to also include elementary schools (in addition to all the other factors, such as proximity to transit, ambiance and so on).

    In an attempt to narrow down the number of neighbourhoods that we are considering we thought it best to look at schools. I know that there is the idea of fit, finding the best school for that child. However I thought it would be interesting to look at objective measures.

    Interestingly I found an open data set on Ontario open data This had not only addresses, but also their level (high school or elementary school) percentage of students identified as gifted, percentage of grade nine students achieving the provincial standard in mathematics, percentage of grade 10 that passed their OSSLT on their first attempt.

    There was also a bunch of other factors that I don’t care about, like percentage of students whose first language is not English (kids are in French school at the moment) and percentage of children who live in low-income households. Interestingly, these are all things that I’ve heard the Fraser institute factors in to their ranking. So I think what I found is an older version of a typical dataset that the Fraser institute works with.

    You can also download slightly older data from the Fraser institute, but getting the two datasets to link is a manual process, so it’s not feasible for the GTA. Plus the address data for the Fraser institute is limited to the city level (and in the case of “Toronto” is too big to be meaningful).

    So going with the more detailed and more interesting open government set, I chose factors that related to high school, based around academic performance. Preferably a French school, but potentially immersion as a second choice. Also, the west end of Toronto and Mississauga are on our short list, so the Tableau default view takes that into account.

    I needed to use jitter for geolocation because, unfortunately Tableau only goes with FSA for Canadian postal codes, so you lose precision. Also, some schools are pretty much the same address (École secondaire Toronto ouest and ESC Frère-André for example). There is a good tutorial on the Tableau support site.

    I also saw some weird stuff in the data: the percentage of gifted students at some schools are in excess of 30%, which seems really high (not represented in the viz). Things to dig into at a later time.

  • Sankey Diagram using D3.js Part 2 of 2

    The chart below shows the flows of money to Toronto mayoral candidates in 2006. What follows is a quick explanation and a few observations. Then I follow up with a few short tips on how I got the visualization up and running.

    2006 Toronto Election Contributions
    By Region, Dollar Amounts and Candidate

    [iframe width=”600″ height=”520″ src=”https://zenbot.ca/elections.html”]
    Source: City of Toronto

    Note that I was coding anything ‘Outside Toronto’ to be more specific and got part-way (you can see Kingston and Ottawa as some locations). Basically outside Toronto extends to Mississauga, Oakville and the Golden Horseshoe. It was possible to get more specific but I didn’t for this visualization. ‘Central Toronto’ seems to be not downtown, but includes Yonge/Eglington, etc.

    You can also see that proportionally, Stephen LeDrew received a relatively large amount of corporate donations (the orange links) while David Miller received none. You can also see that David Miller received money, not only from downtown, but everywhere. You can also see that the vast majority of the money is coming from individuals (blue) versus corporations (orange).

    If I were going to push the analysis further, I could get number of donors per candidate. I would also have loved to get 2009, as it is more recent, but like I mentioned in part 1 that wasn’t available through the city of Toronto’s website. I am sure comparing 2006 to 2009 would have been very interesting even if the candidates are completely different.

    To get the visualization working, you not only need the latest D3.js library, but also the sankey.js plugin which should both be included in your header:

    script type="text/javascript" src="js/d3.v3.min.js" charset="utf-8">/script>
    script type="text/javascript" src="js/sankey.js" charset="utf-8">/script>

    Next, I added some in-line styling:


    .link {
    fill: none;
    stroke-opacity: 0.4;
    }
    .link:hover{
    stroke-opacity: 0.6;
    }
    svg {
    font: 12px sans-serif;
    }

    In the main body, you need something to attach the svg chart to. In this case I picked the following:

    h1 id="chart"

    And finally the main bulk of the code. If you run into problems, please feel free to comment, below.

    //a big thank you to Mike Bostock. Most of this code is originally his
    //// modified for the purposes of this demonstration
    var margin = {top: 10, right: 1, bottom: 6, left: 1},
    width = 600 - margin.left - margin.right,
    height = 500 - margin.top - margin.bottom;
    var formatNumber = d3.format(",.0f"),
    format = function(d) { return "$" + formatNumber(d); },
    color = d3.scale.category20();
    var svg = d3.select("#chart").append("svg")
    .attr("width", width + margin.left + margin.right)
    .attr("height", height + margin.top + margin.bottom)
    .append("g")
    .attr("transform", "translate(" + margin.left + "," + margin.top + ")");
    var sankey = d3.sankey()
    .nodeWidth(15)
    .nodePadding(10)
    .size([width, height]);
    ////this colarray is to avoid going into the JSON document to change the colors of the link
    var colarray = {
    'Individual': '17,203,235',
    'Corporation': '252,189,53'
    }
    var path = sankey.link();
    /////////////here is where the sankey should kick in....
    d3.json("js/electionJSON.json", function(election) {
    sankey
    .nodes(election.nodes)
    .links(election.links)
    .layout(32);
    var link = svg.append("g").selectAll(".link")
    .data(election.links)
    .enter().append("path")
    .attr("class", "link")
    .attr("d", path)
    .style("stroke-width", function(d) { return Math.max(1, d.dy); })
    // .style("stroke-width", "100")
    .sort(function(a, b) { return b.dy - a.dy; })
    .style("stroke",function(d) { return "rgb(" + colarray[d.contribution_type] +")"; })
    link.append("title")
    .text(function(d) { return d.source.name + " → " + d.target.name + "\n" + format(d.value); });
    var node = svg.append("g").selectAll(".node")
    .data(election.nodes)
    .enter().append("g")
    .attr("class", "node")
    .attr("transform", function(d) { return "translate(" + d.x + "," + d.y + ")"; })
    .call(d3.behavior.drag()
    .origin(function(d) { return d; })
    .on("dragstart", function() { this.parentNode.appendChild(this); })
    .on("drag", dragmove));
    node.append("rect")
    .attr("height", function(d) { return d.dy; })
    .attr("width", sankey.nodeWidth())
    .style("fill", function(d) { return d.color = color(d.name.replace(/ .*/, "")); })
    .style("stroke", function(d) { return d3.rgb(d.color).darker(2); })
    .append("title")
    .text(function(d) { return d.name + "\n" + format(d.value); });
    node.append("text")
    .attr("x", -6)
    .attr("y", function(d) { return d.dy / 2; })
    .attr("dy", ".35em")
    .attr("text-anchor", "end")
    .attr("transform", null)
    .text(function(d) { return d.name; })
    .filter(function(d) { return d.x < width / 2; }) .attr("x", 6 + sankey.nodeWidth()) .attr("text-anchor", "start"); function dragmove(d) { d3.select(this).attr("transform", "translate(" + d.x + "," + (d.y = Math.max(0, Math.min(height - d.dy, d3.event.y))) + ")"); sankey.relayout(); link.attr("d", path); } });

  • Sankey Diagram using D3.js Part 1 of 2

    Among other things , I’ve been itching to master some D3.js tricks, mainly because the plugin lets you do some pretty gorgeous stuff, and there’s a wide variety of visualizations which are highly customizable. Recently, I finally had a few minutes to try something out. Since my work entails working with Statistics Canada data, or anything to do with start ups in Ontario I figured I would go for something that has nothing to do directly with that world.

    This led me to tracking down some elections donation data from the city of Toronto’s open data repository which was the donor list from the 2006 mayoral election. The title said it included 2009 as well, which sucked me in because that’s what I really wanted to use. I was disappointed when I found out it was only 2006, but figured it was OK because either way I was just playing around.

    The first part of this two-part series will describe how to lay out the data to get it ready for a Sankey diagram. The second part will talk about how I actually got the visualization going in D3.js. The data as it is presented shows each donor, their postal code, and whether they are a corporation or not and (of course) the candidate who received the donation. Sankey diagrams don’t need that level of detail, and I just wanted to show the movement of money from different parts of the GTA (and beyond) and how that money flowed to each candidate. So the first thing you do is you summarize by FSA (the first three digits of the postal code), while keeping the dollar amount, candidate name and type of donor (corporation vs. individual). I don’t really care how you do it: just run a pivot table, or something, just get that dollar amount by FSA.

    Next, get the area names by FSA region from Wikipedia so that you can distinguish different areas in a readable manner. Now you want to present the data in well-formed JSON like this:
    {"nodes":[
    {"name":"Brockville"},
    {"name":"Central Toronto"},
    .....
    ],
    "links":[
    {"source":0, "type": "Individual", "target":26, "value":500},
    {"source":5, "type": "Individual", "target":16, "value":200},
    ....
    ]}

    A quick note here that sankey.js (the library plugin to include with D3.js) is kind of picky and “value” (as above) is pretty immutable.
    Don’t get caught like I did by using “amount” instead of “value”.

    Finally, every single node (both region and mayoral candidate) gets included in the list of “names”. Then the “source” and “target is whomever is on the list of nodes, in order, starting at zero. In the example above, Brockville = 0. So now that you have the JSON explained, just create the JSON file and you are ready to go to part 2.