Anyone interested in visual data analysis should stop by the Ask ET forum. Here are some great posts:
Archive for the 'dataviz' Category
Now that I’ve launched the site and got all the boring stuff (API calls, authentication, billing, etc.) out of the way I can now again focus on the fun stuff. I wrote a scatterplot function tonight that will help show how long the average deal is, a key piece of data for calculating a sales pipeline. Here’s a quick taste of how it looks.
The only thing keeping me from pushing this live tonight is the scaling. If I use the max value, one outlying deal can throw the whole thing off. I’m going to make it scale to incorporate at least 70-90% of the data points.
I also added an average monthly Won value and average monthly new deals generated under “How are we Doing Lately.”
xkcd may have made the best infographic of the year today.
In other news, I’ve been working on a super secret project within this space. I hope to launch soon, but it’s been taking me longer than I expected. Here’s a taste:
Here’s an update to the games above .500 graph I posted earlier. It shows just how good the Phillies are this year.
Looking back it should have been obvious. In my previous post about the falling automobile fatality rate I hinted it might be due to recessions and/or advances in car technology and their introduction rate to the road. The answer turned out to be much simpler.
From 1979-2010 the number of automobile related fatalities fell from 51,093/year to 32,708. In 1982 the National Highway Traffic Safety Administration started tracking which of these deaths were alcohol related. In 1982 the percentage was 59.6%, but by 2008 it had fallen to 37.2%.
From 1982-2008, not only did all the gains in highway safety come from a reduction in alcohol related deaths, they even offset an increase in non-alcohol related deaths.
To avoid confusion, positive values in this chart are always “good” (ie. unemployment rate going down is green, death rate going up is red, etc).
In retrospect this seems like an obvious answer, but during my search for the cause of the trend I checked against population change, total miles driven, recessions, unemployment, and other factors. The thing that lead me to track alcohol vs non-alcohol related fatalities was that I found a strong correlation between a drop in deaths and higher unemployment, but a weak correlation between fewer deaths and fewer miles driven. Also, the fatality rate didn’t spike back up once unemployment started to go back down. There’s a strong correlation between spikes in unemployment and non-alcohol related fatalities, but you can see the fatality rate return once employment comes back. It’s not shown in the chart above, but strangely we don’t see a large decrease in the miles driven per person when unemployment is high. I can only guess that having free time during the day leads to more opportunities to drive that are for, whatever reason, less likely to result in a fatal accident.
Alcohol related deaths, however, see fluctuating but continuous improvement with one notable exception, 1984-1985 when the number of deaths increased by 1,850 (7.4%) from the previous year. I’m not sure what happened there.
Now that we know the primary cause of the reduction in deaths, the question becomes how did we do it? My guess is a consistent anti-drinking and driving media campaign resulting in fewer drunk drivers on the road? Governments and manufacturers have put a lot of effort into making cars safer, but it doesn’t appear to have had much of an effect over the last 30 years. Future improvements in the fatality rate will be harder and harder to come by.
In 1979 there were over 51,000 autombile related deaths in the United States. Since then we have made dramatic improvements to auto safety. The data clearly shows this.
Where did this improvement come from? If we track improvement in auto deaths and control for population, we can show the year to year improvement.
The large improvement zones are ’79-82, ’88-91, and we seem to be in the middle of a monster improvement since 2006.
I don’t have the answer yet for what is working, but here are some interesting dates.
Year the drinking age became universally 21: 1984
First mandatory seat belt law (NY): December, 1984
Passive restraints (Airbags or Automatic seat belts) required in all cars: 1989
I think there’s a time window between something like the 1989 airbag law and when a significant number of airbag equipped vehicles make an appearance on the road. I have no data on that yet, but the case for a delayed effect seems accurate. Another hypothesis isn’t drivers or cars, but improvements to roads and/or law enforcement. Most major highways around me have added reflectors, signs, and rumble strips to help prevent accidents. If you have any other hypotheses, post them in the comments.
Update: A lot of great comments, thanks! There are also a bunch over at the Hacker News thread. Many point to miles driven and automobile stability systems as good data to incorporate into the model. I also changed the years of “large improvement” which were off before. They now line up pretty well with recessions, but what is interesting is that we don’t give much back once the economy ramps up again. Maybe everyone buys a newer, safer car then
Update: Please read my follow up post on this. The answer is alcohol related accidents.
Figuring out the best way to display the names and values in a treemap has been difficult since their lengths and allowed space vary so widely. I’m getting close, though.
Some of the better looking treemaps I’ve seen out there use small white borders and rounded corners to call attention to boundaries. What do you think? Does it improve or hinder readability?
Classic Tufte thinking would eliminate the borders, but I can’t help but think using them is “better” in this case. Treemap best practices haven’t been established yet.
Putting aside the the color scheme, which treemap looks better?
I’m leaning strongly toward going monochromatic for the treemap colors unless they are tied to a data point. As for what data points I could use, some ideas: election results, population levels, avg temperature, SAT scores, etc. Here’s the green treemap: