Collected

Home

Create collection

Browse collections

Join Collected


Username


Password


Forgot your password?


vizdat

A collection of:

visualzing quantitative and analytical data   

By:

khailey   

Visits:

1,879   

View:

 
Add to favorites |

Map your Twitter friends


FlowingData 23 Feb 2012, 9:45 am CET

Map your friends

You'd think that this would've been done by now, but this simple mashup does exactly what the title says. Just connect your Twitter account and the people you follow popup, with some simple clustering so that people don't get all smushed together in one location.

[Theron17 via Waxy]

Easy Two-Panel Line Chart in Excel


Peltier Tech Blog 23 Feb 2012, 9:00 am CET

I’ve written often about panel charts, and I’ve made some simple ones, but I don’t seem to have explained the process in enough detail for people to reproduce these simple ones with their own data. Easy Two-Panel Column Chart in Excel was the first of several quick tutorials on easy panel charts. This article shows the instructions for a two-panel line chart.

Two Panel Line Chart

Let’s use the following simple data for this exercise.

Data for two-panel chart

Values for the two data series fall within vastly different ranges, as seen in this line chart.

Starting Line Chart

Problems with Primary and Secondary Axis Line Charts

When we move the Secondary series to the secondary axis, there is an apparent relationship between the series, which changes as the axis scales are changed. It is hard to separate the data, despite the presence of the two axes. Who can figure out, let alone remember, which data should be measured against which axis? Our eyes see the series overlapping, and our brain interprets them as having overlapping data ranges.

Line Chart with Primary and Secondary Y Axes

Here the series start out far apart, one much higher than the other for several months. By the end of the year, they have switched places.

Line Chart Fixed to Unhide Primary Data

In contrast, the data in this chart starts the same, but after several months the two series suddenly diverge.

Line Chart with Color Coded Primary and Secondary Axes

But it’s the same data in both of the previous charts. Through intentional or inadvertent axis scale dishonesty, you can tell any story you want.

The Panel Chart, Step-by-Step

You can avoid the problems with two-axis charts by plotting the data in separate panels. Start your panel chart by making a line chart with the data.

Starting Column Chart

Right click the Secondary series, choose Format Series (or similar, it varies with Excel version), and select the Secondary Axis option.

Column Chart with Primary and Secondary Y Axes

We have primary and secondary Y axes, but only the primary X axis. Add the secondary X axis. In Excel 2007/2010, go to the Chart Tools > Layout tab, click on Axes, and for Secondary Horizontal Axis, select Show Left to Right Axis. In Excel 97-2003, go to the Chart menu > Chart Options, and on the Axes tab, under Secondary X Axis, choose the Category option.

Column Chart with Primary and Secondary X and Y Axes

Explanation of Required Axis Formatting

What we want is a chart divided into two panels like the chart below shows. I’ve temporarily hidden the plotted data, but you don’t need to.

The primary axis above spans 0 to 200. To plot the Primary series in the bottom panel, the primary Y axis must span 0 to 200 in the bottom panel, or 0 to 400 overall (the same amount in the top and bottom panels).

The secondary axis above spans 0 to 2000. To plot the Secondary series in the top panel, the primary Y axis must span 0 to 2000 in the top panel, or -2000 to 2000 overall (the same amount in the top and bottom panels). Since the secondary axis crosses at zero, it forms the dividing line in the middle of the chart.

If you have nice data, or if you’re good at algebra, you can pick axis scale parameters that align the axis tick marks on the left and right sides of the chart. The temporary gridlines in this chart show how nicely aligned these tick marks can be.

Frame of Two Panel Column Chart

So the Primary series fits into the bottom panel. . .

Two Panel Line Chart Primary Data in Bottom Panel

. . . and the Secondary series fits into the top panel.

Two Panel Line Chart Secondary Data in Top Panel

Apply Axis Settings and Continue

Here is how the chart looks with the desired axis scales applied, and with the data unhidden.

Two Panel Line Chart, Axes Need Fixing

We will use the secondary horizontal axis as the panel separator. Right click the secondary (right) Y axis, choose Format Axis, and change the Horizontal Axis Crosses setting to Automatic (which means it crosses at zero).

Two Panel Line Chart, Y Axis Labels Need Fixing

We still need to do some cleanup. The month names in the middle of the chart are redundant, and easy to remove. Right click on the axis, choose Format Axis, and for Axis Ticks and Tick Labels, choose None.

Two Panel Line Chart, Axes Fixed

There is now an open space across the top of the chart where the secondary horizontal axis was. We can close it by applying an appropriate border color to the plot area.

Two Panel Line Chart

We need to apply some number format magic to the Y axis labels: we only want primary axis labels in the bottom, and secondary axis labels in the top, of the chart. I’ve written an article about Number Formats in Excel, but there’s room for a quick refresher class.

The secondary (right) axis is easy. We need a format like “0;;0;”. A number format has up to four elements, separated by semicolons. The first indicates what format to apply to positive numbers, the second to negative numbers, the third to zero values, and the fourth to text. A number format of “0″ means simply show the numerical value without any decimal digits; “0.0″ would mean show one decimal digit, “0.00″ two decimal digits, etc. A missing format means don’t show anything. So this format tells Excel to format positive and zero values as whole numbers, and don’t show anything else.

The primary (left) axis is a bit trickier, but we can use simple conditions to turn formats on and off. The format we need is “[<=200]0;”. The expression in square brackets is the condition that sets the format. It says to display any number of value equal to or less than 200 as a whole number, and don’t display anything else. These conditions override the default positive-negative-zero sequence, but we can only apply two conditions, like “[one condition]format;[another condition]format;[all other numbers]format;[text]format”.

Right click each axis and choose Format Axis. Select the Number tab of the dialog, click on Custom, and enter the appropriate format into the box (without the quotes). Don’t forget to click the Add button, or Excel will discard your carefully typed format.

Two Panel Line Chart

You can apply data labels to identify the series in your chart. Here I’ve applied series name labels to point 4 of the Primary series and to point 2 of the Secondary series.

Two Panel Line Chart

We can’t resize the data labels, but I’ve removed that unsightly line wrapping in the Secondary label by shrinking the font size by 0.5 points. A better decision would have been to use free-floating text boxes for these labels, but I like to use data labels which are anchored to the data points.

Two Panel Line Chart

You can stretch the chart if you want more resolution in the Y values. (My data labels needed further shrinking. Should have used text boxes.)

Two Panel Line Chart

It would have been easy enough to switch the panels. In the next chart, the primary Y axis is scaled from -200 to 200, and the number format is ”0;;0;”. The secondary Y axis is scaled from 0 to 4000, and the number format is ”[<=2000]0;”.

Two Panel Line Chart

Peltier Technical Services, Inc., Copyright © 2011.   Licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.   Learn how to create Excel dashboards.

Introduction to R and Revolution R Enterprise: Slides


Revolutions 23 Feb 2012, 1:08 am CET

If you missed this morning's webinar, Revolution R Enterprise, 100% R and More, I've embedded the slides below. Interestingly, about half of today's participants were SAS users, and the remainder R users. The first section introduces open-source R, and the second describes the additional features of Revolution R Enterprise.

Unfortunately we had a technical hiccup with the recording (a dropped internet connection in the producer booth), so we can't make a replay available as we usually do with our webinars. But you can see a recording of the last time I have this presentation on YouTube.

Revolution Analytics webinars: Revolution R Enterprise, 100% R and More

Data cannot save us from ourselves


The Excel Charts Blog 23 Feb 2012, 12:51 am CET

Clay Johnson, author of The Information Diet wrote on Twitter:

Redistricting should be done with data & open source software, not by humans.

Let me be completely honest: I don’t like golden calves and I can spot two in this single sentence: technology and data. Yes, redistricting should be done with data and technology, but the moment you add “not by humans” you ruin everything. The right sentence is:

Redistricting should be done by humans, using data and open source software.

Data-driven redistricting can prevent basic and shameless gerrymandering, but you can’t ask the computer to discover the “right” district structure because there is none.

If you want to algorithmitize everything you end up with a 1:1 map (remember Borges’ On Exactitude in Science?). And if you don’t, how do you select the “right” data? Meta-data? Meta-meta-data? Factor analysis? Cluster analysis? And what if the other people don’t agree with you? Don’t be naïve: you can’t remove the “human factor”, you’re just moving it around.

Data and open source software can add a layer of rationality and legitimacy to your decisions but you can’t ask the computer to decide for you. Be human, decide, and fight for what you believe in.

Pro tip: never use default formats.

 

______________________

Want to create better dashboards? Try the Excel Dashboard Tutorial.

Post from: Excel Charts Blog. Data cannot save us from ourselves

StumbleUponLinkedInGoogle+Other

You may also be interested in:

  1. Data Visualization, Ikea style

The original post is titled Data cannot save us from ourselves , and it came from The Excel Charts Blog .

Global Design Trends 2011 {An Infographic}


visual data | Scoop.it 22 Feb 2012, 7:32 pm CET

From vintage-themed photographs, to vibrant vector graphics, here's an infographic detailing the top global design trends of the year.

After 8 years, 17 million images and over 200 million downloads, Shutterstock is one of the world’s leading marketplaces for visual media. We have artists and photographers from more than 100 countries, and customers in more than 150. But perhaps most significant about these milestones is that it has led to thousands of image searches each day – giving us valuable insight into design trends around the world.

From vintage-themed photographs, to vibrant vector graphics, here’s an infographic detailing what visual stories were told over the last year.

Candidate Fundraising vs. Super PAC Spending in January


Matt Stiles // The Daily Viz 22 Feb 2012, 2:11 pm CET

From Huffington Post

Reports about January’s fundraising numbers, released on February 20, have focused on two narratives: Mitt Romney’s limited fundraising and high burn rate and the role that super PACs are playing in an increasingly contested Republican primary. HuffPost decided to combine those narratives together to make a graphic of candidate and super PAC fundraising and spending in January.

Patterns of daily life in Netherlands from Above


FlowingData 22 Feb 2012, 9:36 am CET

In a similar fashion to their work in Britain from Above, CGI and animation group 422 South map the daily patterns of those in the Netherlands in VPRO's production of Netherlands from Above. It's hard to get a grasp on what exactly I'm seeing, since I know next to nothing about the Netherlands and the video narration is in Dutch, but the visuals are beautiful. Planes fly, cars drive, and patterns emerge. The technique never seems to lose its entertainment value.

Check out the short making-of video, too, which also includes folks from Stamen describing their work with the interactive portion of the feature.

[422 South | Thanks, Ben]

Another Metaphor for Visualization: Writing


eagereyes 22 Feb 2012, 6:36 am CET

Andrew Gelman recently wrote a blog posting in which he draws an interesting comparison between writing styles and graphics styles. I think he’s on to something, and the comparison can be taken a bit further to illustrate some common misunderstandings around visualization.

Gelman first describes the different ways in which people used to write, even in the context of scholarly communication. There were storytellers and people who wrote dialog, etc. He talks about this in terms of the 1700s, but I’ve seen much more recent papers that appear really peculiar through the eyes of somebody who’s used to the way we write today (e.g., some of Einstein’s papers are like that). Gelman then draws his comparison with graphical communication of data:

When it comes to data graphics, though, we’re back in the freewheeling 1700s. Maybe that’s a good thing, I don’t know. But what I do know is there’s no standard way of displaying quantitative information, nor is there any acceptance of the unique virtues of the graphical equivalent of clear prose.

A very interesting point, and one that I think can be taken quite a bit further as a useful metaphor for visualization, infographics, and related endeavors.

Expectations

Say you want to read about some facts, maybe the Thirty-Year War that raged in Europe in the 1600′s. But instead of reading up on it in a history book or an encyclopedia (or maybe Wikipedia), you happen to pick a book of poetry from the time. You’re not going to get much out of that, even though the book may be of great literary value (the war inspired many poems, plays, and other books for many years after). Why is that?

It’s not a question of information content, it’s a question of what your expectations are and how you determine what satisfies your need for information. Perhaps you’re looking for factual information, displayed in a no-nonsense and unbiased way. But perhaps you want to know how people experienced that war, how they dealt with the grueling atrocities that were committed, or how they managed to move on after it was finally over. Those things are sometimes more clearly conveyed in poetry or song, and they may be a lot more compelling that way than a dry history text. But if what you’re looking for is the factual treatment, none of that will sway you.

We need to be aware of our expectations when we judge a data visualization or infographic. Perhaps its goals are very different from our own, and it was made for people with very different expectations. It’s easy to just take this as an anything-goes attitude, but things aren’t quite that black-and-white.

Standards

The other part of Gelman’s argument are standards. The funny thing about standards, of course, is that there are many to choose from, and they also tend to evolve over time. Just like you would have a hard time getting a paper accepted that was written in the style that was common 100 years ago in journals, the way we think about how data should be presented has changed quite a bit.

Part of that is due to technology: cross-hatching and other ways of making elements of charts look distinct are no longer needed because of the availability of color and fine-grained grayscale (unfortunately, that doesn’t mean you don’t see those anymore, though). We also have a lot more techniques at our disposal today than 100 or even 50 years ago, though that doesn’t seem to have translated into much use outside of the visualization field itself.

What Gelman alludes to in his last sentence quoted above about “unique virtues of the graphical equivalent of clear prose” is, I think, Tufte’s data-ink ratio. A high data-ink ratio is equivalent to very concise, matter-of-fact writing without redundancies, embellishments, or elaborations. It’s not a big stretch to come up with an information-ink ratio, or an information-words ratio: the number of words (or amount of ink) on a page that do not carry unique information should be minimal.

That is not a contrived idea, but a common way people think about academic writing. However, such papers tend to be dry and often border on unreadable. I find a lot of writing in mathematics almost impossible to parse because the writers like to be extremely efficient and only spell out what absolutely needs to be. Writing in visualization, fortunately, tends to be a lot livelier and more geared towards the reader. Compared to a lot of applied computer science writing, psychology papers tend to read like long essays with lots of little flourishes and forays into philosophy and other areas.

Which style is the right one? Clearly, different fields have different standards. A paper from another field will appear more terse or wordy than what one is used to, but that does not necessarily make one way of writing – or creating visual depictions of data – better or another style wrong.

Style

When I talk to artists or designers, they often ask me about people in visualization that have distinctive styles. I do recognize some people’s work sometimes just by looking at the visuals, but I have a hard time describing anybody’s style as clearly distinctive and consistent. But the point is that in addition to the broader style of a field, there are individuals who can carve out a niche. Novels are a particular style of writing, but there is clearly not just one way of writing a novel, and many authors have very distinct styles. We don’t have that in visualization quite yet, and perhaps it’s not something to strive for in an academic setting.

There are products that encourage certain styles, though: Tableau’s visualizations look very different from, say, Excel. Sure, one can always torture each of the systems to look like the other, but the defaults and the kinds of settings that are easily available steer the user away from such behavior. Perhaps it’s our tools right now that have the strongest influence on our styles, whether we like it or not.

Beyond personal style, though, there is the style of the particular field. Just like there are crime novels and vampire novels, there is statistical visualization and infographics. Each of those has its own, distinct style, and each has its place. The question is not one of right or wrong, but picking the appropriate niche.

Refinement

Gelman’s point about the wide variety of writing styles in the early days of academic writing (post-Renaissance, anyway) has one more interesting parallel with visualization: development over time. The visual display of information, even in its broadest sense, is still very young and underdeveloped. There is nothing like the vast numbers of books that have laid out information using words. There is no science of how to categorize and compare visual information displays.

Consider, for a moment, that there was a time when the novel was a new idea. Before the 18th century, there were early  versions of it, but the novel as we know it today did not exist. Yet the vast majority of fiction written in the last 200 years or so is novels. Shakespeare did not write novels, and the idea to write one likely didn’t occur to him. He may even have dismissed novels, had somebody shown him one, as crude and unrefined writing that lacked purpose.

Standards change, ideas come and go, some stay and become the prevailing way of doing things. It’s way too early to know how we will be looking at data in 20, 50, or 100 years. And we have to be careful when we just lump together anything visual that’s based on data and apply the same small set of standards to it. That is not to say that anything goes, but we need to make an effort to understand the goals and ideas behind something before we attempt to make a judgment.

Another Metaphor for Visualization: Writing


eagereyes 22 Feb 2012, 6:36 am CET

Andrew Gelman recently wrote a blog posting in which he draws an interesting comparison between writing styles and graphics styles. I think he’s on to something, and the comparison can be taken a bit further to illustrate some common misunderstandings around visualization.

Gelman first describes the different ways in which people used to write, even in the context of scholarly communication. There were storytellers and people who wrote dialog, etc. He talks about this in terms of the 1700s, but I’ve seen much more recent papers that appear really peculiar through the eyes of somebody who’s used to the way we write today (e.g., some of Einstein’s papers are like that). Gelman then draws his comparison with graphical communication of data:

When it comes to data graphics, though, we’re back in the freewheeling 1700s. Maybe that’s a good thing, I don’t know. But what I do know is there’s no standard way of displaying quantitative information, nor is there any acceptance of the unique virtues of the graphical equivalent of clear prose.

A very interesting point, and one that I think can be taken quite a bit further as a useful metaphor for visualization, infographics, and related endeavors.

Expectations

Say you want to read about some facts, maybe the Thirty-Year War that raged in Europe in the 1600′s. But instead of reading up on it in a history book or an encyclopedia (or maybe Wikipedia), you happen to pick a book of poetry from the time. You’re not going to get much out of that, even though the book may be of great literary value (the war inspired many poems, plays, and other books for many years after). Why is that?

It’s not a question of information content, it’s a question of what your expectations are and how you determine what satisfies your need for information. Perhaps you’re looking for factual information, displayed in a no-nonsense and unbiased way. But perhaps you want to know how people experienced that war, how they dealt with the grueling atrocities that were committed, or how they managed to move on after it was finally over. Those things are sometimes more clearly conveyed in poetry or song, and they may be a lot more compelling that way than a dry history text. But if what you’re looking for is the factual treatment, none of that will sway you.

We need to be aware of our expectations when we judge a data visualization or infographic. Perhaps its goals are very different from our own, and it was made for people with very different expectations. It’s easy to just take this as an anything-goes attitude, but things aren’t quite that black-and-white.

Standards

The other part of Gelman’s argument are standards. The funny thing about standards, of course, is that there are many to choose from, and they also tend to evolve over time. Just like you would have a hard time getting a paper accepted that was written in the style that was common 100 years ago in journals, the way we think about how data should be presented has changed quite a bit.

Part of that is due to technology: cross-hatching and other ways of making elements of charts look distinct are no longer needed because of the availability of color and fine-grained grayscale (unfortunately, that doesn’t mean you don’t see those anymore, though). We also have a lot more techniques at our disposal today than 100 or even 50 years ago, though that doesn’t seem to have translated into much use outside of the visualization field itself.

What Gelman alludes to in his last sentence quoted above about “unique virtues of the graphical equivalent of clear prose” is, I think, Tufte’s data-ink ratio. A high data-ink ratio is equivalent to very concise, matter-of-fact writing without redundancies, embellishments, or elaborations. It’s not a big stretch to come up with an information-ink ratio, or an information-words ratio: the number of words (or amount of ink) on a page that do not carry unique information should be minimal.

That is not a contrived idea, but a common way people think about academic writing. However, such papers tend to be dry and often border on unreadable. I find a lot of writing in mathematics almost impossible to parse because the writers like to be extremely efficient and only spell out what absolutely needs to be. Writing in visualization, fortunately, tends to be a lot livelier and more geared towards the reader. Compared to a lot of applied computer science writing, psychology papers tend to read like long essays with lots of little flourishes and forays into philosophy and other areas.

Which style is the right one? Clearly, different fields have different standards. A paper from another field will appear more terse or wordy than what one is used to, but that does not necessarily make one way of writing – or creating visual depictions of data – better or another style wrong.

Style

When I talk to artists or designers, they often ask me about people in visualization that have distinctive styles. I do recognize some people’s work sometimes just by looking at the visuals, but I have a hard time describing anybody’s style as clearly distinctive and consistent. But the point is that in addition to the broader style of a field, there are individuals who can carve out a niche. Novels are a particular style of writing, but there is clearly not just one way of writing a novel, and many authors have very distinct styles. We don’t have that in visualization quite yet, and perhaps it’s not something to strive for in an academic setting.

There are products that encourage certain styles, though: Tableau’s visualizations look very different from, say, Excel. Sure, one can always torture each of the systems to look like the other, but the defaults and the kinds of settings that are easily available steer the user away from such behavior. Perhaps it’s our tools right now that have the strongest influence on our styles, whether we like it or not.

Beyond personal style, though, there is the style of the particular field. Just like there are crime novels and vampire novels, there is statistical visualization and infographics. Each of those has its own, distinct style, and each has its place. The question is not one of right or wrong, but picking the appropriate niche.

Refinement

Gelman’s point about the wide variety of writing styles in the early days of academic writing (post-Renaissance, anyway) has one more interesting parallel with visualization: development over time. The visual display of information, even in its broadest sense, is still very young and underdeveloped. There is nothing like the vast numbers of books that have laid out information using words. There is no science of how to categorize and compare visual information displays.

Consider, for a moment, that there was a time when the novel was a new idea. Before the 18th century, there were early  versions of it, but the novel as we know it today did not exist. Yet the vast majority of fiction written in the last 200 years or so is novels. Shakespeare did not write novels, and the idea to write one likely didn’t occur to him. He may even have dismissed novels, had somebody shown him one, as crude and unrefined writing that lacked purpose.

Standards change, ideas come and go, some stay and become the prevailing way of doing things. It’s way too early to know how we will be looking at data in 20, 50, or 100 years. And we have to be careful when we just lump together anything visual that’s based on data and apply the same small set of standards to it. That is not to say that anything goes, but we need to make an effort to understand the goals and ideas behind something before we attempt to make a judgment.

Webinar Wednesday: Introduction to Revolution R Enterprise


Revolutions 21 Feb 2012, 10:58 pm CET

If you haven't yet had a chance to catch my regularly-scheduled webinar, "Revolution R Enterprise - 100% R and More", it's a quick 30-minute introduction to the R language and the added features of Revolution R Enterprise. It's also a chance to ask me any questions you might have about R or Revolution Analytics during the live broadcast (starts at 11AM Pacific time). Details and registration info at the link below.

Revolution Analytics webinars: Revolution R Enterprise - 100% R and More

What Is Visualization?


visual data | Scoop.it 21 Feb 2012, 9:34 pm CET

This seems like a straightforward question, but it’s proven to be a difficult one to answer. Even visualization researchers – people who think about the subject all day and every day – don’t have a clear definition of what visualization is. Is it synonymous with information graphics? Does visualization have to be computer generated? Does data have to be involved, or can it be abstract? The answers vary depending on who you ask.

To me, visualization is a medium. It’s not just an analysis tool nor just a way to prove a point more clearly through data.

Visualization is like books. There are different writing styles and categories, there are textbooks and there are novels, and they communicate ideas in different ways for varied purposes. And just like authors who use words to communicate, there are rules that you should always follow and others that are guidelines that you can bend and break...

The Uncanny Valley of Big Data


Revolutions 21 Feb 2012, 9:04 pm CET

Three articles in recent weeks have touched on an important issue related to Big Data and predictive analytics: sometimes, the results can be downright creepy. It's kind of like the "Uncanny Valley" in computer animation: the reason why the human characters in Pixar animations are cartoon-like and not human-like is because trying to make animated humans photorealistic generally results in uncomfortable reactions from the viewer. The animations might look realistic, but something in our animal brain knows something isn't quite right, and it's just ... creepy.

The same thing can happen where the rubber meets the road of Big Data and predictive analytics: when offers or suggestions are made to individuals. You've probably had an experience similar to mine: after searching the web for a hotel deal in Vegas, suddenly every ad that appeared next to the blogs and websites I regularly read was for a Vegas-related deal. Creeeepy. (And also not particularly useful: after that trip I had no particular plans to return to Vegas anytime soon, yet the ads kept coming.)

A similar tale was related in the New York Times over the weekend. In the story "How Companies Learn Your Secrets" (reg. req.), statistician Andrew Pole (working for the retailer Target) described how he'd created a predictive model to identify from shopping habits when a shopper was likely to be pregnant. When the father of a young Target shopper saw the baby-related coupons sent to his daughter, he was outraged:

“My daughter got this in the mail!” [said the father]. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

It turns out the daughter actually was pregnant at the time, unbeknownst to her father. The creepy aspect here: why should a corporation be able to know (or rather, infer) such a personal fact, when close family members do not? Intriguingly, Target solved this problem by mixing future offers to identified pregnant shoppers with unrelated coupons, say, for lawnmowers or wineglasses. By deliberately making the predictions worse, the response was better: "as long as we don’t spook her, it works", said Pole. Personally, I wish web-advertisers would do the same thing. Not only do I not care about Vegas hotels anymore, the absence of other ads precludes the serendipity of discovering other products I might actually like, but which my activity history might not suggest. In a follow-up interview, article author Charles Duhigg suggested other areas where this technique might help alleviate the "creep factor".

In a similar vein, this week's Esquire profiles Tibco's CEO Vivek Ranadivé. Amongst several examples of the importance of collecting multiple streams of data to improve predictions from analytics, comes this anecdote about football fans visiting Oakland's Oracle Arena:

At the end of the third quarter, when the computer system showed that the concession stand near his seats had too many hot dogs, it could send him a buy-one-get-one-free offer — because it also knows that he sometimes buys hot dogs at games.

The right information to the right people at the right time in the right context. (Fans creeped out by this could opt out.)

This may be another example where moving the predictions outside the uncanny valley might prevent fans being creeped out.

Finally, another New York Times article from earlier this month, "The Age of Big Data" (reg. req.) looks into the lives and impacts of some of the "rock stars" of Big Data applications. While lauding many of the benefits of analytics on Big Data, it also strikes a cautionary tale at the end of the article:

Big Data has its perils, to be sure. With huge data sets and fine-grained measurement, statisticians and computer scientists note, there is increased risk of “false discoveries.” The trouble with seeking a meaningful needle in massive haystacks of data, says Trevor Hastie, a statistics professor at Stanford, is that “many bits of straw look like needles.”

This is a great point: treating analytics as a "black box process" — data in, predictions out — can lead to inapproprate predictions (more to the "zombie" side than the "angel" side of the Uncanny Valley). It takes the statistical expertise of a data scientist to ensure that such predictive analytics are creating sensible predictions ... and to help companies avoid the Uncanny Valley of Big Data.

(Read more articles from this blog on big data and predictive analytics.)

‘Getstats’ campaign to improve how we handle numbers


Visualising Data 21 Feb 2012, 11:22 am CET

Just a quick mention for a great new(ish) site I was alerted to. Getstats is a 10-year campaign launched by the Royal Statistical Society (UK) to improve the public’s capabilities with numbers, particularly statistics, probability and risk, across our daily lives. The site contains frequent blog posts about contemporary stories or issues relating to statistics and is also curating key events, glossary definitions and resources. You can follow updates via Twitter @RSSgetstats.

My only complaint would be that there is far too much imagery of Wayne Rooney and/or Man Utd on the home page!


Strata Conference

Password reuse visualizer from Mozilla


FlowingData 21 Feb 2012, 9:13 am CET

Password visualizer

When you use the same password for every online account, there could be trouble down the line if one of those sites was breached. You gotta mix it up these days. As part of their Watchdog initiative, Mozilla released an add-on to help you see how you're reusing passwords, and to hopefully keep your personal information secure.

Ever been told not to reuse the same password across different websites? With this add-on, you can visualize your passwords and the sites you use them on. By looking at this visualization, you can get a quick idea of which passwords you've been using the most, and the kinds of sites you're using them on. As you continue to change your passwords and update your password manager, the picture will improve!

Personally, I don't save any of my passwords. The risk of my computer getting stolen and some random person gaining access to my online accounts is too much for me to handle. Of course as a result, I have to put up with the craptastic experience of trying to remember passwords with a variable number of capital letters, symbols, and digits.

[Mozilla]

More