Big Data

Data Is Capital, Not Money

Capital and money might seem like the same thing, but they’re not. A lot of executives I talk to about data capital confuse the two — even MBAs! So, let’s clarify the difference between capital and money, and why it matters when it comes to data.

Take capital first. Capital, along with labor and land, is an economic factor of production in a good or service. If you don’t have enough of these basic inputs, you can’t make the thing or deliver the service you have in mind.

Greg Mankiw, professor of economics at Harvard, uses an apple-producing firm to illustrate these factors in his Principle of Economics, the gold standard for Econ 101 textbooks.


Land is pretty easy to picture. It’s the apple orchard. The same for labor. It’s all the work that goes into tending the orchard, picking the apples, packaging them for sale, and so on. But capital is a bit harder to see. The capital of an apple farm includes ladders, tractors, and warehouses used in growing, harvesting, and packaging apples for sale.

In other words, capital is any produced good which is a necessary input for creating another good or service.

Financial capital is also a produced good. It’s not a natural resource. It has to be made somehow. Any you make it by selling your apples at a price above your costs. You can also increase your financial capital beyond what you can make yourself by borrowing it from someone who already has a whole lot of it, like a bank.

So, yes, a firm’s capital can include money. Money is a necessary input into most production processes. But money is different from all other kinds of capital, including capital equipment or data.

In order for something to be money, it must be both a store of value and a means of exchange. The Benjamin in your wallet (soon Tubman) is good at being money because 1) its value tends to stay pretty stable (a twenty will buy tomorrow pretty much what it buys today), and 2) you can exchange it for things you want more than twenty bucks in your pocket.

Anything with these characteristics can be money. There’s a story, made famous among economists by Milton Friedman, about the islands of Yap whose inhabitants used limestone discs for money. Some of the discs were huge, as big as 12 feet in diameter, and they were cut from the limestone on a nearby island. This is difficult to do, so the number of discs in circulation grew slowly which helped existing discs keep their value.


The discs were the recognized form of currency in the community, so you could buy things with them. But since they were so big, when you paid someone, the community simply recognized the change in ownership, and the disc stayed in your front yard or wherever you dropped it when you brought it home. The discs may not have been convenient, but they were money.

Data is different. To see how, consider a specific data set. Let’s say you have web browsing data on everyone in the richest zip code in the US (which is 10104, according to Experian) for the last year. What’s the value of this data? What’s it worth?

The fact that you immediately want to define its worth in terms of dollars, euros, or renminbi is the first tell. While the data may be valuable, it is not in itself a store of value. Its worth is what the market is willing to bear. It goes up or down depending on what potential buyers are willing to pay, like a house or a Van Gogh.

In addition to being a poor store of value, the data itself is not a unit of exchange. You can’t walk into a Starbucks and buy a latte with a megabyte of your one-percenter browsing data. You can’t pay for things with it.

One objection to this last point is that online we do, in fact, pay with data. We use Google and Facebook without making traditional payments. We get these services in exchange for our data. True, but that’s barter. Which is how you trade when you don’t have money.

The reason this distinction between data-as-capital and data-as-money matters is because of the harsh competitive reality of data.

Contrary to popular belief, data is not abundant. Data consists of countless scarce, even unique observations. If the competition digitizes and datafies interactions with your customers before you do, they get that data and you don’t. They can then create algorithms and analytical services you can’t. To fix this, you’d have to go back in time. And no amount of money can make that happen.

Big Data

Not 30 Posts In 30 Days — Not Even Close

GoRuck TshirtHere’s to experiments — the winners (electricity), the losers (alchemy), and writing. Instead of the 30 posts I said I’d write in June, I wrote 10. These garnered 419 views from 255 visitors, or about 1.6 views per visit.

To all of you who came and read, thank you.

They say you should learn from your mistakes. But they neglect to mention that so many people have made so many mistakes so far that the likelihood you’ve learned something new is zero. So, here are the things that are already known which I needlessly demonstrated for myself:

  1. Under promise, over deliver. Not the other way around. See T-shirt above.
  2. Writing because you have something to say is fun. Blurting junk because you have to write is not.
  3. The more you say, the more likely you are to say something stupid.

The worst part is that these aren’t just things known by someone somewhere sometime before this experiment. I knew these things. The decision research on overconfidence is clear. Anything motivated by a quota — even sex — loses its enjoyment. Pick your favorite wisdom literature and it’ll tell you to keep your mouth shut.

But some things have to be learned personally, and often re-learned, for them to stick. It’s worth remembering this as we move into the age of big data.

1933 worlds fair poster

In the early twentieth century, the pace of technological advancement in science and industry was so great that the perfectability of humankind was at hand. The motto of the 1933 World’s Fair summed it up:

Science Finds — Industry Applies — Man Conforms

The overall theme of the fair was “A Century Of Progress”, mainly through technology. Taking place in the heart of the great depression and a mere fifteen years after The Great War, this was clearly the triumph of hope over experience. As we now know, progress suffered serious setbacks in the decades that followed. Believing technology can eliminate human weakness is a human weakness itself.

So, I’ll continue to write, but at a more measured pace. In the mean time, let’s all agree to avoid a 1933 World’s Fair for big data.

Big Data

Data Can’t Prove Happiness

On a recent trip, I stand in line at an airport Starbucks to get a hit. In front of me is an older woman, fussily put together and a bit anxious. She turns around and asks, “Do you come to this airport often?”

This is either the worst pick-up line ever or a precursor to a question that will reveal I don’t come here often enough.

“Occasionally,” I say.

“Is there a Dunkin’ Donuts here?”

“This is Boston. There has to be.”

But, I tell her, I don’t know for sure. Sighing, she turns around and says it’s probably better to just stay here.

In this day and age there’s no reason not to have the overpriced coffee of your choice, so I get out my phone and look it up. There’s an app for that. Heck, there’s a hundred apps for that.

“Excuse me,” I say. “There is a Dunkin’ Donuts in this terminal, but it’s a bit of a hike from here.”

She looks at the phone, looks at me, and says, “Oh. You’re one of those people.” And turns back around.

She’s right. There’s a certain kind of thinking that comes along with being a data person. The data exists. If you don’t know where, there’s probably data about that. The amount of effort to get the data and use it is probably lower than the penalty you’ll pay of not doing so.

But there is a risk. Thinking that life is a long series of optimizations can turn you into a social idiot. Sometimes people don’t want to know their options. Sometimes they don’t want the best solution. They just want comfort that what they’re doing is ok.

The key is to know one case from the other, and optimize accordingly.

Big Data

The Services Conundrum

Thomas Piketty’s doorstop, Capital In the Twenty-First Century, is so massive it gave a new name to a classic index of unreadness. But it’s actually really good.

And, it includes the observation that, historically, the services sector has seen lower productivity gains than the industrialized goods sector because services tend to be less sensitive to technological advances.

This is a big deal because, according to the US Bureau Of Labor Statistics, as cited in Mary Meeker’s latest internet trends presentation, services jobs represent 86% of all US jobs, up from 56% a little more than 70 years ago.

What if we’re at the beginning of a long boom in services productivity the way we were at the beginning of a long boom in industrialized goods productivity in the early 1800s?

Big Data

Happy Bloomsday, Big Data!

Around the world today the literati celebrate Bloomsday by drinking deep of James Joyce’s intoxicating prose. And beer. Lots of literary beer. I remember this day every year because James Joyce taught me to love big data.

Bloomsday commemorates Joyce’s life and his masterwork Ulysses, a massive creation that captures the universal in the specifics of one day: June 16, 1904. We follow the misadventures of Milo Bloom in a decidedly unheroic odyssey that reveals the frailties of language, the desperation of love, the doggedness of uncertainty, and the sobering realization that you have to go through all this sound and fury no matter how smart you think you are. In Ulysses, Joyce single-handedly invents post-modernism. Take that, Internet.

But the most remarkable thing about the book is the way it combines rigorous schema with riotous mess. This is the very heart of big data.

For each of the 18 chapters, Joyce assigned a title drawn from Homer’s Odyssey (like Calypso), a scene (The House), a time, an organ of the body, an art (economics), a color, a symbol (nymph), and a technique (narrative). Then, in each chapter he follows a relatively simple plot while free-associating across the history of the world.

For example, in chapter three, Proteus, Stephen Dedalus, the main character until we meet Bloom, walks along the seashore on his way to teach a history class at the job he loathes. So, he daydreams.

He sees two midwives walking, one carrying a bag. He imagines there’s afterbirth in the bag which makes him think of umbilical cords which makes him think of a cable stretching back through all generations which makes him think of making a telephone call to Adam and Eve, using his nickname, Kinch.

The cords of all link back, strandentwining cable of all flesh… Hello. Kinch here. Put me on to Edenville. Aleph, alpha: nought, nought, one.”

Stephen is playing fast and loose with what’s connected to what in what context. He draws out the attributes and values of the things he sees, finding links in his over-educated mind to other things that possess those same attribute-value pairs or something equivalent. But with each link comes a new surrounding context.  Stephen flashes from one otherwise isolated idea to another through unexpected connections.

This is a lot like memory, a lot like the Internet, and a lot like what happens when you pour a bunch of disparate data into Hadoop and start crunching through correlations.

And this is what I see in big data: a riot of observations that can be linked in new ways to show us connections we couldn’t see before. Some will be priceless, some worthless, some spurious and misleading. But the effort required to figure out which is which is worth it. Because we’re going to have to go through all this sound and fury anyway. Might as well try to understand it better.

Big Data

Playing, The Numbers

E3, the biggest video game conference in the world, takes place this week in Los Angeles. In addition to raising questions about why violence and mayhem sell so well, it also offers insight into the datafication of play.

Take Destiny, one of — if not the — most expensive games ever produced. If you’re not familiar with this kind of thing, Destiny is a sprawling shoot-em-up that manages to combine the otherwise distinct genres of first-person shooter, multiplayer competition, and collaborative role-play. It’s immersive, visually stunning, and highly addictive.

Created by the powerhouse game studio Bungie, Destiny is a peek into a future where products tell their makers how customers use (and abuse) them.

For example, just days after the launch of an expansion module that pits three-player fireteams against each other, Bungie reported that 3,798,561 of these matches had been played. The players racked up 118,627,301 kills (you can get killed more than once per match). And 299,001 of these folks had achieved perfect scorecards, winning all nine rounds of a match and, by implication, utterly humiliating the other team.

But these raw tallies are just bragging rights. What’s more interesting is Bungie’s observation of players’ behavior, like cheating. The gamemakers could see some players bailing out as soon as they saw tough opponents on the other team. Bungie spread the word through its weekly newsletter that this welching can get you banned from matches if you keep it up. A similar warning went out earlier when Bungie saw that some players were hanging back in certain sections of the game, letting their teammates do all the work but reaping the rewards of victory anyway.

This may sound juvenile and irrelevant. However, this freeloading is a first-person-shooter version of an economic concept called, well, freeloading. Freeloading happens when you get the fruits of someone else’s investment without making the outlay yourself. When shoppers research a product at a retailer’s meticulously well-designed site but then buy from the lowest-priced discounter, the discounter is freeloading on the other retailer’s investment in design, photos, and information.

In Destiny’s case, freeloading is a particular problem because in many cases teamwork is essential to the value of the game. No teamwork, no fun. No fun, no play. No play, no return on the most expensive game ever made.

And this is why digital strategists should play video games. The best ones are complex digital worlds that provide a preview of the real world fully digitized. At least, that’s my excuse.

Big Data

Writer, Interrupted


My achievements as a procrastinauteur seem to know no bounds. On June 1st, I said I would write something on big data every day for the entire month of June. I managed to keep that up for five days and then fell off the wagon for nearly twice that, going on a non-writing bender.

As penance, let me offer the fascinating work of Matthew Jockers who is using big data to study writers who actually manage to write something. Jockers, a professor of English at the University of Nebraska-Lincoln, created an R package to analyze the connections between plot and sentiment in 50,000 works of fiction. In the process, he’s creating a new way of big reading that enlightens individual reading.