Big Data

How Much Data Did Uber Buy for $1.2 Billion?

moneyUber lost at least (at least!) $1.2 billion in the first six months of 2016, according to Bloomberg News. The majority of this loss was due to driver subsidies. When startups lose money on purpose it’s called an investment. But an investment in what, exactly?

This question came up in email correspondence between Justin Fox of Bloomberg and Melissa Schilling of NYU’s Stern School of Business. Prof Schilling asserted that “[t]here are two main reasons for tech companies to lose money early to make money later, and neither of them apply to Uber.”

The first is “[u]pfront investments in fixed costs that are going to pay off with scale.” But, Schilling says, Uber’s fixed costs are low. Most of its losses are due to subsidizing drivers, a variable cost that won’t go down as Uber gains more drivers.

The second is “[s]ubsidizing a large installed base to “win” the market.” But the switching costs for both riders and drivers to use another service are low. In fact, many Uber drivers in the US already also drive for direct competitor Lyft.

But there is a long-term asset Uber gets each time a driver ferries a rider from point A to point B — the data.

Uber is like a card-counter at a blackjack table. It gains information to improve its future bets from every hand it plays. But unlike blackjack where every player sees the other players’ cards, in the ride share game whoever gets the fare first shuts everyone else out of the hand. Only that firm gets the data from that ride, building a unique stock of data capital.

So, imagine for a moment that Uber is buying data to improve its future ability to compete. What does that data cost? And is it worth it?

Uber is a private company, so information about its financials and ridership is a little thin on the ground. But we can use rough numbers from Bloomberg’s reporting and a few heroic assumptions to make some educated guesses.

We know that Uber lost at least $1.2 billion in the first six months of 2016. We also know that it provided one billion rides in roughly the same time frame (It was actually between Dec 24, 2015, when it delivered its billionth ride, and June 18, 2016, when it delivered its 2 billionth). That’s $1.20 per ride record.

Considering that personal transportation is a 10 trillion dollar market worldwide, a billion dollars for a billion unique detailed records is probably a pretty good deal.



Big Data

The Law As Data Capital

Screen Shot 2016-08-30 at 1.06.18 PMHarvard Law School and Ravel Law, a startup, are digitizing Harvard’s complete collection of US case law going back to 1647 . That’s 43,000 volumes, totaling about 40 million pages.

This new trove of data capital costs millions to create. It will be free to search online, but analysis of the data can only be had for a fee. There’s a big lesson in data capital here: different uses of the same data can command different prices.

Try a search for “privacy” (pictured above). You get all the rulings that mention privacy and a visual guide to their relationships over time and by citation. This use is free.

Now imagine that you’re a lawyer about to argue a case pertaining to privacy in front of a specific judge. You’d like to know which of these rulings the judge has cited, how they figure into the judge’s own rulings, and how he or she compares to other judges on this issue. That analysis is a different end-product from simple search results and you’ll have to pay for it.

Because every piece of data is non-rivalrous (it can be used in many searches and analyses simultaneously) it can be freely available in one way and available for a fee in another.

Ravel’s business plan reveals yet another aspect of data capital. After eight years, the entire database will be available to anyone for any analysis. How will the company stay in business?

By that point, Ravel should have been able to create new data by observing how the case law data was searched and analyzed. If this unique stock of data capital belongs to Ravel alone, it can create new digital services that attorneys can only get from them, maintaining a competitive advantage.


Big Data

Data Is Capital, Not Money

Capital and money might seem like the same thing, but they’re not. A lot of executives I talk to about data capital confuse the two — even MBAs! So, let’s clarify the difference between capital and money, and why it matters when it comes to data.

Take capital first. Capital, along with labor and land, is an economic factor of production in a good or service. If you don’t have enough of these basic inputs, you can’t make the thing or deliver the service you have in mind.

Greg Mankiw, professor of economics at Harvard, uses an apple-producing firm to illustrate these factors in his Principle of Economics, the gold standard for Econ 101 textbooks.


Land is pretty easy to picture. It’s the apple orchard. The same for labor. It’s all the work that goes into tending the orchard, picking the apples, packaging them for sale, and so on. But capital is a bit harder to see. The capital of an apple farm includes ladders, tractors, and warehouses used in growing, harvesting, and packaging apples for sale.

In other words, capital is any produced good which is a necessary input for creating another good or service.

Financial capital is also a produced good. It’s not a natural resource. It has to be made somehow. Any you make it by selling your apples at a price above your costs. You can also increase your financial capital beyond what you can make yourself by borrowing it from someone who already has a whole lot of it, like a bank.

So, yes, a firm’s capital can include money. Money is a necessary input into most production processes. But money is different from all other kinds of capital, including capital equipment or data.

In order for something to be money, it must be both a store of value and a means of exchange. The Benjamin in your wallet (soon Tubman) is good at being money because 1) its value tends to stay pretty stable (a twenty will buy tomorrow pretty much what it buys today), and 2) you can exchange it for things you want more than twenty bucks in your pocket.

Anything with these characteristics can be money. There’s a story, made famous among economists by Milton Friedman, about the islands of Yap whose inhabitants used limestone discs for money. Some of the discs were huge, as big as 12 feet in diameter, and they were cut from the limestone on a nearby island. This is difficult to do, so the number of discs in circulation grew slowly which helped existing discs keep their value.


The discs were the recognized form of currency in the community, so you could buy things with them. But since they were so big, when you paid someone, the community simply recognized the change in ownership, and the disc stayed in your front yard or wherever you dropped it when you brought it home. The discs may not have been convenient, but they were money.

Data is different. To see how, consider a specific data set. Let’s say you have web browsing data on everyone in the richest zip code in the US (which is 10104, according to Experian) for the last year. What’s the value of this data? What’s it worth?

The fact that you immediately want to define its worth in terms of dollars, euros, or renminbi is the first tell. While the data may be valuable, it is not in itself a store of value. Its worth is what the market is willing to bear. It goes up or down depending on what potential buyers are willing to pay, like a house or a Van Gogh.

In addition to being a poor store of value, the data itself is not a unit of exchange. You can’t walk into a Starbucks and buy a latte with a megabyte of your one-percenter browsing data. You can’t pay for things with it.

One objection to this last point is that online we do, in fact, pay with data. We use Google and Facebook without making traditional payments. We get these services in exchange for our data. True, but that’s barter. Which is how you trade when you don’t have money.

The reason this distinction between data-as-capital and data-as-money matters is because of the harsh competitive reality of data.

Contrary to popular belief, data is not abundant. Data consists of countless scarce, even unique observations. If the competition digitizes and datafies interactions with your customers before you do, they get that data and you don’t. They can then create algorithms and analytical services you can’t. To fix this, you’d have to go back in time. And no amount of money can make that happen.

Big Data

Not 30 Posts In 30 Days — Not Even Close

GoRuck TshirtHere’s to experiments — the winners (electricity), the losers (alchemy), and writing. Instead of the 30 posts I said I’d write in June, I wrote 10. These garnered 419 views from 255 visitors, or about 1.6 views per visit.

To all of you who came and read, thank you.

They say you should learn from your mistakes. But they neglect to mention that so many people have made so many mistakes so far that the likelihood you’ve learned something new is zero. So, here are the things that are already known which I needlessly demonstrated for myself:

  1. Under promise, over deliver. Not the other way around. See T-shirt above.
  2. Writing because you have something to say is fun. Blurting junk because you have to write is not.
  3. The more you say, the more likely you are to say something stupid.

The worst part is that these aren’t just things known by someone somewhere sometime before this experiment. I knew these things. The decision research on overconfidence is clear. Anything motivated by a quota — even sex — loses its enjoyment. Pick your favorite wisdom literature and it’ll tell you to keep your mouth shut.

But some things have to be learned personally, and often re-learned, for them to stick. It’s worth remembering this as we move into the age of big data.

1933 worlds fair poster

In the early twentieth century, the pace of technological advancement in science and industry was so great that the perfectability of humankind was at hand. The motto of the 1933 World’s Fair summed it up:

Science Finds — Industry Applies — Man Conforms

The overall theme of the fair was “A Century Of Progress”, mainly through technology. Taking place in the heart of the great depression and a mere fifteen years after The Great War, this was clearly the triumph of hope over experience. As we now know, progress suffered serious setbacks in the decades that followed. Believing technology can eliminate human weakness is a human weakness itself.

So, I’ll continue to write, but at a more measured pace. In the mean time, let’s all agree to avoid a 1933 World’s Fair for big data.

Big Data

Data Can’t Prove Happiness

On a recent trip, I stand in line at an airport Starbucks to get a hit. In front of me is an older woman, fussily put together and a bit anxious. She turns around and asks, “Do you come to this airport often?”

This is either the worst pick-up line ever or a precursor to a question that will reveal I don’t come here often enough.

“Occasionally,” I say.

“Is there a Dunkin’ Donuts here?”

“This is Boston. There has to be.”

But, I tell her, I don’t know for sure. Sighing, she turns around and says it’s probably better to just stay here.

In this day and age there’s no reason not to have the overpriced coffee of your choice, so I get out my phone and look it up. There’s an app for that. Heck, there’s a hundred apps for that.

“Excuse me,” I say. “There is a Dunkin’ Donuts in this terminal, but it’s a bit of a hike from here.”

She looks at the phone, looks at me, and says, “Oh. You’re one of those people.” And turns back around.

She’s right. There’s a certain kind of thinking that comes along with being a data person. The data exists. If you don’t know where, there’s probably data about that. The amount of effort to get the data and use it is probably lower than the penalty you’ll pay of not doing so.

But there is a risk. Thinking that life is a long series of optimizations can turn you into a social idiot. Sometimes people don’t want to know their options. Sometimes they don’t want the best solution. They just want comfort that what they’re doing is ok.

The key is to know one case from the other, and optimize accordingly.

Big Data

The Services Conundrum

Thomas Piketty’s doorstop, Capital In the Twenty-First Century, is so massive it gave a new name to a classic index of unreadness. But it’s actually really good.

And, it includes the observation that, historically, the services sector has seen lower productivity gains than the industrialized goods sector because services tend to be less sensitive to technological advances.

This is a big deal because, according to the US Bureau Of Labor Statistics, as cited in Mary Meeker’s latest internet trends presentation, services jobs represent 86% of all US jobs, up from 56% a little more than 70 years ago.

What if we’re at the beginning of a long boom in services productivity the way we were at the beginning of a long boom in industrialized goods productivity in the early 1800s?

Big Data

Happy Bloomsday, Big Data!

Around the world today the literati celebrate Bloomsday by drinking deep of James Joyce’s intoxicating prose. And beer. Lots of literary beer. I remember this day every year because James Joyce taught me to love big data.

Bloomsday commemorates Joyce’s life and his masterwork Ulysses, a massive creation that captures the universal in the specifics of one day: June 16, 1904. We follow the misadventures of Milo Bloom in a decidedly unheroic odyssey that reveals the frailties of language, the desperation of love, the doggedness of uncertainty, and the sobering realization that you have to go through all this sound and fury no matter how smart you think you are. In Ulysses, Joyce single-handedly invents post-modernism. Take that, Internet.

But the most remarkable thing about the book is the way it combines rigorous schema with riotous mess. This is the very heart of big data.

For each of the 18 chapters, Joyce assigned a title drawn from Homer’s Odyssey (like Calypso), a scene (The House), a time, an organ of the body, an art (economics), a color, a symbol (nymph), and a technique (narrative). Then, in each chapter he follows a relatively simple plot while free-associating across the history of the world.

For example, in chapter three, Proteus, Stephen Dedalus, the main character until we meet Bloom, walks along the seashore on his way to teach a history class at the job he loathes. So, he daydreams.

He sees two midwives walking, one carrying a bag. He imagines there’s afterbirth in the bag which makes him think of umbilical cords which makes him think of a cable stretching back through all generations which makes him think of making a telephone call to Adam and Eve, using his nickname, Kinch.

The cords of all link back, strandentwining cable of all flesh… Hello. Kinch here. Put me on to Edenville. Aleph, alpha: nought, nought, one.”

Stephen is playing fast and loose with what’s connected to what in what context. He draws out the attributes and values of the things he sees, finding links in his over-educated mind to other things that possess those same attribute-value pairs or something equivalent. But with each link comes a new surrounding context.  Stephen flashes from one otherwise isolated idea to another through unexpected connections.

This is a lot like memory, a lot like the Internet, and a lot like what happens when you pour a bunch of disparate data into Hadoop and start crunching through correlations.

And this is what I see in big data: a riot of observations that can be linked in new ways to show us connections we couldn’t see before. Some will be priceless, some worthless, some spurious and misleading. But the effort required to figure out which is which is worth it. Because we’re going to have to go through all this sound and fury anyway. Might as well try to understand it better.