Data-science 5 minutes

What's all the fuzz about data?

Data is the new oil. Everyone talks about data and Data Scientist is even said to be the sexiest job of the 21st century. But what’s all that fuzz about? What’s all that data? Where does it come from? And why does it matter?

Data is the new oil

Data is at the heart of this century because the planets finally aligned:

  • huge amounts of data are generated and recorded every minutes;
  • we have enough computing power and storage capacity to process it;
  • and it turned out to be the most useful resource of the century.

Data is the new oil

For instance, UPS, the world’s largest package delivery company saved up to 25 million dollars thanks to its data analysis.

The company recorded data from sensors placed on more that 46,000 vehicles. The data included location, speed, direction, braking and drive train performance.

Left turns waste gas

Looking to minimize driving time, the data scientists found that UPS vehicles were wasting time and gas by taking left turns and changed the itinerary to minimize their number, which saved UPS up to $25 million.

Left turns waste gas

How much data is big data ?

Everything records data about everything. We’ll talk more about the devices that record data in the next section, but first let’s talk about the amount of data recorded each days.

Our current output of data is roughly 2.5 quintillion bytes (= 2.5 exabytes) a day. As the world steadily becomes more connected with an ever-increasing number of electronic devices, that’s only set to grow over the coming years. [1]

But how much is 2.5 quintillion?

  • 1 quintillion is the number of words pronounced by the whole humanity in 28 years. [2]
  • and 3 quintillion gallons of water is roughly 1/300 of all the water on earth.

Here is a video showing what a quintillion pennies look like:

If the empire state building was built and filled with pennies, that would represent slightly less than 2 trillion pennies.

Here is what a quadrillion pennies would look like in comparison:

Quadrillion

What a quintillion pennies looks like in comparison:

Quintillion

“There was five exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days, and the pace is increasing”.(Eric Schmidt, former CEO of Google, 2010)

So, data is useful and there is a huge amount of data recorded each day… But where does all that data come from?

Where does data come from?

The short answer is that most data come from you.

Do you have a credit card?

  • Credit card companies collect data on what you buy and where you shop.

Do you have a car?

  • Car companies collect data about your location, your speed, etc.

Do you have a smartphone?

  • your smartphone’s GPS track exactly where you are in real time;
  • your smartphone’s accelerometer tracks your speed and direction of travel;
  • your smartphone also collects data on how and when you use it;
  • social medias collect data about our conversations (facebook messenger anyone?)
  • cell phone carriers do too with our calls and SMS
  • so does skype and every major telecomunication company.

Do you use the internet?

  • your internet browser collects data on how you use it.
  • websites record your whole activity: every clicks, every mouse moves and how long a given piece of content stays on your screen.
  • even your operating system (windows, macOS) records data about how you use the computer.

Here’s how much data is recorded on the internet in 1mn: 1mn on the internet

And the Internet-of-Things is the promise of more and more data:

  • smart TVs track what you watch and when;
  • smart watches track where you are;
  • smart alarms track when you wake up;
  • smart fridge record what you eat and how often;
  • etc.

In summary

  • an incredible amount of data is tracked each day;
  • this data is generated by sensors, smart objects, applications and websites we use;
  • and this data has an incredible economic value for companies.

Left turns waste gas

References

Most information is adapted from a the class “Data science in practice” at EPFL by Dr. Bruffaerts Christopher

[1] The quote is from How much data does the world generate every minute

[2] Computation from Pointless Large Numbers Stuff