Unless you’ve been cloistered in a medieval monastery for the past decades, you know that data science is an important emerging field of study. It is driven by our increasing ability to process data and the proliferation of the ways we use it. Data itself has even been referred to as our most precious commodity and the currency of the modern world. Here we look at four specific ways our use of data has advanced beyond great physical libraries like that of Alexandria.
The Modern World and Big Data
It used to be that our ability to store data was limited by how many filing cabinets we could fit in a room. Granted, that could be quite a bit, depending on the room. But once computers came around, our ability to store data grew at an exponential rate. When the first commercial hard drive came out in 1957, it was as big as multiple refrigerators. Even with this great size, it only held 5 megabytes (MB) of data. That’s just enough for an average mp3 song.
As new hard drives started coming out each year, they quickly got smaller and smaller. By 1980, a hard drive was small enough to fit in a consumer computer while still holding this same 5MB. By 1990, they could hold 40 to 100MB. That same sized hard drive in modern times can hold 15 terabytes (TB) of data, which is enough for over 3,000,000 mp3 songs.
Beyond Physical Limits
However, these advances we were still limited by the amount of space on our hard drives and the amount of hard drives we owned. The arrival of cloud-based technology was the first time consumers found themselves unhampered by space or hardware capacity. The cloud stores data and information on both physical and virtual servers, but providers (like Amazon or google) are in charge of maintenance. Customers then access this data via the internet, rather than needing their own physical hard drives.
Since storing data in the cloud is much cheaper than traditional storage, companies everywhere are scrambling to make the switch. In a short amount of time, the amount of data people stored reached astronomical proportions dubbed “Big Data.”
Just How Big is Big?
Big data in the modern world is what we call data sets so large they can’t be processed by normal software. Think about this. Every second, 8000+ tweets are sent, 71,000+ people watch another video on YouTube,and 63,000+ people search for answers on Google (Internet Live Stats).
But let’s not stop at the Internet. Walmart handles over 1 million customer transactions per hour. Facebook stores over 30+ Petabytes of data from their users. To illustrate how vast these data storehouses are, consider that one petabyte is a million gigabytes. Just one gigabyte is like a 90-minute movie or about 200 songs. Needless to say, these modern data sets are the largest to exist in the history of the human race.
Perhaps that’s because our ability to process huge amounts of information has multiplied. While it originally took ten years to process decoding the human genome, it would take our modern computer programs a mere week to accomplish the same feat. This unprecedented growth has given birth to opportunities we’ve never seen before. Modern big data analysis is akin to a combination of a time machine and a crystal ball. Data scientists can “go back in time” by analyzing past transactions and even see into the future using predictive analytics.
Visualizing Modern Data
Did you know the pie chart was invented in the same year as the fire hydrant? In 1801, a Scottish eccentric named William Playfair drew the first recorded pie chart in history. Often called a blackmailer and a scoundrel, data scientist was the least of his dubious titles.
Despite his spotty past, no one can deny that Playfair’s pie chart comes in handy. In fact, using a pie chart is just one way to visualize data. Data visualization gives data scientists countless ways to show instead of merely tell. By creating graphics that artfully present the results of their analysis, data scientists can communicate visually instead of verbally. That means those sweaty-palm presentations in front of senior executives might be a little less stressful.
Information Systems: Making Data Make Sense
Information engineers design software that helps people manage their information or data. These days, every single business, government agency, not-for-profit, and vending machine stores information, so you can bet that information engineers have their hands full. While their skills can be applied to many different areas, the basic idea is to architect an information system that makes the most sense for the people who end up using it.
An information system is an umbrella term for the technology employees use to talk to one another. It also encompasses the many ways employees use technology to do their work. The information system works tirelessly to capture, store, send, receive, and manipulate data.
A data engineer designing an information system for a circus might include a synced calendar for each performer so that when the show must go on, it actually does (and so that everyone knows when the trapeze artists went for another “coffee break”). Data visualization is particularly useful for presenting big data metrics or results of a big data analysis. Since these data sets are so large, they can be difficult to digest without graphics.
The main components of information systems are people, the tasks the people need to do, structure and roles of the people and tasks, and the technology that brings it all together. These systems also have definitive boundaries, processors, storage, and inputs and outputs. Any particular information system might aim to support operations, manage projects, or make decisions.
Modern World Data Engineering
A data engineer’s calling is to enhance these systems until they foster natural, productive communication between users. This means keeping data organized to avoid any type of crisis and make decision making easier, and often consists of a focus on design and architecture rather than machine learning or analytics. Through this process, they integrate the data from various sources into more manageable bundles that are more accessible and work more smoothly. This optimizes the performance of the organization or company using the data. After the data engineer prepares big data, it goes off to data scientists
The Search Is On
Men with headlamps burrowing deep within the earth. The sound of hammering and crashing ringing through tunnels. Precious gems and rich coal pouring out of the mine into waiting boxcars. That’s what most people think about when they think about mining, until a new kind of mining appeared called data mining.
While data miners don’t wear headlamps or mine iron ore, they too are searching. Instead of precious metals, the prize they find will be useful information. Like private eyes figuring out whodunnit, data miners sift through modern big data looking for answers. Their mission is to find the patterns that are hidden inside every large repository of data. To do this, they use supercomputers—the only machines strong enough for data mining.
Modern World Methods for Modern Data
In contrast to computers, humans can only process small amounts of information at a time. No human being is capable of grasping patterns running between millions of data points. It’s simply too much information for our minds to process. However, the machines we created have surpassed us. Using tools like artificial intelligence (AI) and machine learning, data miners can find patterns and anomalies in data.
AI gives machines the ability to learn from experience in much the same way as people. The learning they are capable of includes everything from basic chess moves to driving cars or even making art. Machine learning is a type of AI that allows computers to automate analytical models building. This means it learns from previous data to find patterns and make decisions on its own. By doing so, it takes out the uncertainty and errors or human intervention. These tools make it possible for us to find patterns in data too big for us to handle on our own.
Most of us benefit from data mining without even realizing it. Did you know that the spam checker in your email is an automatic data mining program? First, the program figures out what spam emails share in common. Then it uses that information to sort your emails into spam or not spam. In other words, it identifies what is most likely spam based on the patterns spam emails typically follow.
In ancient Egypt, scholars read scrolls made of papyrus, a type of plant. If you ever held a scroll, you’d notice they only come one page at a time due to the way they roll making it difficult to bind them together. When books were invented, the shift was much more than cosmetic. It meant scholars could flip pages and instantly cross-reference between them, previously impossible using scrolls. Similarly, databases compile information together in one place, making it more convenient and easily accessible.
Pre-1960’s, a programmer’s job meant being a highly social creature. The only way to test code was to slap it on a punch card and wait in line for hours. The long line for daily batch processing was a watering hole where programmers would ask veteran colleagues for advice de-bugging programs. But in the mid-1960s, the database became mainstream. Now programmers had a place to store and retrieve data, so they no longer had to wait for hours just to test one line of code.
On a grander scale, switching from daily batch processing to the database created the same monumental shift that occurred between scrolls and books. For the first time, programmers could work side by side, testing bugs as they appeared instead of having to wait for the daily batch.
Using Databases in the Modern World
Databases are hosted on dedicated servers. Each database is tied to a certain database model, which sets the rules for who can access the data, how it’s organized, and how it can be changed. Programmers can “talk” to the database through the database management system (DBMS). Programmers access the DBMS to modify and customize the deepest levels of the database.
If you’ve ever booked a flight or searched for a book in an online library catalog, then you’ve locked eyes with the user-friendly face of the database. In fact, the Internet itself is a huge hypertext database. Any word can become a hypertext link, allowing users to explore new subjects instantly. Another familiar face is Google Maps, which is a spatial database that lets you make location-based queries like googling “Pizza near me” at two in the morning. The modern world contains mountains of data that you use every day, potentially without even realizing it.
Data in the Future
Looking beyond modern day events and into the world of the future, data is sure to play a huge role. Our human impulse has always been to gather and assess data. We also use data to connect with one another and make social evaluations. Roman emperors gauged their popularity based on the quantity and quality of the applause they received upon entering the theater. Today, we have shifted from literal applause to figurative “likes” and “shares.” But, on a deeper level, the vast amounts of information we are both willing to share and wanting to access has created opportunities for a wide variety of data scientists to further our connection to each other and to the world.
One such connection really takes things to the extreme. Some people, such as famed Elon Musk, believe that the modern world will soon be ready to use data to merge human minds with computers. This kind of technology has actually been around for a while, allowing feats such as enabling paralyzed people to control computer functions with their minds. But taking it to the extent of complete mind merging is taking it a big step further. As this cycle continues, we can eagerly anticipate the next evolution of our use of data.