Building Blocks of Data Science

By Direct Knowledge

Data science has been around for a while, but today it’s really starting to grow into a vital part of industry and business. It consists of a number of skills and activities that take sets of numbers (data) and turn them into something useful. The numbers communicate the performance of a company or product, and help people identify problem areas. The skills and activities involved in reading the data in this way can be considered building blocks that organizations can use to their advantage to gain ground in the technological race.

Data Science Building Block 1: Algorithms

Where It All Began

Lines of code running across the screen faster than the eye follows. A twisted, grotesque man spasmodically rising from a table strewn with wires. An endless line of computers working at hyper-speed in brightly-lit server rooms. What could Frankenstein’s monster possibly have in common with computer algorithms? 

In the early 1800’s, a baby girl named Ada Byron was born in England, later to become Ada Lovelace. Her father, Lord Byron, was the famous poet who suggested that Mary Shelley write a ghost story, thus inspiring the birth of Frankenstein. Lord Byron left his wife and daughter to fend for themselves when Ada was only a month old.

After he left, Ada’s mother detested everything about Byron – particularly his fervor for the arts. Hoping Ada would shun her father’s wild ways, her mother instructed her tutors to teach her only mathematics and logic. As a result, Ada became the world’s first computer programmer and the inventor of the first algorithm. 

Algorithms and Cake

But what exactly are algorithms? Broadly defined, an algorithm is a program that follows a set of rules, a series of steps, and has a definite starting and ending point. One way to explain how algorithms work is to compare it to a recipe for baking a certain kind of cake. The recipe has to follow a list of steps (the input) to produce the cake (the output).

Programmers create algorithms that can automatically sort data into groups, saving hours of work. Algorithms can be written to solve complicated math problems; in fact, the scientific calculator you used in high school used an algorithm to calculate functions and exponents. Since algorithms can follow a set of rules and steps, it can use logic to make decisions about the data it is processing. 

Data Science Building Block 2: Mathematics

Zeros and Ones

You’ve probably heard of the ones and zeros that make up binary code. But why those numbers? What’s their individual importance, and how can they combine to create any number or idea? As it turns out, the concept of zero is one of the biggest leaps of human mathematics. For a long time, the concepts of nothings was quite common, but zero itself and its use in math was unknown. That is, until the 7th century.

One of the earliest records of the number zero is from in an Indian, the “Brahmasphutasiddhanta,” from the year 628. The title discusses how to use the number zero in calculations like division and other arithmetic. Previously, zero had only been used as a placeholder, but from then on it would begin to be seen as a number in its own right. Little did the author dream that the number he was describing would one day become half of our binary computing language. 

Binary is essentially the most basic and pure computing language. It is a base 2 number system, meaning it only has two digits: 0 and 1. The two numbers basically represent OFF and ON, and can be applied in many situations. One of these situations includes creating other moer complex languages that programmers use for a variety of purposes.

Math and Data Science

Before the rise of binary, the Brahmasphutasiddhanta also changed the general field of mathematics for good. Mathematics is the science of numbers that we use to solve problems in the real world. Mathematicians use deduction, a process of thought that proves new truths using non-contradictory old truths. Mathematicians are just as interested in why something is true as they are in the fact that it is true. They create general rules about math so that you can solve many different equations using only one rule. 

A data scientist needs to understand mathematics for applied statistics, or applying results from statistical models to problems in the real world. That said, a data scientist is more likely find themselves coding algorithms and interpreting results in his day-to-day work than actually using mathematics. While math is important, the main support of algorithms comes from the logic which underpins them both.

Even though data science is applied math and logic, data scientists can reap incredible rewards from studying math itself. Data scientists who study mathematics are effectively training their minds to solve increasingly difficult problems using logic. And since algorithms run on logic, this can lead to more powerful algorithms. Studying mathematics will not only give you the confidence to tackle new problems at work, it will help you to build better analysis models by training your mind to think logically. 

Building Blocks of Data Science Statistics word cloud collection samples survey

Data Science Building Block 3: Statistics

Improving Processes

In the 1860’s, there was a child in New York City who hated spelling so much that he hopped out of his elementary school’s window to get away from it. Despite dropping out of school, Herman Hollerith grew up to become one of the first data processors in the United States. When the census was counted in 1880, Herman realized that his colleagues at the Census Office wouldn’t finish counting the census until almost ten years later – when the next census was about to begin.

Exasperated by the inefficiency, Hollerith invented the first punched card electronic tabulating machine. With this machine, Hollerith had combined statistics and data science. Statistics is essentially the collecting, organizing, analyzing, interpreting, and displaying of data. These steps make it easier to understand that the original lists of seemingly random numbers. Patterns begin to arise that are obvious when represented in graphs or charts, allowing people working with the data to say “ah-ha!” as they understand the implications of the numbers.

But the process of using statistics to understand can be slow, so Hollerith invented his machine to speed up the process. His machines finished processing the census statistics two years faster than it would have if done manually. The machines also saved the United States government five million dollars. In today’s currency, that’s roughly $129.5 million! 

Statisticians vs. Data Scientists

Hollerith’s machine represented a huge leap in processing statistical data. Statistics is an ancient practice that was used by the Ancient Babylonians, who were already using the census to collect statistical data in about 3,500 B.C. A statistician’s job is to collect data, organize it, and analyze it – similar to the modern day data scientist. In fact, data science came into being at the intersection of computer science and statistics. 

Some people wonder if there’s any difference between data scientists and statisticians. This good-humored tweeted definition from Josh Wills (head of data engineering at Slack) captures what a data scientist really is: 

“Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.”

In other words, a programmer who knows nothing about data science will be able to analyze datasets, but not to interpret the results. On the flip side, a statistician who knows exactly how data analysis works (theoretically) but can’t write a line of code to save their life won’t be able to produce any practical results. A data scientist is the perfect blend of these two fields: someone who understands enough about statistics to think of ways to analyze data and is a skilled programmer who can put their own ideas into action.

Data Science Building Block 4: Finance

Investments Based on Data

In 2013, Netflix’s original TV show “House of Cards” became explosively popular. Two words can explain the success of this show: data science. Using predictive analytics, intrepid programmers of Neteflix successfully predicted which show would grip the hearts of millions of fans around the world.

They invested in two full seasons of the show based on data about what shows and which actors people on Netflix turned to, making them believe (correctly) that the show would be a hit. On February 1st, 2013, Netflix’s stock languished at $23.54 a share. Exactly 365 days later, their stock had nearly tripled. 

The field of finance studies how and why money is invested in these types of scenarios, and the resulting outcomes. It spans sectors like private equity, the stock market, hedge funds, and investment banking. For decades, financial experts have debated the ups and downs of a plethora of investment tactics. If people weren’t so worried about their money and profits, data science might not have received so much attention over time.

Finance and Risk

One of the main considerations of a financial advisor is how much risk an investment entails. Using data science, Netflix was able to figure out that producing a show like House of Cards would entail very little risk. Because of that information, Netflix confidently invested $100 million into production knowing they would make more money than they spent. 

In the past, companies had to depend on a mixture of due diligence, common sense, and a series of wild guesses to figure out whether an investment was worth the risk. Now, data analysis has given those companies the power to determine risk with the push of a button. This means that Netflix’s House of Cards was just the tip of the iceberg. It shows us that the intersection of finance and data analytics will change Wall Street as we know it. 

In that case, it’s no longer enough for data scientists in the financial sector to be merely financially literate. Understanding the sector you work in is crucial to building the models that analyze it. Today, companies search for data scientists who are just as comfortable discussing the Dow Jones as they are when plotting an algorithm. 

Building Blocks of Data Science Careers Portrait of two scientists, man and woman, standing by server cabinets and discussing data while working with supercomputer in research center

Data Science Building Block 5: People

The last but certainly not least of the building blocks is data scientists themselves. After all, people created the field and continue to make advancements in it. These people are unique characters, often with different ways of thinking in comparison to most of the population.

Decades ago, data scientists weren’t really a big deal. Information technology groups in companies would handle data almost as an afterthought at the time. But as more and more data came into play, that was no longer feasible. There weren’t even degrees for the field until recently, but now there are dozens around the country.

The mind of a data scientist is typically extremely logical and analytical. They observe the world around them almost as if it were data points, evaluating how these points interact. In this manner they explore the world and its problems, using logic to solve the problems and improve how things work.

Careers in Data Science

Applying these characteristics to their careers, data scientists fit nicely into jobs in computers and technology. Their analytical mind helps them to fully understand and utilize advanced techniques and machines in these fields.

However, data scientists also have unique creativity and the ability to think outside the box. They can see new possibilities for improving current methods, thus setting trends and encouraging progress. Making these steps into the unknown also requires some risk taking. So, data scientists can’t be too timid if they want to be part of the next big thing.

These characteristics can lead to some really incredible results with huge impact on society. In fact, the Harvard Business Review thinks that data scientists might be the sexiest job of the century. This is because they are key to making opportunities in big data into realities. They get at the heart of problems and see it in new ways that others can’t.

Some modern data scientists that are great examples of these characteristics include Dean Abbott, Kenneth Cukier, John Elder, Bernard Marr, and Hilary Mason. They even have twitter accounts that you can follow to get a look into their unique minds and interesting jobs. If they interest you, continue looking into their careers to see if you might be able to join their ranks as one of the building blocks of data science.

Leave a Comment