Data Transformation

“Data! Data! Data!” he cried impatiently. “I can’t make bricks without clay!”

– Sherlock Holmes, The Adventure in the Copper Beaches

I have an unhealthy relationship with data. We’ve spent way too many sleepless nights together, have constantly fought each other, and I’ve spent so my time and energy cleaning it up. Yet, in the end, I love data, for in data and the transformation of it I’ve found so much strategic gold and insight.

Strategic leaders are masters of data transformation, knowing where to get the right data, when they need data to prove or disprove a hypothesis, how to massage insight out of data, when to question the validity of data, abstract an organization’s processes into data, and how to systemically get their hands on more of the right data fast. Let’s go over a lot of the ins and outs of data so that you can have a healthier relationship with data.

What is Data?

Data are values of qualitative and quantitative variables. Qualitative data are characteristic variables that you can’t represent numerically. In an organization, qualitative data, also known as categorical data, can be non-numeric data tied to products, customers, team members, strategic options, and issues. Quantitative data are numeric values of variables. Examples include the number of customers, the average sales per employee, or product costs.

There are three stages that data can evolve through, which are raw data, information, and knowledge.

Stages of Data


In the context of an organization, data is typically used for two main purposes. The first purpose is to enable the workflow of a process. The second purpose is to inform and drive decisions.

Why is data important?

Think about the two primary purposes of data…enable the workflow of a process or inform and drive to decisions. If you want to enable better workflow or drive to better decisions, you need better data, and you need to become better at transforming that data into information and knowledge.

How do you create value from data?

We are swimming in a vast ocean of ever-increasing amounts of data. 90% of the world’s data was created over the past two years. Data is in systems, computers, spreadsheets, documents, databases, conversations, peoples’ heads, interactions, and all over the internet. Yet, data is pretty useless without context and purpose. To create value from data, you need to transform data into information and knowledge.

Stage 1 to Stage 2 – Transform data into information


A sorely ignored topic. Similar to the analogy that a person uses less than 10% of his brain, companies typically transform less than 10% of their data into information. When conducting analysis, I typically spend 20-30% of my time getting the right data, 40-60% of my time transforming data into information, and then 20-30% of my time “analyzing” the information and transforming it into knowledge. There are three major steps in transforming data into information:

Step 1 – Properly clean the data


It’s a dirty little secret that most people don’t properly clean data, which often leads to poor analysis and incorrect conclusions. Much data, especially derived from human interactions and keystrokes is termed “un or semi-structured” data, meaning that it isn’t very consistent, well organized, and categorized. In these instances, you must spend time cleaning, organizing, and categorizing the data.

The following are elements of a data set that need to be cleaned up:

Consistent categorization

Any time you get a column of data that pertains to a category, such as states, country, segments, demographics, or any other categorical flags, you need to make sure they are consistent. For example, in a data set, California may be represented as CA, California, Ca, CAL, Calif., etc.

Remove unnecessary data

Managing a dataset can be unwieldy. One of the simplest ways to make a dataset easier to work with is to remove unnecessary data. Remove columns and data that aren’t necessary for the purpose of the analysis.

Empty data

If you see a lot of holes or empty fields in a data set, you need to assess how that will affect your analysis. Often, you may need to omit data elements that have empty fields from the dataset to make sure the analysis isn’t misleading.


Often, with numeric data, you’ll have nonsensical values created by miskeys or errors. Rank order columns to understand the variance of values within a dataset, and potentially remove data elements with unrealistic values. Furthermore, some data sets have outliers that you may need to remove.

Text to numbers

Sometimes, numeric values are represented as text, and you need to convert them into numbers using an Excel function.

Text to columns

If you have a dataset that is comma-delimited or has data combined in the same column, you can use the text-to-column function in Excel to separate the data elements.

Bias data

When you collect data and analyze data, make sure it is unbiased. Biased means one-sided, lacking a neutral viewpoint, or not having an open mind. Bias in data can be caused by wanting a particular result or outcome. The way people create and gather data can be biased, such as when people ask biased survey questions.

For most Excel analyses, you can use pivot tables to quickly assess and understand how the data needs to be cleaned up.

Step 2 – Add metadata


One of my favorite methods to turn data into information is to add metadata; data that is appended to add new categorization to the data. A good example is when you have a data set of customers, with their total purchase values, and you create metadata, organizing the customers into “high spend,” “medium spend”, and “low spend.” You should always think through what metadata you should add, some of the metadata dimensions you should think through include, timing, value, organizational categorizations, demographics, and source.

Step 3 – Get to know the data


A dataset to an analyst is a bit like the land is to an architect. You have to get to know the data to understand the potential value in it. Ask yourself basic questions, such as “Where did the data come from?” “What insights can be gleaned from the data?” “Are there other data sets that could be combined to produce more information?” Get to know your dataset. Spend some time using pivot tables and conducting simple analyses. Once you start transforming data into information, more and more ideas and hypotheses will pop up. That is how minds work: stimuli, new facts, new questions, new ideas, rinse and repeat. Once you get a new question or idea, write it down, and follow up to see if the data can answer your question or provide insight.

Stage 2 to Stage 3 – Transform information into knowledge


One of the keys to transforming information into knowledge is being clear about the knowledge you are trying to produce and the question(s) you are trying to answer. Anytime anyone comes to me with analysis, data, or information; my first question is always, “What are we trying to answer?” If there isn’t a good answer, then you’re probably in for an exercise of “boiling the ocean,” or a random walk to nowhere. Once you know what you are trying to answer, the next set of analytic tools will help you answer your questions and transform information into knowledge.


 Learn more about Joe Newsum, the author of all this free content and a McKinsey Alum. I provide a suite of coaching and training services to realize the potential in you, your team, and your business. Learn more about me and my coaching philosophy.
sm icons linkedIn In tmfacebookicontwittericon
linkedin profile