Summary: To take advantage of data science, an organization needs to consider their data quality and accessibility, and the willingness of their staff to use the results of data analysis results. Most importantly, an organization must have a clear understanding of how it expects to benefit from data science.
Can data science benefit your organization? If so, is your organization ready to take advantage of it?
“Data science” has been receiving a lot of attention recently – according to LinkedIn’s 2017 U.S. Emerging Jobs Report, data scientist roles have grown over 650 percent since 2012. This article presents the steps for assessing your organization’s readiness for data science, and what you can do to take full advantage of it.
File Cabinets Are Becoming Obsolete
A major transformation has occurred in the way organizations manage their data since the advent of affordable computing. Most of the transactions conducted by businesses, government agencies, and non-profit organizations are “paperless”, meaning that they are recorded electronically. Business transactions are often recorded at electronic cash registers or via on-line purchases, and the services provided by government and non-profit organizations are often entered directly into a computer at the point of contact with the client. The transformation of data from paper to computers has made it possible to manage and analyze data in ways that were never before possible.
The Next Frontier – Data Quality
Although it is likely that your organization’s data is already digitalized, this does not automatically mean that it is ready for analysis. The first step in assessing an organization’s readiness for data science is a data audit. This a review of all the potential data analysis fields for their completeness and validity.
Completeness refers to the extent of missing values found within each field. Many analytic procedures have a “listwise” setting for missing values, which means a missing value in any of the fields that are selected for a procedure can cause an entire record to be excluded from the analysis. Excluding records from your analysis can seriously bias your results. A rule of thumb is that excluding more than 10% of your records can bias your results, so a data audit should flag any field in which more than 10% of its values are missing. These fields could be simply omitted from the data analysis, or the missing values could be “imputed”, i.e. replaced with valid values. Imputation methods range in complexity from replacing the missing values with the mean of the valid values to more sophisticated algorithms based on the principles of regression.
Validity refers to the accuracy of the data found within each field selected for analysis. For fields that include codes representing categories, this means that these codes correctly represent categories relevant to your organization. For instance, a satisfaction code of “6” is invalid if satisfaction levels were coded as 1 through 5. For fields with continuous values, some values may be unusually large or small. A typical definition of unusually large or small values, known as “outliers”, are those that are more than two or three standard deviations from the mean for a field. Invalid categorical codes and outlier values should be corrected or converted to missing values, but continuous fields with valid outliers could be “coerced”. Coercion converts any values beyond a specified range, such as two standard deviations from the mean, into the values at the end of that range. This dampens the disproportional influence of outliers on an analysis without the risk of losing records due to missing values.
Are You and Your Staff Ready and Willing to Use the Results of Data Science?
Data science can be a disruptive process, and as Uber and AirBnB will argue, this can be a good thing. However, it is not uncommon for staff to resist change, and relying on data analysis rather than staff experience and intuition can represent a major change. For example, experienced sales staff may be confident that they can identify a good lead, and therefore be reluctant to use “propensity scores” that rank a lead’s likelihood to respond to a sales offer. Be prepared to explain the potential value of data science to your staff well before its implementation.
In addition to ensuring that your staff are ready to take advantage of data science, you will also need to consider how your data will be accessed for analysis and how the results will be distributed. Data science requires quick and ongoing access to your data, which may raise some challenges for your IT staff. Distribution of the data science results may also require the acquisition of business intelligence software that allows users to “slice and dice” these results, i.e. use drop-down list to select only the results relevant to them.
Most importantly, you will need to carefully consider how data science will ultimately benefit your organization either in terms of increased revenue and decreased costs, or other ways that support your organization’s mission. There are an increasing number of firms that can guide you through this assessment process. The bottom line is that data science enables you to extract value from your organization’s electronic data, and it is well worth your time to explore how you can make this happen with your organization.