Summary: If you are mid-career and thinking about switching into data science here are some things to think about in planning your journey.
We get lots of inquiries from readers asking for career advice and many of these identify as mid-career looking to switch into data science. If you’re in this group you face some of the same challenges beginners do but also some that are unique to your circumstance. Here are some thoughts and observations that may be valuable.
About You
When folks self-identify as mid-career they usually cite 10 or 20 years experience. By my way of thinking that makes you most likely 30 or 40 years old. The majority of these folks say they are involved with IT, perhaps as programmer, analyst, admin, or similar. Some are in an area of operations that is heavily data driven. Most have a bachelor’s degree but some do not. It’s you I’m thinking about in the following notes.
As you read this you may well say ‘this doesn’t apply to me, I’m different’. I am writing for the middle of the standard distribution. If you know you’re a two or three-sigma guy or gal, my hat’s off to you.
Why Switch At All?
Because data science is hot! I mean we are constantly exposed to articles that call this the best job in America and from an insiders perspective I think they’re right. The work is interesting and challenging. We work with tools that are very rapidly evolving for the better. This means we have to be life-long learners and that’s a good thing.
Everyone is aware that there is a structural shortage of qualified data scientists. Companies are actively hiring and can’t find enough of us which is reflected in the well above average compensation.
Presumably there is also a level of dissatisfaction with the rewards from the 10 or 20 years you’ve put in. If you’re looking for greener pastures this is a good place to look.
The Cost of Switching
OK the shortage is working in your favor but let’s be candid. There are some advantages to the years you may have put into IT or analytics but you are also much more likely to be settled with a family, not to mention having 10 or 20 years of increasing earnings in your paycheck.
The single biggest impediment you are going to face is where to find the time to get trained while still earning a living. Second, since you will be competing for these starter jobs with recent grads, the starting pay may actually be less than what you’re earning now.
The best salary surveys for data scientists come from Burtch Works. Here’s the link to their 2016 study: “2016 Salaries of Data Scientists”.
Future prospects may well be brighter but there really is no way to enter this profession from the middle short of having a Ph.D.
Can I Do This OJT? How About MOOCs?
The odds on doing this OJT are slim to none. About 18 months ago when the hiring crisis was worse, particularly in Silicon Valley, Zynga had an in-house 18 month program to grow their own data scientists. I don’t know if this is still going on, but this is the only one I’ve ever heard of. If you work in a company with a decent size data science group it won’t hurt to ask but it’s exceedingly rare and almost always for current employees.
Employers hiring newly minted data scientists strongly prefer a degree from an accredited university. There’s no central clearing house for this sort of information but from reading and my personal conversations MOOCs may indeed provide the most motivated with valuable skills, but when you’re interviewing, the guy or gal with the formal degree will be strongly favored, all other things being equal.
Neither is it going to be faster with MOOCs. There is no six month program to become a competent entry level data scientist. This is not like picking up another programming language. You need to think in terms of 18 to 24 months.
How Much Education Do You Need?
The most favored entry level degree in this field is a MS in Data Science. It’s going to take about two years of course work and experience to equip you at the basic level. Can you do this part time via a distance learning program? Yes, though this is more an issue of self-discipline.
If you don’t have a bachelor’s degree are you excluded? By no means. Increasingly there are bachelors level programs targeting the same skills taught at the Masters level. There is no reason why you can’t get these same skills over the same two-year period in a bachelors program.
Here’s the caveat for both Masters and Bachelors programs. Make sure the curriculum is specific to data science and not more broadly to computer science or terms like business data analytics.
How Do You Differentiate Yourself to Get Hired?
Not infrequently I’m approached by a recent or about-to-graduate masters student asking about internships or job openings. My response is always: Tell me in some detail about the type of problems you have worked on, how you went about preparing the data, what specific algorithms you used to model, what tools or languages you used in this problem (R, Python, SAS, SPSS, other), and what was the outcome.
The most common responses are silence (hey I took the classes what do you want) or way too deep a dive into just one project. Yes I’d like to know if you used R but I don’t want to know anything about the specifics of the code. I want to know that you know what the important issues are in working the data, building the model, and solving the problem.
This is a little like getting money from a VC. You need an elevator pitch (maybe one minute) backed up by three or four specific examples covering all these points, and not exceeding 5 minutes each. If you want an outline it would look like this:
- What was the business problem to be solved.
- How did you get the data.
- What did you do to clean and prepare the data, including any feature engineering.
- What algorithms did you apply and why. (You can tell me here about R, or Python or other tools but I care much more about the algorithms).
- How did you pick the champion model.
- What was the outcome. Or what would have been the financial or business result if the hypothetical case had been implemented.
Once you start internal interviewing with other data scientists in your target company feel free to deep dive. For now, impress me with both breadth and depth preferably in problems that have some relevance to my business.
A good strategy here during your study, either through your school or on your own, is to reach out to local mid-size and large businesses and ask for real business problems you can work on pro bono. Three or four of these ‘reference projects’ that relate to the industry you’re most interested in pursuing will be like gold, provided you use them to illustrate that your learning has had practical application.
If you can’t find a real business problem, you may be able to build reference projects from publically available data sets that relate to your target industry. If you are motivated by Kaggle competitions by all means join them but there are no average or simple problems there. Plus you may get wrapped around the axle trying to implement advanced and exotic techniques that don’t often get used in the real world.
Remember that in real data science you do not get all the time in the world to work on a problem. There is tradeoff between the cost of your time and the value of the solution on one side, and the amount of time you can spend on the other. It is better to be able to talk about the effective and efficient use of your time in problem solving than to talk about having improved accuracy at the 3rd or 4th decimal point.
Opportunities and Markets
There are two major markets for data scientists:
- The one we read most about is the cutting edge development of deep learning or its application in new products, primarily in Silicon Valley, LA, Seattle, Austin, and New York. (Find any recent article on the level of VC funding by city and you’ll have an accurate map of this segment).
- The core data science market including any and all major companies with a significant B2C component in their business. These include (but aren’t limited to) insurance, banking, mortgage finance, telecos, utilities, ecommerce, government, consulting, and a bunch of others. They are located all over the US and for that matter overseas.
If you are a top one-percenter in your knowledge and drive, by all means join the elite companies that can command the graduates of the top schools. Although the most exciting advanced developments are occurring in this group it represents fewer (probably much fewer) than 10% of all data science jobs.
The good news is that 90% of the data science opportunities that mid-career switchers are interested in are located all over the US in largish cities.
Also, current best estimates are that about 40% of US companies are actively using predictive analytics. It’s a tough number to verify but it sounds about right. What this means is that 100% of the largest companies have adopted and that gets thinner as you move down in size.
It’s the extremely rare company that has fully exploited predictive analytics so new pockets of data science opportunity should be popping up continuously in even the largest companies.
Where to Look. Where to Go to School.
As a mid-career switcher you’re probably very concerned about where you want to live. It’s likely that there are good data science jobs pretty close to where you’d like to be.
Similarly, when folks ask me where they should go to school, I usually counter with ‘where do you want to live’. Unless you have the opportunity to go to a university with prestigious name recognition, pick the best school you can close to where you want to live. Chances are that alumni network and local knowledge will work in your favor.
Data Science versus Data Engineers
In no more than the last 24 months we’ve begun to differentiate Data Engineers from Data Scientists. Not everyone uses or understands this distinction yet but they soon should.
The Data Engineering side has much more in common with classic computer science and IT operations than true data science. You can think of this divide as the data scientist starting with the raw data and moving through modeling and implementation. Data Engineers are about the infrastructure needed to support data science.
Ever since the advent of NoSQL data bases like Hadoop, now Spark, into IoT and other streaming methods, and including data lakes as an alternative to EDWs, there is a growing specialty among those who know how to create and maintain these tools on which much of data science relies. This particularly extends to the quickly growing capabilities of cloud and SaaS. If you can learn to set up a Spark instance, data lake, or streaming app on AWS, Azure, or Google, that may be a very comfortable and well compensated middle ground for you to consider.
While it would be nice for you to know all the skills of a data scientist, acquiring this level of skill is not as time consuming and depending on your company might be available to you OJT or via MOOCs.
The Problem with Job Advertising
Differentiating Data Engineers and Data Scientists is the least of your worries when reading job ads. It’s still as common to see an ad for Data Analyst when what is wanted is a full stack Data Scientist, and conversely ads for Data Scientists where the actual tasks and skills don’t exceed SQL on EDW. It may take a while for HR departments to catch up. Meantime be aware of the problem and read the details of the job description.
AI, Deep Learning, Picking a Specialty
Over the last few years the demands of different industries have created specialties within data science. Curiously the techniques remain the same but the way they are applied in each industry varies. An in-depth understanding of each industry’s business model and typical data becomes important. For example, ecommerce looks for people with a working knowledge of how to analyze web logs and build recommenders, while insurance, banking, and mortgage lending are looking at risk, cross sell, and up sell models.
The thing they have in common is that 90% of the work done in data science is still about predicting consumer behavior. Here your prior industry experience may be a particular benefit if you want to stay in that industry.
Separately, people often ask about AI and deep learning. Do your reading. You’ll soon understand that a handful of deep neural net architectures, specifically Convolutional Neural Nets and Recurrent Neural Nets with LSTM underlie all of the commercial AI today. Unless you are particularly motivated this is not entry level material. Sure play around with Tensorflow or Theano but stick with a foundation of modeling about consumer behavior plus some time series modeling and that’s all you’ll need to start.
End Notes
Two last comments. Be aware that one of the ways the market is responding to the shortage of data scientists is to further automate the tools we use. This allows fewer data scientists to do the work of many.
The same folks making these more automated tools would also like the market place to believe that they are simple enough for non-data scientists to use, a group cleverly referred to as citizen data scientists. It’s true that some of the more rote elements of the modeling task can be automated and we’re glad to embrace that. A full understanding of the predictive analytic process however is required or you can build some models that predict exactly the wrong thing.
Finally, your experience is a two-edged sword. If you stay in an industry with which you have experience you will have a real leg up on fresh graduates with little or no job experience. However, if you are on the long side of 10 to 20 years experience, encountering some ageism in hiring is a real possibility.
Don’t let any of that deter you. This is a terrific profession with a great future. Make the most of what you’ve got and go for it.
Good luck.
Some other Data Science Career articles you may find valuable:
The New Rules for Becoming a Data Scientist (2016)
Data Scientist –Still the Best Job in America – Again (2016)
So You Want to be a Data Scientist (2015)
Getting a Data Science Education (2015)
How to Become a Data Scientist (2014)
About the author: Bill Vorhies is Editorial Director for Data Science Central and has practiced as a data scientist and commercial predictive modeler since 2001. He can be reached at: