June 08, 2019

Prediction Machine Eyes

Some Thoughts on Transitioning Into Data Science

I have spoken with a number of individuals over the past year who, similar to me, did a business undergrad and are interested in ‘getting into data science’. What I often get asked is my view on various programs that are offered online or by universities, and what I might recommend for someone looking to build a skill-set that would enable them to transition into a data science role.

I do not pretend to be an expert about data science in industry (I am just in school after all). That being said over the past couple of years I have spoken with numerous individuals within industry, both those who are relatively junior, and those who have hiring authority. As well, I have my own academic experience to draw upon. With this in mind, I offer the following thoughts:

Robust Skill-Sets

Ask yourself what type of skill-set you want to get out of an educational program (be it from school or self-study). If you want to build a skill-set that is robust over time I firmly believe it must include a solid mathematical component. Learning how to program in the hippest new programming language is not particularly hard, and I feel it can be picked up as needed (particularly for data science tasks which require less programming knowledge than software engineering). Understanding math is hard, and requires a huge amount of time and mental effort. Neural nets may be yesterday’s fad in a few years. However, what I am certain of is that whatever the next hot machine learning/data analysis/quantitative ‘thing’ is, it is going to require a heavy dose of math and statistics to understand.

Paying the Price

In my time at UofT I have come to strongly believe there is no shortcut to understanding statistics or math. I have also learned that it is more or less impossible to fake understanding these things. I have heard from individuals in industry that within data science ‘imposters’ are rife (i.e. people who use buzzwords but do not understand what those buzzwords actually mean). If you are interested in developing a skill-set that is valuable for data science, understand that it requires a tremendous amount of time and effort. I think if someone lacks much of a math background from their undergraduate studies (business calculus does not count as a math background) they would be hard-pressed to make a successful transition without returning to school (i.e. a real brick-and-mortar school) to take classes.

University Programs

If you lack a math background I would suggest taking a year to fill that in. I did this as a non-degree student at UofT and what I learned in one year served as a solid foundation for my Master’s in Statistics.

Your ultimate aim should be enrolling in a 1yr-2yr Master’s program offered through a Statistics/Computer Science/Engineering department. I have a very high opinion of the Master of Science in Applied Computing (MScAC) Concentration in Data Science offered by UofT since it combines 1yr of courses with an internship. A program with the potential to get a full-time offer from the firm you intern at is pretty great in my opinion.

One thing that always comes up is the value of an Master’s in Analytics/Data Science/AI/ML from a business school which have popped up in huge numbers in recent years. If you have an undergraduate degree in math/statistics/engineering I think these programs serve as a means to build a portfolio and land a job since the business schools are very good at getting companies to recruit their graduates. For someone without a degree in these fields I do not see much value in these programs since these programs will teach you algorithms driven by math you will be unable to understand. If you are a mid-career individual who is just looking to learn more about the field (but do not aspire to be a full blown data scientist) I can see some value in these programs, although the price tag seems quite steep (~$60k for some of them) for what you get, unless of course your firm is the one paying.

What Are These Algorithms Doing?!?!

I have heard from numerous individuals in industry that they use algorithms and do not understand what is going on ‘under the hood’, and wish they did! I think this speaks to the value that comes from studying math and statistics rigorously, as the concepts you learn in these classes serve as the basis of these algorithms.

Programming

You would be well advised to take an online Python class (this one is good) if you do not know much programming already. Also, familiarize yourself with pandas (specifically filtering and joining) by working on a dataset. Learn Python, and not R, since from my discussions with individuals in industry it appears Python is used more-or-less everywhere.

I hope this is helpful. Again I am no expert, make sure to consult lots of sources in figuring out what is best for you!