Background
As an individual who taught myself data science, I have roughly five years of experience as a data analyst and an equal period as a data scientist. While my mathematical skills are above average, I am not truly exceptional. My educational background includes a bachelor’s degree in mechanical engineering, and I have worked alongside several data scientists, four of whom hold PhDs and the rest possess master’s degrees. Even though I may not have the highest natural ability, I have managed to be the second most productive data scientist in my group.
Overcoming Misconceptions
Frequently, I notice members of this subreddit asking about the essential knowledge required to begin in machine learning or data science. The responses typically involve lengthy lists of courses and subjects to master, which can be quite off-putting. Having experienced both sides, I find this attitude prevalent and quite irritating. While I believe that a strong foundation in math is beneficial, the standards for what you need to succeed are often blown out of proportion. Many who advocate for these extensive prerequisites may be struggling with their own insecurities.
Back during my mechanical engineering studies, I completed multiple calculus courses, a linear algebra class, and a course in probability. However, a decade passed before I ventured into data science, and I didn’t retain much of that knowledge. This community led me to believe that I needed to be a master of numerous topics to start my career in data science.
At the beginning of my journey, I enrolled in courses covering coding, calculus, statistics, and linear algebra. While I performed decently, I often felt overwhelmed and struggled to retain information from earlier courses. Balancing a full-time job with learning seemed unmanageable. The constant message from online resources made me hesitant to begin tackling real-world problems.
What is Actually Necessary
In truth, for the majority of practical scenarios, you only require a fundamental understanding of these subjects. Specific projects may necessitate a deeper knowledge, but those situations are unique to each project.
For example, in calculus, you don’t need to perform complex integrals by hand. Instead, grasping that derivatives indicate the slope of a function and knowing that a zero derivative reveals local maxima or minima is sufficient. In statistics, it’s vital to understand concepts like p-values without needing to memorize every test.
In linear algebra, the ability to interpret data rather than solve for eigenvectors by hand is key. For probability, a general comprehension of basic concepts is more valuable than mastering intricate theories.
Understanding software development does not require expertise; being able to write decent code and gradually improve it is adequate. You also do not need to construct every algorithm yourself; a broad understanding usually suffices.
The most critical skill to develop is learning how to clearly define problems, as well as how to assess and interpret the effectiveness of your solutions. If you can accurately define your problem and assess your algorithms with suitable metrics, you can add value to your projects. Often, people seek advice on which feature engineering methods to utilize or which algorithm to choose, but the real answer is usually to experiment, evaluate, and adjust as necessary.
Despite industry claims, very few individuals possess expertise in all areas. Rather than trying to master everything beforehand, it’s essential to learn continuously on a project-by-project basis. Gaining practical experience leads to growth and unexpected improvements. Remember to focus on the fundamentals and get started; waiting to become an expert in every domain can stall your progress.
My effectiveness comes from five years spent as a data analyst. Through extensive exploration of the company’s data, I became adept at visualizing issues clearly, which often leads to solutions. My diverse assignments across multiple departments, such as marketing and technical support, have allowed me to understand the broader context of the company better than many other data scientists. I have accumulated essential insights and strategies that proved beneficial.
Guidance for Self-Taught Individuals
Having participated in the hiring process a few times, I recognize the challenges in the job market. Securing a data science position solely through online courses and side projects can be tough unless your projects are outstanding. However, I believe anyone can follow a similar trajectory to mine.
I began learning basic SQL and visualization tools, completing several projects before landing a data analyst role within a mid-sized company (100-200 employees), where data scientists and analysts reported to the same supervisor. While the current entry-level barriers may be higher than they were a decade ago, they are not insurmountable. My advice is to look for roles aligned with your unique background and make sure to highlight that connection on your resume, no matter how minimal.
Initially, I thrived as a data analyst and communicated my aspirations to transition into data science. My supervisor then gradually involved me in data science projects, allowing me to contribute to running queries and developing performance dashboards.
He informed me that I needed to learn coding and machine learning independently to grow into a DS role. As I sought to transition, I encountered numerous challenges due to attempting to master an overwhelming number of topics at once, as suggested by various online platforms.
Ultimately, I enrolled in DataQuest, thinking I would acquire all necessary skills. Although the course won’t make you proficient immediately, its structured approach facilitated daily coding practice, which is crucial for language acquisition.
Once I achieved basic coding proficiency, I embarked on an independent project, an essential step in my development. This project was particularly meaningful to me, and my continuous interest led me to commit to its success. I sought answers to problems encountered during the project, mirroring real-world experiences.
After six months in training and a substantial background as an analyst, I persuaded my boss to assign me a data science project, which I led, benefiting from guidance from an experienced data scientist. My success in this project led to my promotion to data scientist, with growing independence in managing subsequent assignments. Within our collaborative environment, we frequently share our progress to enhance our skills collectively. I have had two promotions since beginning my work in data science.
You might navigate this journey faster than I did; I spent considerable time being indecisive. Utilizing resources like ChatGPT can significantly accelerate your learning, but it’s essential to apply critical thinking when using them.
Summary: This statement encapsulates my journey, emphasizing that deeper knowledge is a continuous pursuit. One does not need to master every domain before starting or excelling in this field. Comprehending the overarching concepts is vital, but delving deeper should occur as situations arise. Learning becomes more impactful when applied directly to real-world applications.
Edit: I intended to clarify that learning about different topics doesn’t imply you shouldn’t explore further when necessary. I consistently seek new knowledge, particularly in various niche areas. My primary message is that while depth is beneficial, a foundational understanding is enough to begin or excel in data science.
Edit #2: To clarify further, I do not claim that having advanced degrees is unimportant. While having a master’s or PhD is advantageous, this advice is directed toward those who wish to break into data science while managing full-time jobs without incurring substantial debt or taking years for additional degrees.