Getting Into Data Science Today? Here’s a Simple Framework you should follow.
From my personal experience!
When I started as a Data Scientist in 2017, things moved slower and the barrier to entry was lower. I took my first steps into this field, driven by a love for math and the spark from Andrew Ng's Machine Learning course.
With the foundation set through this course and a few Kaggle competitions, you could land a job. But that era is over! The space is evolving rapidly:
🚀 Tooling is expanding at lightning speed.
⚔️ Competition is fierce.
🔀 Roles are blurring between DS, ML, and software engineering.
Let’s be honest: just knowing the basics isn’t enough anymore!
❌ What used to be enough (but isn’t anymore)
The below might help build your foundation, but they won’t get you the job:
1. University degree in Data Science
Great for building theoretical knowledge and digging into the basics and the mathematical foundations behind models, but unfortunately this won’t be enough anymore.
2. Udemy/online courses
Good starting point, especially if you didn’t study ML/DS formally, but they won’t showcase your ability to solve real-world problems.
3. Titanic or Kaggle toy datasets
Fun for learning, but far away from the messy, ambiguous problems you’ll face on the job.
Meanwhile, even “junior” roles often require 3+ years of experience 🤯
So how do you get in?
Yes, many out there are not fortunate enough to get a golden ticket or a shortcut into the world of DS/ML, and have to find or dig their way into it. I was in your shoes once: no golden ticket, no shortcut.
What matters and what you should additionally invest in is: introduce yourself to a real-world DS problem and here’s a step-by-step framework that worked for me👇
✅ A Hands-On Path to Real-World DS Skills
This is how you build practical, demonstrable experience even before landing a job.
DATA: Pick a domain you care about
Find a dataset that’s close to real-world complexity and in the domain that matters to you or resonates the most with your current background and interests. Trust me this will open your eyes. 👀
Example data sources I used:
Stock market data from Yahoo Finance
2007-09 Panel Survey of Consumer Finances
Unlike Kaggle datasets, these aren’t pre-cleaned or tailored for a single ML task. They’re raw, unstructured, and ambiguous just like the data you’ll deal with on the job.
💡 You’ll learn:
The pain of collecting and structuring messy data
That 80% of data scientists job is just cleaning & wrangling
How to ask better questions from the data and develop visualisation & story telling skills
That often you can only work with a sample of the data and not the entire data due to computing/resource constraints
MODEL: Solve a problem that matters to you
Pick a problem that you are inclined to solve, this should already be in sync with the data collection step. What business impact can your model bring?
Example problem in my case:
I wanted to explore how market crashes affect investor behaviour
Build a model to predict risk aversion score of an investor
And then generate a balanced portfolio for that investor
💡 You’ll learn:
Feature engineering: oh its so much fun having to drill down to those 10 magical features that matter to your model amongst the 500 available features in your dataset 🥲
Model experimentation (including Hyperparameter tuning): play close attention to model assumptions and the mathematical structure of the models you will use, and of course the pain of keeping track of every model you experimented with.
Iterating between data collection step and modeling like in real projects until you are satisfied with the error metric or explainability.
DEPLOY: Make it live, even temporarily
Use a cloud provider’s free tier to deploy your app: be it AWS, GCP or Azure.
How I did it:
I created a minimalistic app (using Flask) for my solution and built CI/CD with Github Actions
And deployed the app to Google Cloud Platform - Cloud Run for 3 months (first 3 months are/were free 😉)
💡 You’ll learn:
CI/CD basics
APIs, load balancing, cloud metrics monitoring, etc.
Serving models to actual users
DOCUMENT: Write a clean README
Explain your steps and how to run the project locally or access it online. Build a simple architecture diagram.
💡 You’ll learn:
How to communicate your solution
Building of High level architecture diagram
Clear, concise documentation = real-world superpower
GITHUB: Open-source it
Add Python best practices such as black formatter, a few unit tests with pytests, environment file, requirements, pre-commit hook and make your project public. Improve it over time, and link it to your CV and LinkedIn.
💡 You’ll learn:
Git workflows
Python packaging and best practices
How to collaborate incrementally
REPEAT: Start from step 1 again with a brand new data source
✅ Other Tips That Helped Me Break In
1. Start from within:
Start within your company itself, its not always necessary that you have to look for this role outside, if there is a DS team in your company, start approaching them, make connections with them, attend their regular guild sessions, connect with the team lead, showcase your knowledge and your hunger for this domain, share your repository, and trust me an opportunity could direct pave its way to you.
2. Build your online presence:
Post your learnings on LinkedIn, connect with others in the field, join conversations.
3. Keep showing up:
Continue learning and exploring, turn your spark into Momentum, attend workshops/meetups and make connections and keep up with it, your time will come!
✅ Want to really stand out?
Once you have your hands in fundamentals, get some knowledge on:
A/B testing
Monitoring & alerting
Dockerizing models
Building simple E2E pipelines
Load testing & handling high request volumes
Batch vs. real-time predictions
These are the skills that turn you into a production-ready Data Scientist.
So where should you start? visit github.com and create your first repository NOW!
“It takes time, persistence, and patience, but if you stay curious and keep showing up, you’ll get there!”
Stay tuned - I’ll be sharing one of my pet Data Science projects I did back when I was in university that you can use as an inspiration and customize it for your journey. 🙌
P.S. If you’re looking to get your certification in AWS Certified Machine learning, here’s a good place to start:
Great article Shirin!!
Great!