Are you still wondering what the day-to-day of a professional data scientist looks like? What does a data scientist do in the industry?
Let me shed some light on this, and share a few tips I learned, that can help you succeed in your job.
Despite the colossal amount of educational content on Data Science, it is hard to understand what the day-to-day of a professional data scientist looks like. Let’s address this blind spot in this article.
The role of the data scientist
As a data scientist, you are essentially an “interface” between 2 teams:
- The data engineers 👷
- The business people 👩🏻💼
Your role is to build bridges (aka data products) that translate abstract data into high-quality business decisions.
The more you talk to both worlds, the more effective your work will be. However, each of these 2 teams speaks a “slightly different language”.
Talking to the product lead
Business stakeholders, like Product Leads, are focused on setting and hitting clear business outcomes. You need to talk to them regularly, to make sure you solve the right problem for the company.
For example, your Product Leads says things like:
👩🏻💼: “We want to increase user retention by 5% by the end of this quarter”.
Cool. You know WHAT you need to solve.
Let’s now move on to HOW you can solve it. For that, you need relevant, high-quality data. Without high-quality data, you cannot measure retention, and hence you cannot measure your progress. Without high-quality data, you will fail.
Talking to data engineers
Data engineers take care of the infrastructure necessary to make high-quality data accessible to you. So they are your best ally at this stage.
Back-and-forth conversations between you and the data engineer are a MUST if you want to succeed as a data scientist.
Good conversations between data engineers and scientists result in concrete actions. For example:
- let’s add Facebook third-party data to enrich user profiles, or
- remove duplicate entries in the transactions table, or
- make the data available to frontend dashboards.
Once you have high-quality data and a clear business outcome, you are ready to do your data science magic.
The usual “data science” magic
Three ways of solving business problems using data are:
Build a dashboard with Tableau/Power BI
Build a user retention dashboard that the Product Lead can use to break down this metric by relevant user properties (e.g. geo, age). Dashboards are a great way to keep the conversation flowing between product people and you.
I personally recommend you start with this.
Run a data exploration
Explore the data yourself to find the low-hanging fruit (aka quick wins). For example, you might find that certain Facebook campaigns bring low-retention users, so you ping the marketing team to stop them. Quick and easy win. I love these.
Train a Machine Learning model
Sometimes you need to bring out the big guns and use Machine Learning. For example, you could build a churn-prediction model, to identify customers who are likely to churn. With this info, the marketing team could send offers to these users, and keep them active.
My advice: Machine Learning is very tempting. But often, you do not really need to use it. Try #1 and #2 before resorting to ML.
How to get there: project-based learning
Every professional data scientist needs to master a few skills to implement any of the 3 solutions mentioned above. The question is then, what are these skills?
Which skills do I need to master to become a professional data scientist?
In my opinion, any data scientist should know:
- SQL: There is no data scientist without data. And to query and extract the data for your projects you need to master SQL. Without it, you will be slow and dependent on data engineers.
- Python: The main programming language in data science and ML, thanks to its vast ecosystem of open-source libraries.
- Presentation and visualization: a data scientist is an “interface” between business stakeholders and data engineers. As such, you need to talk and present information in an actionable way, focusing on its business impact.
- Machine Learning (ML): ML is about building software from data. It is used to automate and improve operations and business decisions.
- (A bit of) Cloud services: Most companies have their infrastructure in the cloud (e.g. AWS, Google Cloud, or Azure). It is important you feel comfortable working in a cloud environment and building solutions that integrate with cloud services.
- (A bit of) Deep Learning libraries: If you wanna dive deep into computer vision or natural language processing, you need to understand neural networks, how to train, and how to fine-tune them.
Most people follow a course-based approach, where they start many courses (and complete a fraction of them). This is not what works best for me.
Instead, I suggest you learn by following a project-based approach.
→ Pick a problem you care about
→ Find data relevant to it.
→ Build a solution (either of the 3 mentioned above) and make it publicly accessible (e.g. GitHub).
The only way to learn data science is by solving data science problems.