In this article, I share 21 pieces of advice I’ve learned from other data scientists, as well as my own experiences over the last few years. Depending on how advanced your career is, some of these tips will appeal to you more than others. “It takes time” may not be very relevant to someone just starting.
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a broad range of application domains. Data science is related to data mining, machine learning and big data.
Data science is a “concept to unify statistics, data analysis, informatics, and their related methods” in order to “understand and analyse actual phenomena” with data. It uses techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, information science, and domain knowledge. However, data science is different from computer science and information science. Turing Award winner Jim Gray imagined data science as a “fourth paradigm” of science (empirical, theoretical, computational, and now data-driven) and asserted that “everything about science is changing because of the impact of information technology” and the data deluge.https://en.wikipedia.org/wiki/Data_science
Data Scientist Tips: The simplest solution is often the best solution.
Being a data scientist doesn’t mean solving all problems with machine learning models. If the CASE WHEN query is good enough to get the job done, just use it. Don’t build a 10-layer neural network when linear regression is enough to get the job done. A simple solution has many advantages, such as faster implementation time, reduced technical debt, and overall ease of maintenance.
Data Scientist Tips: Make a conscious effort to discover and explore new libraries and packs regularly.
It’s easy to stick with the familiar, but new tools are created for a reason. It was created to fill a gap in the existing one. By taking the time to explore new libraries and packages, I’ve found some great tools that can save me a lot of time. Here are some of them:
- Gradio is a Python package that allows you to create and deploy your web application for machine learning models in just 3 lines of code. It serves the same purpose as Streamlit and Flask, but I’ve found it much quicker and easier to deploy mockups.
- Pandas Profiling is another package that automatically performs exploratory data analysis and integrates it into reports. This is very useful when working with small datasets. The best part is that it only requires one line of code.
- Kedron is a development workflow tool that allows you to create portable ML pipelines. Apply software development best practices to your code to make it reproducible, modular, and well-documented.
Data Scientist Tips: Being efficient doesn’t mean taking critical steps quickly.
Some steps cannot be rushed. In particular, you should take the time to deeply understand the business problem you are trying to solve and the data you are working with.
Data Scientist Tips: Metrics are more important than the model itself.
This point is somehow related to the previous one in the sense that you need to have a really good understanding of the problem you are trying to solve. Machine learning is a fancy word for statistics and optimization, so in addition to understanding, understanding which metrics you want to optimize is also a problem. For example, you can create a model that is 100 times more accurate, but it’s useless when you’re trying to develop an anomaly detection model.
Data Scientist Tips: Your job depends on your ability to communicate it.
People fear and avoid what they don’t understand.
He must be able to convey the terminology and his modeling techniques in a way that the layperson can understand. If you’ve taken the time to create a great model, take the extra time to effectively communicate it so people can appreciate your efforts.
Data Scientist Tips: Learn the basics, especially statistics.
Data science and machine learning are essentially modern versions of statistics. Learning statistics first makes it easier to learn machine learning concepts and algorithms.
Data Scientist Tips: Know the parameters of the problem you are trying to solve.
This is best illustrated with an example.
For one of my projects, I needed to develop a model to predict if a product should go through her RMA. At first, I thought the input consisted of all products, so it turned out to be more like an anomaly detection problem. It wasn’t until I understood my business needs and how the model was used that I realized that all of my model’s inputs were RMA-issued products (customers emailed me about product issues). This greatly improved the balance of our data and saved us a lot of time.
Data Scientist Tips: Don’t underestimate the power of SQL.
SQL is a universal data language. SQL is arguably the most important skill you need to learn for any data-related profession, whether you’re a data scientist, data engineer, data analyst, or business analyst. SQL is not only important for creating pipelines, fetching data, and wrangling data, you can now actually build machine learning models using SQL queries. BigQuery ML lets you do just that.
Data Scientist Tips: Treat data science like a team sport.
One of the biggest benefits of being a data scientist is the level of autonomy you are given. However, this can easily go awry if you don’t want to seek advice, help, or feedback from others.
Despite its autonomy, data science is a team sport. Advice and feedback from multiple stakeholders, including end users, domain experts, and data engineers, should be considered.
Data Scientist Tips: Don’t waste your time memorizing everything.
Trying to memorize everything is too much. Also, it’s a big waste of time. You better practice googling your question so that you get the answers you need. I also create a Google Spreadsheet to store very useful links that I refer to often. For me, I like to include links to cheat sheets, crash courses, and commonly googled questions (e.g. mail regex code).
Data Scientist Tips: Deploy fast, iterate fast, and get continuous feedback.
It is important to keep in constant communication with other stakeholders to keep them updated and get feedback on their thought processes and assumptions they are making about their models. Otherwise, you may end up with a model that doesn’t solve the problem at hand.
I use Grade to create a web UI for each iteration of the model when sharing it with stakeholders, especially non-coders. I find Gradio very useful for the following reasons.
- You can interactively test different inputs to your model.
- Get feedback from domain users and domain experts (who may not be programmers).
- Requires 3 lines of code to implement and can be easily distributed via public links.
Data Scientist Tips: View the entire project.
Just like creating a model, you are responsible for implementing it. Gone are the days when a data scientist could hand over a junk Jupyter notebook to an engineering team for implementation. Data scientists today are like data scientists slashing engineers and slashing product managers.
Data Scientist Tips: It’s all a sales pitch.
As a data scientist, you’re always selling yourself, whether you’re selling a new idea or a model you’ve built. Similar to point 5, you should be able to communicate the business value you get from every idea, model, and the project you undertake.
Data Scientist Tips: Create a sustainable schedule for studying consistently.
If you want to learn, learn right. You may have heard of the forgetting curve. Simply put, if you want to keep new information, you need to consistently learn data science and practice what you learn.
Be honest with yourself and set a schedule that you can stick to. But consistency is key.
Data Scientist Tips: Learn how to use Git and GitHub.
Learning software engineering best practices goes a long way. Version control is one of the most important practices especially since every company uses it.
Data Scientist Tips: Learn by doing.
You can learn and retain more knowledge and skills not only by studying but by practicing. Just like you do your homework after learning a new concept in school, you should always apply what you learn to your projects.
Data Scientist Tips: Stay in touch with what’s going on.
As you explore new tools and libraries, it’s important to keep up with the latest in data science. This keeps your skills and tools as up-to-date as possible.
I like to do this by reading publications, watching YouTube videos, and reading company blogs such as Airbnb, Uber, Google, and Facebook.
Data Scientist Tips: Learn to use divergent and convergent thinking.
This is a very useful technique to use in data science, so you can be sure you’ve exhausted all your options. Divergent thinking means considering multiple solutions to a given problem, while convergent thinking means narrowing options down to one solution. This is especially useful when doing EDA and choosing which model/algorithm to use.
Data Scientist Tips: Start Career Document.
This is something I hadn’t heard of until my friend Udara wrote about it. It’s essentially a journal or diary for your career. Unlike resumes that are aimed at employers, career documents allow you to look back and reflect.
Data Scientist Tips: Learning how to set expectations can make a big difference to your career success.
less promises. We bring you more.
This is especially important for data scientists, as they can spend as much time as they need building models. Data scientists can create mediocre models using automated ML libraries or near-perfect models, but it takes months to complete. Whatever you decide to do, it is important to manage expectations so that your stakeholders are not disappointed. Among other things, this means managing expectations regarding schedules and model performance.
Data Scientist Tips: Find a mentor that you respect and who can help you.
One of the greatest things that happened to me in my career was finding a mentor who was extremely knowledgeable and genuinely cared about my success. I think I learned twice as much from him as I normally would.
Thank you for reading! I hope you have one or the other! I truly believe that this advice has been very helpful in my career. As always, I wish you good luck with your research.