Jul 23, 2020

Stay Organized and Work Smart - Habit to Increase Productivity

A glimpse of the way I work

Depending on where you work and who you work with, your responsibility and workload as a data scientist can change drastically. Almost everyone I spoke to who work in this industry has developed ways to make their everyday work life a little easier.

It could be a concept checklist, an algorithm cheat sheet or a collection of data cleaning and transformation scripts...etc. The point is to allow you to focus more on the important issues which your work is indeed trying to solve.

Different Tasks, Same Solving Process

Build your data science process guideline

If you work in a dynamic and fast-paced environment, and this means your work changes from time to time. Last month, you work on Sell-out Forecasting, and the month after, you work on Market Basket Analysis. Last week, your job is to clean the messy data and trying to make sense of the data. This week, you are finally at the stage of building your first machine learning model...etc.

No matter what projects you are assigned to, the underlying data science process remains unchanged. You do not want to waste your time on thinking about what to do every time a new project kickstarts. You should keep general documentation with several different examples which can tell you what needs to be done during different stages of the project.

Furthermore, if you work in a relatively small team, and the expectation for you is to build data science solutions end-to-end by yourself, then you will benefit enormously from having this documentation to ensure your work is on track.

An example of the process guideline I built in the past shown below:

You should build your own variations which should be referencing your past works or experiences. It also largely depends on the software and tools of your choice.

Data Science Project from Scratch by Adam-min.png

Same Task, Different Data

Build a library of useful functions and codes

It is common that you want to achieve a similar outcome but with different data. It is always a good habit to recycle some of your "old", known to be good codes and keep them into separate notebooks and reuse them the next time you encounter a similar question.

In my past project demo on topic modeling, I compared results from different topic models on different year's customer review data. To build a single topic model using the package called "Gensim", I need to go through several preparation steps including extract text-only data, tokenize, clean, lemmatize, transform to bag-of-words representations, and build a vocabulary of all words...etc. It is probably ok (but not a good practice) to copy and run the same lines of code several times to get the desired output. But a better way to do this is to define a function which includes every step I took and then save it in my library of useful codes.

Furthermore, for each type of machine learning problem, you can have separate notebook files to document important steps and working examples to save yourself time from the process of handling data.

Screen Shot 2020-07-23 at 4.34.34 PM.png

Different Tasks, Different Data

Actively summarize your work, make it a habit

Although the tasks are different, and this time the data are different too. But eventually, as a data scientist, your research ability and problem-solving skills including asking for help from others will get you the solution you wanted.

By the time you completed your work, you should spend sometime reflect on the works you did and take notes on the problems which you try hard to solve. You keep adding new things to your "library" and you should review them once in a while even though you don't use all of them. This process will help you get better at what you do and make you more reliable to your colleague and your team.

Adam Qin

Some rights reserved

Except where otherwise noted, content on this page is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International license.

OnlyAdam's

This site is intended to host my data science project demos and sometimes I share my thoughts too.