Your complete guide for data science bootcamp II

Your complete guide for data science bootcamp II

Learning when to stop learning

First of all, apologizing for taking this long to come back. With a new job and the whole COVID-19 situation, I took a small break. But, I will try my best to come back here and write as often as I could.

I hope you have read my first article on Data Science Bootcamp. Last time, I gave a high-level view of a data science Bootcamp based on my own experience attending WeCloudData’s Immersive Data Science Bootcamp. That article serves as a general guideline for those who are still thinking/hesitating on whether or not he should be taking a Bootcamp and what life will be like once decide to join. This time, I want to share some learning resources and tricks I used during my time at the Bootcamp and even during job hunting.

Note: My chief intention is to share with those who also don’t have any programming experience but plan to go through the data science Bootcamp and get a relevant job at the end. Of course, some of those tips can also benefit to whom are indeed from more of a software background. Kindly skip it if you don’t find this to be valuable to you.

I will break down resources by topics and follow the learning schedule while I was in the Bootcamp:

(1) Learning, and Project Tracking:

Daily Planning:

You are probably surprised that this is the first thing I’d like to address instead of Python or SQL. Don’t underestimate the power of keeping things planned and organized. I do understand everyone works differently and best with different strategies. But try to set some daily objectives, and summarize them along with the questions you have during the day in an Excel Spreadsheet or preferably a Google Sheet since you access it anywhere and using any device. The deeper you go down the learning path, the more meaningful it is if you doing so.

Project tracking & Documentation:

Please start considering using a project & process tracking tool to help you manage every single of your projects, and to boost your work efficiency and productivity. You really should make this a habit of yours if you haven’t done so. Most of the “academic projects” during the Bootcamp learning period expect a quick turnaround, and you will be working by yourself. Therefore in those situations, it appears less “beneficial” to you than it seems. But trust me, having a sense of where you are during a project lifecycle is very important and it should be embedded into your day-to-day.

When I was still a student at the Bootcamp, most of the students like to keep track of their project progress in a notebook or “conveniently” in a Jupyter Notebook (me being with one of them), but the issue with this approach is that you don’t see the whole picture. There should be better solutions out there for sure. You always want to have a holistic view of your project (structure-wise and process-wise) and knowing what to do next.

We used the classic combo: Jira and Confluence later on when students are working on more of those collaborative and team projects. However, since many of them did not use a project tracking tool before, we found it is rather slow to help them adapt and start using these tools. This is especially painful when you are trying to communicate your work with people outside of your little team, sometimes you were not sure where you were at, and which version of your work should you use to present?

The resources I used to help me stay on top of my works are:

Project & Progress Planning & Tracking: Airtable, Jira

Project ideation & Data Pipeline: Mind map, DAG

Documentation & Codes: Github, Confluence, Google Docs

(2) SQL

I highly recommend this resource: SQl Tutorial if you are fairly new to SQL. This resource is quite thorough and I can confident to say that 90% of the SQL tasks I’m doing day-to-day are covered from this resource (the remaining 10% are results of complex and messy real-world data). It is still useful when you plan to step up the game and want to learn more advanced SQL functions like “Window Functions”.

You can always Google your SQL questions, and there’s a high chance that people were in your similar shoes and asked those questions already. One side-note, please always remember to include the version of SQL you are working on in your search query whether it is MS SQL Server or BQ or other vendor-specific ones.

Other than the above mentioned, I believe the SQL resources you got from your Bootcamp are comprehensive that you can rely on them. I intend to share some of the resources that I found to be useful in terms of helping me understand the concepts better and digest well when I was learning those topics and not trying to replace the Bootcamp materials.

Two more important resources worth mentioning here are Hackerrank and Leetcode. These two resources helped me greatly when I need to prepare for my technical interviews. You will be able to learn a lot from the discussion board, but before you do so, please try to solve it yourself first.

(3) Python

Python is a huge topic. From the very basic syntax to the most advanced/optimized programming techniques, data structure, packages for comprehensive data analysis and machine learning modeling..etc. Honestly, if you are first time learning programming and you are learning it through a Bootcamp, don’t force yourself to learn and understand all of the above-mentioned topics or at least not at the beginning (if you are a computer prodigy, then this does not apply to you). Knowing when to stop learning the more specifics (harder concepts, optimizations…etc) is a skill that is so critical that helps you to go through the Bootcamp. You are not going to be a tier 1 Python programmer in 3 months, but you will be able to keep developing your programming skills even after you land a job (a related one of course).

Grabbing the basics are always necessary, and for that purpose, I recommend you to take the basic Python training course from DataCamp or Geeksforgeeks (I used the latter one the most).

In terms of learning Python for data analysis, and I think this is where Bootcamp truly shines. The curriculum was well designed and very practical, and I believe as long as you stay with the curriculum and you should be fine. If you want to consolidate the learning, I recommend this book “Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython” and it is written by the father of the famous Python Data Analysis package “Pandas”.

Again, if you finish the Bootcamp and currently in the job hunting phase, that I would recommend you spend an hour each day in LeetCode or Hackerrank to try to solve some of their Python coding challenges. Based on my own experience, the likelihood of you getting a coding question from your technical interview is very high, and the chances are the actual question you get in your coding interview is in more or less the same format as those you encounter in those two sites. If you ever get stuck, you can always get hints from the discussion board or simply search the exact question online.

(4) Statistics

When I was a student in the Bootcamp, we did not get dedicated sessions just to learn statistics but rather they are embedded in most of our classes. However, based on my observation, even the students that graduated with a math or stats degree benefit a lot from a refresher. I studied Economics in my undergraduate days, and back then, we had the Econometrics course that taught us a lot about statistics for science and economics theories. 3 years passed, the only thing I can remember If I try hard is which friend of mine took the same course with me and which semester was that. Therefore, a trip down memory lane back to all the basic statistics was super necessary.

For the above purpose, I highly recommend a Youtube channel called “StatQuest with Josh Starmer” and its videos are really easy and even fun to watch. Most importantly the videos are organized in a sequence so you can follow along.

After you acquire the basics, the most effective method I think is to learn it case by case. For example, if you are working on a dataset in which you observed outliers, then through some research you found out some popular methods include using IQR or Z Score, then you can search them up and learn everything behinds them.

There is always a hardcore alternative if you prefer more in-depth and in and out the type of learning, and I would recommend you this book: “The Elements of Statistical Learning” like thousand of other industry professionals. Why? Because it is good.

(5) Machine Learning

Your Bootcamp course and materials should give you a great start. You can only progress further and better in machine learning through a ton of reading/learning, and a ton of practicing through working on projects. But before you type your first line of code, please ensure that you have an answer to why you will need a machine learning model to solve the problem first.

I completed about 2 - 3 machine learning projects throughout the Bootcamp, and honestly, now I look back, although my codes were not the cleanest, and there are huge rooms for improvements but then my logic at least was correct.

With the rapid growth in technology and strong demand from this domain, there are so many out-of-the-box AutoML solutions like H2O, Auto-Sklearn, Auto-Keras available to us. It is critical to learn to tackle the ML problem in a correct way than learning the actual coding piece. Truth be told, the actual machine learning codes only count for a small fraction of any real-world machine learning system.

The resources that I found to be very useful besides those I got from the Bootcamp itself are listed below:

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (this is my favorite machine learning book of all time, and till this day I’m still learning from it. Most importantly, you will learn the right way to solve a problem from the beginning to the end)

Towardsdatascience (It is a knowledge (not “fact”) sharing platform based on Medium where thousands of contributors exchange ideas and their understanding of machine learning. Please be super careful when following the articles posted on this site, as most of them are not reviewed by known experts and just based on their first-hand experiences)

Analytics Vidhya (Another popular website, don’t bother about their training programs and see if you can find some great blogs on the topics of your interests)

One last thing to talk about is that I found a lot of my machine learning knowledge aside from doing projects also comes from my conversations with other professionals who work in this field. I strongly believe a lot of my breakthroughs happened after I spoke to someone and got their opinions and feedbacks on my works. Please be sure to talk to others, technical or business, you will never know where you get the inspiration.

This marks the end of the article and I hope you enjoy reading it.