Approaching (Almost) Any Machine Learning Problem Book Review

Asim Zahid
4 min readNov 30, 2021

In this book review, I will briefly discuss a few learnings from the book, why you should read, and how to approach it.

First and foremost, the question is why you should read this book and put time and effort into it, In a world where there are tons of content available on machine learning?

In my view, there are a few reasons,

  1. This book is written by quadruple Kaggle grandmaster Abhishek Thakur.
  2. It’s written on a practical approach with real-world data with real-world use cases.
  3. Not a traditional book, but with years of experience, from understanding the problem to deployment on scale, and winning multiple world-class competitions.

I think these are enough reasons to learn from someone.

Let’s clear a few things right off the bat.

Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.

At Kaggle, more than 800k data scientists, AI engineers, research scientists, etc. compete, collaborate, and contribute to data-related problems.

Being a grandmaster is one of the highest ranks on Kaggle and proves your capabilities.

Now, Let’s discuss how to approach the book to get the best out of it.

We have to understand, It’s not a traditional book, it expects you to have a basic understanding of machine learning. It takes a very practical approach, with impactful real-world examples. Read the book and code along with it. Practice the learnings on one of the Kaggle tabular playground series(TPS) datasets or an active competition. Find a friend, colleague(preferably senior) and discuss the topics and concepts. Explain the concept to someone in plain words, this technique is called Explain Like I’m 5(ELI5).

Let’s discuss the contest of the book itself.

The book starts from setting up the environment and ends with serving your model in production at scale.

It covers the most used techniques and algorithms you would likely use in solving an ML problem. It gives you a framework to structure your project, which will help you in rapid experimentation in business and competitions settings. Using the framework you would be able to just plug & play the model and ready to train to collect the experiments stats.

It teaches you the techniques to build effective pipelines and prevent data leakage. You will learn cross-validation importance and evaluation matrics, what are different eval. metrics, how and where to use them.

Now my favorite topic of the book features engineering. You will learn amazing mind-blowing techniques to handle categorical and numerical features. Next, you will learn how to choose important features out of the dataset.

In the hyperparameter optimization chapter, it's advised to tune parameters one by one manually. This practice will help you understand the math behind the algorithm and a time will come when you would select far better parameters than any hyperparameter optimizer out there.

Afterward, the author touches very briefly on computer vision and NLP problems. Both of these fields are vast both in breadth and depth. Basics and commonly used effective methodologies of both are discussed with examples including transformers.

You will also learn about bagging and boosting, ensembling, and stacking techniques. The industry has started adopting ensembling because the response time is less than 500ms in real-time systems and we can deploy multiple virtual machines with different neural networks.

Last but not least, You have created a state-of-the-art(SOTA) model according to your business needs. Great work. But you have to publish it somewhere, maybe on some virtual machine, or create a package to open source on GitHub or show the demos to the recruiters. To achieve this, In the last chapter, you will learn the power of reproducible code, dockerize the model and serve the model at scale.

The author also helps you understand which algorithms and techniques to use with which dataset and what evaluation metrics are better on different datasets.

Conclusion:

I hope you will love this book and it will tremendously increase your abilities.

Did you find this article useful? Give it a clap 👏, share with the community, have some thoughts, or I missed something? Share with me in the comments 📝.

Connect

The author is a research scientist with a passion to build meaningful impact-oriented products. He is dual Kaggle expert(Dataset & Notebook). A former Google Developer Student Club Lead(GDSC) and AWS educate cloud ambassador. He loves to connect with people. If you like his work, Say Hi to him.

--

--

Asim Zahid

I can brew up algorithms with a pinch of math, an ounce of Python and piles of data to power your business applications.