If you are looking for a book to help you understand how the machine learning algorithms "Random Forest" and "Decision Trees" work behind the scenes, then this is a good book for you. Those two algorithms are commonly used in a variety of applications including big data analysis for industry and data analysis competitions like you would find on Kaggle.
This book explains how Decision Trees work and how they can be combined into a Random Forest to reduce many of the common problems with decision trees, such as overfitting the training data.
Equations are great for really understanding every last detail of an algorithm. But to get a basic idea of how something works, in a way that will stick with you 6 months later, nothing beats pictures. This book contains several dozen images which detail things such as how a decision tree picks what splits it will make, how a decision tree can over fit its data, and how multiple decision trees can be combined to form a random forest.
Most books, and other information on machine learning, that I have seen fall into one of two categories, they are either textbooks that explain an algorithm in a way similar to "And then the algorithm optimizes this loss function" or they focus entirely on how to set up code to use the algorithm and how to tune the parameters.
This book takes a different approach that is based on providing simple examples of how Decision Trees and Random Forests work, and building on those examples step by step to encompass the more complicated parts of the algorithms. The actual equations behind decision trees and random forests get explained by breaking them down and showing what each part of the equation does, and how it affects the examples in question.
Some topics in machine learning don't lend themselves to equations in an Excel table. Things like error checking or complicated conditionals are hard to replicate outside of code. However some topics work quite well in a spreadsheet. Topics such as entropy and information gain, which is how a decision tree picks its splits, can be easily calculated in a spreadsheet. The spreadsheet used to generate many of the examples in this book is available for free download, as are all of the Python scripts that ran the Random Forests & Decision Trees in this book and generated many of the plots and images.
If you are someone who learns by playing with the code, and editing the data or equations to see what changes, then use those resources along with the book for a deeper understanding.
The topics covered in this book are
If you want to know more about how these machine learning algorithms work, but don't need to reinvent them, this is a good book for you