Machine Learning - Supervised Learning - Decision Tree Tutorial
Hyperparameter – max_depth=None , if max_depth is none, there is chances of overfitting, if it is 1 , there is chances of underfitting. It should be optimum.
Criterion:- Gini(default), entropy
Min_sample_split : to stop further splitting, for example, min_sample_split= 100. The value will not get split after if sample is less than 100. High value prevent chances of overfitting. low value prevent chances of underfitting. keep it optimum
Max_features: restrict the no. of feature. Mostly used for high dimensional data.
Max_leaf_node - restrict the no. of leaf node.
Min_Impurity_Decrease – suppose Min_Impurity_Decrease=0.01, if the information gain on splitting is 0.01 then only split,otherwise don’t split.
- Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome.
- In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not contain any further branches.
- The decisions or the test are performed on the basis of features of the given dataset.
- In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree algorithm. Which means it can also be applied to regression problem.
Advantages of the Decision Tree
- It is simple to understand as it follows the same process which a human follow while making any decision in real-life.
- It can be very useful for solving decision-related problems.
- It helps to think about all the possible outcomes for a problem.
- There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree
- The decision tree contains lots of layers, which makes it complex.
- It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
- For more class labels, the computational complexity of the decision tree may increase