What is a Decision Tree?
Decisions trees can be explained as a type of supervised learning algorithm in data mining. They are widely used both as a Classification and Regression model in data analysis. In supervised learning, target results are already known. The types of data present for analysis can be both numerical such as temperature, age, etc. and categorical in nature, for example, marital status, gender, etc.
How does a Decision Tree work step by step
The different steps of decision trees can be described as follows:
- Root node is the base of the decision trees
- The conversion of nodes into sub-nodes is done by the process of Splitting.
- The new nodes formed as a result of splitting are called as Decision Nodes
- After iterative splitting, when the nodes cannot be further split, they are called as leaf nodes. Leaf nodes represent the possible outcomes from the Decision Trees
- After obtaining the possible leaf nodes, the less important nodes can be removed ad this process is called as Pruning
- Branch is simply a subsection of decision trees and may contain one or more nodes
- The attributes employed for process of splitting include Information Gain, gain Ratio and Gini Index, etc.
Advantage of Decision Tree learning
The main advantage of Decision Trees is that they do not require any domain knowledge. The data processed is intuitive as decision trees mimic human mind. The decision trees can handle multidimensional data and can process complex data and queries with ease.
Issues in Decision Tree learning
In case of a dominant class however, the decision trees may provide biased results and in other cases where data is complex, it may result in overfitted trees. There are a number of algorithms which can be employed while building decision trees such as ID3, CART, etc.
Decision tree vs Discriminant
Decision trees and Discriminant are both types of classification models. When multivariate data is far from the normal data, classification trees serve better than LDA as mentioned in the paper Classification Trees as an Alternative to Linear Discriminant Analysis. the most common type of discriminant analysis. In the case of normal datasets with fewer variations, DAs may serve better. BFOS, a kind of decision tree algorithm does a better job in case of data sets that have incomplete observations instead of the need for data imputation.
Applications of decision trees include their use in the assessment of earlier revenue information in corporations. It is also used in marketing finance strategies to find a target audience. In the case of banks and loan providers, it is useful to see the likely defaulters from the lending.