Fact-checked by Grok 2 weeks ago

Decision tree

A decision tree is a graphical representation of a procedure for classifying or evaluating an item of interest by recursively partitioning the input space into regions based on feature values, with each path from the root to a leaf representing a classification rule.^[1] In machine learning, decision tree learning is a supervised method for approximating discrete-valued or continuous target functions, where the learned function is represented by a tree structure that maps observations to conclusions about the target's value.^[2] These models are constructed top-down, starting from a root node that selects an optimal feature for splitting the dataset, followed by branches for each possible feature value and recursive application to subsets until terminal leaves predict outcomes.^[3] Decision trees gained prominence in machine learning through seminal algorithms like ID3 (Iterative Dichotomiser 3), introduced by J. Ross Quinlan in 1986, which uses information gain to build trees for discrete-valued classification by selecting attributes that maximize entropy reduction.^[4] Shortly before, CART (Classification and Regression Trees), developed by Leo Breiman, Jerome Friedman, Richard Olshen, and Charles Stone in 1984, extended the approach to both classification and regression tasks, employing criteria such as Gini impurity for splits and enabling binary tree structures for continuous outputs via squared error minimization.^[5] Subsequent variants, including C4.5 (an evolution of ID3 handling continuous attributes and missing values) and ensemble methods like random forests, have addressed limitations such as sensitivity to small data changes.^[6] Key advantages of decision trees include their high interpretability, as the tree structure visually mimics human decision-making processes, and their ability to handle both categorical and numerical data without requiring feature scaling or normalization.^[7] They are robust to noisy data and can capture nonlinear relationships and interactions among features.^[2] However, disadvantages include a tendency toward overfitting, especially with deep trees, and instability, where minor data perturbations can lead to significantly different structures; these issues are often mitigated through pruning or bagging.^[8] Decision trees find broad applications across domains, including medical diagnosis (e.g., predicting protein function or splice sites from genomic data), financial risk assessment, and business analytics for customer segmentation.^[9] In bioinformatics, they classify biological sequences; in operations research, they support decision analysis under uncertainty.^[10] Their simplicity and explainability make them particularly valuable in regulated fields like healthcare and finance, where model transparency is essential.^[11]

Fundamentals

Definition and Purpose

A decision tree is a graphical, flowchart-like model used in decision analysis to represent sequential decision-making processes under uncertainty. It structures complex problems by depicting a series of decision points, probabilistic chance events, and their potential outcomes in a tree format, where each path from root to leaf corresponds to a possible sequence of events leading to a final result, such as a payoff, cost, or utility value. This approach originated in operations research as a tool for formalizing one-person decision problems, tracing its conceptual roots to game theory frameworks like those in von Neumann and Morgenstern's work on extensive-form games.^[12]^[13] The primary purpose of a decision tree is to break down intricate decisions into manageable components, enabling the systematic incorporation of chance events, associated probabilities, resource costs, and utilities to identify and select optimal strategies. By visualizing alternatives and their consequences, it facilitates the evaluation of expected values—intuitively understood as probability-weighted averages of outcomes—without requiring advanced mathematics, assuming only a basic grasp of likelihoods and averaging. This contrasts with non-sequential models like payoff matrices, which handle simultaneous or single-stage choices but cannot easily capture dependencies in multi-step scenarios where later decisions depend on prior results.^[14]^[15] In practice, decision trees distinguish between controlled elements, such as decision nodes representing choices available to the decision-maker (often depicted as squares), and uncertain elements, such as chance nodes representing probabilistic outcomes (typically shown as circles), with terminal leaves indicating final results. They have been particularly valuable in operations research applications like medical diagnosis, where tests reveal probabilistic health states, or investment choices, where market fluctuations introduce uncertainty, allowing analysts to assess strategies like whether to pursue further information before committing resources.^[12]^[16]

Historical Development

The concept of decision trees traces its roots to early probability trees developed in the 17th century for representing conditional probabilities in games of chance. Bayes' theorem (1763) provided a foundation for inverse inference, often illustrated today using tree-like structures. Early applications in operations research include William Belson's 1959 paper on decision tree methods for biological classification and prediction, followed by the AID (Automatic Interaction Detection) algorithm in 1963 by J.A. Sonquist and J.N. Morgan for multivariate analysis.^[17] Decision trees were formalized in the 1960s within the emerging field of operations research and decision analysis. Ronald A. Howard coined the term "decision analysis" in his 1966 paper, introducing systematic approaches to represent decisions under uncertainty, including graphical models like influence diagrams that evolved into modern tree structures by the mid-1970s.^[18] Howard Raiffa's 1968 book, Decision Analysis: Selected Readings, further established foundational techniques, emphasizing decision trees for evaluating alternatives in complex scenarios such as resource allocation.^[19] In the 1970s, decision trees gained traction in business applications, particularly in high-stakes industries like petroleum exploration, where Paul D. Newendorp's 1975 book Decision Analysis for Petroleum Exploration demonstrated their use in quantifying risks for drilling decisions and investment sequencing.^[20] Concurrently, the first algorithmic classification tree, THeta Automatic Interaction Detection (THAID), was proposed by Robert C. Messenger and Lewis Mandell in 1972, enabling automated splitting of data based on probabilistic measures for predictive modeling in multivariate analysis.^[21] The 1980s marked integration into expert systems and machine learning, with J. Ross Quinlan's ID3 algorithm (1986) synthesizing trees from training examples to emulate human expertise in rule-based domains like medical diagnosis.^[22] Leo Breiman and colleagues' 1984 monograph Classification and Regression Trees (CART) introduced binary splitting criteria for both classification and regression, broadening applicability beyond discrete decisions.^[23] By the 1990s, decision trees exploded in popularity within machine learning and data mining, transitioning from manual sketching to computational implementations in tools like C4.5 (an ID3 successor by Quinlan in 1993), which handled continuous attributes and pruning for generalization. This era saw trees as core components in ensemble methods, reflecting their shift from decision analysis to predictive analytics across fields.^[22]

Basic Components

Decision trees in decision analysis are composed of fundamental structural elements that model sequential choices under uncertainty. The core components include nodes, branches, and terminal outcomes, forming a hierarchical structure that facilitates systematic evaluation of alternatives. These elements adhere to standard notation established in decision theory, enabling clear representation of problems involving decisions and probabilistic events. The primary node types are decision nodes, chance nodes, and terminal nodes. Decision nodes, conventionally represented by squares, denote points where the decision-maker selects among mutually exclusive actions under their control. Chance nodes, illustrated as circles, capture uncertain events beyond the decision-maker's influence, with each outgoing path associated with a specific probability. Terminal nodes, also known as leaves, mark the endpoints of decision paths and are assigned payoff values or utilities that quantify the final outcomes, such as monetary gains, costs, or preference measures. Branches serve as directed edges connecting nodes, directing the flow from the root toward the leaves in a chronological sequence. For decision nodes, branches are labeled with the available action options; for chance nodes, they bear probability labels that collectively sum to 1, reflecting the exhaustive and mutually exclusive nature of the possible states. This branching structure ensures that the tree models all relevant pathways without redundancy. Decision trees are formalized as acyclic directed graphs, with the root node initiating the analysis and paths progressing unidirectionally to avoid loops, thereby maintaining a logical temporal order. At terminal nodes, payoff values provide the basis for evaluation, often expressed in expected monetary value or utility terms to align with the decision-maker's objectives. To derive optimal decisions, fold-back calculations—conceptually equivalent to backward induction—are applied, beginning at the terminal nodes and propagating expected values rearward through chance nodes (via probability-weighted averages) and decision nodes (via selection of the maximum or minimum expected value, depending on the goal). This process, originating from foundational statistical decision theory, computes the overall expected value at the root, guiding strategy selection. In practice, tree depth is constrained to prevent combinatorial explosion and maintain interpretability, as excessive levels can render the model intractable.

Representation and Visualization

Nodes, Branches, and Flowcharts

Decision trees are visually represented using standard flowchart conventions to illustrate the structure of decisions and outcomes in a clear, hierarchical manner. These diagrams employ specific symbols to denote different elements: decision nodes, where choices are made, are typically depicted as squares; chance nodes, representing uncertain events, are shown as circles; and terminal nodes, indicating final outcomes, are often rendered as triangles. Arrows connect these nodes to form branches, providing a directional flow that mirrors the decision-making process.^[24]^[25] Layout principles for decision tree flowcharts emphasize clarity and readability, with common orientations including top-down, where the root node starts at the top and branches extend downward, or left-to-right, beginning from the left side and progressing horizontally. Related subtrees are grouped closely to maintain logical proximity, reducing visual clutter and aiding comprehension of complex models. Software tools such as Lucidchart and TreeAge facilitate rendering by automating node placement, ensuring consistent spacing and alignment while supporting export to various formats.^[26]^[27]^[28]^[29] These conventions trace back to the American National Standards Institute (ANSI) flowchart symbols standardized in the 1970s under ANSI X3.5-1970, which provided foundational shapes for process and decision representation later adapted for decision trees. In digital implementations, large decision trees often incorporate interactive features like zooming and panning to navigate deep structures without losing detail. Unlike UML activity diagrams, which include advanced elements such as concurrency and swimlanes for software modeling, decision tree flowcharts focus on sequential branching without these extensions, prioritizing simplicity for decision analysis.^[30]^[31]^[32] Branch labeling distinguishes between deterministic and probabilistic elements: branches from decision nodes carry labels for controlled choices, such as "Yes" or "No," while those from chance nodes include probabilities, for example, 0.3 or 70%, to quantify uncertainty. This labeling ensures the diagram conveys both deliberate actions and stochastic outcomes effectively. Note that while these conventions are standard in decision analysis, machine learning decision trees are often visualized simply as hierarchical structures with split nodes and leaf predictions, without chance nodes.^[33]^[34]

Decision Rules and Symbols

Decision rules in decision trees consist of conditional statements that guide the progression from one node to another, typically expressed as if-then propositions based on attribute thresholds or categorical tests. For instance, a rule at an internal node might state "If revenue exceeds $500,000, proceed to the left branch; otherwise, proceed to the right," enabling systematic evaluation of alternatives by partitioning the decision space. These rules are derived either from domain expertise, where subject matter experts articulate logical conditions informed by practical knowledge, or from data-driven methods that identify optimal splits to minimize impurity or maximize separation in training datasets.^[35]^[36] In decision analysis contexts, such rules often incorporate utilities to account for preferences under uncertainty, aligning with the von Neumann-Morgenstern framework where expected utility functions quantify outcomes by assigning numerical values to consequences based on rational axioms of choice. A common pitfall in formulating these rules, particularly in data-derived trees, is overfitting, where overly specific conditions capture noise rather than underlying patterns, leading to poor generalization on unseen cases.^[36] Symbols standardize the representation of these rules within decision trees, facilitating clear visualization of conditions and outcomes. Decision nodes, denoting points where rules are applied, are conventionally depicted as squares, while chance nodes—representing probabilistic branches—are shown as circles; terminal nodes, indicating final outcomes, use triangles. Conditions within rules may employ ovals to denote evaluation points, with branches as solid lines for positive or primary paths and dashed lines for negative or alternative outcomes, enhancing readability in flowchart-style diagrams. Advanced variants integrate Boolean logic for crisp if-then rules, where conjunctions and disjunctions (e.g., AND/OR gates) structure multi-attribute tests, or fuzzy rules in uncertain environments, allowing partial memberships via linguistic variables like "high revenue" rather than strict thresholds to handle imprecision.^[24] To evaluate rules, forward traversal simulates decision paths by starting at the root and applying conditions sequentially to reach a leaf, useful for predicting outcomes or testing scenarios. Conversely, backward traversal, or induction, optimizes rules by computing expected values from terminal nodes upward, folding back utilities to identify the best strategy at each decision point without exhaustive enumeration.^[37]^[38]

Influence Diagrams

Influence diagrams provide a compact graphical representation of decision problems under uncertainty, serving as an alternative to expansive decision trees by focusing on variables, dependencies, and objectives rather than enumerating every possible outcome branch. Introduced by Ronald A. Howard and Jerry E. Matheson in 1981, they model complex scenarios involving decisions, uncertainties, and values through a directed acyclic graph that captures probabilistic relationships and informational flows without the combinatorial explosion inherent in full tree structures. This approach is particularly useful for initial problem structuring in decision analysis, where the emphasis is on identifying key influences before detailed expansion.^[39] The core components of an influence diagram include three primary node types and two categories of arcs. Chance nodes, typically depicted as ovals or circles, represent uncertain variables or random events with associated probability distributions. Decision nodes, shown as rectangles, denote choices available to the decision-maker, where the optimal policy is determined during evaluation. Value or utility nodes, often rendered as hexagons or diamonds, quantify the objectives or preferences, aggregating utilities based on preceding variables. Arcs in the diagram are directional: functional or conditional arcs connect chance nodes to indicate probabilistic dependencies, while informational arcs point to decision nodes to specify the sequence of information availability, ensuring decisions reflect known prior states. Additionally, relevance arcs link decisions and chances to value nodes, clarifying how elements contribute to the overall evaluation.^[40] To perform computations, influence diagrams are converted into equivalent decision trees by expanding chance nodes into branches corresponding to their possible states, ordered according to the informational arcs to preserve decision timing. This process unfolds the compact graph into a tree that can be solved via backward induction for optimal strategies and expected utilities, though the diagram itself supports direct evaluation algorithms like variable elimination for efficiency. The conversion highlights the scalability advantages of influence diagrams, as they maintain clarity in models with dozens of variables—far beyond what decision trees can handle without excessive visual complexity—making them ideal for preliminary modeling in fields like operations research and risk analysis.^[41] Influence diagrams also integrate seamlessly with Bayesian networks, where the chance node subgraph functions as a probabilistic model, extending belief propagation techniques to include decision optimization.^[42] Software tools such as GeNIe, developed by BayesFusion, facilitate the construction, evaluation, and sensitivity analysis of influence diagrams, allowing users to build models interactively and convert them to trees or junction tree representations for solving large-scale problems. By abstracting away repetitive branches, influence diagrams significantly reduce visual clutter compared to decision trees, enabling better comprehension of structural dependencies in intricate decision environments.^[43]

Construction Process

Step-by-Step Building Algorithm

The construction of a decision tree in machine learning follows a recursive, top-down partitioning algorithm that builds the tree from a training dataset by repeatedly selecting the best feature to split the data at each node. This greedy approach aims to create homogeneous subsets with respect to the target variable, continuing until stopping criteria are met, such as pure leaves (all samples same class), no remaining features, or a maximum depth to prevent overfitting. The process is data-driven, with splits determined by impurity reduction measures, and is the basis for algorithms like ID3 and CART.^[44]^[45] The standard steps for building the tree are as follows:

Initialize the root node: Start with the full training dataset at the root node.
Check stopping criteria: If all samples in the node belong to the same class, have insufficient samples, no features left, or reach maximum depth, classify the node as a leaf with the majority class (for classification) or mean value (for regression).
Select the best splitting feature: Evaluate each candidate feature using a splitting criterion (e.g., information gain or Gini impurity) to find the one that most reduces impurity in the resulting subsets. For continuous features, test possible thresholds (e.g., midpoints between sorted values).
Split the node: Create child nodes for each possible value of the selected feature (multi-way for categorical; binary for continuous thresholds). Partition the dataset into subsets corresponding to each child.
Recurse on subsets: Apply the algorithm recursively to each child node with its subset of data.
Assign predictions to leaves: At terminal leaves, assign the predicted class or value based on the training samples in that subset.

This process results in a tree where paths from root to leaves represent classification or regression rules. For implementation, libraries like scikit-learn automate this, handling large datasets efficiently. The algorithm's recursive nature ensures exhaustive exploration of splits while the greedy choice at each step makes it computationally feasible, though suboptimal globally.^[44]^[45] A high-level recursive procedure outlines the core logic:

function BuildTree(node, data, features, depth):
    if stopping_criteria_met(data, features, depth):  // e.g., pure, no features, max_depth
        node = create_leaf(data)  // Majority class or [mean](/page/Mean)
        return node
    
    best_feature, best_threshold = select_best_split(data, features)  // Using [gain](/page/Gain)/Gini
    
    node.feature = best_feature
    node.threshold = best_threshold if continuous else None
    
    for each subset in split_data(data, best_feature, best_threshold):
        child = create_child_node()
        node.add_child(child, subset_label)
        BuildTree(child, subset, remaining_features, depth + 1)
    
    return node
function BuildTree(node, data, features, depth):
    if stopping_criteria_met(data, features, depth):  // e.g., pure, no features, max_depth
        node = create_leaf(data)  // Majority class or [mean](/page/Mean)
        return node
    
    best_feature, best_threshold = select_best_split(data, features)  // Using [gain](/page/Gain)/Gini
    
    node.feature = best_feature
    node.threshold = best_threshold if continuous else None
    
    for each subset in split_data(data, best_feature, best_threshold):
        child = create_child_node()
        node.add_child(child, subset_label)
        BuildTree(child, subset, remaining_features, depth + 1)
    
    return node

This pseudocode reflects the data-driven selection of splits, adapting features (remove used if no reuse) and handling both categorical and continuous data. Decision trees in machine learning have evolved since the 1980s, with implementations in tools like scikit-learn for scalable construction on complex datasets.^[44]^[45]

Node-Splitting Criteria

Node-splitting criteria are essential in decision tree construction, as they determine the attribute and split point at each internal node that best partitions the data to improve class separability. These criteria evaluate potential splits by measuring reductions in impurity, uncertainty, or statistical dependence, guiding the greedy selection of the most informative feature. Common approaches include information-theoretic measures, impurity-based indices, and statistical tests, each suited to different data characteristics and objectives.^[4] One prominent criterion is information gain, an entropy-based measure introduced in the ID3 algorithm for machine learning applications. Entropy quantifies the uncertainty in a dataset's class distribution, defined as
H(S) = -\sum_{i=1}^{c} p_i \log_2 p_i,
where S is the dataset, c is the number of classes, and p_i is the proportion of instances belonging to class i. Information gain for an attribute A is then calculated as the entropy of the parent node minus the weighted average entropy of the child nodes:
\text{Gain}(A) = H(S) - \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} H(S_v),
where S_v is the subset of S for value v of A. At each node, the algorithm selects the attribute yielding the maximum gain, favoring splits that most effectively reduce predictive uncertainty. This approach, originally developed for categorical attributes in ID3, has been extended to handle continuous attributes.^[4] The Gini index serves as an impurity reduction criterion, particularly in classification and regression trees (CART), where it measures the probability of misclassifying a randomly chosen instance if labeled according to the node's distribution. The Gini impurity for a node is
\text{Gini}(S) = 1 - \sum_{i=1}^{c} p_i^2,
with the split selected to minimize the weighted Gini impurity of the children:
\text{Gini}_{\text{split}}(A) = \sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \text{Gini}(S_v).
Unlike entropy, the Gini index is computationally simpler and avoids logarithmic operations, making it efficient for large datasets; it tends to produce slightly more balanced trees. Developed for both classification and regression tasks, this criterion prioritizes splits that homogenize class labels within subsets. For statistical validation of splits, the chi-square test assesses independence between an attribute and the target variable, as employed in the CHAID algorithm for decision analysis. The chi-square statistic compares observed and expected frequencies in a contingency table:
\chi^2 = \sum \frac{(O_{ij} - E_{ij})^2}{E_{ij}},
where O_{ij} and E_{ij} are observed and expected counts for row i and column j. Splits are chosen if the p-value falls below a significance threshold (e.g., 0.05), indicating non-independence and thus predictive value; multi-way splits are allowed, with merging of insignificant categories. This criterion, rooted in categorical data exploration, provides a rigorous test for attribute relevance.^[46] The selection process employs a greedy strategy, evaluating all candidate attributes at each node and choosing the split with the highest criterion value to maximize immediate purity gain. For categorical attributes, splits occur directly on values, though information gain can bias toward multi-valued features due to more partitions; this is mitigated by the gain ratio, defined as
\text{Gain Ratio}(A) = \frac{\text{Gain}(A)}{\text{SplitInfo}(A)},
where SplitInfo measures the entropy of the split proportions:
\text{SplitInfo}(A) = -\sum_{v \in \text{Values}(A)} \frac{|S_v|}{|S|} \log_2 \frac{|S_v|}{|S|}.
Gain ratio normalizes for attribute arity, promoting balanced trees and was proposed alongside ID3 to address this bias. For continuous attributes, values are sorted, and potential thresholds (e.g., midpoints between consecutive instances) are tested to binarize the feature, with the optimal threshold selected based on the criterion's maximum value. This extension, refined in successors like C4.5, enables handling of mixed data types without prior discretization.^[4]

Example Construction and Analysis

A classic illustration of decision tree construction in machine learning is the "play tennis" dataset, used to predict whether a person will play tennis based on weather attributes. The dataset consists of 14 instances with features: Outlook (Sunny, Overcast, Rain), Temperature (Hot, Mild, Cool), Humidity (High, Normal), Wind (Weak, Strong), and target Play (Yes/No: 9 Yes, 5 No). This example demonstrates the ID3 algorithm using information gain.^[44] The decision tree is constructed step by step:

Root node: Compute entropy of full dataset: H(\text{Play}) = -(9/14 \log_2 9/14 + 5/14 \log_2 5/14) \approx 0.940.
Evaluate attributes:
Overcast branch: All 4 Yes → Leaf: Play=Yes.
Sunny branch (5 instances, entropy ≈ 0.971): Evaluate remaining attributes.
Rain branch (5 instances, entropy ≈ 0.971): Wind gain ≈ 0.971; Weak (3: 3 Yes, 0 No → Yes), Strong (2: 0 Yes, 2 No → No).

The resulting tree classifies 14/14 training instances correctly (100% accuracy on this small dataset), demonstrating how splits progressively purify subsets. In practice, evaluation on unseen data would assess generalization, with deeper trees risking overfitting—mitigated by pruning (covered in Optimization Techniques). This example highlights the interpretability of decision rules, e.g., "If Outlook=Overcast, then Play=Yes."^[44]

Optimization Techniques

Pruning and Tree Depth Management

Decision tree pruning and depth management are essential techniques for controlling model complexity, mitigating overfitting, and improving generalization performance by limiting tree growth or simplifying an overfitted tree after construction.^[45] Pre-pruning methods halt tree expansion during the building process based on predefined criteria, such as a minimum number of samples required to split a node (min_samples_split) or a maximum allowable tree depth (max_depth). For instance, setting min_samples_split to 10 ensures that internal nodes have at least 10 samples before further splitting, preventing the creation of overly specific branches that capture noise in the training data.^[45] Similarly, enforcing a max_depth of 5 to 10 levels is a common practice to balance expressiveness and simplicity, as deeper trees risk exponential growth in branches—up to $2^d leaves at depth d—leading to high variance and poor out-of-sample performance.^[45] These parameters reduce overfitting by avoiding the fitting of spurious patterns, though they may underfit if set too restrictively.^[47] Post-pruning techniques, applied after growing a full tree, involve systematically removing subtrees that contribute little to predictive accuracy. One seminal approach is reduced error pruning (REP), which evaluates each non-leaf node bottom-up using a validation set: a subtree is replaced by a leaf node if the resulting error rate on the validation data does not exceed that of the full subtree, effectively pruning if the child error is greater than or equal to the parent error plus a small threshold to account for variance. Introduced by Quinlan in 1987, REP prioritizes simplicity while preserving accuracy and has been widely adopted for its straightforward error-based heuristic. A more formal post-pruning method is cost-complexity pruning, developed in the CART framework by Breiman et al. (1984), which trades off error rate against tree size via the cost-complexity measure:

R_{\alpha}(T) = R(T) + \alpha \cdot |\tilde{T}(t)|

Here, R(T) is the tree's misclassification error on the training data, |\tilde{T}(t)| is the number of terminal (leaf) nodes, and \alpha \geq 0 is a complexity parameter that penalizes larger trees. For each \alpha, the smallest subtree T \in \{T_0, T_1, \dots, T_m\} minimizing R_{\alpha}(T) is selected, where T_0 is the full tree and T_m is the root alone; \alpha is tuned via cross-validation to find the optimal balance. This sequence of subtrees allows systematic exploration of the complexity-accuracy trade-off, preventing the exponential proliferation of branches while enhancing interpretability and computational efficiency.^[48] Recent research as of 2025 has explored privacy-preserving pruning strategies that reduce exposure of sensitive information during tree simplification, and advanced algorithms analyzing the computational complexity of optimal pruning operations, such as subtree replacement and raising.^[49]^[50]

Advanced Splitting Functions

Advanced splitting functions in decision trees extend basic criteria to address limitations such as bias toward attributes with many values, complex decision boundaries, and data irregularities like missing values or class imbalances. These methods enhance tree quality by selecting more robust splits, leading to improved generalization and predictive performance.^[51] One prominent advanced criterion is the gain ratio, which normalizes information gain to penalize splits that result in highly uneven partitions, thereby reducing bias toward attributes with numerous outcomes. Introduced in the C4.5 algorithm, the gain ratio is computed as the information gain divided by the split information value. The split information measures the entropy of the partition proportions and is given by:

\text{SplitInfo}(A) = -\sum_{i=1}^{c} \frac{|S_i|}{|S|} \log_2 \left( \frac{|S_i|}{|S|} \right)

where |S| is the total number of samples, c is the number of subsets, and |S_i| is the size of the i-th subset. Thus, the gain ratio for an attribute A is:

\text{GainRatio}(A) = \frac{\text{InformationGain}(A)}{\text{SplitInfo}(A)}

This approach favors splits that provide substantial purity gains relative to their partitioning complexity, as detailed in Quinlan's foundational work on C4.5.^[52]^[53] Multivariate splits allow nodes to partition data using linear combinations of multiple attributes, enabling oblique decision boundaries that can capture more nuanced relationships than single-attribute tests. These splits are particularly useful in datasets where interactions between features are critical, such as in high-dimensional spaces, and can result in shallower trees with comparable or superior accuracy to univariate methods. The selection of coefficients for the linear combination often involves optimization techniques like sequential feature selection to maximize a purity measure. Seminal research by Brodley and Utgoff demonstrated that multivariate decision trees can reduce tree size while maintaining performance across various domains.^[51] To handle missing values without discarding samples, surrogate splits employ backup attributes that closely mimic the ranking or partitioning behavior of the primary split variable. When a value is missing for the best splitter, the algorithm selects the surrogate that best preserves the class distribution ordering, allowing the tree to route instances effectively. This technique, originating from the CART framework, ensures robustness in real-world datasets with incomplete information, where ignoring missing data could otherwise degrade model utility. For imbalanced datasets or scenarios with unequal misclassification costs, weighted splits and cost-sensitive learning adjust the splitting criteria to prioritize minority classes or high-cost errors. Weighted splits incorporate class frequencies or user-defined weights into the impurity calculation, effectively resampling during tree construction to balance influence. Cost-sensitive variants extend this by minimizing an expected cost matrix in the gain computation, ensuring splits account for asymmetric penalties. These methods are vital in applications like fraud detection, where false negatives carry disproportionate consequences, and have been shown to enhance minority class recall without severely impacting overall accuracy.^[54]^[55] The gain ratio criterion in C4.5 has been widely adopted and contributes to better handling of noisy data by mitigating attribute bias, often yielding more stable trees in empirical evaluations. In finance, advanced splitting functions like these support risk assessment models, such as credit scoring, by enabling precise partitioning of heterogeneous financial indicators to predict default probabilities.^[52]^[56]

Other Refinement Methods

Ensemble methods represent a class of refinement techniques that combine multiple decision trees to enhance overall performance, reducing variance and bias compared to individual trees. Bagging, or bootstrap aggregating, involves training multiple instances of the same decision tree algorithm on different bootstrap samples of the training data and aggregating their predictions, typically by majority vote for classification or averaging for regression. This approach was introduced by Breiman in 1996 and is particularly effective for unstable learners like decision trees, leading to improved generalization.^[57] Boosting is another ensemble strategy that builds trees sequentially, with each subsequent tree focusing on correcting the errors of the previous ones by assigning higher weights to misclassified instances. A seminal example is AdaBoost, developed by Freund and Schapire in 1996, which adaptively boosts weak classifiers into a strong one through iterative reweighting.^[58] Boosting methods often yield higher accuracy than bagging but can be more sensitive to outliers and noise. Random Forests, proposed by Breiman in 2001, extend bagging by introducing additional randomness in the tree construction process, such as selecting a random subset of features at each split. These ensembles typically combine hundreds of trees, resulting in substantial accuracy improvements over single decision trees—often in the range of 10-20% on benchmark datasets—while maintaining computational efficiency with a per-tree complexity of O(n log n), where n is the number of samples.^[59] Random Forests have been widely adopted in competitive machine learning environments, including Kaggle competitions, due to their robustness and out-of-the-box performance.^[60] Hybrid models integrate decision trees with other paradigms to leverage complementary strengths, such as combining trees with neural networks for deeper representational power. For instance, Deep Neural Decision Forests replace traditional splitting functions with soft, differentiable alternatives implemented via neural networks, enabling end-to-end training and improved scalability on large datasets.^[61] Another hybrid approach involves rule extraction from decision trees, where tree paths are converted into interpretable if-then rules, as facilitated by algorithms like C4.5, to bridge the gap between tree-based predictions and symbolic reasoning systems. Recent developments as of 2025 include hybrid ensembles integrating decision trees with transformers or other advanced models for improved performance in specific domains like academic prediction and air quality forecasting.^[62]^[63] These refinement methods are typically applied post-optimization, integrating ensembles or hybrids after initial tree construction to address challenges like non-independent and identically distributed (non-i.i.d.) data, thereby enhancing robustness in real-world scenarios where assumptions of standard training data do not hold.

Evaluation and Assessment

Performance Metrics

Decision trees are evaluated using a variety of quantitative metrics that assess their predictive quality, complexity, and decision-making effectiveness, depending on whether the context is machine learning classification/regression or decision analysis under uncertainty. In machine learning applications, core performance metrics include accuracy, precision, recall, and the F1-score, which are derived from the confusion matrix—a tabular representation of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for binary classification tasks.^[64]^[65] For regression tasks, common metrics include mean squared error (MSE), which measures the average squared difference between predicted and actual values: \text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 and mean absolute error (MAE): \text{MAE} = \frac{1}{n} \sum_{i=1}^{n} |y_i - \hat{y}_i| These quantify prediction accuracy, with lower values indicating better performance.^[5] Accuracy measures the overall correctness of predictions as the proportion of correctly classified instances:
\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}
This metric is straightforward but can be misleading for imbalanced datasets, where precision (TP / (TP + FP)) and recall (TP / (TP + FN)) provide better insights into positive class performance, particularly when false positives or false negatives carry different costs.^[64]^[65] The F1-score, as the harmonic mean of precision and recall, balances these for imbalanced classes:
F1 = 2 \times \frac{\text{precision} \times \text{recall}}{\text{precision} + \text{recall}} Tree-specific metrics focus on internal structure and quality. Leaf purity quantifies the homogeneity of classifications at terminal nodes, often expressed as the percentage of instances correctly assigned to the majority class within a leaf, with higher purity indicating better separation.^[64] Tree size, measured by the number of nodes or leaves, serves as a proxy for model complexity, where smaller trees reduce overfitting risk while maintaining predictive power.^[64] During construction, information gain acts as a build-time metric to evaluate splits by quantifying the reduction in entropy (or impurity) after partitioning data on a feature.^[4] In decision analysis contexts, performance emphasizes economic outcomes under uncertainty. Expected monetary value (EMV) at decision nodes is computed via backward induction, representing the weighted average payoff across chance events, with variance of EMV measuring outcome uncertainty and risk.^[66] Regret assesses suboptimality as the difference between the maximum achievable payoff and the selected decision's payoff for each possible state of nature; the minimax regret criterion minimizes this maximum regret to guide conservative choices.^[67]^[66]

Metric	Description	Use Case
Confusion Matrix	2x2 table for binary outcomes (TP, TN, FP, FN)	Foundation for accuracy, precision, recall in classification
Leaf Purity	% majority class in leaf nodes	Assesses split quality and homogeneity
Tree Size	# nodes or leaves	Gauges interpretability and overfitting risk

Validation Approaches

Validation approaches for decision trees assess the model's generalizability to unseen data, enabling detection of overfitting and tuning of hyperparameters such as maximum tree depth. The holdout method, a basic validation technique, partitions the dataset into separate training and test sets, commonly allocating 70–80% of the data for training and the rest for evaluation, to obtain an initial performance estimate.^[68] This approach is straightforward but can yield variable results depending on the specific split. K-fold cross-validation improves reliability by dividing the data into k non-overlapping folds, iteratively training the tree on k-1 folds while validating on the remaining fold, then averaging the performance across all iterations.^[69] Ten-fold cross-validation emerged as a standard practice in the 1990s, balancing low bias and variance for accuracy estimation and model selection on real-world datasets similar to those used in decision tree applications.^[69] Compared to holdout, it yields reduced bias by ensuring every data point contributes to both training and testing.^[70] For bagged decision trees, out-of-bag (OOB) error provides an efficient validation metric, leveraging bootstrap samples where approximately one-third of the data per tree remains unused for training and serves as an internal test set, yielding estimates comparable to k-fold cross-validation without additional data partitioning.^[57] Bootstrap sampling further supports validation by generating multiple tree variants to estimate prediction variance, highlighting instability in high-variance decision trees.^[57] These methods are applied after tree construction and optimization, comparing training versus validation errors to identify overfitting—evident when training error is substantially lower than validation error—and to refine hyperparameters like max_depth, which strongly influence tree complexity.^[71]

Advantages and Limitations

Key Benefits

Decision trees offer high interpretability, as their structure provides a clear, visual representation of the decision-making process through a series of if-then rules that can be easily traced from root to leaf nodes.^[72] This transparency contrasts with black-box models like neural networks, making decision trees particularly suitable for domains requiring explainable predictions, such as healthcare and finance, where stakeholders need to understand the rationale behind outcomes.^[72] A key aspect of their interpretability is the support for what-if analysis, allowing users to simulate hypothetical scenarios by altering input values and observing changes in predictions along specific paths in the tree.^[72] This feature facilitates exploratory decision-making without retraining the model, enhancing its utility in interactive applications. Decision trees exhibit versatility in handling diverse data types, including both categorical and numerical features, without requiring extensive preprocessing or assumptions about data distribution, such as normality. They also manage missing values effectively by incorporating surrogate splits or routing instances based on available data during both training and prediction phases.^[73] In terms of efficiency, decision trees achieve a training time complexity of O(m n \log n), where m is the number of features and n is the number of samples, enabling scalability to datasets with thousands of instances.^[72] Prediction is even faster at O(\log n), making them suitable for real-time applications. Additionally, the independent nature of node splits allows for easy parallelization, which accelerates computation on multi-core systems or distributed environments and improves performance on large-scale data.^[72]

Common Drawbacks

Decision trees are prone to overfitting, where the model captures noise in the training data rather than underlying patterns, leading to high variance and poor generalization to unseen data.^[74] This occurs because unrestricted tree growth allows the algorithm to create complex structures that memorize individual training examples, resulting in excellent training performance but degraded test accuracy.^[45] Pruning techniques can mitigate overfitting by simplifying the tree structure post-construction. Another significant drawback is the instability of decision trees, where minor perturbations in the training data can lead to substantially different tree structures and predictions.^[75] This sensitivity arises from the greedy splitting process, which amplifies small data variations into large structural changes. Additionally, decision trees exhibit bias toward dominant classes in imbalanced datasets, often prioritizing splits that favor the majority class and underrepresenting minorities.^[45] Ensemble methods, such as random forests, can address instability by aggregating multiple trees. Decision trees can grow exponentially in size with increasing depth, potentially leading to computationally expensive models with up to $2^d leaves for depth d, which complicates training and inference.^[76] They also perform poorly on datasets exhibiting linear relationships compared to linear regression models, as trees struggle to approximate smooth boundaries efficiently.^[77] Furthermore, decision trees are not well-suited for high-dimensional data, such as datasets with more than 100 features, due to increased overfitting risk and the curse of dimensionality, which dilutes split quality across sparse feature spaces.^[45] The greedy nature of common algorithms like CART and C4.5, which select locally optimal splits at each node, often results in globally suboptimal trees, as the approach does not guarantee an overall best structure.^[78] In very deep trees, interpretability diminishes, as the hierarchical path from root to leaf becomes overly intricate, making it difficult to trace decision rationales despite the model's inherent transparency in shallower forms.

Applications and Extensions

In Decision Analysis

In decision analysis, decision trees provide a structured framework for evaluating choices under uncertainty, particularly in operations research and management contexts such as investment decisions, medical treatment selections, and environmental policy development. For investment decisions, they model hierarchical relationships among behavioral, demographic, and financial variables to predict outcomes and inform capital allocation strategies, revealing that factors like investor attitudes and decision-making behaviors exert the strongest influence on returns.^[79] In medical treatment choices, decision trees incorporate probabilities of outcomes and patient-specific utilities to compare interventions; for example, in managing a compound ankle fracture with infection risk, they weigh immediate amputation (utility 0.70) against debridement and antibiotics (expected utility 0.773), favoring the latter to maximize quality-adjusted life years while accounting for risks like sepsis spread.^[80] For environmental policy, the U.S. Environmental Protection Agency (EPA) utilizes decision trees to guide site remediation, such as in phytoremediation assessments for contaminated brownfields, evaluating factors like contaminant depth, plant suitability, and hydraulic control to determine applicability for soil, groundwater, or sediment cleanup.^[81] Case studies illustrate these applications' impact. In pharmaceutical R&D, decision trees optimize clinical trial sequencing by mapping pathways for drug indications and calculating risk-adjusted expected net present value (eNPV); for an anti-inflammatory drug targeting asthma, inflammatory bowel disease (IBD), and lupus erythematosus, an IBD-first strategy yields the highest eNPV of $552 million with a 31% probability of multi-indication approval, outperforming alternatives and justifying up to $72 million in proof-of-concept study costs via expected value of sample information analysis.^[82] In supply chain management, decision trees can support vendor selection amid disruption risks. For example, mixed-integer programs that minimize costs while incorporating value-at-risk (VaR) and conditional VaR metrics have been formulated; numerical analyses with 7–14 suppliers and 50 orders demonstrate optimal portfolios balancing local and global disruptions based on price, quality, and reliability.^[83] Decision trees integrate with Monte Carlo simulation to evaluate thousands of scenarios, sampling from probability distributions to generate outcome distributions that enhance precision in project evaluations, such as comparing wastewater treatment alternatives with expected costs ranging from $1.024 billion to $1.120 billion over 500 trials.^[84] Widely adopted as a standard in consulting since the 1980s, they form part of influential risk management frameworks alongside scenario planning and game theory. They also enable value of information analysis, including the expected value of perfect information (EVPI), which quantifies the maximum worth of eliminating uncertainty:

\text{EVPI} = \text{EVPP} - \text{EVUU}

where EVPP is the expected value with perfect prediction (selecting the best outcome per scenario) and EVUU is the expected value under uncertainty (optimal strategy without added information); for instance, EVPI of $40,000 indicates the ceiling for information acquisition costs in a single-stage investment tree.^[85] In these domains, decision trees quantify trade-offs between costs, risks, and benefits—such as balancing remediation expenses against ecological gains—while facilitating group decisions through their visual, branching structure that promotes consensus on sequential choices.^[81]^[83]

In Machine Learning and Data Mining

In machine learning and data mining, decision trees serve as fundamental models for supervised learning, enabling both classification and regression tasks by recursively partitioning data based on feature splits to minimize impurity or error. These models facilitate automatic feature selection during the splitting process, where the most informative attributes are chosen at each node to maximize predictive power, making them effective for discovering hierarchical patterns in datasets. Seminal algorithms include ID3 for classification using information gain to handle discrete features, extended in C4.5 to support continuous attributes and pruning for generalization.^[22]^[52] Key algorithms also encompass CART, which applies least squares criteria for regression trees and Gini impurity for classification, allowing binary splits on mixed data types and producing both classifiers and regressors in a unified framework. CHAID, designed for categorical data, employs chi-square tests to identify significant splits, enabling multi-way branching and statistical significance testing at each node. These methods have been foundational since the 1980s, powering applications in pattern recognition and predictive analytics.^[23]^[86] Extensions of decision trees address limitations like high variance through ensemble techniques. Random forests combine bagging—bootstrap aggregation of multiple trees—with random feature subsampling at each split, reducing overfitting and improving stability while maintaining interpretability. Gradient boosting trees, such as XGBoost, iteratively build trees to correct residuals from previous models using gradient descent, incorporating regularization for sparsity and scalability; this approach dominated machine learning competitions in the 2010s due to its superior performance on tabular data.^[87]^[88] Decision trees and their ensembles function as white-box models, offering transparency valued in regulated domains like finance for auditability and compliance. They efficiently handle high-dimensional data through feature subsampling, scaling to thousands of features without exhaustive computation. On benchmark UCI datasets, such as Iris or Wine, decision trees often achieve accuracies exceeding 90%, demonstrating robust performance on structured data.^[89]^[59]^[90]

References

[1]
[PDF] Decision Trees1 - Machine Learning Laboratory
A decision tree is a graphical representation of a procedure for classifying or evaluating an item of interest. For example, given a patient's symptoms, ...
[2]
[PDF] decision tree learning algorithms - cs.Princeton
It is a method for approximating discrete-valued functions that is robust to noisy data and capable of learning disjunctive expressions.
[3]
[PDF] Decision Trees - UPenn CIS
In learning a decision tree, we must first choose a root attribute and then recur- sively decide sub-roots, building the decision tree in a top-down fashion.
[4]
[PDF] Induction of decision trees - Machine Learning (Theory)
This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system,. ID3, in detail.
[5]
[PDF] Classification and Regression Trees (CART)
▷ Classification and Regression Trees by Leo Breiman, Jerome. Friedman, Charles J. Stone and R.A. Olshen. ▷ An Introduction to Statistical Learning with ...
[6]
[PDF] Introduction to Machine Learning - Lecture 3: Decision Trees
Decision trees can also be used for regression on real-valued outputs. Choose splits to minimize squared error, rather than maximize information gain. UofT.
[7]
[PDF] Decision trees
An important disadvantage is accuracy: we often can't achieve the same prediction performance with a decision tree as we can with a less interpretable model. ...
[8]
[PDF] Decision Trees
Advantages and disadvantages. Disadvantages: tree models also have some disadvantages. Hard to find optimal set of rules. Greedy splitting often not accurate ...
[9]
What are decision trees? - PMC - NIH
Decision trees have been applied to problems such as assigning protein function and predicting splice sites. How do these classifiers work, what types of ...
[10]
11 Classify: Decision Trees – Introduction to Machine Learning
The decision tree, which, as with logistic regression, can be applied to a classification target variable with two or more classes (labels, levels, values, ...
[11]
Decision trees: from efficient prediction to responsible AI - PMC - NIH
Jul 26, 2023 · This article provides a birds-eye view on the role of decision trees in machine learning and data science over roughly four decades.
[12]
[PDF] Decision Trees and Influence Diagrams
The main goal of this chapter is to describe decision trees and influence diagrams, both of which are formal mathematical techniques for representing and ...
[13]
[PDF] Decision Making Under Uncertainty - Stanford University
... in decision diagrams can have multiple parents. Figure 4.8 shows an example of a decision tree and an equivalent decision diagram. Instead of requiring five ...
[14]
[PDF] Decision Trees - The Heller School
You will learn how to construct a graphical device called a decision tree. Decision trees serve two primary purposes. First, they tell you which alternatives to ...
[15]
[PDF] volume 1 class notes section 2 elements of decision analysis m ...
FORM” (INFERENCE FORM) BEFORE THEY CAN BE TRANSFORMED INTO A DECISION TREE ... FISHBURN, “FOUNDATIONS OF DECISION ANALYSIS: ALONG THE WAY” (OPTIONAL READING) ...
[16]
Decision trees, Simulation Models, Sensitivity Analyses
Decision trees are schematic representations of the question of interest and the possible consequences that occur from following each strategy. In the figure ...
[17]
Bayes Theorem (Easily Explained w/ 7 Examples!) - Calcworkshop
Sep 25, 2020 · Did you know that Bayes Theorem is nothing more than working backward through a tree diagram? Find out more with 7 step-by-step examples!
[18]
Influence Diagram Retrospective - PubsOnLine
Aug 30, 2005 · Since the invention of Influence diagrams in the mid-1970s, they have become a ubiquitous tool for representing uncertain situations.Missing: 1960s | Show results with:1960s
[19]
Raiffa, Howard - INFORMS.org
Brief Biography. Howard Raiffa is an influential decision scientist and operations researcher. Raised in Depression-era New York, Raiffa was encouraged to ...Missing: Ronald | Show results with:Ronald
[20]
Decision analysis for petroleum exploration (Book) | OSTI.GOV
Dec 31, 1975 · Included in the contents are: measures of profitability; decision tree ... Newendorp, P D, "Decision analysis for petroleum exploration," (1975).Missing: oil 1970s
[21]
A Modal Search Technique for Predictive Nominal - jstor
Scale Multivariate Analysis. ROBERT MESSENGER and LEWIS MANDELL*. A method is proposed in which a statistic, E3, is used as a probabilistic measure of ...Missing: decision | Show results with:decision
[22]
Induction of decision trees | Machine Learning
This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail.
[23]
Classification and Regression Trees | Leo Breiman, Jerome ...
Oct 19, 2017 · The methodology used to construct tree structured rules is the focus of this monograph. Unlike many other statistical procedures, ...
[24]
What is a Decision Tree Diagram - Lucidchart
The key elements are called nodes, and appear as a square or circle with branches (lines) connecting them until a result is reached. Squares represent decisions ...
[25]
Decision tree analysis: a step-by-step guide - Motion
Sep 22, 2023 · A decision tree uses specific symbols that show the types of actions at each point in the decision-making process. Let's review these before ...
[26]
What is a Decision Tree? (Templates and Tips) - Canva
The standard decision tree model starts with a primary goal or cause (known as the “root node”), typically placed at the top- or left-most part of the graph ...
[27]
Decision trees | F5 Performance Management - ACCA Global
A decision tree is always drawn starting on the left hand side of the page and moving across to the right. Above, I have mentioned decisions and outcomes.Missing: layout | Show results with:layout
[28]
Tree Diagram Maker | Create a Decision Tree Online - Lucidchart
Our Decision tree diagram maker is the perfect tool to simplify complex hierarchies. Create, share, and collaborate on professional tree diagrams.
[29]
Decision tree software
Feb 10, 2025 · TreeAge Software offers innovative solutions designed to simplify decision-making and improve accuracy. With intuitive tools, you can assess options, analyze ...
[30]
[PDF] flowchart symbols and their usage in information processing
ANSI X3.5-1970 flowchart symbols and their usage in information processing. Page 2. This standard was approved as a Federal Information Processing Standard by ...Missing: trees | Show results with:trees
[31]
Interactive Decision Tree Diagrams - yWorks
Interactively exploring a filtered decision tree helps to keep a clear view of the decision process.
[32]
UML Activity Diagram vs Flowchart: How to Choose - Sparx Systems
Sep 16, 2025 · Flowcharts excel at communication and quick visualization, whereas UML Activity Diagrams bring rigor through guards, decision nodes, concurrency ...
[33]
Decision Tree Analysis In Project Management & Strategic Planning
Jan 1, 2025 · Decision node: A square or rectangle represents a decision point where a choice must be made. Chance node: A circle represents a point where ...
[34]
How can decision trees be used to model probabilistic algorithms?
Jan 9, 2024 · Decision trees can be adapted for probabilistic modeling by assigning probabilities to each branch or outcome at decision nodes.
[35]
Acquiring expert rules with the aid of decision tables - ScienceDirect
This paper describes a computer-assisted approach to help knowledge engineers elicit rules from a domain expert.
[36]
[PDF] Classification: Basic Concepts, Decision Trees, and Model Evaluation
A classification technique (or classifier) is a systematic approach to building classification models from an input data set. Examples include decision tree.
[37]
(PDF) Decision Trees and Diagrams - ResearchGate
Aug 10, 2025 · Decision trees and diagrams (also known as sequential evaluation procedures) have widespread applications in databases, decision table programming, concrete ...Missing: scholarly | Show results with:scholarly
[38]
Decision Trees - Solution Method (The Backward Method) - YouTube
Sep 16, 2022 · Once you have constructed a decision tree, you can use the backward method to calculate the optimal expected value for the tree, ...Missing: forward traversal
[39]
Backward Reasoning Over Decision Trees - LessWrong
Jun 29, 2012 · If you reason forward, taking the best option on the first choice and so on, you end up as a low-level manager.Missing: traversal | Show results with:traversal
[40]
Decision Influence Diagrams and Their Uses - SpringerLink
In 1981, Howard and Matheson introduced the idea of representing a Bayesian decision problem in terms of a graph called an influence diagram.
[41]
Influence Diagram - an overview | ScienceDirect Topics
Arcs into decision nodes indicate time precedence and are informational, showing which variables will be known to the decision-maker before making a decision.
[42]
[PDF] DECISION TREES AND INFLUENCE DIAGRAMS - Prakash P. Shenoy
The main goal of this paper is to describe decision trees and influence diagrams, both of which are formal mathematical techniques for representing and solving ...
[43]
[PDF] From Influence Diagrams to Junction Trees - arXiv
uate an influence diagram without tranforming it into a decision tree. The ... The structure of a decision problem is determined by an acyclic directed graph G.
[44]
GeNIe Modeler – BayesFusion
GeNIe Modeler is a graphical user interface (GUI) to SMILE Engine and allows for interactive model building and learning.
[45]
Decision Analysis Applications in the Operations Research ...
Decision Analysis Applications in the Operations Research Literature, 1970–1989 | Operations Research.
[46]
Decision tree analysis for the risk averse organization - PMI
The decision tree software used in this paper is Precision Tree® from Palisade Corporation.)
[47]
Decision Trees
The decision tree learning algorithm recursively learns the tree as follows: Assign all training instances to the root of the tree. Set curent node to root ...
[48]
An Exploratory Technique for Investigating Large Quantities of ...
Summary. The technique set out in the paper, chaid, is an offshoot of aid (Automatic Interaction Detection) designed for a categorized dependent variable.
[49]
Decision Analysis: Introductory Lectures on Choices Under ...
Decision Analysis: Introductory Lectures on Choices Under Uncertainty. Front Cover. Howard Raiffa. McGraw-Hill, 1997 - Education - 309 ...
[50]
1.10. Decision Trees — scikit-learn 1.7.2 documentation
Decision Trees (DTs) are a non-parametric supervised learning method used for classification and regression. The goal is to create a model that predicts the ...DecisionTreeClassifier · Post pruning decision trees · Plot the decision surface of...
[51]
Overfitting and pruning - Machine Learning | Google for Developers
Aug 25, 2025 · Set a maximum depth: Prevent decision trees from growing past a maximum depth, such as 10. · Set a minimum number of examples in leaf: A leaf ...
[52]
[PDF] Cost-Complexity Pruning Process - IBM
Materials in this document are based on Classification and Regression Trees by Breiman et al (1984). Calculations of the risk estimates used throughout this ...Missing: paper | Show results with:paper<|control11|><|separator|>
[53]
Multivariate Decision Trees
Multivariate decision trees are not restricted to orthogonal splits, and each test can be based on one or more input features, unlike univariate trees.
[54]
C4.5 - ScienceDirect.com
This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use.
[55]
C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan ...
C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993 ... Article PDF. Download to read the full article text.
[56]
Cost-Sensitive Decision Trees for Imbalanced Classification
Aug 21, 2020 · The decision tree algorithm is effective for balanced classification, although it does not perform well on imbalanced datasets.
[57]
Class Imbalance and Cost-Sensitive Decision Trees
Dec 7, 2020 · Class imbalance treatment methods and cost-sensitive classification algorithms are typically treated as two independent research areas.
[58]
Logistics financial risk assessment based on decision tree algorithm ...
In order to solve the financial risk problem of small and medium-sized logistics enterprises, a log algorithm is used to measure the financial risk of ...
[59]
[PDF] Bagging Predictors - UC Berkeley Statistics
Abstract. Bagging predictors is a method for generating multiple versions of a pre- dictor and using these to get an aggregated predictor.
[60]
[PDF] Experiments with a New Boosting Algorithm - Machine Learning
Jan 22, 1996 · In this paper, we present such an experimental assessment of a new boosting algorithm called AdaBoost. Boosting works by repeatedly running a ...
[61]
[PDF] 1 RANDOM FORESTS Leo Breiman Statistics Department University ...
A recent paper (Breiman [2000]) shows that in distribution space for two class problems, random forests are equivalent to a kernel acting on the true margin.
[62]
Random Forests - Kaggle
The random forest uses many trees, and it makes a prediction by averaging the predictions of each component tree. It generally has much better predictive ...
[63]
[PDF] Deep Neural Decision Forests - CVF Open Access
We present Deep Neural Decision Forests – a novel ap- proach that unifies classification trees with the representa- tion learning functionality known from ...
[64]
Decision trees: from efficient prediction to responsible AI - Frontiers
The purpose of this review is to complement the literature by taking a step back and providing a higher-level overview of decision tree technology and ...
[65]
[PDF] A REVIEW ON EVALUATION METRICS FOR DATA ...
The purpose of this paper is to review and analyse all related evaluation metrics that were ... decision tree classifiers [26]. Due to specific purpose, these ...<|control11|><|separator|>
[66]
A Comparison of Machine Learning Techniques for the Quality ...
May 26, 2022 · Instead, the other four classifiers, i.e., KNN, decision tree, SVM, and MLP, score around 92% as the mean testing accuracy, with higher standard ...
[67]
[PDF] Decision Trees - DSpace@MIT
minimax regret : choose the action that minimizes the maximum possible regret minimax regret = min i max j regret (i, j) = 20. Page 9. 9. Expected Value. 30. 30.
[68]
Module 1 Notes: Decision Analysis
This strategy, which is sometimes called the minimax regret strategy, then selects that decision alternative associated with the minimum of the maximum regrets.
[69]
Introduction of Holdout Method - GeeksforGeeks
Sep 17, 2025 · The Holdout Method is a fundamental validation technique in machine learning used to evaluate the performance of a predictive model.
[70]
[PDF] A Study of Cross-Validation and Bootstrap for Accuracy Estimation ...
This study compares cross-validation and bootstrap for accuracy estimation, finding ten-fold stratified cross-validation best for model selection on real-world ...
[71]
Practical Considerations and Applied Examples of Cross-Validation ...
Dec 18, 2023 · Bias can often be reduced by increasing the complexity of the model (ie, if the model is underfit) in hopes of uncovering deeper statistical ...
[72]
[PDF] Efficient Algorithms for Decision Tree Cross-validation
Abstract. Cross-validation is a useful and generally applicable technique often employed in machine learn- ing, including decision tree induction.
[73]
https://link.springer.com/article/10.1007/s00362-024-01550-4
[74]
On the consistency of supervised learning with missing values
Sep 12, 2024 · Alternatively, some learning algorithms, such as decision trees, can readily handle missing values, accounting for their discrete nature. In ...
[75]
https://www.researchgate.net/publication/221112498_Decision_Tree_Instability_and_Active_Learning
[76]
Decision Tree Instability and Active Learning - ResearchGate
Aug 7, 2025 · This instability stems from the sensitivity of the branching processes of the decision tree to variations in data and exacerbated by the ...Missing: growth poor
[77]
[PDF] On the Explanatory Power of Decision Trees - arXiv
Aug 11, 2021 · We prove that the set of all sufficient reasons of minimal size for an instance given a decision tree can be exponentially larger than the size ...
[78]
[PDF] Comparing Linear Regression and Decision Trees for Housing Price ...
In this paper, two machine learning models, linear regression and decision trees, will be compared to the accuracy in predicting housing prices by comparing the ...
[79]
Decision Tree Induction: How Effective is the Greedy Heuristic? - AAAI
Mar 31, 2023 · Although the greedy approach is suboptimal, it is believed to produce reasonably good trees. In the current work, we attempt to verify this ...
[80]
https://doi.org/10.1007/s11606-009-0918-8
[81]
https://www.epa.gov/sites/default/files/2015-04/documents/phytoremprimer.pdf
[82]
[PDF] Selecting and Using Phytoremediation for Site Cleanup - EPA
The decision tree diagrams provides guidelines for determining the applicability of phytoremediation at a brownfields site after site characterization has been.Missing: policy | Show results with:policy
[83]
Using Decision Trees to Optimize Pharmaceutical Indication ...
Apr 12, 2024 · PrecisionTree can help create multiple decision pathways that map out possible strategies for clinical trials, the likelihood of success of each strategy,Missing: EMV | Show results with:EMV
[84]
https://www.pmi.org/learning/library/evaluate-decision-monte-carlo-simulation-4899
[85]
Decision analysis in projects - Monte Carlo simulation - PMI
Monte Carlo simulation is a complementary calculation alternative to decision tree analysis. Each technique has its advantages and disadvantages. The nature ...
[86]
https://rss.onlinelibrary.wiley.com/doi/abs/10.2307/2986296
[87]
An Exploratory Technique for Investigating Large Quantities of ...
The technique set out in the paper, chaid, is an offshoot of aid (Automatic Interaction Detection) designed for a categorized dependent variable.
[88]
Random Forests | Machine Learning
Random forests are a combination of tree predictors such that each tree depends on the values of a random vector sampled independently.
[89]
XGBoost: A Scalable Tree Boosting System - ACM Digital Library
Aug 13, 2016 · In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art ...
[90]
[PDF] Top-Down Induction of Decision Trees Classifiers—A Survey
Several advantages of the decision tree as a classification tool have been pointed out in the literature. •. Decision trees are self-explanatory and when com-.
[91]
A Survey of Decision Trees: Concepts, Algorithms, and Applications
Aug 6, 2025 · This paper presents a comprehensive overview of decision trees, including the core concepts, algorithms, applications, their early development to the recent ...