Fact-checked by Grok 2 weeks ago

C4.5 algorithm

The C4.5 algorithm is a supervised machine learning method for constructing decision trees to classify data instances based on their attributes, developed by J. Ross Quinlan as an extension of his earlier ID3 algorithm and detailed in his 1993 book.^[1] It employs a divide-and-conquer strategy to recursively partition training data, selecting attributes that maximize the information gain ratio—a normalized measure of entropy reduction—to minimize bias toward attributes with many possible values.^[2] Key enhancements over ID3 include robust handling of continuous attributes through threshold-based splits (e.g., testing if a numeric value exceeds a computed cutoff like 75 for humidity), probabilistic distribution of cases with missing values across branches, and post-pruning techniques to simplify trees by replacing subtrees with leaves when estimated error rates improve.^[1]^[2] Implemented in C for Unix systems, C4.5 generates both tree-based classifiers and production rules, making it suitable for knowledge acquisition in expert systems, and has been widely adopted for its balance of accuracy and interpretability in domains like medical diagnosis and credit scoring. The algorithm's source code, approximately 9,000 lines long, was distributed with the book and later evolved into the commercial C5.0 system, influencing subsequent developments in decision tree algorithms.^[1]

Background

Overview of Decision Trees

Decision trees are hierarchical, tree-structured models used in supervised machine learning for tasks such as classification and regression, where decisions are made by recursively applying tests on input attributes to partition the data into subsets.^[3]^[4] These models mimic human decision-making by traversing from the root to a leaf, with each path representing a sequence of attribute-based choices that lead to a prediction.^[3] The basic components of a decision tree include the root node, which encompasses the entire training dataset; internal nodes, each associated with a decision rule or test on a specific attribute; branches, which denote the possible outcomes of that test; and leaf nodes, which assign the final output, such as a class label in classification or a continuous value in regression.^[3]^[4] The structure forms a flowchart-like representation, enabling straightforward visualization of the model's logic.^[5] The induction process for building a decision tree involves recursive partitioning of the training data, starting at the root and selecting attributes that divide the data into increasingly pure subsets, with the goal of minimizing impurity at each node.^[3]^[4] Node purity measures how well a subset is concentrated in a single class (for classification) or around a mean value (for regression); a foundational metric for impurity is entropy, calculated as
H(S) = -\sum_{i=1}^{c} p_i \log_2 p_i,
where S is the dataset at a node, c is the number of classes, and p_i is the proportion of instances in class i.^[3] This process halts when criteria like pure nodes or resource limits are reached, resulting in a complete tree.^[3] Decision trees provide advantages such as high interpretability, as their structure allows easy tracing of decision paths, and the ability to handle mixed data types—categorical and numerical—without assuming linearity or normality in the data.^[5]^[6] However, they suffer from limitations including a tendency to overfit training data, producing overly complex trees that perform poorly on unseen data, and instability, where minor data perturbations can yield vastly different trees.^[5]^[6] Additionally, they can exhibit bias toward attributes with numerous distinct values, favoring splits on such features even if not optimal.^[3]

Development and Relation to ID3

The C4.5 algorithm was developed by J. Ross Quinlan in 1993 as an extension of his earlier ID3 algorithm, introduced in 1986.^[7]^[3] This advancement was detailed in Quinlan's book C4.5: Programs for Machine Learning, which provided a comprehensive implementation and evaluation framework for decision tree induction.^[7] ID3, while pioneering in using entropy-based information gain to select attributes for splitting, had core limitations that restricted its applicability to real-world data.^[3] Specifically, it could not natively handle continuous attributes without preprocessing into discrete bins, lacked mechanisms for dealing with missing values, offered no built-in pruning to combat overfitting, and relied on information gain—a measure prone to bias toward attributes with numerous distinct values, as these tend to produce finer partitions regardless of their predictive power.^[7] The information gain formula in ID3, defined as

IG(A) = H(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} H(S_v),

where H(S) is the entropy of the training set S and S_v is the subset of S for which attribute A takes value v, exemplifies this bias by rewarding splits that maximize impurity reduction through sheer multiplicity of outcomes.^[3]^[7] To overcome these shortcomings, C4.5 introduced key enhancements, including gain ratio as a normalized alternative to information gain for more equitable attribute selection, thresholding methods to binarize continuous attributes during tree construction, probabilistic fractional allocation for distributing instances with missing values across branches, and error-based post-pruning techniques to simplify trees and improve generalization.^[7] Quinlan's primary motivation in developing C4.5 was to make decision tree learning more robust for practical machine learning tasks, particularly on diverse, noisy datasets like those in the UCI Machine Learning Repository, where ID3's constraints often led to suboptimal performance.^[7]

Algorithm Mechanics

Attribute Selection with Gain Ratio

The information gain criterion used in the ID3 algorithm favors attributes with a large number of possible outcomes, as such attributes tend to produce more evenly distributed partitions and thus higher apparent gain, even if the splits are not particularly informative about the class labels. This bias can lead to fragmented decision trees that overfit the training data by selecting attributes that create many small subsets rather than those that provide meaningful class separation.^[8] To address this issue, C4.5 introduces the gain ratio, which normalizes the information gain by the split information, a measure of the entropy inherent in the partition sizes produced by the attribute. The split information for an attribute A that partitions a dataset S of size |S| into k subsets S_i of sizes |S_i| is defined as:

\text{SplitInfo}(A) = -\sum_{i=1}^{k} \frac{|S_i|}{|S|} \log_2 \left( \frac{|S_i|}{|S|} \right)

This value is higher when the partitions are more uneven and lower when they are more balanced, penalizing attributes that create highly skewed or numerous small subsets. The gain ratio for attribute A is then:

\text{GainRatio}(A) = \frac{\text{IG}(A)}{\text{SplitInfo}(A)}

where \text{IG}(A) is the information gain. Since \text{SplitInfo}(A) = 0 only occurs for degenerate splits where all instances fall into one subset (implying \text{IG}(A) = 0), such cases are naturally excluded as they provide no useful separation.^[8] To further mitigate bias toward splits with low split information (which could inflate the ratio even for modest gains), C4.5 evaluates all possible tests and selects the one with the maximum gain ratio among those whose information gain is at least the average gain across all candidate tests. This ensures that only informative splits are considered, with the threshold effectively being the mean \text{IG} rather than a fixed value like zero, though it can be tuned in implementations.^[8] A concrete example illustrates the computation using the standard weather dataset for predicting whether to play tennis (14 instances, 9 "Yes," 5 "No"; root entropy = 0.940 bits). Consider the discrete attribute "Outlook" (values: Sunny=5, Overcast=4, Rain=5):

Entropy after split: Sunny ([2 Yes, 3 No]) = 0.971 bits; Overcast ([4 Yes, 0 No]) = 0 bits; Rain ([3 Yes, 2 No]) = 0.971 bits.
Weighted entropy = (5/14) \times 0.971 + (4/14) \times 0 + (5/14) \times 0.971 = 0.693 bits.
\text{IG}(\text{[Outlook](/page/Outlook)}) = 0.940 - 0.693 = 0.247 bits.
\text{SplitInfo}(\text{[Outlook](/page/Outlook)}) = -(5/14)\log_2(5/14) - (4/14)\log_2(4/14) - (5/14)\log_2(5/14) = 1.577 bits.
\text{GainRatio}(\text{[Outlook](/page/Outlook)}) = 0.247 / 1.577 = 0.156.

Similar calculations for other attributes yield the following:

Attribute	IG (bits)	SplitInfo (bits)	Gain Ratio
Outlook	0.247	1.577	0.156
Temperature	0.029	1.557	0.019
Humidity	0.152	1.000	0.152
Windy	0.048	0.985	0.049

Outlook is selected as it has the highest gain ratio among those exceeding the average IG (0.119 bits).^[9]

Handling Continuous and Discrete Attributes

C4.5 distinguishes between discrete (categorical) attributes and continuous (numeric) attributes in its tree construction process. For discrete attributes, the algorithm creates multi-way branches, with one branch corresponding to each possible category value, allowing the data to be partitioned into subsets based on nominal distinctions. In contrast, continuous attributes cannot be directly branched in this manner due to their infinite potential values; instead, C4.5 binarizes them by selecting a single threshold that converts the attribute into a binary test, enabling integration with the discrete handling mechanism.^[10]^[11] To identify the optimal threshold for a continuous attribute A, C4.5 sorts the unique values of A from the current dataset D, resulting in an ordered list v_1 < v_2 < \dots < v_n. It then considers potential thresholds t_i = (v_i + v_{i+1})/2 for each pair of consecutive values where v_i \neq v_{i+1}, evaluating the information gain ratio for the binary split induced by each t_i. The threshold yielding the highest gain ratio is selected as the split point, with gain ratio serving as the attribute selection metric applied after binarization.^[10]^[11] Once the threshold t is chosen, the split rule is binary: the dataset D is partitioned into two subsets, D_1 containing instances where A \leq t and D_2 containing instances where A > t. These subsets are then treated as new datasets for recursive application of the tree-building process, effectively incorporating the continuous attribute as a binary decision node in the tree. This approach ensures that continuous attributes contribute to the decision tree structure in a manner consistent with discrete ones, though limited to two branches per split.^[10]^[11] Consider a simple illustrative example with a dataset of 6 instances, where the continuous attribute is age and the target class is whether a product is purchased (yes or no):

Instance	Age	Class
1	25	yes
2	30	yes
3	35	no
4	40	no
5	45	no
6	20	yes

The unique sorted ages are 20, 25, 30, 35, 40, 45. Candidate thresholds are the midpoints: 22.5, 27.5, 32.5, 37.5, 42.5. For each threshold, C4.5 computes the gain ratio by assessing the class distribution in the resulting subsets (e.g., for t = 32.5, one subset has ages ≤32.5 with classes {yes, yes, yes} and the other >32.5 with {no, no, no}). The threshold 32.5 yields the highest gain ratio in this case due to perfect class separation, creating branches for age ≤32.5 (predicting yes) and age >32.5 (predicting no). This example demonstrates how threshold evaluation leads to an effective binary split that maximizes the gain ratio.^[10]^[11] C4.5 addresses edge cases in continuous attribute handling to prevent invalid splits. If all values of the attribute are identical, no distinct consecutive values exist for midpoint thresholds, rendering the attribute unsuitable for splitting and excluding it from consideration. Similarly, if the dataset has fewer than two distinct values or if all potential splits result in zero gain (e.g., no change in class distribution across subsets), the algorithm halts splitting on that attribute to avoid degenerate trees. These checks ensure computational efficiency and logical tree growth.^[10]^[11]

Managing Missing Values

C4.5 employs a probabilistic allocation strategy to manage missing values during both tree construction and classification, enabling the use of incomplete data without resorting to complete-case deletion. This approach fractionally distributes the weight of an instance with a missing attribute value across all possible branches of that attribute, proportional to the frequencies observed in the training data with known values.^[7] As described by Quinlan, each training instance starts with a weight of 1; for a missing value on attribute A, the weight is split such that the fraction allocated to each outcome of A equals the proportion of training instances exhibiting that outcome among those with non-missing values for A.^[11] This weighted distribution preserves the total instance weight while contributing to entropy and split information calculations for gain ratio evaluation.^[12] During attribute selection, the gain ratio for a candidate attribute incorporates these fractional weights from missing instances, effectively treating them as partial contributors to each potential child node rather than excluding them entirely. For instance, consider a training set of 100 instances where 20 have missing values on attribute A, which has two discrete outcomes: "yes" observed in 50 of the 80 known cases (62.5%) and "no" in the remaining 30 (37.5%). The 20 missing instances contribute a weight of 12.5 to the "yes" branch and 7.5 to the "no" branch. This allocation ensures the gain ratio computation uses the full dataset's information, avoiding underestimation of splits due to data loss.^[11] By integrating missing instances proportionally, C4.5 maintains more accurate estimates of class distributions and information gain, particularly beneficial when missingness rates are moderate, such as 20%.^[13] In the classification phase, a test instance with a missing value at an internal node defined by attribute A is propagated down all branches from that node, with each path receiving a weight equal to the branch's probability from the training data. The final prediction aggregates the class probabilities at the leaves reached via these paths, using a weighted average based on the branch weights—typically resulting in a probabilistic output rather than a hard assignment.^[7] This mirrors the training-time distribution, ensuring consistency between model building and application. For the earlier example, a test instance missing A would receive 62.5% weight toward the "yes" subtree and 37.5% toward the "no" subtree, with leaf predictions combined accordingly.^[12] Compared to deletion methods, which discard instances with any missing values and can reduce effective sample size by up to 20% or more in sparse datasets, C4.5's probabilistic method retains full data volume for splitting decisions, leading to more reliable gain ratio estimates and potentially lower variance in tree structure.^[13] Empirical evaluations confirm this advantage, as deletion often biases toward attributes with fewer missing values, whereas fractional weighting promotes equitable consideration across features.^[11]

Tree Construction Pseudocode

The tree construction in C4.5 follows a recursive divide-and-conquer strategy to induce a decision tree from a training dataset S, where each instance consists of attribute values and a class label. The process begins at the root node representing the entire dataset and proceeds by selecting the best attribute for splitting at each non-leaf node, partitioning S into subsets based on attribute values, and recursing on those subsets until stopping criteria are met. This builds an unpruned tree that captures the training data's structure, with attribute selection guided by gain ratio to prioritize splits that maximize class discrimination relative to split information. The core procedure integrates handling for discrete and continuous attributes: for discrete attributes, branches correspond to each possible value; for continuous attributes, a threshold is chosen to binarize the attribute into "less than or equal to threshold" and "greater than threshold" branches. Missing values are managed during selection and splitting by probabilistically distributing instances across branches based on observed frequencies in the dataset, ensuring the tree remains robust to incomplete data. The recursion terminates based on predefined rules to prevent trivial or overly specific nodes.

Pseudocode

The following pseudocode outlines the recursive tree induction function in C4.5, adapted from the original implementation details:

function TreeInduce(S: training set, Attributes: set of available attributes) returns Node:
    if |S| = 0:  // empty set
        return new Leaf("failure")  // or default class
    if all instances in S have the same class C:
        return new Leaf(C)
    if Attributes is empty or |S| < mincases:  // default mincases = 2
        return new Leaf(majority class in S)
    
    // Select best attribute using gain ratio
    A ← SelectBestAttribute(S, Attributes)  // via gain ratio, handling continuous/missing as needed
    
    // Create decision node
    Node ← new DecisionNode(A)
    
    // Partition S based on A (discrete values or continuous threshold; distribute missing)
    for each possible outcome v of test A on S:
        Sv ← {instances in S satisfying outcome v}
        Child ← TreeInduce(Sv, Attributes - {A})
        add branch from Node to Child labeled v
    
    return Node
function TreeInduce(S: training set, Attributes: set of available attributes) returns Node:
    if |S| = 0:  // empty set
        return new Leaf("failure")  // or default class
    if all instances in S have the same class C:
        return new Leaf(C)
    if Attributes is empty or |S| < mincases:  // default mincases = 2
        return new Leaf(majority class in S)
    
    // Select best attribute using gain ratio
    A ← SelectBestAttribute(S, Attributes)  // via gain ratio, handling continuous/missing as needed
    
    // Create decision node
    Node ← new DecisionNode(A)
    
    // Partition S based on A (discrete values or continuous threshold; distribute missing)
    for each possible outcome v of test A on S:
        Sv ← {instances in S satisfying outcome v}
        Child ← TreeInduce(Sv, Attributes - {A})
        add branch from Node to Child labeled v
    
    return Node

This structure ensures the tree grows depth-first, with each call reducing the dataset size and attribute set. The SelectBestAttribute function invokes gain ratio computation (referenced briefly from attribute selection mechanics), while partitioning incorporates continuous thresholding and missing value allocation (as detailed in handling procedures).^[14] Stopping rules include: (1) an empty subset, yielding a failure leaf; (2) all instances sharing the same class, forming a pure leaf node; (3) no remaining attributes or fewer than the minimum number of cases per node (typically 2), resulting in a leaf with the predominant class in the subset. These criteria balance tree completeness against overfitting risks during induction. To illustrate, consider a toy dataset adapted from the classic "play tennis" problem with 14 instances, attributes (Outlook: sunny/overcast/rain; Temperature: hot/mild/cool; Humidity: high/normal; Wind: weak/strong), and class (Play: yes/no). At the root, gain ratio selects Outlook (highest value ≈0.156). Subsets form: sunny (5 instances, 2 yes/3 no), overcast (4 yes/0 no → pure leaf "yes"), rain (5 instances, 3 yes/2 no). Recursing on sunny selects Humidity (gain ratio ≈0.152), splitting into high (0 yes/3 no → leaf "no") and normal (3 yes/0 no → leaf "yes"). On rain, Wind is selected (gain ratio ≈0.049), yielding weak (3 yes/0 no → "yes") and strong (0 yes/2 no → "no"). The overcast leaf stops immediately due to purity. This constructs a tree of depth 2 with 5 leaves, accurately classifying all training instances before pruning.

Pruning and Post-Processing

Error-Based Pruning

C4.5's error-based pruning is a post-processing step applied to the fully grown decision tree to mitigate overfitting by simplifying its structure while minimizing estimated generalization error. The process operates bottom-up, commencing from the terminal nodes and ascending toward the root. At each internal node, the algorithm evaluates whether replacing the entire subtree rooted there with a single leaf—assigned the majority class label among the cases reaching that node—would decrease the predicted error. If so, the replacement is made, effectively collapsing the subtree into a leaf. This iterative simplification continues until no further reductions in predicted error are possible.^[15] Central to this method is a pessimistic estimation of error rates, which adjusts the observed training errors upward to account for the bias toward lower errors on the training data itself. For a leaf node that covers N training cases with E observed errors, the estimated error rate is the upper confidence limit U_{\text{CF}}(E, N) derived from the binomial distribution at a user-specified confidence factor CF (default 25%). This limit incorporates a continuity correction of 0.5 to improve the approximation, effectively treating the error count as E + 0.5 in the binomial confidence interval calculation. The predicted number of errors for the leaf is then N \times U_{\text{CF}}(E, N). For a subtree, the total predicted error is the sum of the predicted errors across all its descendant leaves, weighted by the number of cases they cover.^[15]^[16] The pruning decision hinges on comparing this pessimistic upper bound for the subtree's error against the predicted error if replaced by a leaf. Specifically, pruning occurs if the upper bound of errors in the subtree is greater than or equal to the predicted error for the majority-class leaf. This conservative criterion ensures that only simplifications likely to improve performance on unseen data are adopted, as the upper bound simulates a worst-case scenario informed by statistical confidence.^[15] C4.5 performs this error-based pruning using the training data directly, with the pessimistic adjustment serving to debias estimates without requiring a held-out validation set; however, alternative methods like reduced-error pruning may employ a separate pruning set (typically 25% of the data) for direct validation, though this is not the default in C4.5.^[17]^[18] To illustrate, consider an unpruned subtree with three leaves: the first covering 6 cases with 0 errors (U_{25\%}(0,6) \approx 0.206), the second covering 9 cases with 0 errors (U_{25\%}(0,9) \approx 0.143), and the third covering 1 case with 0 errors (U_{25\%}(0,1) = 0.750). The predicted errors for the subtree sum to $6 \times 0.206 + 9 \times 0.143 + 1 \times 0.750 \approx 3.273. Replacing the subtree with a leaf covering all 16 cases, where the majority class incurs 1 observed error, yields a predicted error of $16 \times U_{25\%}(1,16) \approx 2.512. Since 2.512 < 3.273, the subtree is pruned to the leaf.^[15]

Rule Extraction from Trees

In C4.5, rule extraction begins with the pruned decision tree as input, where each path from the root to a leaf is traversed to generate a conjunctive rule: the sequence of attribute tests along the path forms the antecedent (a conjunction of conditions), and the class label at the leaf serves as the consequent.^[7] This process yields one rule per leaf, providing an alternative, more linear representation of the learned classifier that can be easier to interpret and apply than the tree structure itself.^[19] Rule simplification follows to enhance comprehensibility and accuracy by removing redundant tests—such as conditions that do not alter the class probability distribution in the subtree—and pruning rules with low coverage or those that fail to improve predictive performance on the training data.^[7] This step generalizes the rules, potentially merging similar paths and reducing the total number of conditions, while using pessimistic error estimates to avoid overfitting.^[19] The simplified rules are then collected into a rule set, sorted by confidence (measured as classification accuracy on the training data), and used for prediction by selecting the highest-ranked rule whose antecedent matches the instance; if no rule applies, classification defaults to the majority class from the training set.^[7] Overlaps between rules, which can arise after simplification, are handled through this ordering, with the first matching rule determining the prediction.^[19] For illustration, consider a simplified pruned decision tree on the classic weather dataset, where the target is to predict whether to play tennis (yes/no) based on attributes like outlook, temperature, humidity, and wind. Three extracted and simplified rules might be:

If outlook = sunny ∧ humidity = high then play = no
If outlook = overcast then play = yes
If humidity = normal then play = yes

These rules demonstrate conjunctive antecedents leading to class predictions, with the full set ordered by accuracy and defaulting to "yes" (the majority class) for unmatched cases.^[7]

Implementations and Extensions

Software Implementations

The original implementation of the C4.5 algorithm was developed by J. Ross Quinlan and released in the 1990s as a command-line tool written in the C programming language for Unix environments. This system processes input data in a simple textual format akin to ARFF, with attributes specified in a header followed by case records, and generates output as decision trees or production rules in human-readable text files.^[7] A prominent open-source port is J48, a Java reimplementation of C4.5 integrated into the Waikato Environment for Knowledge Analysis (Weka) toolkit, which has been available since the late 1990s. J48 replicates key C4.5 features such as gain ratio for attribute selection, handling of continuous attributes via thresholding, and post-pruning using a confidence threshold parameter (default value of 0.25 for estimating error rates). It supports diverse input formats including ARFF, CSV, and JSON, and offers graphical visualization of decision trees alongside textual rule extraction for interpretability. In Python, the scikit-learn library's DecisionTreeClassifier provides C4.5-like functionality through its entropy criterion (approximating information gain), though full gain ratio splitting requires custom extensions via the splitter interface; it does not natively implement all C4.5 specifics like rule post-pruning. This class handles input from Pandas DataFrames or NumPy arrays (often loaded from CSV), and includes export_graphviz for tree visualization. For R, the C50 package offers an approximation via its implementation of the closely related C5.0 algorithm, supporting tree and rule-based models with boosting options, while the partykit package's ctree function uses statistical tests that can emulate C4.5-style splits on datasets in data.frame format.^[20]^[21] Commercial software integrates C4.5 as a foundational decision tree method; for instance, SAS Enterprise Miner includes a Decision Tree node supporting the C4.5 algorithm alongside others like CART and CHAID, enabling interactive splitting and visualization on enterprise datasets in SAS or database formats. IBM SPSS Modeler similarly incorporates C5.0-based nodes for classification modeling, processing inputs from CSV, database connections, or Excel, with outputs including tree diagrams and rule sets. These tools often benchmark C4.5 performance on standard UCI datasets like Iris (for multiclass classification) or Wine (for attribute handling), where J48 in Weka, for example, achieves accuracies around 95-98% on Iris with default parameters.

Evolution to C5.0 and See5

C5.0, developed by J. Ross Quinlan as a successor to the C4.5 algorithm, was released in the late 1990s and early 2000s, with See5 serving as its Windows implementation counterpart.^[22] These tools are now maintained and distributed by RuleQuest Research, Quinlan's company.^[23] Key enhancements in C5.0 include adaptive boosting to create ensemble trees similar to AdaBoost, which generates multiple classifiers for voting and typically reduces test error by about 25% on average.^[22] Winnowing enables attribute subset selection by estimating feature importance and pre-selecting relevant attributes, such as reducing 22 attributes to 8 in benchmark examples.^[22] Additionally, softer thresholds for continuous attribute splits use probabilistic blending of outcomes within narrow bounds around the split point, improving robustness to noisy data.^[22] Pruning in C5.0 employs an improved error-based approach with a cost-complexity model that incorporates Laplace error estimates for confidence intervals, allowing finer control over tree simplification.^[22] It also supports misclassification costs, enabling handling of unbalanced datasets by assigning penalties to different error types during pruning and prediction.^[22] For rule sets, C5.0 uses a separate-and-conquer strategy in rule learning, iteratively covering positive examples with rules followed by pruning cycles to remove redundant conditions, resulting in more compact and interpretable rule bases compared to tree extraction alone.^[22] Relative to C4.5, C5.0 offers faster execution through techniques like adaptive boosting and sampling, along with higher accuracy on noisy data; for instance, it achieves 5-10% error reductions on UCI repository datasets in representative evaluations.^[22] C5.0 and See5 are available under commercial licensing from RuleQuest, with academic and GPL-licensed source code versions for non-commercial use, and they integrate into data mining tools such as KNIME via compatible packages.^[24]^[23]

References

[1]
[PDF] C5.1.3 Decision Tree Discovery - Stanford AI Lab
We describe the two most commonly used systems for induction of decision trees for classification: C4.5 and CART. We highlight the methods and differ-.
[2]
Building Classification Models: ID3 and C4.5 - Temple CIS
ID3 and C4.5 are algorithms introduced by Quinlan for inducing Classification Models, also called Decision Trees, from data.
[3]
Induction of decision trees | Machine Learning
This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail.
[4]
Classification and Regression Trees | Leo Breiman, Jerome ...
Oct 19, 2017 · Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical ...<|control11|><|separator|>
[5]
Decision tree methods: applications for classification and prediction
Apr 25, 2015 · Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing ...
[6]
[PDF] Tree-based Methods
Left column: linear model; Right column: tree-based model. 29 / 51. Page 30. Advantages and Disadvantages of Trees s Trees are very easy to explain to people.
[7]
C4.5 - ScienceDirect.com
This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use.Missing: key | Show results with:key
[8]
[PDF] Improved Use of Continuous Attributes in C4.5 - Iowa State University
The gain ratio of every possible test is determined and, among those with at least average gain, the split with maximum gain ratio is selected. In some ...<|control11|><|separator|>
[9]
None
### Summary of Gain Ratio Calculations for Weather Dataset in C4.5
[10]
Improved Use of Continuous Attributes in C4.5
Mar 1, 1996 · A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes.
[11]
C4.5 : programs for machine learning - Internet Archive
Apr 25, 2022 · Publication date: 1993 ; Topics: Machine learning, Algorithms, Computer programming ; Publisher: San Mateo, Calif. : Morgan Kaufmann Publishers.Missing: official | Show results with:official
[12]
[PDF] Imputation of Missing Data Using Machine Learning Techniques
session. C4.5 uses a probablistic approach to handle missing values in the training and test data. Any case from the training set is assigned a weight w~ of ...<|control11|><|separator|>
[13]
[PDF] Handling Missing Values when Applying Classification Models
For distribution-based imputation we employ C4.5's method for classifying instances with missing values as described above. For value imputation we estimate ...
[14]
[PDF] A comparative study of decision tree ID3 and C4.5
ID3 and C4.5 algorithms have been introduced by J.R Quinlan which produce reasonable decision trees. The objective of this paper is to present these algorithms ...
[15]
None
### Summary of Error-Based Pruning in C4.5
[16]
[PDF] Is Error-Based Pruning Redeemable?
Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision ...<|control11|><|separator|>
[17]
Pruning - SAS Help Center
Dec 21, 2018 · C4.5 pruning (Quinlan 1993) evolved from pessimistic pruning to employ an even more pessimistic (that is, higher) estimate of the true error ...
[18]
[PDF] An Analysis of Reduced Error Pruning
In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, ...
[19]
Generating Production Rules from Decision Trees - Semantic Scholar
Rule extraction from neural networks via decision tree induction · Makoto Sato ... C4.5) to develop a predictive model on Android. Expand. Add to Library.
[20]
1.10. Decision Trees — scikit-learn 1.7.2 documentation
J.R. Quinlan. C4. 5: programs for machine learning. Morgan Kaufmann, 1993. T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning ...Missing: official | Show results with:official
[21]
DecisionTreeClassifier — scikit-learn 1.7.2 documentation
The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “log_loss” and “entropy” both for the Shannon ...Missing: C4. ratio
[22]
C5.0: An Informal Tutorial - RuleQuest
C5.0's job is to find how to predict a case's class from the values of the other attributes. C5.0 does this by constructing a classifier that makes this ...Missing: Quinlan | Show results with:Quinlan
[23]
Ross Quinlan's personal homepage - RuleQuest
Ross Quinlan. Software available for download: FOIL Release 6: (shell archive). FFOIL Release 2: (shell archive). C4.5 Release 8: (gzipped tar file)Missing: original implementation
[24]
Information on See5/C5.0 - RuleQuest
See5 (Windows 8/10/11) and its Linux counterpart C5.0 are sophisticated data mining tools for discovering patterns that delineate categories.Missing: algorithm Quinlan