Fact-checked by Grok 2 weeks ago

C4.5 algorithm

The C4.5 algorithm is a supervised method for constructing s to classify data instances based on their attributes, developed by J. Ross Quinlan as an extension of his earlier and detailed in his 1993 book. It employs a divide-and-conquer strategy to recursively partition training data, selecting attributes that maximize the —a normalized measure of reduction—to minimize bias toward attributes with many possible values. Key enhancements over include robust handling of continuous attributes through threshold-based splits (e.g., testing if a numeric value exceeds a computed like 75 for ), probabilistic distribution of cases with missing values across branches, and post-pruning techniques to simplify trees by replacing subtrees with leaves when estimated error rates improve. Implemented in C for Unix systems, C4.5 generates both tree-based classifiers and production rules, making it suitable for in expert systems, and has been widely adopted for its balance of accuracy and interpretability in domains like and credit scoring. The algorithm's , approximately 9,000 lines long, was distributed with the book and later evolved into the commercial C5.0 system, influencing subsequent developments in decision tree algorithms.

Background

Overview of Decision Trees

Decision trees are hierarchical, tree-structured models used in supervised for tasks such as and , where decisions are made by recursively applying tests on input attributes to partition the data into subsets. These models mimic human decision-making by traversing from the root to a , with each path representing a sequence of attribute-based choices that lead to a . The basic components of a decision tree include the root node, which encompasses the entire training dataset; internal nodes, each associated with a decision rule or test on a specific attribute; branches, which denote the possible outcomes of that test; and leaf nodes, which assign the final output, such as a class label in or a continuous value in . The structure forms a flowchart-like representation, enabling straightforward visualization of the model's logic. The induction process for building a involves of the training data, starting at the root and selecting attributes that divide the data into increasingly pure , with the goal of minimizing at each . purity measures how well a is concentrated in a single class (for ) or around a value (for ); a foundational for is , calculated as
H(S) = -\sum_{i=1}^{c} p_i \log_2 p_i,
where S is the at a , c is the number of classes, and p_i is the proportion of instances in class i. This process halts when criteria like pure nodes or resource limits are reached, resulting in a complete tree.
Decision trees provide advantages such as high interpretability, as their structure allows easy tracing of decision paths, and the ability to handle mixed data types—categorical and numerical—without assuming or in the . However, they suffer from limitations including a tendency to overfit training , producing overly complex trees that perform poorly on unseen , and , where minor perturbations can yield vastly different trees. Additionally, they can exhibit toward attributes with numerous distinct values, favoring splits on such features even if not optimal.

Development and Relation to ID3

The C4.5 algorithm was developed by J. Ross Quinlan in 1993 as an extension of his earlier ID3 algorithm, introduced in 1986. This advancement was detailed in Quinlan's book C4.5: Programs for Machine Learning, which provided a comprehensive implementation and evaluation framework for decision tree induction. ID3, while pioneering in using -based information gain to select attributes for splitting, had core limitations that restricted its applicability to real-world data. Specifically, it could not natively handle continuous attributes without preprocessing into bins, lacked mechanisms for dealing with values, offered no built-in to combat , and relied on information gain—a measure prone to toward attributes with numerous distinct values, as these tend to produce finer partitions regardless of their . The information gain formula in ID3, defined as IG(A) = H(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} H(S_v), where H(S) is the entropy of the training set S and S_v is the of S for which attribute A takes value v, exemplifies this by rewarding splits that maximize reduction through sheer multiplicity of outcomes. To overcome these shortcomings, C4.5 introduced key enhancements, including gain ratio as a normalized alternative to information gain for more equitable attribute selection, thresholding methods to binarize continuous attributes during construction, probabilistic fractional allocation for distributing instances with missing values across branches, and error-based post-pruning techniques to simplify trees and improve generalization. Quinlan's primary motivation in developing C4.5 was to make more robust for practical tasks, particularly on diverse, noisy datasets like those in the UCI Machine Learning Repository, where ID3's constraints often led to suboptimal performance.

Algorithm Mechanics

Attribute Selection with Gain Ratio

The information gain criterion used in the favors attributes with a large number of possible outcomes, as such attributes tend to produce more evenly distributed partitions and thus higher apparent gain, even if the splits are not particularly informative about the labels. This can lead to fragmented decision trees that overfit the by selecting attributes that create many small subsets rather than those that provide meaningful separation. To address this issue, C4.5 introduces the gain ratio, which normalizes the information gain by the split information, a measure of the entropy inherent in the partition sizes produced by the attribute. The split information for an attribute A that partitions a dataset S of size |S| into k subsets S_i of sizes |S_i| is defined as: \text{SplitInfo}(A) = -\sum_{i=1}^{k} \frac{|S_i|}{|S|} \log_2 \left( \frac{|S_i|}{|S|} \right) This value is higher when the partitions are more uneven and lower when they are more balanced, penalizing attributes that create highly skewed or numerous small subsets. The gain ratio for attribute A is then: \text{GainRatio}(A) = \frac{\text{IG}(A)}{\text{SplitInfo}(A)} where \text{IG}(A) is the information gain. Since \text{SplitInfo}(A) = 0 only occurs for degenerate splits where all instances fall into one subset (implying \text{IG}(A) = 0), such cases are naturally excluded as they provide no useful separation. To further mitigate bias toward splits with low split information (which could inflate the ratio even for modest gains), C4.5 evaluates all possible tests and selects the one with the maximum gain ratio among those whose information gain is at least the average gain across all candidate tests. This ensures that only informative splits are considered, with the threshold effectively being the mean \text{IG} rather than a fixed value like zero, though it can be tuned in implementations. A concrete example illustrates the computation using the standard weather dataset for predicting whether to play (14 instances, 9 "Yes," 5 "No"; root = 0.940 bits). Consider the attribute "" (values: =5, =4, =5):
  • after split: ([2 Yes, 3 No]) = 0.971 bits; ([4 Yes, 0 No]) = 0 bits; ([3 Yes, 2 No]) = 0.971 bits.
    Weighted = (5/14) \times 0.971 + (4/14) \times 0 + (5/14) \times 0.971 = 0.693 bits.
  • \text{IG}(\text{[Outlook](/page/Outlook)}) = 0.940 - 0.693 = 0.247 bits.
  • \text{SplitInfo}(\text{[Outlook](/page/Outlook)}) = -(5/14)\log_2(5/14) - (4/14)\log_2(4/14) - (5/14)\log_2(5/14) = 1.577 bits.
  • \text{GainRatio}(\text{[Outlook](/page/Outlook)}) = 0.247 / 1.577 = 0.156.
Similar calculations for other attributes yield the following:
AttributeIG (bits)SplitInfo (bits)Gain Ratio
Outlook0.2471.5770.156
Temperature0.0291.5570.019
Humidity0.1521.0000.152
Windy0.0480.9850.049
Outlook is selected as it has the highest gain ratio among those exceeding the average IG (0.119 bits).

Handling Continuous and Discrete Attributes

C4.5 distinguishes between (categorical) attributes and continuous (numeric) attributes in its tree construction process. For attributes, the algorithm creates multi-way branches, with one branch corresponding to each possible value, allowing the to be partitioned into subsets based on nominal distinctions. In contrast, continuous attributes cannot be directly branched in this manner due to their potential values; instead, C4.5 binarizes them by selecting a single that converts the attribute into a test, enabling integration with the handling mechanism. To identify the optimal threshold for a continuous attribute A, C4.5 sorts the unique values of A from the current D, resulting in an ordered list v_1 < v_2 < \dots < v_n. It then considers potential s t_i = (v_i + v_{i+1})/2 for each pair of consecutive values where v_i \neq v_{i+1}, evaluating the information gain ratio for the binary split induced by each t_i. The threshold yielding the highest gain ratio is selected as the split point, with gain ratio serving as the attribute selection metric applied after binarization. Once the threshold t is chosen, the split rule is : the dataset D is partitioned into two subsets, D_1 containing instances where A \leq t and D_2 containing instances where A > t. These subsets are then treated as new datasets for recursive application of the -building process, effectively incorporating the continuous attribute as a decision in the . This approach ensures that continuous attributes contribute to the structure in a manner consistent with ones, though limited to two branches per split. Consider a simple illustrative example with a of 6 instances, where the continuous attribute is and the target is whether a product is purchased ( or no):
Instance
125
230
335no
440no
545no
620
The unique sorted ages are 20, 25, 30, 35, 40, 45. Candidate s are the midpoints: 22.5, 27.5, 32.5, 37.5, 42.5. For each , C4.5 computes the gain ratio by assessing the class distribution in the resulting s (e.g., for t = 32.5, one has ages ≤32.5 with classes {yes, yes, yes} and the other >32.5 with {no, no, no}). The 32.5 yields the highest gain ratio in this case due to perfect class separation, creating branches for age ≤32.5 (predicting ) and age >32.5 (predicting no). This example demonstrates how evaluation leads to an effective binary split that maximizes the gain ratio. C4.5 addresses edge cases in continuous attribute handling to prevent invalid splits. If all values of the attribute are identical, no distinct consecutive values exist for midpoint thresholds, rendering the attribute unsuitable for splitting and excluding it from consideration. Similarly, if the has fewer than two distinct values or if all potential splits result in zero (e.g., no change in across subsets), the algorithm halts splitting on that attribute to avoid degenerate trees. These checks ensure computational efficiency and logical tree growth.

Managing Missing Values

C4.5 employs a probabilistic allocation strategy to manage values during both tree construction and , enabling the use of incomplete without resorting to complete-case deletion. This approach fractionally distributes the weight of an instance with a attribute value across all possible branches of that attribute, proportional to the frequencies observed in the with known values. As described by Quinlan, each instance starts with a weight of 1; for a value on attribute A, the weight is split such that the fraction allocated to each outcome of A equals the proportion of instances exhibiting that outcome among those with non- values for A. This weighted distribution preserves the total instance weight while contributing to and split information calculations for gain ratio evaluation. During attribute selection, the gain ratio for a candidate attribute incorporates these fractional weights from instances, effectively treating them as partial contributors to each potential child rather than excluding them entirely. For instance, consider a training set of 100 instances where 20 have values on attribute A, which has two outcomes: "" observed in 50 of the 80 known cases (62.5%) and "no" in the remaining 30 (37.5%). The 20 instances contribute a weight of 12.5 to the "" branch and 7.5 to the "no" branch. This allocation ensures the gain ratio computation uses the full dataset's information, avoiding underestimation of splits due to . By integrating instances proportionally, C4.5 maintains more accurate estimates of class distributions and information gain, particularly beneficial when missingness rates are moderate, such as 20%. In the classification phase, a test instance with a missing value at an internal defined by attribute A is propagated down all branches from that , with each path receiving a weight equal to the branch's probability from the training data. The final prediction aggregates the class probabilities at the leaves reached via these paths, using a weighted average based on the branch weights—typically resulting in a probabilistic output rather than a hard . This mirrors the training-time , ensuring consistency between model building and application. For the earlier example, a test instance missing A would receive 62.5% weight toward the "yes" subtree and 37.5% toward the "no" subtree, with leaf predictions combined accordingly. Compared to deletion methods, which discard instances with any missing values and can reduce effective sample size by up to 20% or more in sparse datasets, C4.5's retains full data volume for splitting decisions, leading to more reliable gain ratio estimates and potentially lower variance in . Empirical evaluations confirm this advantage, as deletion often biases toward attributes with fewer values, whereas fractional weighting promotes equitable consideration across features.

Tree Construction Pseudocode

The tree construction in C4.5 follows a recursive divide-and-conquer strategy to induce a from a training S, where each instance consists of attribute values and a label. The process begins at the representing the entire and proceeds by selecting the best attribute for splitting at each non-leaf , partitioning S into subsets based on attribute values, and recursing on those subsets until stopping criteria are met. This builds an unpruned tree that captures the training data's structure, with attribute selection guided by gain ratio to prioritize splits that maximize relative to split information. The core procedure integrates handling for discrete and continuous attributes: for discrete attributes, branches correspond to each possible value; for continuous attributes, a threshold is chosen to binarize the attribute into "less than or equal to " and "greater than " branches. Missing values are managed during selection and splitting by probabilistically distributing instances across branches based on observed frequencies in the , ensuring the tree remains robust to incomplete data. The recursion terminates based on predefined rules to prevent trivial or overly specific nodes.

Pseudocode

The following pseudocode outlines the recursive tree induction function in C4.5, adapted from the original implementation details:
function TreeInduce(S: training set, Attributes: set of available attributes) returns Node:
    if |S| = 0:  // empty set
        return new Leaf("failure")  // or default class
    if all instances in S have the same class C:
        return new Leaf(C)
    if Attributes is empty or |S| < mincases:  // default mincases = 2
        return new Leaf(majority class in S)
    
    // Select best attribute using gain ratio
    A ← SelectBestAttribute(S, Attributes)  // via gain ratio, handling continuous/missing as needed
    
    // Create decision node
    Node ← new DecisionNode(A)
    
    // Partition S based on A (discrete values or continuous threshold; distribute missing)
    for each possible outcome v of test A on S:
        Sv ← {instances in S satisfying outcome v}
        Child ← TreeInduce(Sv, Attributes - {A})
        add branch from Node to Child labeled v
    
    return Node
This structure ensures the tree grows depth-first, with each call reducing the dataset size and attribute set. The SelectBestAttribute function invokes gain ratio computation (referenced briefly from attribute selection mechanics), while partitioning incorporates continuous thresholding and missing value allocation (as detailed in handling procedures). Stopping rules include: (1) an empty , yielding a failure leaf; (2) all instances sharing the same , forming a pure ; (3) no remaining attributes or fewer than the minimum number of cases per (typically 2), resulting in a with the predominant in the . These criteria balance tree completeness against risks during induction. To illustrate, consider a toy dataset adapted from the classic "play tennis" problem with 14 instances, attributes (Outlook: sunny/overcast/; Temperature: hot/mild/cool; : high/; : weak/), and (Play: /no). At the root, gain ratio selects (highest value ≈0.156). Subsets form: sunny (5 instances, 2 /3 no), (4 /0 no → pure leaf ""), (5 instances, 3 /2 no). Recursing on sunny selects (gain ratio ≈0.152), splitting into high (0 /3 no → leaf "no") and (3 /0 no → leaf ""). On , is selected (gain ratio ≈0.049), yielding weak (3 /0 no → "") and (0 /2 no → "no"). The leaf stops immediately due to purity. This constructs a tree of depth 2 with 5 leaves, accurately classifying all training instances before .

Pruning and Post-Processing

Error-Based Pruning

C4.5's error-based is a post-processing step applied to the fully grown to mitigate by simplifying its structure while minimizing estimated . The process operates bottom-up, commencing from the terminal nodes and ascending toward the root. At each internal node, the algorithm evaluates whether replacing the entire subtree rooted there with a single —assigned the label among the cases reaching that node—would decrease the predicted error. If so, the replacement is made, effectively collapsing the subtree into a . This iterative simplification continues until no further reductions in predicted error are possible. Central to this method is a pessimistic of error rates, which adjusts the observed training errors upward to account for the bias toward lower errors on the training data itself. For a leaf node that covers N training cases with E observed errors, the estimated error rate is the upper confidence limit U_{\text{CF}}(E, N) derived from the at a user-specified confidence factor CF (default 25%). This limit incorporates a of 0.5 to improve the approximation, effectively treating the error count as E + 0.5 in the confidence interval calculation. The predicted number of errors for the leaf is then N \times U_{\text{CF}}(E, N). For a subtree, the total predicted error is the sum of the predicted errors across all its descendant leaves, weighted by the number of cases they cover. The pruning decision hinges on comparing this pessimistic upper bound for the subtree's error against the predicted error if replaced by a leaf. Specifically, pruning occurs if the upper bound of errors in the subtree is greater than or equal to the predicted error for the majority-class leaf. This conservative criterion ensures that only simplifications likely to improve performance on unseen data are adopted, as the upper bound simulates a worst-case scenario informed by statistical confidence. C4.5 performs this error-based using the training data directly, with the pessimistic adjustment serving to debias estimates without requiring a held-out validation set; however, alternative methods like reduced-error may employ a separate set (typically 25% of the data) for direct validation, though this is not the default in C4.5. To illustrate, consider an unpruned subtree with three leaves: the first covering 6 cases with 0 errors (U_{25\%}(0,6) \approx 0.206), the second covering 9 cases with 0 errors (U_{25\%}(0,9) \approx 0.143), and the third covering 1 case with 0 errors (U_{25\%}(0,1) = 0.750). The predicted errors for the subtree sum to $6 \times 0.206 + 9 \times 0.143 + 1 \times 0.750 \approx 3.273. Replacing the subtree with a leaf covering all 16 cases, where the majority class incurs 1 observed error, yields a predicted error of $16 \times U_{25\%}(1,16) \approx 2.512. Since 2.512 < 3.273, the subtree is pruned to the leaf.

Rule Extraction from Trees

In C4.5, rule extraction begins with the pruned decision tree as input, where each path from the root to a is traversed to generate a conjunctive : the sequence of attribute tests along the path forms the antecedent (a of conditions), and the class label at the serves as the consequent. This process yields one per , providing an alternative, more linear representation of the learned classifier that can be easier to interpret and apply than the itself. Rule simplification follows to enhance comprehensibility and accuracy by removing redundant tests—such as conditions that do not alter the class probability distribution in the subtree—and pruning rules with low coverage or those that fail to improve predictive performance on the training data. This step generalizes the rules, potentially merging similar paths and reducing the total number of conditions, while using pessimistic error estimates to avoid overfitting. The simplified rules are then collected into a rule set, sorted by confidence (measured as classification accuracy on the training data), and used for prediction by selecting the highest-ranked rule whose antecedent matches the instance; if no rule applies, classification defaults to the majority class from the training set. Overlaps between rules, which can arise after simplification, are handled through this ordering, with the first matching rule determining the prediction. For illustration, consider a simplified pruned on the classic weather dataset, where the target is to predict whether to play (yes/no) based on attributes like , , , and . Three extracted and simplified rules might be: These rules demonstrate conjunctive antecedents leading to predictions, with the full set ordered by accuracy and defaulting to "yes" (the ) for unmatched cases.

Implementations and Extensions

Software Implementations

The original implementation of the C4.5 algorithm was developed by J. Ross Quinlan and released in the 1990s as a command-line tool written in for Unix environments. This system processes input data in a simple textual format akin to ARFF, with attributes specified in a header followed by case records, and generates output as decision trees or production rules in human-readable text files. A prominent open-source port is , a reimplementation of C4.5 integrated into the Environment for Knowledge Analysis () toolkit, which has been available since the late 1990s. J48 replicates key C4.5 features such as gain ratio for attribute selection, handling of continuous attributes via thresholding, and post-pruning using a threshold parameter (default value of 0.25 for estimating error rates). It supports diverse input formats including ARFF, , and , and offers graphical visualization of decision trees alongside textual rule extraction for interpretability. In , the library's DecisionTreeClassifier provides C4.5-like functionality through its entropy criterion (approximating information gain), though full gain ratio splitting requires custom extensions via the splitter interface; it does not natively implement all C4.5 specifics like rule post-pruning. This class handles input from DataFrames or arrays (often loaded from ), and includes export_graphviz for visualization. For , the C50 package offers an approximation via its implementation of the closely related C5.0 algorithm, supporting and rule-based models with boosting options, while the partykit package's ctree function uses statistical tests that can emulate C4.5-style splits on datasets in data.frame format. Commercial software integrates C4.5 as a foundational method; for instance, Enterprise Miner includes a node supporting the C4.5 algorithm alongside others like and CHAID, enabling interactive splitting and on enterprise datasets in or database formats. similarly incorporates C5.0-based nodes for classification modeling, processing inputs from , database connections, or Excel, with outputs including tree diagrams and rule sets. These tools often benchmark C4.5 performance on standard UCI datasets like (for multiclass classification) or Wine (for attribute handling), where J48 in , for example, achieves accuracies around 95-98% on with default parameters.

Evolution to C5.0 and See5

C5.0, developed by J. Ross Quinlan as a successor to the C4.5 algorithm, was released in the late and early , with See5 serving as its Windows implementation counterpart. These tools are now maintained and distributed by RuleQuest Research, Quinlan's company. Key enhancements in C5.0 include adaptive boosting to create ensemble trees similar to , which generates multiple classifiers for voting and typically reduces test error by about 25% on average. enables attribute subset selection by estimating feature importance and pre-selecting relevant attributes, such as reducing 22 attributes to 8 in benchmark examples. Additionally, softer thresholds for continuous attribute splits use probabilistic blending of outcomes within narrow bounds around the split point, improving robustness to noisy data. Pruning in C5.0 employs an improved error-based approach with a cost-complexity model that incorporates Laplace error estimates for confidence intervals, allowing finer control over simplification. It also supports misclassification costs, enabling handling of unbalanced datasets by assigning penalties to different error types during and prediction. For rule sets, C5.0 uses a separate-and-conquer in rule learning, iteratively covering positive examples with followed by cycles to remove redundant conditions, resulting in more compact and interpretable rule bases compared to tree extraction alone. Relative to C4.5, C5.0 offers faster execution through techniques like adaptive boosting and sampling, along with higher accuracy on noisy data; for instance, it achieves 5-10% error reductions on UCI repository datasets in representative evaluations. C5.0 and See5 are available under commercial licensing from RuleQuest, with academic and GPL-licensed source code versions for non-commercial use, and they integrate into tools such as via compatible packages.

References

  1. [1]
    [PDF] C5.1.3 Decision Tree Discovery - Stanford AI Lab
    We describe the two most commonly used systems for induction of decision trees for classification: C4.5 and CART. We highlight the methods and differ-.
  2. [2]
    Building Classification Models: ID3 and C4.5 - Temple CIS
    ID3 and C4.5 are algorithms introduced by Quinlan for inducing Classification Models, also called Decision Trees, from data.
  3. [3]
    Induction of decision trees | Machine Learning
    This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail.
  4. [4]
    Classification and Regression Trees | Leo Breiman, Jerome ...
    Oct 19, 2017 · Classification and Regression Trees reflects these two sides, covering the use of trees as a data analysis method, and in a more mathematical ...<|control11|><|separator|>
  5. [5]
    Decision tree methods: applications for classification and prediction
    Apr 25, 2015 · Decision tree methodology is a commonly used data mining method for establishing classification systems based on multiple covariates or for developing ...
  6. [6]
    [PDF] Tree-based Methods
    Left column: linear model; Right column: tree-based model. 29 / 51. Page 30. Advantages and Disadvantages of Trees s Trees are very easy to explain to people.
  7. [7]
    C4.5 - ScienceDirect.com
    This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use.Missing: key | Show results with:key
  8. [8]
    [PDF] Improved Use of Continuous Attributes in C4.5 - Iowa State University
    The gain ratio of every possible test is determined and, among those with at least average gain, the split with maximum gain ratio is selected. In some ...<|control11|><|separator|>
  9. [9]
    None
    ### Summary of Gain Ratio Calculations for Weather Dataset in C4.5
  10. [10]
    Improved Use of Continuous Attributes in C4.5
    Mar 1, 1996 · A reported weakness of C4.5 in domains with continuous attributes is addressed by modifying the formation and evaluation of tests on continuous attributes.
  11. [11]
    C4.5 : programs for machine learning - Internet Archive
    Apr 25, 2022 · Publication date: 1993 ; Topics: Machine learning, Algorithms, Computer programming ; Publisher: San Mateo, Calif. : Morgan Kaufmann Publishers.Missing: official | Show results with:official
  12. [12]
    [PDF] Imputation of Missing Data Using Machine Learning Techniques
    session. C4.5 uses a probablistic approach to handle missing values in the training and test data. Any case from the training set is assigned a weight w~ of ...<|control11|><|separator|>
  13. [13]
    [PDF] Handling Missing Values when Applying Classification Models
    For distribution-based imputation we employ C4.5's method for classifying instances with missing values as described above. For value imputation we estimate ...
  14. [14]
    [PDF] A comparative study of decision tree ID3 and C4.5
    ID3 and C4.5 algorithms have been introduced by J.R Quinlan which produce reasonable decision trees. The objective of this paper is to present these algorithms ...
  15. [15]
    None
    ### Summary of Error-Based Pruning in C4.5
  16. [16]
    [PDF] Is Error-Based Pruning Redeemable?
    Error based pruning can be used to prune a decision tree and it does not require the use of validation data. It is implemented in the widely used C4.5 decision ...<|control11|><|separator|>
  17. [17]
    Pruning - SAS Help Center
    Dec 21, 2018 · C4.5 pruning (Quinlan 1993) evolved from pessimistic pruning to employ an even more pessimistic (that is, higher) estimate of the true error ...
  18. [18]
    [PDF] An Analysis of Reduced Error Pruning
    In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, ...
  19. [19]
    Generating Production Rules from Decision Trees - Semantic Scholar
    Rule extraction from neural networks via decision tree induction · Makoto Sato ... C4.5) to develop a predictive model on Android. Expand. Add to Library.
  20. [20]
    1.10. Decision Trees — scikit-learn 1.7.2 documentation
    J.R. Quinlan. C4. 5: programs for machine learning. Morgan Kaufmann, 1993. T. Hastie, R. Tibshirani and J. Friedman. Elements of Statistical Learning ...Missing: official | Show results with:official
  21. [21]
    DecisionTreeClassifier — scikit-learn 1.7.2 documentation
    The function to measure the quality of a split. Supported criteria are “gini” for the Gini impurity and “log_loss” and “entropy” both for the Shannon ...Missing: C4. ratio
  22. [22]
    C5.0: An Informal Tutorial - RuleQuest
    C5.0's job is to find how to predict a case's class from the values of the other attributes. C5.0 does this by constructing a classifier that makes this ...Missing: Quinlan | Show results with:Quinlan
  23. [23]
    Ross Quinlan's personal homepage - RuleQuest
    Ross Quinlan. Software available for download: FOIL Release 6: (shell archive). FFOIL Release 2: (shell archive). C4.5 Release 8: (gzipped tar file)Missing: original implementation
  24. [24]
    Information on See5/C5.0 - RuleQuest
    See5 (Windows 8/10/11) and its Linux counterpart C5.0 are sophisticated data mining tools for discovering patterns that delineate categories.Missing: algorithm Quinlan