Factorial experiment
A factorial experiment is an experimental design in statistics that simultaneously investigates the effects of two or more independent variables, known as factors, on a dependent variable or response, by testing all possible combinations of the factor levels to evaluate both individual main effects and their interactions.[1] This approach contrasts with one-factor-at-a-time methods by allowing researchers to efficiently study multiple variables and detect how factors influence each other, making it a cornerstone of modern experimental design in fields such as agriculture, engineering, psychology, and clinical trials.[2][3]
The origins of factorial experiments trace back to the work of British statistician Ronald A. Fisher in the 1920s and 1930s at the Rothamsted Experimental Station, where he developed these designs to optimize agricultural research by accounting for multiple soil and treatment variables.[4] Fisher's foundational book, The Design of Experiments (1935), formalized the factorial approach, emphasizing randomization, replication, and blocking to minimize bias and enhance inference validity.[4] This innovation dramatically improved the efficiency of experiments compared to earlier sequential testing methods, enabling broader generalizability of results.[5]
In a factorial experiment, each factor is varied at specified levels—typically two or more discrete values, such as low and high dosages in a drug trial—and the design is denoted by the number of levels per factor (e.g., a 2×3 design for two factors with 2 and 3 levels, respectively, yielding 6 treatment combinations).[6] Main effects assess the independent impact of each factor, while interactions reveal whether the effect of one factor depends on the level of another, which is crucial for understanding complex real-world phenomena.[1] Experiments often incorporate replication across runs and randomization of treatment order to control for extraneous variables and ensure statistical robustness.[3]
Factorial designs are classified as full factorial, which include all combinations for complete resolution of effects, or fractional factorial, which use a subset of combinations to reduce experimental costs while still estimating key effects, particularly useful when many factors are involved.[1] Analysis typically employs analysis of variance (ANOVA) to partition variance into components attributable to main effects and interactions, with significance tested via F-statistics.[2] These designs are highly efficient, often requiring fewer runs than separate one-factor experiments, and their orthogonality supports clear separation of effects for reliable conclusions.[1]
Fundamentals
Definition
A factorial experiment is a type of experimental design in which multiple independent variables, known as factors, are varied simultaneously across their specified levels to evaluate their individual (main) effects and combined (interaction) effects on a dependent variable, referred to as the response variable.[1][7] This approach allows researchers to systematically test all possible combinations of factor levels, providing a comprehensive assessment of how factors influence the response in isolation and together, which is essential for understanding complex systems in fields such as agriculture, engineering, and medicine.[8][2]
In a factorial experiment, each factor is an independent variable that can take on two or more discrete levels, such as low and high settings for temperature or dosage. The unique combinations of these levels across all factors are called runs or treatment combinations, and in a full factorial design, every possible run is executed to ensure complete crossing of factors.[1][9] For k factors with n_1, n_2, \dots, n_k levels respectively, the total number of runs required is the product n_1 \times n_2 \times \dots \times n_k; for instance, a two-level design with k factors yields $2^k runs.[3][10]
Unlike one-factor-at-a-time (OFAT) designs, which vary only one factor while holding others constant and thus require more runs to explore interactions, factorial experiments test multiple factors concurrently, enhancing efficiency by detecting both main effects and interactions within fewer total experiments.[1][11] This efficiency is particularly valuable in resource-limited settings, as it provides broader inferential power and reduces the risk of confounding interactions that OFAT approaches often miss.[12][13]
Factorial experiments are a form of controlled study, where researchers deliberately manipulate factor levels to observe causal relationships on the response variable, distinguishing them from observational studies that merely record associations without intervention.[14] The response variable is the measurable outcome of interest, such as yield in an agricultural trial or efficacy in a clinical test, which must be quantifiable to enable statistical analysis of effects.[1][3]
Notation
In factorial experiments, the design is commonly described using notation that specifies the number of levels for each factor. For symmetric designs where all factors have the same number of levels, the notation p^k indicates k factors each with p levels, resulting in p^k treatment combinations; for example, a $2^3 design involves three factors at two levels each, yielding eight combinations. Asymmetric designs, with varying levels across factors, are denoted by products such as $2 \times 3 \times 2, representing one factor at two levels, another at three, and a third at two, for a total of 12 combinations. This notation facilitates clear communication of the experiment's structure and scale.[15][16]
Factors are coded to simplify analysis and interpretation. For quantitative factors in two-level designs, levels are typically coded as -1 for the low setting and +1 for the high setting, centering the values around zero and scaling the range to 2 units; this coding is particularly useful for estimating effects via contrasts. Qualitative factors are often labeled with uppercase letters (e.g., A, B, C), with levels denoted by lowercase letters or descriptive terms (e.g., a_1 for the first level of A, or A_low and A_high); an alternative binary coding of 0 and 1 may be used for computational convenience in some software implementations. Treatment combinations are represented either as tuples specifying levels (e.g., (low, high, low) for three factors) or as multiplicative codes using the numeric values (e.g., -1 \times +1 \times -1); in Yates' standard order for two-level designs, combinations are listed systematically, such as (1) for all low levels, a for high A and low others, b for high B and low others, and ab for high A and B with low others.[15][2]
The response variable, representing the measured outcome, is denoted as Y, with subscripts indicating the specific treatment combination and replication; for a three-factor design, this is often Y_{ijk\ell}, where i, j, and k index the levels of factors A, B, and C, respectively, and \ell denotes the replicate within that combination. Regarding degrees of freedom, the total for the experiment equals the number of runs minus 1 (i.e., total df = N - 1, where N is the total observations). For a factor with l levels, the main effect degrees of freedom is l - 1; in two-level designs, this simplifies to 1 df per main effect and per interaction.[2][17]
Historical Development
Origins
Early multi-factor experiments that foreshadowed the development of factorial designs can be traced to the mid-19th century, when John Bennet Lawes and Joseph Henry Gilbert established the Rothamsted Experimental Station in 1843 to systematically study the effects of fertilizers on crop yields. Their pioneering long-term field trials, including the Broadbalk wheat experiment starting in 1843 and the Hoosfield barley experiment in 1852, examined interactions between soil conditions and various nutrient applications, such as mineral salts with and without nitrogen, across multiple plots under continuous cropping and rotation systems. These efforts demonstrated the essential role of combined nutrients like nitrogen, phosphorus, and potassium in plant growth, establishing foundational evidence for multi-factor influences on agricultural productivity.[18][19]
This work addressed limitations in earlier classical approaches, exemplified by Justus von Liebig's 1840 mineral theory, which focused on individual essential elements (nitrogen, phosphorus, and potassium) as limiting factors without fully accounting for their interactions. Lawes and Gilbert's multi-treatment designs revealed inefficiencies in such single-factor perspectives by showing, for instance, that nitrogen responses depended on phosphorus availability and that legumes could fix atmospheric nitrogen, countering Liebig's emphasis on soil-derived ammonia.[18][20][21]
By the early 20th century, agricultural research increasingly recognized the need for systematic multi-factor trials to capture real-world complexities beyond isolated variables, driven by expanding expertise in fields like genetics and statistics at institutions such as Rothamsted. This transition set the stage for Ronald A. Fisher's formalization of factorial methods during his tenure there from 1919 onward. In papers from 1921 to 1926, including "Statistical Methods for Research Workers" (1925) and "The Arrangement of Field Experiments" (1926), Fisher championed factorial designs as superior to single-factor tests, arguing they enabled efficient detection of interactions using fewer experimental units while controlling for variability in field conditions.[22][23][24]
The term "factorial design" first appeared in print in Fisher's seminal 1935 book, The Design of Experiments, which dedicated a chapter to its principles and application in scientific inquiry.[25]
Key Advancements
Frank Yates made significant contributions to factorial experimental design in the 1930s, particularly through his development of methods for confounding and fractional factorial designs, which addressed the challenges of implementing large-scale experiments with limited resources.[26] His work on partial confounding allowed researchers to estimate main effects while sacrificing some higher-order interactions, making factorial designs more feasible for practical applications in agriculture and industry.[27] Yates' seminal 1937 publication, The Design and Analysis of Factorial Experiments, provided systematic procedures for constructing and analyzing these designs, including tables for efficient confounding patterns.[26]
In the mid-20th century, factorial designs gained widespread adoption in industrial experimentation, notably through the integration with response surface methodology (RSM) introduced by George E.P. Box and K.B. Wilson in 1951. Their approach built on factorial structures to model quadratic responses and optimize processes, such as in chemical engineering, by using sequential experimentation to approximate optimal conditions.[28] This extension enabled the exploration of curved response surfaces beyond linear main effects, facilitating improvements in manufacturing efficiency.[29]
The computational era from the 1970s to 1980s marked a pivotal advancement with the integration of computers into factorial analysis, exemplified by the development of GENSTAT software at Rothamsted Experimental Station.[30] First released in 1968, GENSTAT automated the complex calculations for unbalanced and confounded factorial designs, reducing manual errors and enabling larger datasets in agricultural and biological research.[31] This software's capabilities for variance analysis and design generation democratized access to sophisticated factorial methods, supporting their broader application in experimental sciences.
Post-2000 developments have incorporated factorial designs into high-throughput screening and advanced optimization frameworks, enhancing their utility in genomics, materials science, and engineering.[32] In high-throughput contexts, fractional factorials efficiently identify key factors among many variables, as demonstrated in stem cell viability studies where they screened additive combinations rapidly.[33] Concurrently, Bayesian approaches to factorial designs, emerging prominently in the 2010s, have introduced probabilistic modeling to handle uncertainty and interactions more robustly, particularly in time-course experiments and industrial optimization.[34] These methods, often leveraging Markov chain Monte Carlo for posterior inference, allow adaptive designs that update based on accumulating data, improving efficiency over classical frequentist analyses.[35]
Benefits and Limitations
Advantages
Factorial experiments provide significant efficiency in estimating effects by allowing researchers to assess multiple factors and their interactions using fewer experimental runs compared to conducting separate one-factor-at-a-time (OFAT) studies for each factor.[8] This approach reduces the total number of trials required while maintaining or improving the precision of estimates for main effects and interactions.[36]
A key advantage is the ability to detect interaction effects, which reveal how the influence of one factor on the response variable depends on the levels of other factors—information that OFAT designs cannot capture, as they examine factors in isolation. By jointly varying all factors, factorial designs uncover synergistic or antagonistic relationships that might otherwise go unnoticed, leading to more accurate models of the underlying process.[37]
Factorial experiments promote an economy of experimentation through the reuse of data across multiple analyses; for instance, in a full 2^k design, each experimental run contributes information to the estimation of all k main effects and higher-order interactions, maximizing the informational yield per trial.[38] This resource efficiency is particularly valuable in resource-constrained settings, such as industrial or clinical research, where minimizing runs without sacrificing insight is essential.[36]
The designs also enhance robustness by producing results that are generalizable across the tested levels of all factors simultaneously, providing a broader understanding of the system's behavior than isolated tests.[39] Furthermore, factorial setups facilitate the inclusion of replicates within the design to estimate experimental error, improving the reliability of inferences about effects and interactions.
An illustrative example is an 1980s study at SKF, the Swedish bearing manufacturer, where statistician Christer Hellstrand designed a 2^3 factorial experiment on factors including inner ring heat treatment, outer ring osculation, and cage design. The experiment revealed that increasing outer ring osculation and inner ring heat treatment together extended bearing life fivefold, saving tens of millions of dollars.[40]
Disadvantages
One primary disadvantage of full factorial experiments is the exponential growth in the number of experimental runs required, which scales as s^k for k factors each at s levels, leading to substantial resource demands even for moderate numbers of factors.[41] For instance, a design with 5 factors at 3 levels each demands 243 runs, while one with 10 binary factors requires 1024 runs, often rendering such experiments impractical without extensive time and materials.[41] This escalation in scale quickly overwhelms budgets and timelines, particularly in industrial or laboratory settings where each run involves costly procedures or equipment.[42]
While full factorial designs can estimate all effects, they often assume negligible higher-order interactions for practical interpretation (sparsity principle), but significant higher-order interactions can complicate analysis and may necessitate shifting to fractional designs for feasibility.[41] In unreplicated designs, where only one observation per treatment combination is collected to minimize runs, estimating experimental error becomes challenging without replicates, as higher-order interactions must be pooled as the error term, potentially confounding results if those interactions are not truly negligible.[41] Additionally, the overall cost and time demands intensify with increasing factors, as blocking may be required to control for extraneous variables, further complicating execution without adequate resources.[43]
To mitigate these limitations, researchers often employ fractional factorial designs or screening experiments, which reduce the number of runs—such as using a $2^{k-p} fraction to estimate main effects and low-order interactions—while preserving essential information at a fraction of the cost.[41] These approaches balance the trade-off between comprehensive interaction detection and practical constraints, allowing initial screening of vital factors before committing to fuller designs if needed.[42]
Design and Implementation
Basic Factorial Designs
Basic factorial designs, also known as full factorial designs, involve testing all possible combinations of levels for a set of factors, providing a complete examination of main effects and interactions in experimental settings.[44] These designs are particularly useful for two-level factors, resulting in $2^k experimental runs for k factors, and form the foundation of design of experiments (DOE) methodologies.[45] Developed in the early 20th century by statisticians like R. A. Fisher, they emphasize systematic construction to ensure orthogonality and efficiency in data collection.[46]
The construction of basic $2^k factorial designs proceeds recursively, starting from simpler designs and building upward. For instance, a $2^1 design consists of two runs: one at the low level and one at the high level of the single factor. To create a $2^2 design, duplicate the $2^1 runs and flip the levels of the new second factor in the duplicated set, yielding four unique combinations.[44] This process extends to higher k; for a $2^3 design, duplicate the $2^2 design and flip the levels of the third factor in the copy, producing eight runs that cover all combinations.[44] Such recursive duplication ensures the design remains balanced and allows for the estimation of all effects without aliasing in full factorials.[45]
Replication is incorporated by repeating each treatment combination multiple times, which provides degrees of freedom for estimating experimental error and improves precision. A minimum of two replicates per condition is typically required for basic variance estimation, though more may be used depending on the desired power.[44] For example, in a $2^2 design with two replicates, the total number of runs becomes eight, allowing separation of systematic effects from random variation.[45]
Randomization plays a crucial role in basic factorial designs by assigning the run order randomly to experimental units, thereby minimizing systematic biases from uncontrolled factors like time trends or environmental changes.[46] This practice, integral to valid inference, ensures that observed differences are attributable to the factors under study rather than extraneous influences.[44]
Blocking is employed to group runs into subsets that control for known nuisance factors, such as operator shifts or equipment batches, enhancing the design's robustness without increasing the number of full factorials.[45] For instance, runs might be divided into blocks corresponding to day and night operations, with randomization applied within each block to further reduce variability.[44]
A simple example is a $2 \times 2 factorial design examining the effects of temperature (factor A: low, high) and pressure (factor B: low, high) on a chemical reaction yield. The four treatment combinations, or runs, are:
These runs would be randomized in order and potentially replicated or blocked as needed.[44][45]
Advanced and Fractional Designs
Fractional factorial designs extend basic two-level factorial experiments to larger numbers of factors by selecting a fraction of the full $2^k runs, denoted as $2^{k-p} where p is the number of factors "fractionated" out, allowing estimation of main effects and low-order interactions with reduced experimental effort. For instance, a $2^{4-1} design requires only 8 runs to study 4 factors, equivalent to a half-replicate of a $2^3 full factorial, where one factor is defined as the product of the others (e.g., D = ABC).
The confounding structure in these designs arises from the defining relation, which specifies aliases between effects; in the $2^{4-1} design with generator I = ABCD, two-factor interactions are aliased pairwise, such as AB = CD, meaning the estimate for AB includes any effect of CD. Resolution classifies these designs by the length of the shortest word in the defining relation, with higher resolution minimizing aliasing—Resolution IV designs, for example, ensure main effects are unaliased with two-factor interactions and two-factor interactions are unaliased among themselves, making them preferable for detecting interactions.
For screening many factors at two levels, Plackett-Burman designs provide efficient asymmetric alternatives, approximating a $2^k structure but using N runs where N is a multiple of 4 (e.g., 12 runs for up to 11 factors), focusing on main effects while assuming higher-order interactions are negligible. These non-regular designs are particularly useful in early-stage experimentation to identify dominant factors with minimal resources.
Modern extensions include Taguchi methods, which employ orthogonal arrays—special fractional factorials—to enhance design robustness against noise, optimizing both mean response and variability for quality engineering applications. Post-1990s advancements in computer-generated designs use algorithmic optimization criteria, such as D-optimality (maximizing determinant of the information matrix for parameter estimation), to create custom fractional factorials tailored to specific models and constraints beyond classical catalogs.
These advanced designs address gaps in high-dimensional settings, such as biopharmaceutical production, where fractional factorials optimize cell culture processes in mammalian cells like Chinese hamster ovary (CHO) cells to enhance recombinant protein yield amid numerous variables.[47] In AI experiments during the 2020s, they facilitate hyperparameter tuning and model robustness testing in machine learning pipelines with vast factor spaces.[48]
Effects and Interactions
Main Effects and Contrasts
In factorial experiments, a main effect represents the average difference in the response variable associated with a change in the level of a single factor, while averaging over the levels of all other factors. For a two-level factor, this is typically computed as the difference between the average response at the high level and the average response at the low level. This decomposition allows researchers to isolate the independent contribution of each factor to the overall variation in the response.[49][50]
Main effects are often estimated using contrasts, which are linear combinations of the response values designed to isolate specific effects. A contrast C takes the general form C = \sum c_i y_i, where y_i are the observed responses, and c_i are coefficients that sum to zero, typically balanced with equal numbers of positive and negative values (e.g., +1 and -1 for two-level designs). These contrasts provide an orthogonal basis for partitioning the total variation in the data, enabling precise estimation of effects without overlap in balanced designs.[51][52]
For a two-level factorial design with k factors, the main effect of factor A, denoted Effect_A, is estimated as
\text{Effect}_A = \frac{1}{2^{k-1}} \left( \sum_{\text{high A}} y_i - \sum_{\text{low A}} y_i \right),
where the sums are over the responses at the high and low levels of A, respectively, and each level appears in $2^{k-1} treatment combinations. This formula arises from the contrast coefficients, which assign +1 to high levels and -1 to low levels of A, with the scaling factor ensuring the estimate reflects the average effect per unit change. In replicated designs, averages replace the sums accordingly.[51][53]
Each main effect for a factor with l levels has l - 1 degrees of freedom, corresponding to the number of independent contrasts needed to describe the variation across levels. For two-level factors, this simplifies to 1 degree of freedom per main effect.[54][55]
In balanced factorial designs, the contrasts for different main effects (and interactions) are orthogonal, meaning their coefficients satisfy \sum c_{j} c_{m} = 0 for distinct effects j and m. This orthogonality ensures that estimates of different effects are uncorrelated, facilitating efficient partitioning of variance and simplifying statistical analysis.[56][57]
Interaction Effects
In factorial experiments, interaction effects capture the non-additive joint influence of multiple factors on the response variable, where the magnitude or direction of one factor's effect varies depending on the level of another factor.[58][59] This departs from the assumption of additivity in main effects, highlighting synergies or antagonisms among factors that cannot be predicted from individual effects alone.[60]
For a two-way interaction AB in a 2×2 factorial design, the effect is quantified using the contrast formula:
AB = \frac{1}{2} \left[ (\bar{y}_{A\text{ high},B\text{ high}} + \bar{y}_{A\text{ low},B\text{ low}}) - (\bar{y}_{A\text{ high},B\text{ low}} + \bar{y}_{A\text{ low},B\text{ high}}) \right],
where \bar{y} represents the mean response at the indicated factor levels. This measure indicates how the difference in responses between levels of A changes across levels of B (or vice versa).
Higher-order interactions, such as the three-way interaction ABC, extend this concept by measuring deviations from the additivity of main effects and two-way interactions; specifically, they occur when a two-way interaction (e.g., AB) differs across levels of the third factor C.[61] In general, for any interaction involving a subset of factors in a $2^k design with coded levels \pm 1, the effect is the average of the response values weighted by the product of the coded levels for those factors across all runs.[62] Main effects serve as the foundational building blocks upon which these interactions are assessed.[63]
A conceptual example appears in agricultural studies of crop yield, where temperature and humidity interact such that high temperatures diminish yields more under low humidity (dry conditions) than under high humidity, due to exacerbated stress on plant physiology like pollen viability and silk emergence in maize.[64] Interaction plots visualize this by plotting mean yields for temperature levels at each humidity level, revealing non-parallel lines that confirm the dependency; parallel lines would indicate no interaction.[65]
Analysis Methods
Statistical Procedures
The statistical analysis of factorial experiments is grounded in a linear model framework, expressed as Y_{ijk\dots} = \mu + \sum \tau_p + \sum \gamma_{pq} + \sum \delta_{pqr} + \dots + \epsilon, where Y is the response, \mu is the grand mean, \tau_p denotes main effects for factor p, \gamma_{pq} represents two-way interactions, higher-order terms like \delta_{pqr} capture further interactions, and \epsilon is the random error. Factors are typically coded as orthogonal variables, such as -1 and +1 for two-level designs, which simplifies estimation of effects and ensures interpretability in terms of deviations from the mean. This full model allows for the decomposition of variance attributable to each component, assuming additivity unless interactions are present.
Analysis of variance (ANOVA) serves as the cornerstone for inference in factorial designs, partitioning the total sum of squares (SST) into sums of squares for main effects (e.g., SSA, SSB), interaction effects (e.g., SSAB), and residual error (SSE), such that SST = SSA + SSB + SSAB + \dots + SSE. Degrees of freedom are similarly allocated, and significance is assessed via F-tests, computing F = \frac{MS_{effect}}{MS_E}, where mean square for an effect is its sum of squares divided by its degrees of freedom, and MS_E is the error mean square. These tests evaluate whether effects or interactions explain a significant portion of the variability beyond chance, with p-values derived from the F-distribution under the null hypothesis of no effect. ANOVA assumes balanced designs for optimal power but can accommodate imbalances through Type II or III sums of squares.
As an alternative to traditional ANOVA, multiple linear regression can fit the factorial model directly, specified as Y = \beta_0 + \sum \beta_p X_p + \sum \beta_{pq} X_p X_q + \sum \beta_{pqr} X_p X_q X_r + \dots + \epsilon, with predictors X coded as \pm 1 to yield coefficients \beta that correspond to half the effect sizes for main effects and interactions. This regression approach yields identical F-tests and estimates to ANOVA for balanced designs but offers flexibility for unbalanced data, continuous covariates, or hypothesis testing via t-tests on coefficients. It is particularly advantageous in software environments that emphasize regression-based workflows.[36]
These procedures rest on key assumptions: the errors \epsilon are normally distributed with mean zero, observations are independent, and variances are homoscedastic across all factor combinations. Diagnostics typically involve plotting residuals—such as Q-Q plots to check normality, scatterplots of residuals versus fitted values for homoscedasticity, and tests like Levene's for variance equality or Durbin-Watson for autocorrelation—to identify violations, which may require data transformations (e.g., log) or non-parametric alternatives if severe.[66]
Contemporary implementations leverage statistical software for efficient computation; R's aov() function fits full factorial ANOVA models, handling crossed factors and providing summary tables with F-statistics, while Python's statsmodels library offers anova_lm() for regression-based ANOVA on factorial data, supporting both balanced and unbalanced cases. Post-2020 developments have incorporated machine learning extensions, such as Gaussian process regression within factorial frameworks, to address non-linear effects and improve predictions in complex experimental settings.[67][68][69]
Practical Analysis Example
A practical example of analyzing a $2^3 full factorial design involves a filtration process in a chemical pilot plant, where the goal is to maximize the filtration rate (measured in gallons per hour). The three factors are A: pressure (low: 10 psig, high: 15 psig), B: concentration of formaldehyde (low: 2%, high: 4%), and C: temperature (low: 24°C, high: 35°C), with the stirring rate held constant at its low level (15 rpm) to focus on these variables. This design requires 8 experimental runs, conducted in standard order, yielding the following response data:[70]
| Run | A (Pressure) | B (Concentration) | C (Temperature) | Filtration Rate (gal/h) |
|---|
| 1 | - | - | - | 45 |
| 2 | - | - | + | 71 |
| 3 | + | - | - | 48 |
| 4 | + | - | + | 65 |
| 5 | - | + | - | 68 |
| 6 | - | + | + | 60 |
| 7 | + | + | - | 80 |
| 8 | + | + | + | 65 |
The Yates algorithm provides a manual method to compute the effects by iteratively summing and differencing the responses in a tabular format. Starting with the responses in standard order, the first column lists the yields; subsequent columns alternate between sums of pairs and differences (high minus low), with the final column divided by $2^{k-1} = 4 (for k=3) to obtain the effects. Applying this yields the main effects: A = 3.5, B = 11, C = 5; two-factor interactions: AB = 5, AC = -4, BC = -16.5; and three-factor interaction: ABC = 0.5. These values indicate the change in filtration rate for moving each factor or interaction from low to high, averaged over the other factors. For unreplicated designs, while ANOVA can be performed assuming the highest-order interaction as error, it is common to use normal probability plots of effects to visually identify significant ones, as formal tests have limited power with df=1.
Analysis of variance (ANOVA) quantifies the significance of these effects, assuming the three-factor interaction (ABC) serves as the estimate of experimental error due to its small magnitude in unreplicated designs. The sums of squares (SS) for each effect are calculated as (contrast)^2 / 8, where the contrast is the difference in sums for the + and - levels of the effect column. With 7 degrees of freedom (df) total, the model assigns 1 df per effect, leaving 1 df for error. The ANOVA table is as follows (p-values approximate using F(1,1) distribution):
| Source | SS | df | MS | F | p-value |
|---|
| A | 24.5 | 1 | 24.5 | 49.0 | 0.090 |
| B | 242.0 | 1 | 242.0 | 484.0 | 0.030 |
| C | 50.0 | 1 | 50.0 | 100.0 | 0.064 |
| AB | 50.0 | 1 | 50.0 | 100.0 | 0.064 |
| AC | 32.0 | 1 | 32.0 | 64.0 | 0.080 |
| BC | 544.5 | 1 | 544.5 | 1089.0 | 0.019 |
| ABC | 0.5 | 1 | 0.5 | - | - |
| Error | 0.5 | 1 | 0.5 | - | - |
| Total | 943.5 | 7 | - | - | - |
Thus, factor B (concentration) and the BC interaction are significant at \alpha = 0.05, as their F-statistics exceed the critical value (approximately 161.4 for F_{1,1}). Other effects are not significant at this level but may be considered if pooling terms or using graphical methods.
Interpreting these results, the main effect plots show increasing trends for A, B, and C, with average responses at high levels of 64.5, 68.25, and 65.25 gal/h, respectively, versus 61, 57.25, and 60.25 gal/h at low levels. However, the interaction plots reveal non-parallel lines, confirming the significance of BC: for example, the BC plot shows that high concentration boosts the rate markedly at low temperature (difference of 12 gal/h) but less so at high temperature (difference of -8 gal/h), indicating antagonism. A Pareto chart of absolute effect sizes orders them as |BC| = 16.5 (dominant), |B| = 11, |C| = 5, |AB| = 5, |A| = 3.5, |AC| = 4, and |ABC| = 0.5, emphasizing that the concentration-temperature interaction drives most variation and should be prioritized for process optimization, along with the main effect of concentration. The optimal settings for maximizing filtration rate are high pressure (A+), high concentration (B+), and low temperature (C-), achieving 80 gal/h in run 7.[70]
In modern practice, statistical software like R facilitates this analysis, but for the full model on unreplicated data, aov provides effect estimates without p-values due to zero residual df. The following code snippet computes the model (effects can be extracted manually or via other packages like DoE.base for normal plots):
data <- data.frame(
Rate = c(45,71,48,65,68,60,80,65),
Pressure = factor(c(1,1,2,2,1,1,2,2)),
Concentration = factor(c(1,1,1,1,2,2,2,2)),
Temperature = factor(c(1,2,1,2,1,2,1,2))
)
model <- aov(Rate ~ Pressure * Concentration * Temperature, data = data)
summary(model)
data <- data.frame(
Rate = c(45,71,48,65,68,60,80,65),
Pressure = factor(c(1,1,2,2,1,1,2,2)),
Concentration = factor(c(1,1,1,1,2,2,2,2)),
Temperature = factor(c(1,2,1,2,1,2,1,2))
)
model <- aov(Rate ~ Pressure * Concentration * Temperature, data = data)
summary(model)
This aligns with manual calculations for effects but requires additional steps for significance assessment in unreplicated cases.[70]