Fact-checked by Grok 2 weeks ago

Multiple choice

Multiple choice, also known as multiple-choice questions (MCQs) or selected-response items, is an assessment format consisting of a stem—typically a question or incomplete statement—followed by a set of options, including one correct answer and several distractors designed to resemble plausible alternatives.^[1] This structure allows respondents to select the most accurate response from the provided choices, often limited to a single selection unless multiple responses are explicitly permitted.^[2] The format is prevalent in educational settings, standardized exams like the SAT or GRE, professional certifications, and surveys due to its scalability for large groups and automated scoring capabilities.^[3] The format originated in the early 20th century with Frederick J. Kelly's Kansas Silent Reading Test (1914–1915), which addressed limitations of subjective essay-based assessments through objective scoring to facilitate mass testing.^[4] Its adoption accelerated during World War I with the U.S. Army Alpha and Beta tests, screening over 1.7 million recruits and marking a shift toward standardized evaluation.^[5] By the mid-20th century, it had become integral to large-scale assessments, including the Scholastic Aptitude Test introduced in 1926.^[6] In contemporary education and beyond, multiple choice excels in objectively measuring factual recall, application, and analysis across diverse topics, while supporting rapid grading and immediate feedback, which enhances its utility for high-stakes testing and formative assessments.^[7] However, critics highlight drawbacks, including the potential for random guessing to inflate scores—mitigated somewhat by negative marking in some designs—and a tendency to prioritize lower-level cognition over creative problem-solving or critical thinking.^[8] Despite these limitations, the format remains a cornerstone of assessment strategies, often integrated with open-ended questions to balance breadth and depth in evaluating learner outcomes.^[9]

Definition and Basics

Terminology

A multiple-choice question (MCQ) is an assessment format in which test-takers select one or more correct responses from a predefined set of options.^[10] This structure allows for objective evaluation by limiting responses to provided alternatives, distinguishing it from open-ended questions.^[11] Key components of a multiple-choice question include the stem, which presents the core query or scenario; the key, representing the correct answer(s); and distractors, which are plausible but incorrect options designed to challenge the respondent.^[12] The stem typically poses a direct question or incomplete statement, while distractors mimic common misconceptions to test deeper understanding.^[3] Multiple-choice questions are categorized into single-select and multiple-select formats. In single-select questions, respondents choose exactly one correct option from the list.^[13] In contrast, multiple-select questions permit the selection of two or more correct options, often requiring identification of all applicable answers.^[3] The term "multiple-choice" derives from the English words "multiple," indicating more than one, and "choice," referring to selection, emphasizing the array of options available; it first appeared in print in 1914 in the context of educational testing.^[11] The abbreviation MCQ stands for "multiple-choice question" and has become standard in academic and professional literature.^[14]

Core Components

A multiple-choice item typically consists of a stem that presents the question or problem, followed by a set of 3 to 5 response options, including one correct answer known as the key and the remaining as distractors.^[14]^[15] This structure ensures the item tests specific knowledge or skills efficiently by requiring selection from a limited set of alternatives.^[3] Guidelines recommend limiting options to 3 to 5 per item, as fewer than 3—such as only 2—effectively reduces the format to a true/false question, diminishing its ability to discriminate among nuanced understandings.^[14]^[16] Conversely, more than 5 options increase cognitive load and test administration complexity without proportionally improving validity or reliability, based on meta-analyses of item performance.^[16] The stem must clearly pose a complete problem, incorporating all necessary context while avoiding extraneous details that could confuse respondents.^[14]^[15] Response options should be homogeneous in length, grammatical structure, and style to prevent unintended cues, such as identifying the correct answer by its uniqueness.^[3]^[14] Essential prerequisites include ensuring all options are plausible, drawing from common misconceptions to challenge knowledgeable respondents without obvious errors, and mutually exclusive, avoiding overlaps that could imply multiple correct choices.^[15]^[3] These elements, including the stem and distractors as outlined in the terminology section, form the foundational layout for effective multiple-choice design.^[14]

Historical Development

Origins

The multiple-choice format originated in educational testing as a means to efficiently assess large groups of students amid the expansion of public schooling in the early 20th century. In 1914, Frederick J. Kelly, then a professor at Kansas State Normal School (now Emporia State University), developed the first known multiple-choice test for the Kansas Silent Reading Test, which was published the following year.^[17] This innovation addressed the limitations of subjective essay grading by providing objective, scorable responses, allowing for standardized evaluation of reading comprehension skills in a growing student population. Kelly's approach marked a shift toward scalable assessment methods suitable for mass education.^[18] The format gained further traction during World War I, when the need to classify over 1.7 million U.S. Army recruits rapidly highlighted the inefficiencies of traditional essay-based and individual testing, which were too time-consuming for wartime demands. In response, psychologists led by Robert Yerkes, including Lewis Terman, developed the Army Alpha and Beta tests in 1917–1918; the Alpha version, administered to literate recruits, consisted primarily of true-false and multiple-choice items across eight subscales to measure verbal and numerical abilities for personnel assignment.^[19] Terman, a key contributor, adapted intelligence testing principles from his earlier 1916 Stanford-Binet revision to support these group-administered formats, emphasizing objective scoring to enable quick, large-scale evaluations without reliance on subjective interpretation.^[18] These military applications demonstrated the practicality of multiple-choice for high-stakes, volume-based assessment, influencing postwar educational practices. By the mid-1920s, multiple-choice testing achieved widespread adoption in civilian contexts through the efforts of the College Entrance Examination Board. In 1926, the Board introduced the Scholastic Aptitude Test (SAT), its first primarily multiple-choice exam, administered to over 8,000 high school students to gauge general intellectual aptitude for college admissions.^[20] This marked a pivotal expansion, as the format's efficiency facilitated standardized admissions amid rising postsecondary enrollment, building directly on the objective principles refined in earlier intelligence and military tests.^[18]

Modern Evolution

Following World War II, multiple-choice testing expanded significantly in standardized assessments to accommodate growing educational demands. The Graduate Record Examination (GRE), originally launched in 1936, evolved in the 1950s to support returning veterans applying to graduate programs, with increased administration and integration of multiple-choice formats to evaluate aptitude efficiently across large cohorts.^[21] Internationally, the UK's 11-plus exam, introduced in 1944 under the Education Act to select students for secondary schooling, was refined in the post-war decades to include standardized components such as arithmetic, English comprehension, and intelligence tests, aiming to reduce subjectivity in grammar school placements.^[22]^[23] Technological advancements in the late 20th century shifted multiple-choice testing from paper-based to digital formats, enhancing adaptability and scalability. A key milestone was the Graduate Management Admission Test (GMAT)'s transition to computer-adaptive testing (CAT) in 1997, where question difficulty adjusted in real-time based on responses, replacing fixed paper exams and improving precision in ability measurement.^[24] This CAT approach, building on earlier computerized pilots, became widespread in professional and academic assessments by the 2000s, allowing for shorter tests while maintaining reliability. Statistical methodologies also advanced, with Item Response Theory (IRT) incorporated into multiple-choice test design from the 1970s onward to calibrate item difficulty and discrimination more rigorously than classical test theory. IRT models the probability of a correct response as a function of latent ability, enabling equitable scoring across diverse test-takers; a foundational two-parameter logistic model, developed by Birnbaum, is given by:

P(\theta) = \frac{1}{1 + e^{-a(\theta - b)}}

where a represents item discrimination, b is item difficulty, and \theta is the examinee's ability.^[25]^[26] This framework gained prominence in standardized tests like the SAT, supporting adaptive algorithms and bias detection. By the 2020s, artificial intelligence further transformed multiple-choice assessments through automated question generation and hyper-personalized adaptation. Platforms like Khan Academy employed generative AI to optimize scoring via synthetic response data and facilitate explanatory dialogues post-question, boosting student understanding by up to 36% in geometry tasks as of 2025.^[27] Similarly, Duolingo integrated AI-driven adaptive algorithms for language tests, dynamically adjusting multiple-choice items like "Read and Select" based on performance to tailor difficulty in real-time.^[28]^[29] These innovations, up to 2025, emphasized efficiency and engagement in educational platforms.

Design and Format

Question Construction

Effective multiple-choice question construction requires careful attention to the stem and response options to promote clarity, fairness, and accurate assessment of knowledge.^[15] The stem, which poses the problem or question, and the response options, including the correct answer (key) and incorrect alternatives (distractors), form the core components of these items.^[2] For stem design, authors should use complete, self-contained sentences that clearly state the problem without relying on the options for full understanding, allowing test-takers to answer by covering the choices.^[15] Stems must be concise, avoiding irrelevant details, vague terms like "nearly all," or unnecessary negatives unless essential to the content, as these can introduce confusion or bias toward test-wise strategies.^[16] Positive phrasing and active voice enhance readability and focus on higher-order thinking, such as application rather than mere recall.^[3] Distractor creation involves developing plausible alternatives that reflect common misconceptions or partial knowledge, ensuring they are homogeneous in length, grammar, and detail to avoid unintended cues.^[15] Effective distractors should be attractive to uninformed test-takers but clearly incorrect upon analysis, without using extremes like "always" or "never" that could make them implausible.^[16] Options such as "all of the above" or "none of the above" should be avoided unless justified by the content, as they can undermine the assessment's diagnostic value and encourage guessing.^[2] To maintain balance across a test, the position of the correct answer should be randomized, with no predictable patterns (e.g., avoiding clustering keys in the first or last position), ensuring equitable difficulty and preventing exploitation by pattern recognition.^[3] A test blueprint can guide this by aligning questions to learning objectives and varying cognitive demands, typically limiting options to three or four for optimal discrimination without increasing random guessing.^[16] Common pitfalls in construction include the use of absolute words like "always" or "never" in options, which can make distractors too obvious; overlapping choices that blur distinctions; or grammatical inconsistencies between the stem and options that inadvertently signal the key.^[15] Unintended clues from stem phrasing, such as double negatives or cultural biases, can compromise fairness, while lengthy or convoluted stems increase cognitive load unnecessarily.^[2] Peer review and pilot testing help identify these issues, ensuring questions are unambiguous and equitable.^[3]

Response Options

Response options in multiple-choice questions, also known as alternatives, consist of one correct answer and several distractors designed to challenge test-takers while providing diagnostic value. Effective options enhance the item's validity by discriminating between knowledgeable and less knowledgeable respondents without introducing unintended cues or biases.^[16] To prevent test-takers from identifying the correct answer based on superficial characteristics, all options should be similar in length, grammatical structure, and style, employing parallel construction where feasible. For instance, if the correct answer is a noun phrase, distractors should follow the same format rather than mixing sentence fragments or varying verbosity. This approach minimizes the "longest option" bias, where respondents might favor more detailed choices assuming they convey greater accuracy.^[2]^[16] Distractors must be plausible to serve their purpose, attracting respondents who partially understand the material or hold common misconceptions, thereby revealing instructional gaps. They are best derived from actual student errors identified through pilot testing, think-aloud protocols, or expert consultations on typical pitfalls, rather than arbitrary inventions. For example, in a mathematics item, a distractor might reflect a frequent computational error observed in preliminary trials. This grounding in real responses ensures distractors function effectively without appearing obviously incorrect.^[30]^[31] Special options such as "none of the above" or "all of the above" should be used sparingly, primarily when they align with the learning objectives and do not encourage guessing over comprehension. These can be appropriate for assessing comprehensive understanding, like verifying if multiple statements are collectively true or false, but they risk inflating chance scores—e.g., "none of the above" as correct increases the effective guessing probability if distractors fail to attract errors. Studies recommend avoiding them in high-stakes assessments to prioritize content mastery over strategic elimination.^[3]^[32] Research indicates that three options—one correct and two distractors—represent the optimal quantity for most multiple-choice items, balancing reliability, development effort, and cognitive demands on test-takers. A meta-analysis of over 80 years of studies found that additional options beyond three often yield nonfunctional distractors that few select, failing to improve measurement while complicating item creation and increasing extraneous load. This configuration maintains discrimination power equivalent to four or five options but reduces the time needed for validation and response.^[33]

Variations and Examples

Standard Formats

Standard multiple-choice questions typically feature a clear stem presenting the problem or query, followed by four response options labeled A through D, one of which is the correct key and the others distractors designed to challenge test-takers plausibly.^[34] A classic single-select example is: "What is the capital of France? A) London B) Paris C) Berlin D) Madrid." Here, Paris serves as the key, while the distractors—capitals of nearby European countries—are relevant and appealing to those with partial knowledge, as they share geographic and cultural similarities that could mislead without direct recall of the fact.^[34] In the standard four-option template, the stem must pose a single, well-defined problem to ensure clarity and focus, avoiding extraneous details that could confuse respondents.^[34] The key should be placed neutrally across options to prevent patterns, with correct answers distributed roughly evenly (e.g., 25% for A, 25% for B, 25% for C, and 25% for D) across a set of questions, often favoring positions B or C to mimic natural distribution without predictability.^[34] Distractors must remain relevant to the content, drawing from common misconceptions or related facts to test understanding effectively rather than mere trivia.^[34] These formats appear frequently in quizzes across subjects, promoting quick assessment of foundational knowledge. For instance, in mathematics: "What is the result of 2 + 2? A) 3 B) 4 C) 5 D) 6," where B is the key and distractors represent off-by-one errors common in basic arithmetic. In history: "In what year did World War II begin? A) 1914 B) 1939 C) 1941 D) 1945," with B as the key (marking Germany's invasion of Poland) and distractors tied to related events like World War I's start, Pearl Harbor, and the war's end. Such examples, using the key and distractors as defined in core terminology, illustrate straightforward application without complexity.^[34]

Specialized Types

Multiple-select questions, also known as "select all that apply" formats, require test-takers to identify and choose all correct options from a list, rather than selecting a single answer. This variant is particularly useful in assessments aiming to evaluate comprehensive knowledge, such as in nursing exams where candidates must recognize multiple symptoms or interventions. For instance, a question might ask, "Which of the following are fruits? A) Apple B) Carrot C) Banana D) Tomato," expecting selections of A, C, and D. However, partial scoring in these questions can introduce risks, as incorrect selections may penalize otherwise accurate responses, leading to lower overall scores compared to single-select formats (e.g., average scores of 63.7% for multiple-answer vs. 76.5% for single-answer questions).^[35]^[35] Ranking or ordering questions adapt the multiple-choice structure by asking test-takers to arrange a set of options in a specified sequence, such as by priority, chronology, or relevance, thereby assessing relational understanding. These are common in subjects like history or management, where a prompt might require sequencing events, for example, arranging key milestones in the American Revolution from earliest to latest. Methodologies for analyzing responses in such questions often involve statistical tests like the likelihood ratio test to rank option popularity or validity, ensuring reliable evaluation in large-scale surveys or exams. This format enhances discrimination between response qualities but requires clear instructions to avoid ambiguity in partial credit assignment.^[36]^[37] Matching questions function as a multiple-choice variant when the response options are limited and presented in a paired format, where test-takers connect elements from one column (e.g., terms or concepts) to corresponding items in another (e.g., definitions or examples). This setup is efficient for testing associations without the redundancy of separate multiple-choice items, reducing local dependence issues where overlapping choices influence guessing probabilities. An example involves pairing historical figures with their achievements, such as matching "Abraham Lincoln" to "Emancipation Proclamation" from a list of eight options for five prompts. Extended matching formats expand this by including multiple vignettes with a shared pool of options, offering higher reliability (coefficient alpha of 0.90) than traditional multiple-choice in distinguishing proficient students.^[38]^[38]^[39] Hotspot or image-based questions represent a digital evolution of multiple-choice, where test-takers interact with visuals by clicking or marking specific areas (hotspots) to indicate answers, ideal for spatial or visual assessments like anatomy or geography. In an anatomy exam, for example, users might click regions of a diagram to identify muscle groups. These questions improve knowledge retention by engaging visual processing, as evidenced in an immunology workshop where hotspot exercises contributed to higher post-assessment quiz performance compared to pre-workshop results. They are particularly effective in computer-based testing, allowing precise scoring of targeted selections without textual options.^[40]^[41]

Benefits and Limitations

Advantages

Multiple-choice tests provide efficient grading processes, particularly when automated, which substantially reduces the time and resources needed compared to subjective formats like essays that demand extensive human review. This efficiency allows educators to assess large numbers of students promptly, enabling faster feedback and more frequent assessments without overwhelming administrative burdens. For instance, machine-scoring capabilities inherent to multiple-choice formats approximate the speed and consistency of objective evaluations, minimizing logistical challenges in high-volume testing scenarios. Additionally, as of 2025, AI tools can generate multiple-choice questions rapidly, further reducing preparation time while maintaining quality.^[42]^[43]^[44] A core strength lies in their objectivity, as multiple-choice items eliminate subjective interpretation by scorers, thereby reducing rater bias that can occur in open-ended responses. This feature makes them particularly suitable for large-scale standardized testing, where consistent application of criteria across diverse examinee groups is essential to ensure fairness and equity in evaluation. Research highlights how this objectivity supports reliable measurement of knowledge without the variability introduced by individual grader preferences or fatigue.^[45]^[9] Multiple-choice formats excel in content coverage, permitting the assessment of a wide array of topics within constrained time limits, which enhances the comprehensiveness of evaluations. By including numerous items—such as up to 100 questions in a two-hour session—they sample broader knowledge domains than formats limited by depth per question, thereby improving the validity of inferences about overall proficiency. This capability is especially valuable in curricula requiring verification of extensive factual recall or conceptual understanding across multiple standards.^[46] In terms of reliability, well-designed multiple-choice tests demonstrate high internal consistency and test-retest stability, often yielding Cronbach's alpha coefficients exceeding 0.8, which indicates strong measurement precision. Such reliability ensures that scores reflect true ability rather than random error, supporting dependable use in both formative and summative contexts. Educational studies confirm these metrics for professionally constructed items, underscoring their robustness for repeated administrations.^[47]^[48]

Disadvantages

One significant drawback of multiple-choice tests is the risk of guessing, which can inflate scores without genuine knowledge. In a standard four-option format, the probability of selecting the correct answer by random chance is 25%, potentially leading to unreliable assessments of true ability.^[49] Penalty scoring systems, which deduct points for incorrect answers (e.g., -0.25 points per wrong response), are commonly used to mitigate this by setting the expected value of guessing to zero or negative, thereby discouraging uninformed attempts. However, such penalties do not fully eliminate guessing and can disadvantage risk-averse test-takers, including women and high-ability students, who skip more questions to avoid losses, resulting in lower overall scores and reduced representation in top percentiles (e.g., a 60.1% male overrepresentation in the top 5% under penalty conditions).^[49] Multiple-choice formats often encourage surface-level learning and rote memorization over deeper conceptual understanding, aligning primarily with lower levels of Bloom's taxonomy such as remembering and understanding. This limitation arises because questions typically reward recognition of familiar information rather than synthesis or application, fostering passive study habits like cramming. A 2012 study in introductory biology courses compared multiple-choice-only exams to mixed formats (including constructed-response questions) and found that the former led to significantly lower engagement in active learning strategies (e.g., 3.20 vs. 3.87 active behaviors per student) and poorer performance on higher-order multiple-choice items (59.54% vs. 64.4% accuracy), indicating an obstacle to developing critical thinking skills.^[50] Cultural and linguistic biases in multiple-choice questions can further undermine fairness, particularly through distractors or stems that embed subtle cues favoring certain socioeconomic or ethnic backgrounds. For example, high-frequency words in easier SAT verbal items (e.g., related to "golf" or "oarsman") often carried cultural connotations that disadvantaged African American students compared to matched-ability white peers, while rarer, school-taught vocabulary in harder items did not show this gap—a pattern identified in analyses from the 1980s and 1990s. These biases contributed to lawsuits and reforms, including the removal of analogy sections from the SAT in 2005, as they were criticized for relying on context-poor, culturally loaded comparisons that exacerbated score disparities.^[51] Finally, multiple-choice tests are ill-suited for evaluating complex skills like creativity and writing, as their objective structure prioritizes selection over original production or justification of ideas. Scholarly reviews from the 2010s highlight that while multiple-choice items can target higher-order cognition with careful design, they inherently limit assessment of divergent thinking, articulation, and innovative problem-solving—domains better captured by open-ended formats. For instance, a 2020 analysis of language assessments noted that multiple-choice questions fail to probe deeper communicative abilities, such as nuanced expression or creative argumentation, often resulting in incomplete evaluations of student proficiency. With the rise of AI tools in 2024, multiple-choice tests have become vulnerable to automated solving, potentially undermining their reliability in detecting genuine knowledge as AI achieves high accuracy on such formats.^[52]^[53]

Usage in Assessment

Scoring Approaches

Multiple-choice questions are typically scored using one of several established methods designed to evaluate respondent accuracy while accounting for factors such as guessing and question complexity. The simplest approach is number-correct scoring, where the total score is the raw count of correctly answered items, assigning full credit (usually 1 point) for each correct response and zero for incorrect ones.^[54] This method is widely used in high-stakes educational assessments due to its straightforward computation and alignment with classical test theory, though it does not penalize guessing.^[55] To adjust for random guessing and provide a fairer measure of knowledge, formula scoring subtracts a penalty for incorrect answers based on the number of response options. The standard formula is S = R - \frac{W}{n-1}, where S is the adjusted score, R is the number of correct responses, W is the number of incorrect responses, and n is the total number of options per item (e.g., for a 4-option question, n=4, so each wrong answer deducts \frac{1}{3} point).^[56] This approach, originally proposed to estimate true ability by assuming uniform random guessing, has been shown to reduce score inflation from guessing while maintaining reliability in undergraduate medical exams.^[54] Unanswered items typically receive zero points, avoiding further penalties. In multiple-select formats, where respondents choose more than one correct option from a set, partial credit scoring allows nuanced evaluation by rewarding correct selections and penalizing errors proportionally. A common method awards +1 point for each correct choice selected and -0.25 points for each incorrect choice, scaled to the total possible score for the item (e.g., for 4 correct options out of 6, full credit requires all correct selections without extras).^[57] This rights-minus-wrongs variant promotes careful selection and has demonstrated improved validity in nursing assessments by distinguishing partial knowledge from complete errors, though it requires clear rubrics to ensure fairness.^[58] For computerized adaptive testing (CAT), scoring employs item response theory (IRT) to dynamically adjust item difficulty in real-time, estimating the respondent's ability parameter \theta (typically on a latent trait scale) after each response. Item scores contribute to updating \theta via maximum likelihood estimation, where the probability of a correct response is modeled as P(X_i=1|\theta) = \frac{1}{1 + e^{-a_i(\theta - b_i)}} in the 2-parameter logistic model (with a_i as discrimination and b_i as difficulty), selecting subsequent items that best fit the current \theta for precision.^[59] This method enhances efficiency in professional licensing exams, such as the GRE, by reducing test length while achieving comparable reliability to fixed-form tests.^[60]

Answer Revision Strategies

A persistent myth among test-takers advises against changing answers on multiple-choice tests, suggesting that initial instincts are usually correct and revisions lead to errors. This belief, often termed the "first instinct fallacy," has been empirically debunked through numerous studies demonstrating that answer changes typically result in net score improvements. For instance, in a seminal analysis of objective achievement test items, Mueller and Shwedel found that 58% of changes involved switching from wrong to right answers, compared to only 20% from right to wrong, yielding a positive net gain for most participants.^[61] Empirical evidence from broader reviews reinforces this pattern. A meta-analysis of 61 studies spanning decades revealed that answer-changing behavior is prevalent among students and generally enhances performance, with no consistent negative effects tied to demographic or test factors. Similarly, a 2007 investigation of medical students showed that changes from wrong to right occurred in 48% of cases, leading to an average score increase of 2.5%, while a referenced review indicated net gains in approximately 20-30% of overall changes across similar examinations. These findings highlight that while not every change succeeds, the majority contribute positively when driven by reasoned doubt rather than impulse.^[62]^[63] Test-takers' decisions to revise answers are influenced by several psychological and situational factors. Low confidence in an initial selection often prompts changes, as less-prepared students tend to second-guess more frequently, though this can yield benefits if revisions stem from reflection rather than anxiety. Time constraints play a role, with revisions more common toward the exam's end when initial answers have been reconsidered under pressure. Additionally, recognizing patterns across questions—such as recurring themes or clues in later items—can justify returning to flagged responses for informed adjustments.^[64] Effective revision strategies emphasize deliberate review over hasty alterations. Experts recommend flagging uncertain questions during the first pass and revisiting them only if subsequent items provide clarifying evidence, thereby avoiding random swaps that dilute accuracy. This approach, encapsulated as "change that answer when in doubt," aligns with study outcomes showing superior results from targeted revisions and has been shown to boost performance by encouraging metacognitive monitoring.^[63]

Applications and Impact

Educational Testing

Multiple-choice questions form a core component of K-12 educational assessments in the United States, particularly in state-mandated evaluations. The National Assessment of Educational Progress (NAEP), often called the Nation's Report Card, has utilized multiple-choice formats since its inception in 1969 to gauge student proficiency in subjects like reading, mathematics, and science across grades 4, 8, and 12.^[65]^[66] Similarly, tests aligned with the Common Core State Standards, such as those developed by the Partnership for Assessment of Readiness for College and Careers (PARCC) and Smarter Balanced, incorporate multiple-choice items alongside other formats to measure standards in English language arts and mathematics for grades 3 through 8 and high school.^[67]^[68] These assessments aim to provide consistent benchmarks for student achievement and school accountability, with millions of students participating annually. In higher education admissions, multiple-choice-based exams like the SAT and ACT play a pivotal role. The SAT, administered by the College Board, was taken by over 1.97 million high school seniors in the class of 2024, marking a transition to a fully digital format that year to enhance accessibility and efficiency.^[69] The ACT, meanwhile, saw approximately 1.37 million test-takers from the class of 2024, with both exams relying heavily on multiple-choice questions to evaluate college readiness in areas such as critical reading, mathematics, and science reasoning.^[70] Studies indicate that these scores correlate moderately with first-year college grade point average (GPA), typically in the range of r=0.3 to 0.5, underscoring their predictive value while highlighting limitations when used in isolation.^[71] Globally, multiple-choice questions are integral to high-stakes educational testing in various systems. In India, the Joint Entrance Examination (JEE) Main, a gateway to engineering programs at the Indian Institutes of Technology (IITs), attracts about 1.5 million candidates annually and features a format dominated by multiple-choice questions in physics, chemistry, and mathematics.^[72]^[73] China's Gaokao, reinstated in 1977 following educational reforms, serves as the primary college entrance exam for around 13 million students each year and includes substantial multiple-choice sections in mandatory subjects like Chinese, mathematics, and English, alongside electives.^[74]^[75] Despite their widespread use, multiple-choice assessments in educational testing face critiques regarding equity, particularly in the 2020s amid the shift to digital formats and lingering pandemic effects. Access gaps have widened for underserved students, with disparities in technology availability exacerbating achievement differences between socioeconomic groups, as evidenced by lower participation and performance rates among low-income and minority populations during the SAT's digital rollout.^[76]^[77] These issues highlight ongoing debates about how such tests may perpetuate inequities rather than solely measuring merit.

Professional and Research Contexts

In professional certifications, multiple-choice questions (MCQs) form a core component of high-stakes assessments designed to evaluate competency for licensure in fields like medicine and accounting. The United States Medical Licensing Examination (USMLE) Step 1, for instance, consists of 280 MCQs administered over seven one-hour blocks, assessing foundational biomedical knowledge for medical licensure.^[78] In 2024, the first-time pass rate for U.S. MD seniors on this exam was 91%, reflecting its rigorous standards and role in ensuring practitioner readiness.^[79] Similarly, the Certified Public Accountant (CPA) exam's core sections—Auditing and Attestation (AUD), Financial Accounting and Reporting (FAR), and Taxation and Regulation (REG)—include 78 MCQs for AUD, 50 MCQs for FAR, and 72 MCQs for REG, with MCQs comprising 50% of each section score and testing practical application of professional standards.^[80] These formats allow for efficient evaluation of broad knowledge domains while maintaining objectivity in credentialing processes. In market research, MCQs, particularly those using Likert scales, enable structured collection of public opinions through surveys, facilitating quantifiable insights into consumer and societal trends. The Gallup organization has employed such formats since the 1930s, with polls often featuring closed-ended questions like rating scales (e.g., "strongly agree" to "strongly disagree") to gauge attitudes on topics such as economic confidence or policy approval.^[81] For example, Gallup's ongoing presidential job approval surveys use Likert-style response options to track nuanced public sentiment, allowing for statistical analysis of shifts over time and informing business and policy decisions. This approach ensures high response rates and comparability across large samples, making MCQs indispensable for reliable opinion polling. Within psychometrics, MCQs support the validation and development of psychological assessment tools by providing scalable, standardized items for measuring traits and disorders. The Minnesota Multiphasic Personality Inventory (MMPI-2), a seminal instrument for clinical diagnosis, comprises 567 true/false MCQs that yield scores on 10 clinical scales and validity measures, aiding in the identification of psychopathology.^[82] Developed through empirical keying—where items are selected based on their correlation with criterion groups—the MMPI's format has been refined over decades to enhance reliability and cultural adaptability, influencing scale construction in personality research. As of 2025, advancements in AI-driven proctoring are transforming remote professional certifications by enhancing security in online formats. These systems use machine learning to monitor eye movement, facial recognition, and environmental anomalies, significantly mitigating cheating risks in distributed testing environments. This trend supports broader access to credentials while upholding integrity, with adoption rising amid hybrid work models.^[83]

References

[1]
Designing Assessment Questions - Poorvu Center - Yale University
Multiple choice questions are typically composed of (1) a question stem and (2) several choices, including distractors and one correct option. Prior research ...
[2]
Writing Multiple Choice Questions | Center for Teaching & Learning
Multiple choice questions—also known as fixed choice or selected response items—require students to identify right answers from among a set of possible options ...
[3]
Best Practices for Creating Multiple-Choice Questions
In this article, we will explore the multiple-choice format and share practical guidelines drawn from existing literature on how to develop high quality ...
[4]
History of multiple choice exams and its impact on education - Turnitin
Jul 13, 2022 · The origin of multiple choice testing, which aims to quantify knowledge, is widely attributed to Frederick J. Kelly, author of the Kansas Silent ...
[5]
The History of the Multiple-Choice Question - Veritas Journal
Apr 30, 2019 · The multiple-choice question was invented in the early 20th century, with Frederick J. Kelly creating the Kansas Silent Reading Test in 1914. ...
[6]
The Dark History of the Multiple-Choice Test - Edutopia
May 20, 2013 · Multiple-choice tests originated with the Army in WWI, linked to xenophobia and scientific racism, and the Army found no value in the results.
[7]
Davidson, A Short History of Standardised Tests
Frederic J. Kelly created the first multiple-choice test in 1914. French psychologists Binet and Smion also developed tests in 1904.
[8]
Should Teachers Use Multiple Choice Tests for Assessment?
Dec 14, 2023 · Yes, there are benefits to multiple-choice questions, including efficiency in grading and feedback, lack of bias and subjectivity, and versatility in assessing ...
[9]
Advantages and Disadvantages of Different Types of Test Questions
Disadvantages by Question Type Multiple-choice · Often tests “test taking” skills · Students can guess and get credit for things they don't know · Expose students ...
[10]
Multiple-Choice Tests: Revisiting the Pros and Cons - Faculty Focus
Feb 21, 2018 · On too many multiple-choice tests, the questions do nothing more than assess whether students have memorized certain facts and details.
[11]
Multiple Choice Questions : With Types and Examples - QuestionPro
Multiple choice questions provide multiple answer options, with single or multi-select. They include a stem, correct answer, and distractors. Types include ...
[12]
Definition of MULTIPLE-CHOICE
- **Definition**:
[13]
[PDF] Guidelines for Writing Multiple-Choice - Items - Algonquin College
Multiple-choice questions typically have three parts: a stem, the key (the correct answer) and several distractors. There are a number of ways to design a stem.
[14]
The Ultimate Guide to Crafting Multiple Choice Questions for Surveys
Sep 6, 2024 · Heads up! Multiple choice vs. multi-select · Multiple choice questions: These questions are accompanied by one or more answer options (A, B, C, D) ...Single-Select Questions · Multi-Select Questions · Best practices for creating...
[15]
Designing Multiple-Choice Questions | Centre for Teaching Excellence
A multiple-choice question (MCQ) is composed of two parts: a stem that identifies the question or problem, and a set of alternatives or possible answers ...
[16]
None
Summary of each segment:
[17]
A how‐to guide for developing high‐quality multiple‐choice questions
Multiple‐choice questions are commonly used for assessing learners' knowledge, as part of educational programs and scholarly endeavors.
[18]
Contemplation on marking scheme for Type X multiple choice ... - NIH
The tests using prototypes of multiple choice questions (MCQs) were first developed by Edward Thorndike, early in 20th century. · Evaluation system is the part ...Missing: abbreviation | Show results with:abbreviation
[19]
[PDF] A History of Educational Testing - Princeton University
The first educational test using the multiple-choice format was developed by Frederick J. Kelly in 1915. Since then, multiple choice has become the dominant.
[20]
Sage Reference - Army Alpha/Army Beta
Army Alpha was made up of 212 true–false and multiple-choice ... tested during World War I. The documented validity and utility of ...
[21]
Where Did The Test Come From? - The 1926 Sat | FRONTLINE - PBS
The first Scholastic Aptitude Test (SAT) was primarily multiple-choice and was administered on June 23, 1926 to 8,040 candidates - 60% of whom were male.
[22]
GRE Info – GRE History - Manhattan Review
In the 1950s, it played an important role in helping soldiers returning from WWII applying to graduate programs.
[23]
Eleven-plus | Comprehensive, Entrance & Selection - Britannica
It evolved after 1944 as a means of determining in which of the three types of secondary school—grammar, technical, or modern—a child should continue his ...
[24]
Secondary Modern - History Workshop
Jan 24, 2013 · Selective education 11 Plus et al grew up out of the 1944 Education Act which introduced the 11 Plus Exam, Secondary Grammar, Secondary ...
[25]
GMAT History Information - Manhattan Review
The most significant change of the mid-1990s was the adoption of computer-adaptive testing, which was introduced in 1997. This change allowed the test to be ...
[26]
(PDF) Item Response Theory - ResearchGate
In this chapter, we review most of those contributions, dividing them into sections by decades of publication. The history of IRT begins before the seminal ...
[27]
Item response theory for measurement validity - PMC - NIH
This paper introduces the basic premise, assumptions, and methods of IRT. To help explain these concepts we generate a hypothetical scale using three items ...Missing: seminal | Show results with:seminal
[28]
Exploring How to Improve Assessment with AI - Khan Academy Blog
Oct 27, 2025 · Learn how Khan Academy uses generative AI to improve assessments—enhancing student understanding, feedback, and scoring accuracy.Missing: multiple generation Duolingo 2020-2025
[29]
Duolingo English Exam Pattern 2025 - Edvoy
May 26, 2025 · The Adaptive test section determines your knowledge of the English language by analysing the answers you provide to the questions. It is ...
[30]
Which questions are adaptive on the Duolingo English Test?
Jul 31, 2025 · The following question types are adaptive: Read and Select · Fill in the Blanks · Read and Complete · Listen and Type · Interactive Reading ...Missing: AI multiple 2020-2025
[31]
Multiple-Choice Item Distractor Development Using Topic Modeling ...
Apr 25, 2019 · We describe and demonstrate a systematic method for creating plausible but incorrect options, also called distractors, based on students' misconceptions.
[32]
Writing Effective Multiple Choice Questions - Knowledge Base
Sep 30, 2020 · Use this guide to write multiple choice questions for both lower and higher level learning outcomes. Components of a Multiple Choice Question.
[33]
A Refresh on Writing Quality Multiple Choice Questions
Oct 5, 2023 · Multiple choice questions are one of the most popular assessment techniques in higher education because they have many advantages.
[34]
A how‐to guide for developing high‐quality multiple‐choice questions
Jan 24, 2023 · ... Three options are optimal for multiple-choice test items: a meta-analysis of 80 years of research. Educ Meas Issues Pract. 2005; 24(2): 3-13 ...
[35]
None
### Summary of Rules and Examples from the Document
[36]
[PDF] A Short Guide to Writing Effective Test Questions
Multiple choice questions are widely scorned as multiple guess questions. ... Here are two examples of multiple choice test items designed for higher order ...<|control11|><|separator|>
[37]
Comparing The Effectiveness Of Multiple-Answer And Single ...
Oct 21, 2024 · Multiple-choice questions (MCQs) are commonly used for assessments in education, providing a rapid and efficient means of evaluating students' ...
[38]
Ranking questions - Digital testing - Optimum Assessment
Ranking questions are questions that require the candidate to rank a number of options in order of importance, relevance or correctness.
[39]
[PDF] Ranking responses in multiple-choice questions
In this paper, methodologies for ranking responses are proposed. Keywords: single-choice question; multiple-choice question; survey; likelihood ratio test; Wald ...
[40]
Sage Research Methods - Matching Items
Matching is a test item type where test takers can demonstrate their ability to connect ideas, themes, statements, numbers, expressions, ...
[41]
The virtues of extended matching and uncued tests as alternatives to ...
The objectives of this study were to compare the reliability and validity of written test formats that are widely used in medical education (multiple choice ...
[42]
Importance of Hot Spot Questions in Teaching and Assessment
Feb 22, 2019 · Using image-based, or more commonly referred to as hot spot questions, in both teaching and assessment can help to improve retention of knowledge.Missing: computer- | Show results with:computer-<|control11|><|separator|>
[43]
Hot Spot-type Questions offered via ExamSoft to learn Immunology
This pairing exercise and use of Hot Spot type assessment enhanced learning of basic concepts of Immunology and was evident from post-workshop quiz grades. The ...
[44]
[PDF] Evaluation of the e-rater® Scoring Engine for the TOEFL ... - ERIC
Mar 2, 2012 · Automated scoring in general can provide value that approximates some advantages of multiple-choice scoring, including fast scoring, constant ...
[45]
[PDF] Comparing the Validity of Automated and Human Essay Scoring - ETS
This study sought to provide further evidence of the validity of automated, or computer-based, scores of complex performance assessments, such as direct tests ...
[46]
[PDF] Efficient Scoring of Multiple-Choice Tests - HAL
Jun 24, 2020 · They have several advantages like fast and easy scoring, wide sampling of the content and grading exempt from rater bias. A major drawback ...
[47]
[PDF] The Case for Using Multiple-Choice Test Items - ACT
13 Because multiple-choice tests can cover a broad range of content in a relatively short amount of time, potentially covering more standards, they typically ...
[48]
[PDF] An Instructor's Guide to Understanding Test Reliability
Cronbach's alpha ranges from 0 to 1.00, with values close to 1.00 indicating high consistency. Professionally developed high-stakes standardized tests should ...
[49]
Quality assessment of a multiple choice test through... - MedEdPublish
May 6, 2020 · Alpha coefficient was 0.898 considered good for a MCQs assesment. ANOVA results showed no significant differences between groups. Discussion and ...
[50]
[PDF] Hit or Miss? Test Taking Behavior in Multiple Choice Exams
Negative marking reduces guessing, thereby increasing accuracy. However, it reduces the expected score of the more risk averse, discriminating against them. Our ...
[51]
Multiple-Choice Exams: An Obstacle for Higher-Level Thinking ... - NIH
This study tested the effect of exam format on critical-thinking skills. Multiple-choice (MC) testing is common in introductory science courses.Missing: order | Show results with:order
[52]
The Bias Question - The Atlantic
Nov 1, 2003 · Nor did he think that the SAT changes to take effect in 2005, including the removal of the analogies, would be enough to eliminate the need for ...
[53]
[PDF] Analysis of Multiple-choice versus Open-ended Questions in ... - ERIC
Oct 20, 2020 · Such tests are widely used in language assessment as they are cheap, fast and easy to administer. However, they are also criticized since they ...
[54]
Comparison of formula and number-right scoring in undergraduate ...
Nov 9, 2017 · Rasch model analyses showed that tests with number-right scoring have better psychometric properties than formula scoring.
[55]
Scoring methods for multiple choice assessment in higher education
Number right scoring. Traditionally, multiple choice tests have been scored using a conventional number right (NR) scoring method (Bereby-Meyer et al., 2002 ...<|control11|><|separator|>
[56]
Scoring Single-Response Multiple-Choice Items: Scoping Review ...
May 19, 2023 · The aim of this paper is to compile scoring methods for individual single-choice items described in the literature.
[57]
Evaluating Different Scoring Methods for Multiple Response Items ...
For this reason, the RI method is seemingly more lenient than the SU method because it can allow for partial credit even when irrelevant information is selected ...Multiple Response Scoring... · Scoring Methods Evaluated · Results
[58]
Multiple Select Partial Scoring Methods - Complete Guide - Brillium
Aug 5, 2025 · Introduction. Brillium Assessment Builder offers multiple scoring methods for Multiple Select questions with partial credit.
[59]
(PDF) Computerized Adaptive Test (CAT) Applications and Item ...
Aug 9, 2025 · (PDF) Computerized Adaptive Test (CAT) Applications and Item Response Theory Models for Polytomous Items.
[60]
Developing Computerized Adaptive Testing for a National Health ...
Oct 31, 2023 · In this study, we conducted psychometric simulations to develop a CAT for evaluating the candidates' competence of health professionals.
[61]
some correlates of net gain resultant from answer changing on ... - jstor
1) Wrong answer changed to right answer; 2) Wrong answer changed to another wrong answer; 3) Right answer changed to wrong answer. changed, 5.3 times as many ...
[62]
Answer Changing: A Meta-Analysis of the Prevalence and Patterns
Jul 1, 1994 · ABSTRACT. In order to determine the prevalence and patterns of answer-changing behavior, a meta-analysis of 61 studies was conducted.
[63]
Answer changing in multiple choice assessment change that answer ...
Aug 24, 2007 · Answer changing in multiple choice examinations results in an increased number of wrong answers rather than an improved score.
[64]
Changing Answers in Multiple-Choice Exam Questions - NIH
Jun 2, 2023 · The purpose of this study is to validate the previous findings on the benefits of changing answers in podiatric medical school examinations and ...
[65]
Overview of the NAEP Assessment Design Since 1969
Mar 1, 2011 · The National Assessment of Educational Progress (NAEP) has instituted a multi-component assessment system, in which each component is itself a set of ...
[66]
NAEP Frequently Asked Questions (FAQs) | osse
Since its inception in 1969, NAEP has assessed numerous ... NAEP assessments generally contain both constructed-response and multiple-choice questions.
[67]
Are New Common Core Tests Better Than Old Multiple-Choice ...
May 20, 2015 · New tests aligned to Common Core standards feature in-depth tasks that some say will make the new exams better than the multiple-choice-heavy ...
[68]
Common Core Standards: Comprehensive Guide to Assessments
CCSS-aligned tests build on prior knowledge to reinforce learning over time. ... Tests include multiple-choice, open-ended, and performance tasks. Choose ...
[69]
SAT Participation Continues To Grow As The SAT Suite Successfully ...
Sep 24, 2024 · More than 1.97 million students in the high school class of 2024 took the SAT at least once, up from 1.91 million in the class of 2023.
[70]
SAT, ACT participation remains below pre-pandemic levels | K-12 Dive
Oct 22, 2025 · Some 1.38 million students took the ACT in 2025 compared to 1.78 million in 2019, and about 2 million students took the SAT this year versus ...
[71]
The ACT Predicts Academic Performance—But Why? - PMC - NIH
Jan 3, 2023 · Correlations between scores on the tests and college grade point average (GPA) are typically in the . 30–. 50 range (Kuncel and Hezlett 2007; ...
[72]
India's JEE Main vs China's Gaokao: Which exam is tougher?
Feb 21, 2025 · In 2024, approximately 12 million students appeared for the Gaokao, compared to around 1.5-2 million applicants for the JEE. Despite the high ...
[73]
Highlights of JEE Main 2024 Exam Pattern - BYJU'S
In paper I, each subject will contain two sections. Section A will be of Multiple-Choice Questions (MCQs), and Section B will contain questions whose answers ...
[74]
The Gaokao: History, Reform, and Rising International Significance ...
May 2, 2016 · In 1977, the gaokao was reintroduced. The event signaled the revival of opportunity for millions of suppressed youth in China.
[75]
Answering Multiple-Choice Questions in Geographical Gaokao with ...
In this paper, we focus on answering multiple-choice questions in geographical Gaokao. Specifically, a concept graph is automatically constructed from textbook ...<|separator|>
[76]
Rethinking Standardized Testing: Calling for Equity to Close the ...
Nov 7, 2022 · This lack of educational equity has carried through to today, where standardized test scores can affect funding access for schools. When the ...Missing: 2020s | Show results with:2020s
[77]
Standardized Testing is Still Failing Students | NEA
Mar 30, 2023 · Educators have long known that standardized tests are an inaccurate and unfair measure of student progress. There's a better way to assess students.Missing: 2020s access
[78]
Step 1 Content Outline and Specifications | USMLE
Content Description. Step 1 consists of multiple-choice questions (MCQs), also known as items, created by USMLE committees composed of faculty members, ...
[79]
Performance Data | USMLE
The National Board of Medical Examiners publishes USMLE performance data in its annual reports. Most of the data presented here are excerpted from those ...
[80]
Complete Guide to CPA Exam Sections - Becker
Within the section, the exam format is divided into five specific "testlets," or groups of questions. Two testlets consist of multiple-choice questions and ...
[81]
Gallup Poll Methodology
Jan 25, 2024 · Gallup has been polling the American public for almost 90 years and still asks many of the questions that originated in the 1930s, including presidential job ...
[82]
Minnesota Multiphasic Personality Inventory - StatPearls - NCBI - NIH
May 27, 2020 · The MMPI is a psychometric test to assess personality traits and psychopathology, using 567 true/false items to draw conclusions about the test ...Definition/Introduction · Issues of Concern · Clinical Significance
[83]
Ethical Proctoring Guide for Secure Online Assessments - Talview
Dec 31, 2024 · A study by EDUCAUSE showed that AI proctoring reduced false-positive cheating flags by 40% compared to manual reviews. Scalable Operations ...