C syntax
C syntax encompasses the formal rules and conventions that govern the structure, formatting, and interpretation of source code in the C programming language, enabling the definition of variables, functions, control flows, and data manipulations through a concise, low-level paradigm.[1] Developed by Dennis M. Ritchie at Bell Labs between 1969 and 1973 as a system implementation language for the Unix operating system, C's syntax evolved from earlier languages like BCPL and B, incorporating typeless roots while introducing structured types and pointer arithmetic to support efficient hardware interaction on platforms such as the PDP-11.[2]
At its core, a C program consists of a sequence of tokens—including keywords (e.g., int, if, while), identifiers, constants, operators, and punctuation—separated by whitespace, with execution typically beginning in a main function that returns an integer status.[1] Statements end with semicolons, and compound statements are delimited by curly braces {}, fostering a block-based structure that enhances readability and scoping without mandating indentation.[2] The preprocessor handles directives like #include for header files and #define for macros, allowing conditional compilation and portability across environments before the compiler processes the code.[1]
C's type system forms a foundational aspect of its syntax, distinguishing basic types such as int for integers, char for characters, float and double for floating-point numbers, alongside derived types like arrays, pointers, structures, unions, and enumerations.[1] Declarations mirror the syntax of expressions for intuitiveness; for instance, *int pi declares a pointer to an integer, reflecting how *pi would dereference to yield an int value.[2] Pointers enable direct memory addressing and arithmetic (e.g., ptr + i advances by the size of the pointed-to type), while arrays decay to pointers in most contexts, such as a equating to *(a + i), which underpins C's flexible but error-prone memory model.[1]
Control structures in C syntax provide mechanisms for decision-making and repetition, including conditional statements like if-else and switch, which support labeled cases and fall-through behavior, as well as loops via for, while, and do-while, often enhanced by break and continue for early exits.[1] Operators span arithmetic (+, -), relational (<, ==), logical (&&, || with short-circuit evaluation), bitwise (&, |, ^), and assignment forms (=, +=), with a defined precedence hierarchy that binds operations predictably, such as multiplication preceding addition.[2] Type qualifiers like const and volatile, along with storage classes (auto, static, extern, register), further refine variable semantics, influencing visibility, mutability, and optimization.[1]
Standardized by ANSI in 1989 (as ANSI X3.159-1989) and later by ISO (e.g., ISO/IEC 9899:1999 for C99), C syntax emphasizes portability and efficiency, though it permits undefined behaviors like signed integer overflow to grant compiler flexibility.[2] Its minimalist design, including single-character operators like = for assignment (contrasting BCPL's :=) and / / for comments, has profoundly influenced successor languages like C++, Java, and Go, while GNU extensions in implementations like GCC add non-standard features such as nested functions.[1]
Program Structure
Main Function
In the C programming language, the main function serves as the designated entry point for program execution in a hosted environment, where the implementation invokes it after initializing objects with static storage duration.[3] The function must be defined with a return type of int and either no parameters or two specific parameters for handling command-line arguments, as detailed in the next section.[3]
The standard syntax for main is int main(void) for programs without command-line arguments or int main(int argc, char *argv[]) for those that accept them, where argc represents the argument count and argv an array of argument strings; equivalent forms, such as using char **argv, are also permitted, though the implementation declares no prototype for main.[3] In a freestanding environment, such as embedded systems without an operating system, the startup mechanism is implementation-defined and does not require a main function, potentially allowing implicit handling of entry points.[3]
Historically, in K&R C as described in the 1978 first edition of The C Programming Language, the main function lacked an explicit return type, defaulting to int, and used an old-style definition like main(argc, argv) int argc; char **argv; { ... } without prototypes.[4] The ANSI C standard (ISO/IEC 9899:1989), later adopted as C90, mandated explicit declarations and prototypes for all functions, including main, to enhance portability and type safety, deprecating the implicit int return and old-style syntax.[5]
Upon termination, a return statement in main provides the program's exit status to the environment, with return 0; conventionally indicating successful execution, while non-zero values signal errors; if main executes without a return statement, the status defaults to 0 in C99 and later standards.[3]
c
#include <stdio.h>
int main(void) {
[printf](/page/Printf)("Hello, World!\n");
[return 0](/page/Return_0); // Success
}
#include <stdio.h>
int main(void) {
[printf](/page/Printf)("Hello, World!\n");
[return 0](/page/Return_0); // Success
}
This example illustrates the minimal int main(void) form, suitable for simple programs.[3]
Command-Line Arguments
In C, command-line arguments are accessed through parameters to the main function, which serves as the program's entry point. The standard form is int main(int argc, char *argv[]) or equivalently int main(int argc, char **argv), where argc is a nonnegative integer representing the total number of arguments passed to the program, including the program's name as the first argument. The parameter argv is an array of pointers to null-terminated character strings, with argv[0] pointing to the program's name (implementation-defined if unavailable), argv[1] through argv[argc-1] pointing to the additional arguments, and argv[argc] guaranteed to be a null pointer (NULL) to bound the array.[6]
An alternative wide-character form, int main(int argc, wchar_t *argv[]) or int main(int argc, wchar_t **argv), is supported as an implementation-defined extension in some environments, allowing Unicode or wide-string arguments where multibyte characters are insufficient.[7]
In C23 (ISO/IEC 9899:2023), a common implementation-defined extension adds a third parameter for environment variables: int main(int argc, char **argv, char **envp), where envp is an array of pointers to null-terminated strings of the form "name=value", terminated by a null pointer; this form is not mandated by the standard but provides direct access to the execution environment without relying on library functions like getenv.[6]
Lexical Elements
Identifiers
In the C programming language, identifiers serve as names for variables, functions, types, labels, and other program entities. According to the ISO/IEC 9899 standard, an identifier consists of a nonempty sequence of characters drawn from the basic source character set, beginning with an identifier nondigit, which is either a universal character name designating a letter, an underscore (_), or a letter from a to z or A to Z; subsequent characters may be identifier nondigits or decimal digits (0 to 9).[8] Identifiers are case-sensitive, so variable and Variable are treated as distinct.[9]
The length of identifiers is subject to translation limits specified in the standard to ensure portability across implementations. Since C99 (ISO/IEC 9899:1999), conforming implementations must treat at least the first 31 characters of external identifiers (those with external linkage, such as global variables or functions) as significant, while additional characters beyond this may be ignored or treated differently by the implementation.[9] For internal identifiers (those with internal linkage, such as local variables) and macro names, at least the first 63 characters must be significant.[9] In C23 (ISO/IEC 9899:2024), the significant length for external identifiers has been increased to at least 63 characters to align with modern practices and reduce legacy constraints.[8]
Certain identifier forms are reserved for the implementation or future use to avoid conflicts with compiler-specific extensions or library functions. For example, any identifier beginning with an underscore followed by an uppercase letter (e.g., _Foo) or consisting of two or more consecutive underscores (e.g., __bar) is reserved for the implementation in all contexts.[9] Similarly, identifiers beginning with a single underscore are reserved for use at file scope.[9] User-defined identifiers must not match any keywords, which are predefined reserved words.[8]
C23 introduces enhanced support for international characters in identifiers through universal character names, aligned with Unicode Standard Annex #31 for identifier validity. Universal character names (e.g., \u03B1 for the Greek letter alpha) can appear in place of nondigits, provided they designate characters with the XID_Start property for the first position or XID_Continue for subsequent positions, and the identifier is in Normalization Form C.[8] This allows identifiers like π_value (using \u03C0) while maintaining compatibility with earlier standards that limited such usage to basic Latin characters.[8]
For illustration, the following are valid C identifiers:
myVariable
_count
π_factor /* Valid in C23 with universal character name */
myVariable
_count
π_factor /* Valid in C23 with universal character name */
Invalid examples include those starting with a digit (e.g., 1abc) or containing disallowed characters (e.g., my-var).[8]
Keywords
In the C programming language, keywords are predefined reserved words that have special meanings and cannot be used as identifiers for variables, functions, or other user-defined names. These keywords form the core vocabulary of the language, directing the compiler on control flow, data types, storage classes, and other constructs. The set of keywords has evolved with each revision of the ISO C standard, reflecting additions for new features like type qualifiers, atomic operations, and alignment specifications.[10]
The original C89 standard (ISO/IEC 9899:1989) defined 32 keywords, which include fundamental type specifiers such as int, char, float, double, and control structures like if, else, for, while, do, switch, case, break, and continue. Storage class specifiers like auto, register, static, extern, and qualifiers such as const, volatile, signed, and unsigned were also introduced, along with sizeof for runtime type information, return for function exits, and goto for unconditional jumps. Aggregate types were supported via struct, union, enum, and typedef for custom type aliases. The void keyword denoted absence of type, while default handled fallback cases in switches.[10]
Subsequent standards expanded this set. C99 (ISO/IEC 9899:1999) added five keywords: inline for function inlining hints, restrict for pointer aliasing restrictions to optimize memory access, _Bool for boolean types (typically via <stdbool.h> macro bool), _Complex for complex number support, and _Imaginary for imaginary types (though implementation-dependent). C11 (ISO/IEC 9899:2011) introduced eight more: _Alignas and _Alignof for type alignment control, _Atomic for thread-safe atomic operations, _Generic for type-generic macros, _Noreturn to indicate non-returning functions, _Static_assert for compile-time assertions, and _Thread_local for thread-specific storage. These underscore-prefixed keywords were intended to avoid conflicts with user identifiers.[10]
C23 (ISO/IEC 9899:2024) further modernized the language by adding keywords like alignas, alignof, bool, constexpr (for constant expressions), false, nullptr (null pointer constant), static_assert, thread_local, true, typeof (for type inference), and typeof_unqual (unqualified type), while deprecating the C11 underscore variants in favor of these cleaner, standard forms. It also introduced bit-precise integers via _BitInt and decimal floating-point types with _Decimal32, _Decimal64, and _Decimal128. Preprocessor extensions like elifdef and elifndef were added, but these are directives rather than core keywords. The total now exceeds 50, depending on the implementation and included headers.[10]
Keywords are strictly reserved across all translation units and scopes, prohibiting their reuse as identifiers to prevent syntax errors; attempts to do so result in compilation failures. Some keywords, such as typedef, are context-sensitive, recognized only in declaration contexts, allowing limited flexibility in parsing. Unlike identifiers, which follow specific naming rules (alphanumeric with underscore starts), keywords must remain unchanged and are case-sensitive. This reservation ensures portability and consistency across C compilers conforming to the standards.[10]
| Standard | Keywords Introduced |
|---|
| C89 | auto, break, case, char, const, continue, default, do, double, else, enum, extern, float, for, goto, if, int, long, register, return, short, signed, sizeof, static, struct, switch, typedef, union, unsigned, void, volatile, while |
| C99 | inline, restrict, _Bool, _Complex, _Imaginary |
| C11 | _Alignas, _Alignof, _Atomic, _Generic, _Noreturn, _Static_assert, _Thread_local |
| C23 | alignas, alignof, bool, constexpr, false, nullptr, static_assert, thread_local, true, typeof, typeof_unqual, _BitInt, _Decimal32, _Decimal64, _Decimal128 (plus deprecations of prior underscore forms) |
[10]
In the C programming language, comments serve to annotate source code with explanatory text that is ignored by the compiler, facilitating documentation and code maintenance without affecting program execution. There are two primary forms of comments: block comments, enclosed by the delimiters /* and */, and line comments, introduced in the C99 standard and beginning with //. Block comments can span multiple lines and are suitable for longer explanations, while line comments apply only to the remainder of the current line.[11]
The syntax for a block comment is /* followed optionally by comment text and terminated by */, where the comment text consists of any sequence of characters excluding the null character or newline, except that the terminating */ must not be interrupted by another /*. For example:
c
/* This is a block [comment](/page/Comment) spanning
multiple lines. It explains the purpose
of the following [function](/page/Function). */
int main(void) {
return 0;
}
/* This is a block [comment](/page/Comment) spanning
multiple lines. It explains the purpose
of the following [function](/page/Function). */
int main(void) {
return 0;
}
Block comments do not nest; an inner /* begins a nested comment that terminates at the first subsequent */, potentially leading to unexpected behavior if nesting is attempted, such as in /* outer /* inner */ */ which effectively ends the outer comment at the inner */. (Section 6.4.9) This non-nesting rule has been consistent since the C89 standard and remains unchanged in subsequent revisions, including C23.[12]
Line comments begin with // followed by optional comment text and terminate at the next newline character (or the end of the file if no newline follows). They were standardized in C99 to align with common extensions in compilers like GCC and to simplify single-line annotations. For example:
c
// This is a line comment
int x = 5; // Inline comment after statement
// This is a line comment
int x = 5; // Inline comment after statement
Line comments cannot span lines and do not nest, as their scope is strictly until the newline; any // within a block comment is treated as literal text. (Section 6.4.9)
Comments are processed during translation phase 3 of the C compilation process, where each comment is replaced by a single space character, ensuring they do not influence tokenization or syntax while preserving spacing between adjacent tokens. This occurs after physical source file mapping and line splicing but before directive processing, meaning comments are entirely ignored during preprocessing and do not interact with directives like #include. For instance, #define FOO /* comment */ BAR expands to BAR after comment removal. (Section 5.1.1.2)
Preprocessor Directives
The C preprocessor processes source code before compilation, handling directives that control macro expansion, file inclusion, and conditional compilation. Preprocessing directives begin with the # character, which must be the first non-whitespace character on its line, followed by the directive name and optional parameters, and terminate with a newline. Whitespace may precede the #, but no characters other than whitespace are allowed between the # and the directive name. These directives are not part of the C language proper but modify the translation unit prior to parsing.[8]
Key directives include #include, which inserts the contents of another file into the current source; for example, #include <stdio.h> for standard headers using angle brackets or #include "local.h" for user files using quotes. The #define directive creates macros: object-like macros replace a simple identifier with a token sequence, such as #define PI 3.14159, while function-like macros accept parameters and perform substitution with argument replacement, like #define MAX(a, b) ((a) > (b) ? (a) : (b)). The #undef directive removes a previously defined macro, as in #undef PI. Conditional directives enable selective compilation: #ifdef and #ifndef test for the presence or absence of a macro definition, paired with #endif to delimit blocks; more flexible control uses #if for constant expressions, with #elif and #else for alternatives, all ending in #endif. The #error directive halts compilation with a specified message, e.g., #error "Unsupported platform". The #pragma directive conveys implementation-defined instructions to the compiler, such as #pragma once for header guards in some systems. Finally, #line adjusts the reported line number and optional filename for diagnostics, like #line 100 "source.c".[8]
Macro definitions via #define support both simple token replacement in object-like forms and parameterized expansion in function-like forms, where arguments are substituted into the replacement list while respecting parentheses for operator precedence. Macros can be undefined at any point after definition using #undef, ensuring no redefinition conflicts. These mechanisms allow for code reuse and portability without altering the core syntax.[8]
Conditional compilation relies on integer constant expressions evaluated during preprocessing, incorporating the defined unary operator—which yields 1 if the operand macro is defined (ignoring its value) and 0 otherwise—and predefined macros such as __LINE__ (current line number as decimal integer), __FILE__ (current source filename as string literal), __DATE__ (compilation date as "Mmm dd yyyy" string), and __TIME__ (compilation time as "hh:mm:ss" string). Expressions in #if or #elif must be constant, using only integer literals, character constants, and these operators, without side effects.[8]
Since C99 (ISO/IEC 9899:1999), the _Pragma operator extends pragma functionality by allowing stringified directives for dynamic or macro-based insertion, as in _Pragma("STDC FENV_ACCESS ON"), enabling pragmas within macros or expressions where #pragma cannot appear directly.[13]
Constants and Literals
Integer Constants
Integer constants in the C programming language represent fixed integer values directly in source code, serving as literals in expressions and initializers. They are essential for embedding numerical data without runtime computation, and their syntax ensures portability across implementations by adhering to strict rules for notation and type inference. The form of an integer constant is specified in the ISO/IEC 9899 standard, which defines the lexical elements of C programs.[8]
The basic syntax for integer constants includes decimal, octal, and hexadecimal notations. Integer constants represent non-negative values; negative literals are expressions involving the unary minus operator applied to a positive constant. A decimal constant consists of a sequence of decimal digits (0-9), starting with a non-zero digit unless it is zero itself, such as 123 or 0. An octal constant begins with a zero followed by octal digits (0-7), for example, 0123 (equivalent to decimal 83). Hexadecimal constants start with 0x or 0X followed by hexadecimal digits (0-9, a-f, or A-F), like 0x123 (decimal 291). These notations allow representation in familiar bases, with octal and hexadecimal useful for low-level bit manipulation.[9][8]
Suffixes modify the type and signedness of integer constants, enabling specification of unsigned or extended-range variants. The suffixes u or U denote unsigned integers, l or L indicate long integers, and ll or LL specify long long integers; combinations are permitted, such as 123ULL for an unsigned long long constant. Suffixes are case-insensitive and can appear in any order, though unsigned typically precedes size specifiers. Without suffixes, the type is determined by the constant's value: it is assigned to the first signed integer type (int, long int, or long long int) that can represent it, based on implementation-defined ranges; if the value exceeds signed long long, it defaults to the corresponding unsigned variants in sequence (unsigned int, unsigned long int, unsigned long long int). This value-based promotion ensures compatibility while prioritizing signed types for non-negative values.[9][8]
In C23 (ISO/IEC 9899:2024), binary notation is introduced for integer constants, starting with 0b or 0B followed by binary digits (0-1), such as 0b101 (decimal 5), facilitating direct bit-pattern representation. Additionally, C23 adds digit separators using a single apostrophe (') between digits for improved readability, which are ignored during value evaluation; examples include 1'000 (decimal 1000) or 0b1010'1100. These features apply to all bases and suffixes, with type determination following the same rules as in prior standards. C23 also introduces suffixes for bit-precise integers: wb or WB for signed _BitInt(N) and uwb for unsigned _BitInt(N), where N is the smallest number of bits required to represent the constant's value (e.g., 6uwb is unsigned _BitInt(3)). These can combine with other suffixes like U or LL, and support digit separators.[8][14]
Floating-Point Constants
Floating-point constants in C represent real numbers and are used to initialize variables of the primitive floating-point types: float, double, and long double.[15] They consist of a significand, an optional exponent, and an optional suffix, with the significand formed by an optional integer part, a decimal point, and an optional fractional part.[15] At least one digit must appear before or after the decimal point, or an exponent must be present to distinguish them from integer constants.[16]
The decimal form of a floating-point constant includes a sequence of decimal digits with a decimal point, such as 3.14 or 0.001, and may include an exponent part introduced by e or E followed by an optional sign and decimal digits, for example, 3.14e-10 or 1.23E4.[15] Without a suffix, a floating-point constant has type double; the suffixes f or F specify type float, while l or L specify long double.[15] For instance:
double pi = 3.14159; // type [double](/page/Double)
float radius = 5.0f; // type [float](/page/Float)
long double angle = 90.0L; // type [long double](/page/Long_double)
double pi = 3.14159; // type [double](/page/Double)
float radius = 5.0f; // type [float](/page/Float)
long double angle = 90.0L; // type [long double](/page/Long_double)
Since C99, hexadecimal floating-point constants are supported, prefixed by 0x or 0X, followed by hexadecimal digits for the significand (with a decimal point optional), and a mandatory binary exponent introduced by p or P followed by a signed integer, such as 0x1.2p3, which equals 9.5 in decimal.[15] These constants also default to double unless suffixed with f/F or l/L, but hexadecimal forms cannot use the decimal floating-point suffixes introduced in C23 (df/DF for _Decimal32, dd/DD for _Decimal64, dl/DL for _Decimal128). For example, 1.23df has type _Decimal32. These decimal suffixes apply only to decimal notation and support decimal floating-point types for precise representation in applications like finance.[15][17]
In C23, single quotes (') can be used as digit separators within the significand or exponent parts of floating-point constants to improve readability, and these separators are ignored by the compiler; for example, 1'234.5e-3 is equivalent to 1234.5e-3.[15] This feature applies to both decimal and hexadecimal forms but cannot appear adjacent to the decimal point or exponent indicator.[15]
Character Constants
In the C programming language, a character constant is a literal representation of a single character or escape sequence enclosed in single quotation marks, serving as an integer value suitable for assignments, comparisons, or arithmetic operations.[8] Ordinary character constants, such as 'A' or '\n', have type int, with the value corresponding to the numeric code of the character in the execution character set.[9] These constants typically represent one character from the basic execution character set, though multicharacter constants like 'ab' are permitted but yield implementation-defined values of type int.[9]
Wide character constants, introduced in the C99 standard, use the L prefix, as in L'A', and have type wchar_t to support extended character sets beyond the basic multibyte encoding.[9] The C23 standard extends this with Unicode-aware prefixes: u8'c' for UTF-8 encoded constants of type char8_t, u'c' for UTF-16 of type char16_t, and U'c' for UTF-32 of type char32_t, enabling multibyte representations for international characters while restricting each constant to a single code unit.[8] Unlike string literals, which use double quotes and form arrays of char, character constants are strictly single-quoted and evaluate to scalar integer types.[8]
The syntax for a character constant follows the form encoding-prefix_opt ' c-char-sequence ', where the optional encoding-prefix is one of u8, u, U, or L, and c-char-sequence consists of one or more c-char elements.[8] A c-char is any member of the source or execution character set except single quote ('), backslash (\), or newline, or it may be an escape sequence.[9] Escape sequences allow embedding control characters, literals, or numeric values, starting with a backslash followed by a specifier.
The following table lists the standard escape sequences supported in character constants:
| Category | Sequence | Description | Example Value (ASCII) |
|---|
| Simple escapes | \a | Alert (bell) | 7 |
| \b | Backspace | 8 |
| \f | Form feed | 12 |
| \n | Newline | 10 |
| \r | Carriage return | 13 |
| \t | Horizontal tab | 9 |
| \v | Vertical tab | 11 |
| \\ | Backslash | 92 |
| \' | Single quote | 39 |
| \" | Double quote | 34 |
| \? | Question mark | 63 |
| \0 | Null character | 0 |
| Octal escape | \ooo | Octal value (1-3 digits, 0-377) | '\123' = 83 |
| Hexadecimal escape | \xhh... | Hexadecimal value (1+ digits, 0-FF...) | '\x1A' = 26 |
| Universal name | \uXXXX | Unicode code point (4 hex digits) | '\u00A9' = © |
| \UXXXXXXXX | Unicode code point (8 hex digits) | '\U000000A9' = © |
Octal and hexadecimal escapes interpret the following digits until a non-digit or invalid digit is encountered, with values truncated to fit the character type.[9] Universal character names, added in C99, facilitate portable representation of Unicode characters outside the basic set.[9] For instance, the constant '\n' represents a newline with value 10 in ASCII-based implementations, while L'\n' does the same in wide form.[8]
String Literals
A string literal in C is a sequence of characters enclosed in double quotes, optionally preceded by an encoding prefix, and it represents an initializer for an array of characters. The basic syntax is "s-char-sequence", where each s-char is a source character excluding double quote, backslash, or newline, or an escape sequence. For example, "hello" denotes a sequence of five characters followed by a null terminator.[18]
Adjacent string literals are automatically concatenated during translation phase 6, forming a single string literal without intervening operators or whitespace. This allows splitting long strings for readability, such as "hel""lo" which becomes equivalent to "hello". Concatenation applies to both narrow and wide string literals, but mixing prefixes like u8 and L may result in implementation-defined behavior for wide variants.[18]
C supports various encoding prefixes for string literals, introduced or expanded in C11 (ISO/IEC 9899:2011). A plain string literal like "hello" has type char[], an array of narrow characters in the execution character set. The L prefix denotes a wide string literal of type wchar_t[], suitable for multibyte encodings. C11 adds u8 for UTF-8 encoded strings of type char[], u for UTF-16 strings of type char16_t[], and U for UTF-32 strings of type char32_t[]. These types are constant-qualified, meaning the arrays are modifiable unless explicitly declared const.[18]
Every string literal is implicitly null-terminated with a byte or code of value zero (\0) appended in translation phase 7, making the effective length one greater than the number of characters specified. The sizeof operator on a string literal expression yields the total size including this null terminator; for instance, sizeof "hello" returns 6. This null termination enables standard library functions like strlen to determine the string's length.[18]
Escape sequences in string literals follow the same rules as in character constants, allowing representation of non-printable or special characters such as \n for newline, \t for tab, octal escapes like \101 for 'A', hexadecimal \x41 for 'A', or universal character names like \u0041. These are processed in translation phase 3 to produce members of the execution character set.[18]
Type System
Primitive Types
In the C programming language, primitive types represent the basic, built-in data types that form the foundation for variable declarations and expressions, without relying on user-defined structures or compositions. These types include integer, floating-point, boolean, and void variants, each with specific syntactic forms for declaration. Their exact sizes and behaviors are implementation-defined to allow flexibility across different hardware architectures, though the ISO/IEC 9899 family of standards imposes minimum requirements to ensure portability.[8]
Integer types in C encompass both signed and unsigned variants, providing mechanisms for representing whole numbers with or without sign. The basic signed integer types are char, short, int, long, and long long (the latter introduced in C99), declared using keywords such as signed int or simply int (where signed is implicit for most types except char). Unsigned variants are specified with the unsigned keyword, e.g., unsigned char or unsigned long long, allowing for larger positive ranges at the expense of negative values. The size of these types is platform-dependent: char is at least 8 bits, short and int at least 16 bits, long at least 32 bits, and long long at least 64 bits, as defined in the standard's limits for integer types.[9][8]
C23 introduces bit-precise integer types using the _BitInt(N) specifier, where N is the exact number of bits (up to implementation-defined maximum, at least 128), allowing signed or unsigned integers of arbitrary width, e.g., _BitInt(16) x;. These provide precise control over bit sizes beyond traditional types.[19]
For applications requiring precise bit widths regardless of the implementation, the <stdint.h> header—introduced in C99—provides fixed-width integer types such as int8_t, int16_t, int32_t, int64_t for signed integers and corresponding uintN_t types for unsigned ones, along with least-width variants like int_least16_t that guarantee at least the specified width.[9]
Floating-point types handle approximate representations of real numbers using IEEE 754 formats in most implementations. The primitive types are float (single precision, at least 32 bits), double (double precision, at least 64 bits), and long double (extended precision, implementation-defined but often 80 or 128 bits). Declarations follow simple syntax, such as float pi; or long double value;, with ranges and precision determined by the exponent and mantissa bits specified in the standard.[8][9]
C23 adds decimal floating-point types _Decimal32 (at least 32 bits), _Decimal64 (at least 64 bits), and _Decimal128 (at least 128 bits) for base-10 arithmetic, useful for financial applications to avoid binary rounding errors, e.g., _Decimal32 price;.[20]
The boolean type, introduced in C99 as _Bool, represents logical true (non-zero) or false (zero) values and is declared as _Bool flag;, with arithmetic operations promoting it to int for compatibility. The <stdbool.h> header provides the portable macros bool (typedef for _Bool), true (1), and false (0) to simplify usage.[9]
C23 introduces the char8_t type for UTF-8 character representation, declared as char8_t c;, with size typically matching unsigned char.[8]
The void type specifies the absence of a value or type, primarily used in function declarations to indicate no return value, e.g., void function(void); where the parameter list void explicitly denotes no arguments, or in pointer types like void* for generic addressing (though the latter derives from void). C23 adds nullptr as a null pointer constant of type nullptr_t, usable in pointer contexts, e.g., int *p = nullptr;.[8][21] Type qualifiers such as const or volatile may be applied to primitive types to modify their behavior, e.g., const int x;.[9]
C23 also introduces the auto type specifier for automatic type inference in object declarations at block scope, e.g., auto x = 42; // infers int, distinct from the auto storage class specifier. This enables typeof and typeof_unqual extensions for more flexible type deduction in expressions.[22]
Derived Types
In the C programming language, derived types are constructed from other types—typically primitive types—using specific syntactic constructs that allow for the formation of pointers, arrays, functions, aggregates (structures and unions), and enumerations, as defined in the ISO/IEC 9899 family of standards.[23] These derived types enable the representation of complex data relationships and behaviors while maintaining the language's focus on low-level control.[23] Incomplete types, which lack full size or layout information, also arise from certain derivations and are useful in forward declarations.[23] Later standards, such as C11, introduce additional type-related features like atomic types for concurrency support.[23]
Pointer types derive from an object type or another function type, describing an object whose value provides a reference to an entity of the referenced type.[23] The syntax for declaring a pointer is formed by prefixing an asterisk (*) to the base type, optionally with type qualifiers.[23] For example:
c
[int](/page/INT) *p; /* p is a pointer to [int](/page/INT) */
[int](/page/INT) *p; /* p is a pointer to [int](/page/INT) */
This creates a complete object type unless the referenced type is incomplete.[23]
Array types derive from an element type, describing a contiguously allocated nonempty set of objects of that type.[23] The syntax involves the base type followed by square brackets containing the array size, which must be a nonnegative constant expression if specified.[23] For example:
c
int a[10]; /* a is an array of 10 ints */
int a[10]; /* a is an array of 10 ints */
The element type must be complete if the size is known at declaration, but arrays can form incomplete types when the size is omitted, such as in external declarations.[23]
Function types derive from a return type and describe a function with specified parameters, characterized by the return type and the types and number of parameters.[23] The basic syntax places the parameter list in parentheses after the identifier, with the return type preceding.[23] Pointers to functions follow a similar pattern but require parentheses around the identifier and asterisk.[23] Examples include:
c
int f(int x); /* f is a function returning int, taking int parameter */
int (*pf)(int x); /* pf is a pointer to a function returning int, taking int parameter */
int f(int x); /* f is a function returning int, taking int parameter */
int (*pf)(int x); /* pf is a pointer to a function returning int, taking int parameter */
The parameter list may include an ellipsis (...) for variadic functions or be empty for functions with unspecified parameters.[23]
Aggregate types encompass structures and unions, both derived from their member types to describe collections of objects.[23] A structure type specifies sequentially allocated members, while a union type specifies overlapping members sharing the same memory.[23] The syntax uses the struct or union keyword, optionally with a tag, followed by a brace-enclosed list of member declarations.[23] For example:
c
struct point {
int x;
int y;
}; /* structure with two int members */
union data {
int i;
float f;
}; /* union with int or float member */
struct point {
int x;
int y;
}; /* structure with two int members */
union data {
int i;
float f;
}; /* union with int or float member */
Members can have diverse types, and aggregates form incomplete types if declared without member lists.[23]
Enumerated types derive from integer constants, forming a distinct type with named values compatible with int.[23] The syntax uses the enum keyword, optionally with a tag, followed by a brace-enclosed comma-separated list of enumerators, which may include constant expressions.[23] For example:
c
enum color {
RED,
GREEN = 2,
BLUE
}; /* [enumerated type](/page/Enumerated_type) with values 0, 2, 3 */
enum color {
RED,
GREEN = 2,
BLUE
}; /* [enumerated type](/page/Enumerated_type) with values 0, 2, 3 */
Each enumerator represents an integer constant, and the type is incomplete until the full list is provided.[23]
Incomplete types result from derivations where essential information, such as size or member details, is missing, including void, arrays of unknown size, and forward-declared structures or unions.[23] For instance, void * is an incomplete pointer type, and struct s; declares an incomplete structure type that can be completed later.[23] These are useful for mutual references or external linkages without full definitions.[23]
Introduced in C11, atomic types derive from a base type using the _Atomic specifier, providing thread-safe access through atomic operations without data races.[23] The syntax prefixes _Atomic to the type name.[23] For example:
c
_Atomic [int](/page/INT) counter; /* atomic integer type */
_Atomic [int](/page/INT) counter; /* atomic integer type */
The representation may differ from the non-atomic type, and it is a conditionally supported feature.[23]
Type Qualifiers
Type qualifiers in the C programming language are keywords that modify the semantics of a type without altering its base category, affecting how objects of that type are accessed, modified, or optimized by the compiler.[24] They are specified in declarations preceding the type specifier and can be applied to any type except void, with multiple identical qualifiers treated as a single instance.[24] The standard type qualifiers are const, volatile, and restrict (introduced in C99), while _Atomic was added in C11 to support atomic operations in multithreaded contexts.[25][24]
The const qualifier indicates that an object declared with this modifier cannot be modified through the lvalue after initialization, leading to undefined behavior if an attempt is made to do so.[25] It applies to the entire type, such as array elements or pointed-to objects, but does not prevent modification through a non-const alias.[25] For example:
c
const int x = 10; // x cannot be modified after initialization
const int *p; // Pointer to a const int; *p cannot be modified
const int x = 10; // x cannot be modified after initialization
const int *p; // Pointer to a const int; *p cannot be modified
Qualifiers like const are orthogonal to storage classes, which govern linkage and lifetime rather than access properties.[24]
The volatile qualifier specifies that an object may be modified by external means outside the program's control, such as hardware or interrupts, requiring the compiler to perform every access explicitly without optimization like caching or reordering.[25] What constitutes an access to a volatile-qualified object is implementation-defined but includes reads, writes, lvalue-to-rvalue conversions, and operations via pointers to such objects.[25] This is particularly useful for accessing hardware registers or shared memory in embedded systems.[25] An example is:
c
volatile int status; // Status register that may change externally
volatile int status; // Status register that may change externally
The restrict qualifier, available since C99, applies only to pointers and indicates that the referenced object will be accessed only through that pointer (and pointers derived from it) within the scope of the pointer's declaration, enabling the compiler to assume no aliasing for optimization purposes.[25] If the object is accessed via another pointer or means during that scope, the behavior is undefined.[25] It is commonly used in function parameters to indicate non-overlapping memory regions, as in:
c
void copy(int *restrict dest, const int *restrict src, size_t n);
void copy(int *restrict dest, const int *restrict src, size_t n);
This allows the compiler to generate more efficient code by avoiding redundant loads or stores.[25]
Introduced in C11, the _Atomic qualifier ensures that operations on the object are performed atomically, preventing data races in concurrent executions by treating accesses as indivisible units.[24] It can be applied to arithmetic types, pointers, and aggregates but not to arrays or functions directly.[24] For instance:
c
_Atomic int counter; // Atomic integer for thread-safe updates
_Atomic int counter; // Atomic integer for thread-safe updates
Type qualifiers can be combined to express multiple properties, such as const volatile int *p, which denotes a pointer to an integer that is constant (non-modifiable) but volatile (subject to external changes).[25] In pointer declarations, qualifiers can specify properties of the pointer itself or the pointed-to type, as in int * const p (constant pointer to modifiable int) versus const int *p (pointer to constant int).[24] These combinations allow precise control over mutability and optimization in complex data structures.[24]
Storage Classes
In C, storage-class specifiers determine the storage duration (lifetime) and linkage (visibility across translation units) of declared identifiers, influencing how objects and functions are allocated and accessed.[26] These specifiers are part of the declaration syntax and include auto, register, static, extern, typedef, and _Thread_local (the latter introduced in the C11 standard).[26] At most one storage-class specifier is permitted per declaration, except that _Thread_local may combine with static or extern.[26] Storage classes can combine with type qualifiers such as const or volatile to further specify behavior, for example, static const int x = 5;.[26]
The auto specifier indicates automatic storage duration, where the object's lifetime begins upon entry to the block in which it is declared and ends upon exit; this is the default for variables declared at block scope without any explicit storage-class specifier.[26] Such objects have no linkage and must be explicitly initialized each time the block is entered if a specific value is required, as their contents are otherwise indeterminate.[26] For example:
void function() {
auto int counter = 0; // Automatic storage, reinitialized on each call
counter++;
}
void function() {
auto int counter = 0; // Automatic storage, reinitialized on each call
counter++;
}
The register specifier requests automatic storage duration with no linkage, suggesting to the implementation that the object be stored in a CPU register for optimal access speed; however, the implementation may ignore this request, and the address of a register-qualified object cannot be taken using the & operator.[26] In modern compilers, this hint is often disregarded due to sophisticated register allocation algorithms that outperform manual suggestions.[27]
The static specifier provides static storage duration, ensuring the object's lifetime spans the entire program execution, with initialization occurring only once before main begins.[26] At file scope, static confers internal linkage, limiting visibility to the current translation unit; at block scope, it confers no linkage but retains the value across multiple block invocations.[26] Uninitialized static objects are zero-initialized.[26] Example at block scope:
void function() {
static int counter = 0; // Retains value across calls
counter++;
}
void function() {
static int counter = 0; // Retains value across calls
counter++;
}
The extern specifier declares an identifier with external linkage, allowing reference to an object or function defined in another translation unit; at file scope, it typically has static storage duration, but if initialized at block scope, it reverts to automatic duration with no linkage.[26] A declaration at file scope without an initializer and without extern is a tentative definition, which the compiler treats as an external definition with zero initialization and static storage duration if no subsequent definition appears in the translation unit.[26] Multiple tentative definitions for the same identifier in a translation unit are permitted, resolving to a single zero-initialized definition unless overridden.[26] For instance:
// file1.c
extern int global_var; // Declaration with external linkage
// file2.c
int global_var = 10; // Definition
// file1.c
extern int global_var; // Declaration with external linkage
// file2.c
int global_var = 10; // Definition
The typedef specifier creates an alias for an existing type without allocating storage, reserving no memory and imparting no linkage or duration; it is used solely for type naming and cannot combine with other storage-class specifiers.[26] Example:
typedef unsigned int uint; // uint is now an alias for unsigned int
typedef unsigned int uint; // uint is now an alias for unsigned int
Introduced in C11, _Thread_local specifies thread storage duration, where each thread executing the program has its own distinct instance of the object, with lifetime tied to the thread's existence.[26] It must combine with static or extern at block scope and confers the corresponding linkage; at file scope without such combination, its behavior is implementation-defined.[26] Initialization occurs once per thread before the first use.[26] Unlike type qualifiers, which affect mutability and optimization (e.g., const prevents modification), storage classes primarily control visibility and persistence.[26]
Declarations
Variable Declarations
In the C programming language, variable declarations introduce identifiers for objects such as scalars, arrays, or pointers, specifying their type and optionally initializing them. The basic syntax consists of a declaration specifier (typically a type name like int or float) followed by one or more declarators, terminated by a semicolon. For example, a simple declaration is int x;, which declares a variable x of type int without initialization. Multiple variables of the same type can be declared in a single statement using commas, as in int a, b, c;.[18]
Initialization can be included in the declaration using the equals sign, such as int x = 0;, where the initializer follows the declarator. This form allocates storage and sets an initial value, with rules varying by storage duration (e.g., automatic variables may have indeterminate values if uninitialized). Aggregate types like arrays support brace-enclosed lists, for instance int arr[3] = {1, 2, 3};, though full initialization details are covered elsewhere. Declarations without initializers rely on default behaviors based on scope and storage class.[18][28]
Variables declared within a block (delimited by curly braces {}) have block scope, meaning they are visible only from the point of declaration to the end of that block. File scope applies to declarations outside any function, making the identifier visible throughout the translation unit. Function prototype scope is limited to parameter types in function declarations and does not apply to variables proper. These scopes determine visibility, lifetime, and linkage options.[18]
Starting with C99, declarations can be intermixed with statements within a block, allowing code like int i = 0; [printf](/page/Printf)("%d\n", i); int j = i + 1;, which enhances flexibility over earlier standards requiring all declarations at the block's beginning. This feature supports more natural coding styles while maintaining compatibility.[25]
At file scope, a declaration without an initializer and without a storage-class specifier other than extern constitutes a tentative definition. For example, int x; acts as a tentative definition, which, if unmatched by a full definition elsewhere in the translation unit, becomes a definition initialized to zero. Multiple tentative definitions are permitted if they are compatible, facilitating modular code organization. The extern keyword, as in extern int x;, declares an existing variable without defining it, useful for separate compilation units.[18]
Type Declarations
In C, type declarations provide mechanisms to define new names for existing types or to specify aggregate types, enhancing code readability and modularity without instantiating objects. The primary constructs include typedef for creating aliases, struct for structure types, and enum for enumeration types, as detailed in the ISO/IEC 9899:2011 standard (C11). These declarations are essential for defining derived types such as aggregates, which build upon primitive types to form more complex data representations.[23]
The [typedef](/page/Typedef) specifier introduces a synonym for an existing type within its scope, without creating a new type. Its syntax is typedef followed by a type specifier and declarator, ending with an identifier for the alias, such as typedef int counter;. This alias can then be used interchangeably with the original type in declarations.[23] The typedef can also alias more complex types, including pointers and arrays; for example, typedef int *intptr; defines intptr as a synonym for a pointer to int, and typedef char string[100]; defines string as an array of 100 characters.[23] Such aliases simplify declarations of intricate types, particularly in function parameters or return types, and are evaluated at the point of declaration for variable-length arrays if applicable.[23]
Structure declarations define a type consisting of a sequence of named members, potentially of different types, allocated contiguously in memory. The complete syntax is struct [tag] { struct-declaration-list };, where the optional tag serves as an identifier for the type, and the member declarations follow standard type declaration rules. For instance:
c
struct point {
int x;
int y;
};
struct point {
int x;
int y;
};
This defines a complete structure type struct point with two integer members.[23] An incomplete (opaque) structure declaration omits the member list, using only struct [tag](/page/Tag);, which declares the type without specifying its content and is useful for forward references or abstracting implementation details, such as in header files. The type becomes complete only upon a subsequent full definition in the same scope.[23]
Enumeration declarations specify a type comprising a set of named integer constants. The syntax is enum [tag] { enumerator-list };, where each enumerator is an identifier optionally assigned a constant integer expression, defaulting to incremental values starting from 0 if unassigned. An example is:
c
enum color {
red,
green = 5,
blue
};
enum color {
red,
green = 5,
blue
};
Here, red is 0, green is 5, and blue is 6, with all enumerators having underlying type int.[23] The enumeration type is complete after the closing brace, and the tag, if present, identifies the type.
Introduced in C11, anonymous structures and unions allow unnamed aggregates as members within another structure or union, enabling direct access to their members by name in the enclosing scope. The syntax omits the tag and declarator, such as struct { int a; [float](/page/Float) b; }; embedded within a parent structure, treating the members as if directly declared in the parent. This feature, a special case of structure and union specifiers, facilitates compact representations of variant data without intermediate naming.[23] Similarly, anonymous unions like union { int i; char c; }; permit overlapping storage for members, accessible directly.[23]
Function Declarations
In C, a function declaration, also known as a function prototype, introduces the name and type signature of a function without providing its implementation, allowing the compiler to perform type checking and argument promotion during separate compilation.[29] The general syntax is return-type function-name(parameter-list);, where the return type specifies the type of value returned by the function, the function name is an identifier, and the parameter list declares the types and names of arguments.[9] For instance, int add(int a, int b); declares a function named add that takes two integers as parameters and returns an integer.[30]
Prior to the ANSI C standard (C89), the pre-ANSI or K&R style used a different syntax for function declarations, where parameters were listed without types immediately after the function name, followed by separate type declarations.[9] An example is int old_add(a, b) int a; int b;, though this form lacks prototype information and is deprecated in modern C, with undefined behavior for type mismatches.[30] Since C23, such old-style declarations are no longer supported in standard-compliant code.[29]
Functions with a variable number of arguments, known as variadic functions, are declared using an ellipsis (...) after the fixed parameters.[9] The syntax is return-type function-name(type param1, ..., type paramN, ...);, as in int printf(const char *format, ...);, which allows the function to accept any number of additional arguments beyond the format string.[30] This feature requires at least one fixed parameter and is defined in section 6.7.5.3 of the C99 standard.[9]
Starting with C99, the inline keyword can be used as a hint to the compiler to inline the function, though it does not guarantee inlining and applies only to function declarations or definitions.[9] The syntax is inline return-type [function](/page/Function)-name(parameter-list);, for example, inline double square(double x);, which suggests optimizing the function by embedding its code at call sites to reduce overhead.[30] Function declarations can also use an empty parameter list () to indicate no arguments, equivalent to (void) in prototypes since C23.[29]
Expressions and Operators
Operator Precedence and Associativity
In the C programming language, operator precedence specifies the order in which operators are evaluated within an expression containing multiple operators, with higher-precedence operators binding more tightly than lower ones. This grouping is crucial for unambiguous parsing, as the syntax grammar in the standard defines precedence levels implicitly through production rules. Associativity determines how operators of the same precedence are grouped when they appear sequentially, either from left to right (e.g., for additive operators) or right to left (e.g., for assignment operators). These rules ensure portable behavior across compliant implementations, as outlined in the ISO/IEC 9899 standard.[29]
The following table summarizes the precedence levels and associativity for C operators, ordered from highest precedence (level 1) to lowest (level 15). Operators within the same level share equal precedence and are grouped according to their associativity. This structure derives directly from the grammar in Annex A of the standard, where postfix operators bind tightest and the comma operator loosest.[29][31]
| Precedence | Operator(s) | Description | Associativity |
|---|
| 1 | ++ -- ( ) [] . -> (type){list} | Postfix increment/decrement, function call, array subscripting, structure/union member access, compound literal (C99) | Left-to-right |
| 2 | ++ -- + - ! ~ (type) * & sizeof _Alignof | Prefix increment/decrement, unary plus/minus, logical NOT, bitwise NOT, cast, indirection, address-of, sizeof, _Alignof (C11) | Right-to-left |
| 3 | * / % | Multiplication, division, remainder | Left-to-right |
| 4 | + - | Addition, subtraction | Left-to-right |
| 5 | << >> | Left/right shift | Left-to-right |
| 6 | < <= > >= | Relational less-than/greater-than or equal | Left-to-right |
| 7 | == != | Equality/inequality | Left-to-right |
| 8 | & | Bitwise AND | Left-to-right |
| 9 | ^ | Bitwise exclusive OR | Left-to-right |
| 10 | ` | ` | Bitwise inclusive OR |
| 11 | && | Logical AND | Left-to-right |
| 12 | ` | | ` |
| 13 | ?: | Ternary conditional | Right-to-left |
| 14 | = += -= *= /= %= <<= >>= &= ^= ` | =` | Simple and compound assignment |
| 15 | , | Comma operator | Left-to-right |
Associativity affects evaluation only for operators of equal precedence; for instance, the expression a + b + c is parsed as (a + b) + c due to left-to-right associativity of the + operator, while a = b = c is parsed as a = (b = c) owing to right-to-left associativity of assignment. The ternary operator ?: also follows right-to-left associativity, as in a ? b : c ? d : e becoming a ? b : (c ? d : e). These conventions prevent ambiguity in chains of operators and align with the standard's expression grammar.[29]
Parentheses explicitly override both precedence and associativity, allowing programmers to enforce a desired grouping; for example, a * (b + c) evaluates the addition before multiplication despite * having higher precedence than +. Without parentheses, the default rules apply strictly.[29]
Expressions with unsequenced side effects, such as modifications and value computations on the same scalar object without an intervening sequence point (e.g., i++ + i), result in undefined behavior, as the standard does not specify the order of evaluation between operators or subexpressions. This can lead to unpredictable results across compilers, emphasizing the need for careful expression design to avoid such cases.[29]
Arithmetic Operators
Arithmetic operators in C provide the fundamental means for performing mathematical computations on integer and floating-point operands. These include the binary operators for addition (+), subtraction (-), multiplication (*), division (/), and the modulo or remainder operation (%), as well as unary forms of addition and negation. Before applying these operators, operands undergo usual arithmetic conversions, which promote integer types to a common type (typically int or larger) and convert integers to floating-point when mixed with floating-point operands, ensuring compatible operations.[32]
The binary + operator computes the sum of its operands, yielding a result of the converted type, while the binary - operator computes their difference. The * operator produces the product of the operands. For the / operator, division behavior varies by type: integer division truncates the quotient toward zero (e.g., 5 / 2 evaluates to 2), as standardized in C99 and later revisions.[32] Floating-point division follows the IEEE 754 mathematical semantics where applicable, producing an exact floating-point quotient unless the divisor is zero, which results in undefined behavior (or infinity/NaN in conforming floating-point implementations).[32]
c
int quotient = 5 / 2; // quotient == 2
double fp_quotient = 5.0 / 2.0; // fp_quotient == 2.5
int quotient = 5 / 2; // quotient == 2
double fp_quotient = 5.0 / 2.0; // fp_quotient == 2.5
The % operator yields the remainder from dividing the first operand (dividend) by the second (divisor), with the result having the same sign as the dividend and satisfying the equation (a / b) * b + (a % b) == a after conversions. In C99 and later, due to toward-zero integer division, negative dividends produce negative remainders; for instance, -5 % 3 evaluates to -2, since -5 / 3 == -1 and (-1) * 3 + (-2) == -5.[32] Modulo by zero invokes undefined behavior, and the operation is undefined if the quotient cannot be represented (e.g., INT_MIN % -1).[32]
c
int remainder = -5 % 3; // remainder == -2 (C99+)
int remainder = -5 % 3; // remainder == -2 (C99+)
The unary + operator promotes its operand to the common real type without altering its value, serving primarily as a no-op in arithmetic contexts. The unary - operator negates the promoted operand; however, negating the minimum representable signed integer (e.g., -INT_MIN) can cause undefined behavior if the result exceeds the type's range.[32]
Overflow in arithmetic operations—where the result exceeds the representable range—triggers undefined behavior for signed integers, potentially leading to implementation-specific outcomes such as wraparound, signal traps, or erroneous optimizations.[32] For unsigned integers, arithmetic is well-defined and modular, wrapping around via modulo $2^n (where n is the type's bit width); for example, UINT_MAX + 1 equals 0.[32]
The precedence and associativity of arithmetic operators influence evaluation order in multi-operator expressions, with *, /, and % binding tighter than + and - (all left-to-right associative).[32]
Relational and Logical Operators
In C, relational operators compare two scalar operands and produce an integer result of 1 if the relation holds and 0 otherwise. These operators include the equality operators == (equal to) and != (not equal to), as well as the ordering operators < (less than), > (greater than), <= (less than or equal to), and >= (greater than or equal to).[26] Prior to comparison, the operands undergo usual arithmetic conversions if they are of arithmetic types, ensuring compatibility such as promoting integers to matching widths or converting to common real types for floating-point values.[26] For example, the expression 5 > 3 evaluates to 1, while 2 < 1 evaluates to 0.
c
int a = 10, b = 5;
if (a == b) { /* false, does not execute */ }
if (a > b) { /* true, executes */ }
int a = 10, b = 5;
if (a == b) { /* false, does not execute */ }
if (a > b) { /* true, executes */ }
When applied to pointers, relational operators are restricted to prevent undefined behavior; comparisons are defined only if both pointers point to the same object or function, both are null pointers, or they refer to elements (or one past the end) of the same array object.[26] In such cases, the ordering reflects the relative positions in memory address space. For instance, within an array int arr[3] = {1, 2, 3};, &arr[0] < &arr[2] yields 1, but comparing pointers from different arrays generally results in undefined behavior unless they satisfy the same-object rule.[26]
Logical operators perform boolean operations on scalar operands, yielding an integer result of 1 if the logical condition is true and 0 if false. These include the binary logical AND &&, binary logical OR ||, and unary logical NOT !.[26] The && operator returns 1 only if both operands compare nonzero to 0, while || returns 1 if at least one operand is nonzero; the ! operator inverts the truth value of its operand, returning 1 if it is 0 and 0 otherwise.[26] Pointers may serve as operands, where a null pointer evaluates as false (0) and a non-null pointer as true (nonzero).[26]
A defining feature of the binary logical operators && and || is short-circuit evaluation, which guarantees left-to-right ordering and skips the second operand if the first determines the overall result: the second operand is not evaluated if the first is 0 for && or nonzero for ||.[26] This behavior introduces a sequence point after the first operand, ensuring side effects from the first are complete before any potential evaluation of the second.[26] For example:
c
int x = 0;
if (x != 0 && (y = 5) > 0) { /* second operand skipped, y unchanged */ }
if (x != 0 || (z = 10) < 0) { /* second operand evaluated only if needed */ }
int x = 0;
if (x != 0 && (y = 5) > 0) { /* second operand skipped, y unchanged */ }
if (x != 0 || (z = 10) < 0) { /* second operand evaluated only if needed */ }
The ! operator has no short-circuit aspect as it is unary and always evaluates its operand fully.[26]
Bitwise Operators
Bitwise operators in the C programming language enable manipulation of individual bits within integer operands, facilitating low-level operations such as masking, setting, toggling, and shifting bits. These operators include the bitwise AND (&), bitwise inclusive OR (|), bitwise exclusive OR (^), unary bitwise NOT (~), left shift (<<), and right shift (>>). All bitwise operators require operands of integer type, and prior to evaluation, operands undergo integer promotions, resulting in types of at least int, or unsigned int if int cannot represent all values of the original type; the result of the operation retains the promoted type.[9]
The bitwise AND operator (&) performs a logical AND on corresponding bits of its two operands, yielding 1 in each bit position only if both operands have 1 in that position. Similarly, the bitwise inclusive OR operator (|) sets a bit to 1 if at least one operand has a 1 in that position, while the bitwise exclusive OR operator (^) sets a bit to 1 only if the corresponding bits in the operands differ. For these binary operators, the usual arithmetic conversions are applied to the operands before the operation. A common application of the AND operator is masking, as in extracting the lower 8 bits of an integer: unsigned int x = 0x12345678; unsigned int masked = x & 0xFF;, which results in 0x78.[9]
The unary bitwise NOT operator (~) inverts all bits of its integer operand, equivalent to one's complement negation. For unsigned types, the result is the maximum value of the type minus the operand value. An example is ~0x0F, which produces all 1s except the lower four bits (implementation-dependent for signed types).[9]
Shift operators perform bit shifting on integer operands after integer promotions. The left shift operator (<<) shifts the bits of the left operand left by the number of positions specified by the right operand, filling vacated bits with zeros; for unsigned or non-negative signed operands, this is equivalent to multiplication by 2 raised to the shift amount. The right shift operator (>>) shifts bits right, filling vacated bits with zeros for unsigned types or non-negative signed types; for negative signed integers, the behavior is implementation-defined, often involving sign extension to preserve the sign bit. Both shift operators require the right operand to represent a non-negative value between 0 and the bit width of the promoted left operand minus 1; shifts where the amount is negative or exceeds or equals the bit width invoke undefined behavior. For instance, 1 << 3 yields 8, and 0x80 >> 1 yields 0x40 for unsigned types.[9]
Assignment Operators
Assignment operators in C provide a mechanism for storing computed values into modifiable lvalues, forming a fundamental part of expression evaluation and control flow. Defined in the ISO/IEC 9899 standard, these operators include the simple assignment operator = and compound variants that integrate arithmetic, bitwise, or shift operations with assignment. They ensure type compatibility through conversions and enforce sequencing rules to manage side effects reliably.[24]
The syntax for assignment expressions follows the grammar: an assignment-expression is either a conditional-expression or a unary-expression followed by an assignment-operator and another assignment-expression. The assignment operators are: =, *=, /=, %=, +=, -=, <<=, >>=, &=, ^=, and |=. In general form, they appear as E1 op= E2, where E1 designates the target lvalue and E2 provides the source value or operand. These operators are right-associative, so chains like a = b = 5 evaluate as a = (b = 5), assigning 5 to b first, then that result to a.[24][29]
For the simple assignment E1 = E2, the value of E2—after conversion to the type of E1—replaces the prior value stored in the object designated by E1. The assignment expression itself yields the value of E1 post-assignment as its result, with the same type as E1 after lvalue conversion, though this result is not itself an lvalue. Conversions follow standard rules: arithmetic types undergo usual arithmetic conversions, while pointer assignments require compatible types, such as pointers to qualified or unqualified versions of the same object type. The left operand E1 must be a modifiable lvalue, excluding const-qualified objects, arrays, incomplete types, or structures/unions with const-qualified members; violations lead to undefined behavior.[24][29]
Compound assignment operators extend this by applying a binary operation before storing the result. Specifically, E1 op= E2 is semantically equivalent to E1 = E1 op E2 (or a temporary holding E1's value if needed for type reasons), ensuring E1 is evaluated only once to avoid multiple side effects. For instance, += adds the converted value of E2 to E1, while <<= shifts E1 left by the value of E2 bits. Type rules mirror the underlying operators, with the right operand converted as in simple assignment. An example is:
c
int x = 10;
x += 5; // Equivalent to x = x + 5; now x is 15
int x = 10;
x += 5; // Equivalent to x = x + 5; now x is 15
This updates x in place, with the expression value being 15.[24][29]
The side effect of the assignment is sequenced after the evaluations of both operands. The order of evaluation of the operands is unspecified. However, subexpressions within E2 are unsequenced relative to E1's side effects, potentially leading to undefined behavior if they overlap inexactly. In practice:
c
int a = 1;
a = a + 1; // Safe: old value of a used in computation before update
int a = 1;
a = a + 1; // Safe: old value of a used in computation before update
This ensures predictable behavior, as the addition uses a's initial value before assigning the result. Volatile-qualified lvalues impose extra constraints, prohibiting actual modifications via assignment if volatile, to preserve memory consistency.[24][29]
Other Operators
In addition to the primary categories of operators, C provides several miscellaneous operators that handle addressing, sizing, conditional selection, sequencing, and member access. These include the unary address-of operator (&), indirection operator (*), sizeof operator, ternary conditional operator (?:), comma operator (,), and the member access operators (. and ->). The address-of and indirection operators are essential for working with pointers, allowing retrieval of memory addresses and access to values at those addresses, respectively.[33]
The address-of operator (&) takes an lvalue operand and returns a pointer to its effective type. Its syntax is & followed by the operand, such as &var where var is a variable. The result cannot be an lvalue and is never null for objects or functions. For example:
c
int x = 10;
int *p = &x; // p holds the address of x
int x = 10;
int *p = &x; // p holds the address of x
This operator is defined in the C standard as yielding the address of the operand, with restrictions against applying it to bit-fields or non-lvalue expressions.[34]
The indirection operator (*) dereferences a pointer operand, yielding an lvalue of the pointed-to type if the pointer is valid. Its syntax is * followed by a pointer expression, such as *ptr. The operand must be a pointer to a complete object type or an incomplete type (except void). For example:
c
int x = 10;
int *p = &x;
int y = *p; // y is 10, dereferencing p
int x = 10;
int *p = &x;
int y = *p; // y is 10, dereferencing p
Undefined behavior occurs if the pointer does not point to a valid object. This operator is specified in the C standard to access the value indirectly through the pointer.[34]
The sizeof operator determines the size in bytes of a type or the result of an expression, returning an unsigned integer of type size_t. It has two forms: sizeof unary-expression or sizeof( type-name ). The expression form does not evaluate the operand if it is not a variable-length array (VLA), for variable-length arrays (VLAs introduced in C99), the sizeof operator evaluates the operand to compute the size at runtime. For example:
c
size_t s1 = sizeof(int); // size of int type
int arr[5];
size_t s2 = sizeof arr; // size of the array (typically 5 * sizeof(int))
size_t s1 = sizeof(int); // size of int type
int arr[5];
size_t s2 = sizeof arr; // size of the array (typically 5 * sizeof(int))
The result is a constant expression except for VLAs, and it is never evaluated for parenthesized type names. This is detailed in the C standard as yielding the alignment-adjusted size.[34][35]
The conditional operator (also known as the ternary operator) selects one of two expressions based on a condition, with syntax logical-OR-expression ? expression : conditional-expression. It associates right-to-left, and only the selected expression is evaluated. The first operand is implicitly converted to bool (non-zero is true), and the common type of the second and third operands determines the result type. For example:
c
int a = 5, b = 3;
int max = (a > b) ? a : b; // max is 5
int a = 5, b = 3;
int max = (a > b) ? a : b; // max is 5
The result is not an lvalue, and arithmetic conversions are applied to the selected value. The C standard specifies this as the only ternary operator, with rules for type compatibility between the second and third operands.[34]
The comma operator (,) sequences evaluations, evaluating its left operand (discarding the value) before the right, with a sequence point in between, and yields the right operand's value (as converted to the expression's type). It has left-to-right associativity and syntax expression , assignment-expression. Both operands must be valid expressions, and the result is not an lvalue. For example:
c
int a = 1, b = 2;
int result = (a += 1, b * 2); // evaluates a += 1 (discards), then b * 2; result is 4
int a = 1, b = 2;
int result = (a += 1, b * 2); // evaluates a += 1 (discards), then b * 2; result is 4
This guarantees order of evaluation and is useful in for loops or macros, but overuse can reduce readability. The C standard defines it as producing the value of the right operand after full evaluation of the left.[34]
Member access operators provide syntax for accessing members of structures and unions. The dot operator (.) accesses a member of a struct or union lvalue, with syntax struct-or-union-expression . identifier. The arrow operator (->) accesses a member through a pointer to struct or union, with syntax pointer-to-struct-or-union-expression -> identifier, equivalent to (*pointer).identifier. For example:
c
struct Point { int x, y; };
struct Point p = {1, 2};
int val = p.x; // dot access
struct Point *ptr = &p;
int val2 = ptr->y; // arrow access
struct Point { int x, y; };
struct Point p = {1, 2};
int val = p.x; // dot access
struct Point *ptr = &p;
int val2 = ptr->y; // arrow access
These operators have higher precedence than most binary operators and yield an lvalue if the member is modifiable. The evaluation order of the left operand and member selection is unspecified, per the C standard.[34]
Statements
Expression Statements
In the C programming language, an expression statement consists of an expression followed by a semicolon, which causes the expression to be evaluated solely for its side effects while discarding any resulting value. This construct allows developers to perform operations such as assignments, function calls, or modifications to variables as standalone units of execution within a program. According to the C standard, the syntax is defined as an optional expression terminated by a semicolon, where the expression is fully evaluated, its side effects are applied, and the value (if any) is ignored.[36]
A common use of expression statements is for assignments, such as a = b;, which assigns the value of b to a and discards the resulting value of the assignment expression itself. Increment and decrement operations also frequently appear as expression statements, for example ++i; to pre-increment i or i++; to post-increment it, both of which modify the variable and discard the temporary value produced. These statements are essential in contexts like loops or sequential code where side effects drive program behavior without needing to capture return values. The C standard specifies that side effects from the expression must be completed before the next sequence point, ensuring predictable ordering in execution.[36]
Expression statements can include operations without inherent side effects, such as sizeof(a);, which evaluates the size of a but discards the result; this is permitted but often serves diagnostic purposes or suppresses unused expression warnings in certain compilers. To explicitly indicate that a value should be discarded—particularly for function calls returning values that might otherwise trigger compiler warnings—programmers can use a void cast, as in (void)func();, which treats the expression as having type void. This practice is recommended in the C standard's semantics for clarity and portability, avoiding undefined behavior from ignored returns in functions where side effects are the intent. The standard emphasizes that such statements must form valid expressions, adhering to type rules and constraints like avoiding incomplete types where prohibited.[36]
Compound Statements
In C, a compound statement, also known as a block, groups zero or more declarations and statements into a single syntactic unit enclosed by matching braces, allowing multiple lines of code to be treated as one statement wherever a single statement is expected.[8] This construct is essential for structuring code in functions, control structures, and other contexts, promoting modularity and readability by enabling the sequential execution of its contents.[8]
The syntax of a compound statement is defined as { block-item-list_opt }, where block-item-list consists of one or more block-items, and each block-item is either a declaration, an unlabeled statement, or a label.[8] Braces are required to delimit the block, and since C99, declarations may be intermixed with statements within the block rather than restricted to the beginning.[8] For example:
c
{
int x = 1; // Declaration
[printf](/page/Printf)("%d\n", x); // Statement
x++; // Another statement
}
{
int x = 1; // Declaration
[printf](/page/Printf)("%d\n", x); // Statement
x++; // Another statement
}
This example demonstrates a compound statement containing a declaration followed by statements, all executed in order upon entering the block.[8]
A compound statement introduces a block scope, in which any identifiers declared within it have block scope and are visible only from their point of declaration until the end of the block, including any nested blocks.[8] Objects declared in a block have automatic storage duration, with their initializers evaluated and the objects initialized in declaration order each time control enters the block.[8] Variable declarations within a block are thus local to that scope and do not affect outer scopes unless shadowed by redeclaration.[8]
An empty compound statement, consisting solely of { } with no declarations or statements, is valid and equivalent to a null statement, performing no action but serving as a placeholder where a statement is required.[8] For instance, { } can be used to explicitly indicate an empty body in certain constructs without altering program flow.[8]
Compound statements support nesting, where a block can contain another compound statement, creating a hierarchical structure of scopes in which inner blocks are subsets of outer ones and can access (but potentially shadow) identifiers from enclosing scopes.[8] This nesting allows for fine-grained control over variable lifetime and visibility, with no explicit limit beyond the implementation's translation constraints, such as a minimum of 63 nesting levels for blocks.[8] An example of nesting is:
c
{
int outer = 10;
{
int inner = outer + 1; // Accesses outer
[printf](/page/Printf)("%d\n", inner); // Outputs 11
} // inner destroyed here
// outer still accessible
}
{
int outer = 10;
{
int inner = outer + 1; // Accesses outer
[printf](/page/Printf)("%d\n", inner); // Outputs 11
} // inner destroyed here
// outer still accessible
}
In C23, the core syntax and semantics of compound statements remain consistent with prior standards, though labels may now appear before declarations within blocks for improved flexibility.[8]
Selection Statements
Selection statements in the C programming language provide mechanisms for conditional execution, allowing control flow to branch based on the evaluation of expressions. These statements enable a program to select and execute different blocks of code depending on specific conditions, facilitating decision-making without repetition of logic. The primary selection statements are the if statement and the switch statement, each designed for different scenarios: the if statement for general boolean conditions, and the switch statement for multi-way branching on integral values.[18]
The if statement evaluates a controlling expression and executes an associated statement if the result is true. Its syntax is if ( expression ) statement for the basic form, or if ( expression ) statement else statement when including an alternative branch. The controlling expression must have a scalar type, such as arithmetic or pointer, and is contextually converted to determine its truth value: a non-zero result is considered true, while zero is false. Upon evaluation, if the expression is true, the first statement (which may be a compound statement enclosed in braces) is executed; otherwise, if the else clause is present, the second statement is executed instead. A sequence point occurs after the expression evaluation, ensuring side effects are complete before control transfer. In nested if constructs, an else associates with the nearest preceding if.[18]
c
if (x > 0) {
printf("Positive\n");
} else {
printf("Non-positive\n");
}
if (x > 0) {
printf("Positive\n");
} else {
printf("Non-positive\n");
}
This example demonstrates conditional output based on the sign of x, with the else providing the alternative path.[18]
The switch statement supports efficient multi-way selection on an integral controlling expression, transferring control to labeled statements within its body. Its syntax is switch ( expression ) statement, where the expression undergoes integer promotions to yield an integer type value, and the statement is typically a compound statement containing case and optional default labels. The case labels take the form case constant-expression :, where constant-expression is an integer constant expression unique within the switch; control transfers to the statement following the matching case. If no match occurs, execution proceeds to the default : label if present; otherwise, the entire body is skipped. Labels must appear inside the compound statement, and multiple case labels may precede a single statement to group values.[18]
A key feature of the switch statement is its fall-through behavior: execution continues sequentially from the matched label to subsequent statements unless interrupted by a jump statement like break, allowing shared code across cases without explicit duplication. Implementations must support at least 257 case constants.[18]
c
switch (grade) {
case 'A':
case 'B':
printf("Good\n");
break;
case 'C':
printf("Average\n");
break;
default:
printf("Below average\n");
}
switch (grade) {
case 'A':
case 'B':
printf("Good\n");
break;
case 'C':
printf("Average\n");
break;
default:
printf("Below average\n");
}
In this example, grades 'A' or 'B' share the "Good" output due to fall-through, terminated by break to prevent further execution. The default handles unmatched values. Jump statements such as break or continue can be used within switch to alter flow, with break exiting the statement entirely.[18]
In the C23 standard (ISO/IEC 9899:2024), enhancements to labeled statements improve flexibility within switch bodies. Labels, including case and default, may now appear before declarations and at the end of compound statements, and can optionally precede an empty statement if no code follows immediately. Additionally, attribute specifiers (e.g., [[fallthrough]]) may be applied before labels to annotate intentional fall-through, aiding code clarity and analysis without altering semantics. These changes, introduced to resolve longstanding restrictions on label placement, maintain backward compatibility while supporting more structured code in complex switches.[8]
Iteration Statements
Iteration statements in C provide mechanisms for repeating the execution of a block of code based on a condition, enabling efficient handling of repetitive tasks without code duplication. These statements include the while loop, do-while loop, and for loop, each offering variations in when the controlling expression is evaluated and how initialization and iteration updates are handled. They establish a block scope for any variables declared within their clauses or bodies, as specified since C99.[25]
The while statement executes its associated statement repeatedly as long as the controlling expression evaluates to a nonzero value. Its syntax is while (expression) [statement](/page/Statement);, where the expression is tested before each iteration, potentially resulting in zero executions if the condition is false initially. The statement, often a compound statement enclosed in braces, forms the loop body. Within the body, the break statement terminates the loop immediately, transferring control to the statement following the while construct, while the continue statement skips the remainder of the current iteration and proceeds to re-evaluate the expression.[25][25]
In contrast, the do-while statement ensures the loop body executes at least once, with the controlling expression evaluated after each iteration. Its syntax is do statement while (expression);, differing from while by postponing the test, which is useful when the body must run unconditionally first. Break and continue behave similarly here: break exits the loop, and continue jumps to the expression evaluation for the next iteration. This post-test evaluation distinguishes do-while from while, where pre-test might skip the body entirely.[25][25]
The for statement combines initialization, condition testing, and iteration updating into a compact form, equivalent to the sequence init-clause; while (cond-expression) { loop-[statement](/page/Statement); iteration-expression; }. Its syntax is for (init-clause; cond-expression; iteration-expression) [statement](/page/Statement);, where the init-clause (an expression or declaration since C99) runs once before the loop, cond-expression is checked before each iteration (omission implies true, creating an infinite loop), and iteration-expression executes after the body. Variables declared in init-clause are scoped to the entire for statement. Break terminates the for loop, and continue advances to the iteration-expression before re-testing the condition. For example:
c
int i;
for (i = 0; i < 5; ++i) {
printf("%d\n", i); // Outputs 0 to 4
}
int i;
for (i = 0; i < 5; ++i) {
printf("%d\n", i); // Outputs 0 to 4
}
This structure facilitates counted loops, such as array traversals.[25]
Infinite loops occur when the controlling condition never becomes false, often intentionally for event-driven programs. Common idioms include for (;;) (omitting all clauses) or while (1), where 1 is a nonzero constant ensuring perpetual execution unless altered by break or other control flow. Such loops require explicit termination mechanisms to avoid undefined behavior from non-termination without observable side effects.[25]
Jump Statements
Jump statements in the C programming language enable unconditional transfers of control, altering the sequential flow of execution to specific points within a function or to exit enclosing constructs. Defined in section 6.8.6 of the ISO/IEC 9899:1999 standard (C99), these statements include goto, continue, break, and return, each serving distinct purposes in program control while adhering to strict scoping and type constraints to ensure portability and reliability.[25]
The goto statement facilitates direct jumps to a labeled statement within the same function, providing a mechanism for unstructured control flow when structured alternatives are insufficient. Its syntax is goto identifier;, where identifier refers to a label declared as identifier: statement earlier or later in the function body.[25] Labels must be unique within their function and apply only to the immediately following statement, including compound statements.[25] Upon execution, control transfers immediately to the statement following the matching label, bypassing all code in between.[25] However, goto is restricted: it shall not jump from outside the scope of an identifier with a variably modified type (such as an array size determined at runtime) into that scope, preventing undefined behavior from uninitialized or improperly scoped variables.[25] Unlike C++, which prohibits jumping over any automatic variable declarations, C permits jumps over automatic variables as long as they are not variably modified.[25]
For example, the following code uses goto to skip error handling:
if (condition) {
goto cleanup;
}
// some code
cleanup:
free(resources);
}
if (condition) {
goto cleanup;
}
// some code
cleanup:
free(resources);
}
The continue statement, applicable only within iteration statements like for, while, or do-while, transfers control to the end of the current iteration, effectively skipping the remaining body of that loop cycle and proceeding to the next.[25] Its syntax is simply continue;.[25] This allows selective omission of loop body parts without fully exiting the loop.[25]
The break statement terminates the innermost enclosing switch statement or iteration statement, transferring control to the code immediately following that construct.[25] With syntax break;, it is constrained to appear only within such contexts.[25] This provides early exit capabilities, enhancing control in repetitive or selective structures.[25]
Finally, the return statement ends execution of the current function and returns control to the caller.[25] Its syntax is return [expression];, where the optional expression supplies a value compatible with the function's declared return type; for void functions, no expression is permitted.[25] The expression, if present, undergoes the usual arithmetic conversions and must be assignable to the return type without loss of information.[25] If the function returns a value but return lacks an expression, or vice versa for non-void functions, the behavior is undefined.[25]
An illustrative return usage in a function:
int compute(int x) {
if (x < 0) {
return -1; // Early return with value
}
// computations
return result;
}
int compute(int x) {
if (x < 0) {
return -1; // Early return with value
}
// computations
return result;
}
Functions
Function Definition
In C, a function definition provides the complete implementation of a function, specifying its return type, identifier, parameters (if any), and executable body. The syntax follows the form declaration-specifiers declarator declaration-list_opt compound-statement, where the declaration specifiers include the return type (such as int or void), the declarator names the function and lists parameters, and the optional declaration list provides types for parameters in older K&R-style definitions (deprecated in modern standards).[9] For example, a simple function might be defined as int add(int a, int b) { return a + b; }, where int is the return type, add is the identifier, (int a, int b) declares the parameters, and the braces enclose the body.[8] If no parameters are needed, the parameter list is (void) to explicitly indicate this, preventing ambiguity with older identifier-list forms.[9]
The parameter list consists of zero or more parameter declarations, each specifying a type and identifier, such as parameter-type-list in the form (type1 name1, type2 name2, ...). Parameters have automatic storage duration, block scope limited to the function, and are treated as lvalues upon entry to the function body; array or function parameter types are adjusted to pointers as per type compatibility rules. The function body is a compound statement { block-item-list_opt }, which may contain a sequence of declarations and statements executed sequentially when the function is called. Storage-class specifiers like static or extern are permitted for the function itself, but parameters cannot include most storage-class specifiers except register in legacy contexts.[8] Function definitions must appear at file scope and cannot be nested within other functions.[9]
Recursion is permitted in C, allowing a function to invoke itself directly or indirectly, provided sufficient stack space is available at runtime; there are no language-level restrictions on recursive calls. For instance, a recursive factorial function can be defined as unsigned long factorial(unsigned int n) { return n == 0 ? 1 : n * factorial(n - 1); }.[8] If a prior function declaration (prototype) exists, the definition's return type, parameter types, and variadic nature must match it exactly for compatibility.[9]
Prior to C99, variable declarations within a function body were required to appear at the beginning of each block before any statements. Starting with the C99 standard (ISO/IEC 9899:1999), declarations can be intermixed with statements anywhere within a block, improving flexibility while maintaining block scope and automatic storage duration for local variables unless otherwise specified. This change applies to all compound statements, including function bodies, and has been retained in subsequent standards like C11 and C23.[8]
Parameter Passing
In C, parameters are passed to functions by value, meaning that the function receives copies of the arguments provided by the caller, rather than references to the original data. This mechanism ensures that modifications to the parameters within the function do not affect the caller's variables, promoting function isolation and predictability. For instance, consider the function definition int square(int x) { return x * x; }; when called as int result = square(5);, the value 5 is copied into the parameter x, and any change to x inside the function would only impact the local copy.[29][1]
To achieve behavior similar to pass-by-reference, where the function can modify the caller's data, pointers are used by passing the address of a variable. The function declares a parameter as a pointer (e.g., void increment(int *p) { (*p)++; }), and the caller provides the address using the address-of operator (e.g., int val = 10; increment(&val);), allowing the function to dereference and alter the original value. This idiom is essential for tasks like swapping variables or updating multiple outputs from a single function call, as direct pass-by-reference is not supported in C.[29][1]
C also supports variadic functions, which accept a variable number of arguments beyond a fixed set of parameters, declared using an ellipsis (...) in the parameter list (e.g., int sum(int count, ...)). These additional arguments are accessed within the function using macros from the <stdarg.h> header, such as va_start to initialize a va_list, va_arg to retrieve each argument by type, and va_end to clean up. Standard library functions like printf exemplify this, enabling flexible formatting with an indeterminate number of inputs.[29][1]
Unlike some languages, C does not support default parameter values, named arguments, or function overloading; each function must have a unique name, and all arguments must be provided explicitly in the call. As a special case, arrays passed as arguments are treated as pointers to their first element, allowing efficient access without copying the entire array.[29][1]
Function Calls
In C, a function call is a postfix expression consisting of a function designator followed by parentheses enclosing an optional comma-separated list of argument expressions.[26] The function designator is typically an identifier naming the function or an expression yielding a pointer to a function, such as the dereferenced form (*ptr)(args) where ptr is a function pointer.[26] If the function has a prototype in scope, the number and types of arguments must correspond to the parameters, with arguments undergoing implicit conversions to match the parameter types; otherwise, default argument promotions apply.[26]
The order of evaluation of the function designator and the argument expressions is unspecified, but a sequence point occurs after all of them are evaluated and before the actual function execution begins, ensuring that all side effects from the evaluations are complete.[26] This means that if arguments have side effects, such as increments or assignments, the relative ordering of those effects is not defined, though the overall program behavior remains determinate due to the sequence point.[26] For example, in a call like f(a++, b++), the increments occur before the function body starts, but whether a or b is incremented first is implementation-dependent.[26]
Prior to the C99 standard, calling an undeclared function resulted in an implicit declaration assuming a return type of int and unspecified parameters, but this feature is deprecated in C99 and removed in later standards, with compilers required to issue a diagnostic for such cases.[25] In conforming C99 and subsequent implementations, a function must be declared before it is called to avoid undefined behavior from type mismatches or linkage issues.[25][26]
When the called function returns void, the function call expression itself has type void and produces no value, so it cannot be used in a context expecting a value, such as an assignment; such calls are typically used solely for their side effects.[26] For non-void functions, the call yields the returned value, converted as necessary to the expected type of the expression.[26]
c
#include <stdio.h>
void greet(int count) {
printf("Hello, called %d times.\n", count);
}
int main() {
int x = 5;
greet(x++); // Increments x after evaluation, but order with other args unspecified if multiple
return 0;
}
#include <stdio.h>
void greet(int count) {
printf("Hello, called %d times.\n", count);
}
int main() {
int x = 5;
greet(x++); // Increments x after evaluation, but order with other args unspecified if multiple
return 0;
}
This example demonstrates a void function call where the argument x++ is evaluated and promoted to int to match the parameter, with the side effect (increment) sequenced before the function body.[26]
Function Pointers
In C, a function pointer is a variable that stores the memory address of a function, enabling indirect invocation and dynamic function selection at runtime. This construct allows functions to be passed as arguments to other functions, returned from functions, or stored in data structures, facilitating flexible programming patterns such as callbacks and plugin systems. Unlike pointers to objects, function pointers adhere to specific declaration and usage syntax defined in the C standard.[37]
The declaration of a function pointer follows the form return_type (*pointer_name)(parameter_list);, where the parentheses around the *pointer_name ensure the asterisk binds to the identifier rather than forming an array of functions, which is invalid in C. For instance, int (*fp)(int); declares fp as a pointer to a function that takes an int parameter and returns an int. This syntax is specified in the pointer declarator rules of the ISO C standard.[37] (ISO/IEC 9899:2023 draft, section 6.7.6.1)
Assignment to a function pointer involves setting it to the address of a compatible function, using either the address-of operator & explicitly or implicitly, as the standard performs an automatic conversion from a function designator to a function pointer. Thus, int func(int x) { return x * 2; } followed by fp = &func; or simply fp = func; both assign the address of func to fp, provided the function signatures match. Compatibility requires the return type and parameter types to be identical, per the type compatibility rules in the standard.[37] (ISO/IEC 9899:2023 draft, section 6.3.2.1)
To invoke a function through a pointer, apply the function call operator () to the pointer, optionally dereferencing with * though it is not required due to operator precedence. For example, int result = (*fp)(5); or int result = fp(5); calls the function pointed to by fp with argument 5, yielding the return value. This indirect call syntax supports runtime polymorphism in C without inheritance.[37] (ISO/IEC 9899:2023 draft, section 6.5.2.2)
Function pointers can also form arrays, declared as return_type (*array_name[size])(parameter_list);, creating an array of function pointers. An example is int (*operations[3])(int, int);, which declares an array of three pointers to functions taking two int arguments and returning an int, useful for dispatching among multiple handlers. Elements are assigned and accessed like any array of pointers, with indexing starting at 0.[37] (ISO/IEC 9899:2023 draft, section 6.7.6.2)
Advanced Features
Arrays
In C, arrays provide a means to store a fixed number of elements of the same type in contiguous memory locations. The declaration syntax for an array specifies the element type, the array name, and the number of elements, using the form type name[size];, where size is an integer constant expression evaluated at compile time.[9] This form allocates space for exactly size elements, each of the specified type. For example, int arr[5]; declares an array of five integers. The size must be a positive integer constant expression, and the array type is incomplete until the size is specified.[9]
Starting with the C99 standard, variable-length arrays (VLAs) extend this syntax to allow the size to be determined at runtime, using type name[expression]; where expression is an integer expression evaluated during execution.[9] VLAs are allocated on the stack and have automatic storage duration, but their use is optional in C11, C17, and C23 (for automatic storage duration), while variably modified types are mandatory in C23.[38] For instance, int n = 10; int vla[n]; creates an array whose size depends on the value of n at runtime. Unlike fixed-size arrays, VLAs cannot be initialized at declaration with a brace-enclosed list.[9]
Elements of an array are accessed using subscript notation array-name[index], where index is an expression of integer type.[29] The index starts from 0 for the first element and must be converted to ptrdiff_t for pointer arithmetic computations, though the standard requires only that it be an integer type. Accessing an index outside the bounds [0, size-1] results in undefined behavior, as C performs no bounds checking.[29] This notation is equivalent to *(array-name + index), illustrating the close relationship between arrays and pointers, where the array name decays to a pointer to its first element in most contexts.
Multidimensional arrays are declared by specifying multiple size dimensions in succession, such as type name[size1][size2]; for a two-dimensional array.[9] For example, int matrix[3][4]; declares an array equivalent to an array of three arrays, each containing four integers, stored in row-major order in contiguous memory. Access uses nested subscripts like matrix[i][j], which is parsed as *( (matrix + i) + j ), treating rows as sub-arrays.[29] Higher dimensions follow similarly, with each additional pair of brackets defining another level.[9]
Arrays can be initialized at declaration using a brace-enclosed list of values, as in type name[] = {value1, value2, ...};, where the size may be omitted and inferred from the number of initializers. For partial initialization, such as int arr[5] = {1, 2};, the provided values fill the initial elements, and the remaining elements are implicitly zero-initialized.[9] This zero-filling applies to all remaining elements in the array, ensuring predictable default values without explicit assignment. Multidimensional arrays use nested braces for initialization, like int matrix[2][3] = {{1, 2}, {3}};, where partial lists zero-fill the unspecified elements.[9]
Pointers
In the C programming language, a pointer is an object that stores the memory address of another value located in computer memory, allowing indirect access to that value.[24] Pointers facilitate dynamic memory management, array traversal, and passing data by reference, but they require careful handling to avoid undefined behavior such as dereferencing invalid addresses.[24]
Pointers are declared using the syntax type *identifier;, where type specifies the type of the object to which the pointer can point, and the asterisk * indicates a pointer declarator.[24] This declaration creates a pointer variable capable of holding the address of an object of the specified type, with optional type qualifiers like const or volatile to restrict modifications.[24] For example, int *p; declares a pointer p to an integer.[24] A generic pointer type, void *, can hold the address of any object type without specifying the pointed-to type, though it cannot be directly dereferenced and requires casting to a specific type for access.[24]
The address of an lvalue object or function can be obtained using the unary address-of operator &, which yields a pointer to the operand's type.[24] For instance, if int x = 5;, then &x produces an int * pointing to x.[24] A null pointer, which does not point to any valid object or function, is represented by a null pointer constant such as the integer constant expression 0 or (void *)0; the standard library header <stddef.h> defines the macro NULL as a null pointer constant suitable for any pointer type.[24] Any two null pointers compare equal, and a null pointer compares unequal to any pointer to an object or function.[24]
To access the value at the address stored in a pointer, the unary dereference operator * is applied, yielding an lvalue of the pointed-to type, provided the pointer is valid and points to an accessible object.[24] Dereferencing a null pointer or an invalid pointer results in undefined behavior.[24] For example, if p points to an int, then *p retrieves or modifies that integer.[24]
Pointer arithmetic operations, such as addition (+), subtraction (-), increment (++), and decrement (--), are defined only for pointers to qualified or unqualified versions of compatible complete object types, typically elements of the same array.[24] Adding an integer n to a pointer p computes p + n * [sizeof](/page/Sizeof)(*p), advancing the pointer by n elements scaled by the size of the pointed-to type; the result must not exceed one past the end of the array.[24] Subtracting two pointers to elements of the same array yields the difference in their positions as a ptrdiff_t value, representing the number of elements between them.[24] Operations outside an array or on pointers to different objects invoke undefined behavior.[24] Array expressions in most contexts decay to pointers to their first element, enabling uniform pointer-based access.[24]
Relational operators (<, <=, >, >=) and equality operators (==, !=) can compare pointers to compatible types or to void, but comparisons are well-defined only if both pointers point to elements of the same aggregate object (including one past the last element) or if at least one is a null pointer.[24] Two pointers compare equal if they point to the same object, the same function, or are both null pointers; otherwise, equality comparisons between unrelated pointers yield undefined results.[24]
Introduced in C11, the _Alignof operator returns the alignment requirement of a type, expressed as a size_t constant equal to the smallest alignment value for which the type is naturally aligned, which is a nonnegative integer power of two.[24] The syntax is _Alignof (type-name), applicable to complete object types but not to functions or incomplete types like void.[24] For example, _Alignof (int) yields the byte boundary on which integers must be aligned for optimal access.[24] This operator aids in ensuring proper memory layout for pointers and data structures.[24]
Structures and Unions
In C, structures (struct) are user-defined aggregate types that group variables, known as members, of potentially different types into a single composite object, allowing for the representation of complex data entities such as points in space or employee records. The syntax for declaring a structure type involves the struct keyword, an optional identifier (tag) for the structure name, and a block of member declarations enclosed in curly braces, terminated by a semicolon. For example:
c
struct point {
int x;
int y;
};
struct point {
int x;
int y;
};
This declares a structure type named point with two integer members. Members must be of complete object types, excluding functions or incomplete arrays, and no two members in the same structure can share the same name. The size of a structure is at least the sum of its members' sizes, with possible padding bytes inserted between members for alignment purposes, which is implementation-defined.[18]
Access to structure members occurs via the dot operator (.) for lvalue expressions designating the structure object itself, or the arrow operator (->) when accessing through a pointer to the structure, equivalent to (*pointer).member. For instance, if struct point p = {1, 2}; and struct point *ptr = &p;, then p.x or ptr->x retrieves the value of the x member. Structures differ from arrays in that they support heterogeneous member types with named access, whereas arrays hold homogeneous elements accessed by index. Each member has its own distinct address within the structure, stored contiguously in memory order as declared.[18]
Unions (union) extend the structure concept by allowing multiple members to occupy the same memory location, with the union's size determined by the largest member's size plus any necessary padding, ensuring all members start at the same offset. The declaration syntax mirrors that of structures but uses the union keyword:
c
union data {
[int](/page/INT) i;
[float](/page/Float) f;
};
union data {
[int](/page/INT) i;
[float](/page/Float) f;
};
In a union, only one member can be considered active at any time; modifying one member and then reading another may yield undefined behavior, as the shared storage interprets the bits according to the accessed member's type. Unlike structures, where all members maintain separate storage and can be active simultaneously, unions enable memory-efficient variants for discriminated unions, such as tagged alternatives in parsers. Access to union members follows the same . and -> operators as structures.[18]
Bit-fields provide a mechanism within structures or unions to allocate a specified number of bits for integer members, promoting compact storage for flags or small integers. A bit-field declarator appends a colon followed by a non-negative constant expression indicating the width (in bits), applicable only to types like _Bool, signed int, unsigned int, or other implementation-defined integer types, with the width not exceeding the type's bit precision. For example:
c
struct flags {
unsigned int valid:1;
unsigned int error:3;
};
struct flags {
unsigned int valid:1;
unsigned int error:3;
};
Adjacent bit-fields of the same type may pack into a single storage unit without padding between them, though the exact allocation (endianness or padding) is implementation-defined; a zero-width bit-field can force alignment separation. Bit-fields cannot have their addresses taken with the & operator, as they may not correspond to complete addressable units, but they are accessed like regular members using . or ->.[18]
Introduced in C11, anonymous structures and unions allow nesting within a parent structure or union without a tag or name, promoting cleaner code by directly incorporating members into the parent's namespace. The syntax omits the identifier:
c
struct container {
int id;
struct {
char name[20];
}; // Anonymous structure
};
struct container {
int id;
struct {
char name[20];
}; // Anonymous structure
};
Here, the anonymous structure's members, like name, are accessed directly as container.name, as if declared in the parent. This feature supports only one level of anonymity per nesting and is unavailable in earlier C standards like C99. Initialization of structures and unions as aggregates follows designated initializer rules, but detailed syntax is covered elsewhere.[18]
Initialization
In C, initialization provides initial values to objects at the point of declaration, using an initializer clause following the declarator. For scalar types such as integers or pointers, the syntax is straightforward: the variable is declared and assigned a value using the equals sign, as in int x = 42;. This form is supported across all C standards, including C89, and ensures the object receives the specified constant or expression value at compile time.[9]
For aggregate types like arrays or structures, initializers use brace-enclosed lists of values, where the number of provided values determines how many elements are explicitly set, with remaining elements defaulting to zero if the aggregate has static or thread storage duration. For example, int arr[3] = {1, 2}; initializes the first two elements to 1 and 2, respectively, and the third to 0. An empty brace list {} or {=} performs zero initialization for the entire aggregate, setting all elements to zero (or null for pointers), which is particularly useful for ensuring predictable default states. This brace notation originates from early C standards and was formalized in ISO/IEC 9899:1990.[9]
Introduced in C99 (ISO/IEC 9899:1999), designated initializers allow explicit assignment to specific elements or members by index or name, enabling sparse or out-of-order initialization without affecting unspecified elements, which default to zero. The syntax uses square brackets for array indices or dot notation for structure members within the brace list, such as int a[5] = { [2] = 10, [0] = 1 };, which sets a to 1, a[39] to 10, and all other elements to 0. This feature enhances readability and flexibility, especially for large aggregates, and is backward-compatible with prior standards when no designators are used.[9]
Also from C99, compound literals create unnamed objects of aggregate or union type with an initializer list, using parenthesized type followed by braces, like (int[]){1, 2, 3}. This expression yields a pointer to the first element of the temporary object, which has automatic storage duration if within a block or static otherwise, allowing runtime-like initialization in expressions. Compound literals support designated initializers and are useful for passing initialized arrays to functions without global variables.[9]
For character arrays, string literal initialization automatically appends a null terminator, as in char s[] = "hello";, which allocates space for six characters (including '\0') and copies the string contents. This implicit sizing and null termination simplifies string handling and is defined in the core grammar for initializers. Designated initializers and compound literals can also apply to character arrays for partial or dynamic string setups.[9]
Structures support aggregate initialization with brace lists matching member order, and designated initializers permit naming specific members, such as struct point { int x, y; } p = { .y = 5, .x = 3 };.[9]
Attributes
In C, attributes provide hints to the compiler about the properties of declarations, enabling optimizations, warnings, and better code generation without altering the core language semantics. These are primarily extensions in GNU C, but some have been standardized in later revisions of the ISO C standard. Attributes can be applied to functions, variables, types, and other entities to convey information such as non-returning behavior or deprecation status.
The GNU extension uses the __attribute__ keyword, placed immediately after the declaration it modifies. The syntax is declarator __attribute__((attribute-list)), where attribute-list is a comma-separated sequence of attributes enclosed in double parentheses. For example, to mark a function as non-returning: void [exit](/page/Exit)(int status) __attribute__((noreturn));. This placement binds the attribute to the specific declarator, and it can also appear before the declarator in some contexts for clarity. An alternative GNU syntax, compatible with C23, uses [[gnu::attribute]] to specify vendor-specific attributes. These attributes are applied to function declarations to inform the compiler about expected behavior, such as optimization opportunities in non-returning functions.
Common GNU attributes include aligned(n), which specifies a minimum alignment in bytes (a power of two) for the entity's storage, useful for performance on architectures with specific alignment requirements; for instance, int x __attribute__((aligned(16))); ensures 16-byte alignment. The packed attribute, typically used on structures or unions, instructs the compiler to eliminate padding bytes between members for tight packing, as in struct s { char a; int b; } __attribute__((packed));. The unused attribute suppresses warnings for entities that might not be referenced, applied like void f(void) __attribute__((unused));. Additionally, deprecated marks an entity as discouraged for future use, triggering a diagnostic upon usage, optionally with a message: void old_func(void) __attribute__((deprecated("Use new_func instead")));. These are all GNU extensions, though some like deprecated align with emerging standards.
The C11 standard introduced the _Noreturn function specifier (a keyword) to indicate that a function does not return control to its caller, allowing the compiler to optimize away unreachable code following calls to such functions; it is used as _Noreturn void abort(void); and defined via the noreturn macro in <stdnoreturn.h>. In C23, _Noreturn is deprecated in favor of the standard attribute [[noreturn]], which applies similarly to function names and specifies non-returning behavior without using a keyword. C23 also standardizes [[deprecated("reason")]] for marking entities as obsolete, providing a portable way to warn users. These standard attributes use the [[attr]] syntax and can be placed before declarators.
While GNU attributes enhance portability across GCC-compatible environments, their use is limited to compilers supporting the extensions, primarily GCC and Clang, which implement most GNU attributes for compatibility. Standard attributes like [[noreturn]] and [[deprecated]] offer better portability across conforming C23 implementations.
Dynamic Memory Allocation
Dynamic memory allocation in C allows programs to request memory from the heap at runtime, providing flexibility for data structures whose size is not known at compile time. This is achieved through functions declared in the <stdlib.h> header, which manage allocation and deallocation of memory blocks. These functions return pointers to the allocated memory, which must be of compatible type when used.[26]
The malloc function allocates a block of size bytes of uninitialized storage and returns a pointer to the beginning of the block, or a null pointer if the request cannot be satisfied. Its declaration is void *malloc(size_t size);. The allocated memory is suitably aligned for any kind of object, and the pointer returned can be converted to any pointer type. Allocation of zero bytes may result in a null pointer or a unique pointer to a zero-sized object. Programmers typically use malloc with sizeof to allocate space for a specific type, such as int *ptr = malloc(sizeof(int) * n);, and should check if the returned pointer is null to handle allocation failure.[26]
c
#include <stdlib.h>
int main() {
int n = 10;
int *arr = malloc(n * sizeof(int));
if (arr == NULL) {
// Handle allocation failure
return 1;
}
// Use arr[0] to arr[n-1] as an array
free(arr);
return 0;
}
#include <stdlib.h>
int main() {
int n = 10;
int *arr = malloc(n * sizeof(int));
if (arr == NULL) {
// Handle allocation failure
return 1;
}
// Use arr[0] to arr[n-1] as an array
free(arr);
return 0;
}
The calloc function allocates space for an array of nmemb elements, each of size bytes, and initializes all bits to zero before returning a pointer to the allocated memory, or null on failure. Its declaration is void *calloc(size_t nmemb, size_t size);. This differs from malloc by providing zero-initialized memory, making it suitable for arrays that require default initialization. If nmemb or size is zero, the behavior is similar to malloc.[26]
The realloc function changes the size of the memory block pointed to by ptr to size bytes. If ptr is null, it behaves like malloc(size). If size is zero and ptr is not null, it behaves like free(ptr). Otherwise, it deallocates the old block, allocates a new block of the specified size, copies the old data up to the minimum of the old and new sizes, and returns a pointer to the new block or null on failure, in which case the original block remains valid. Its declaration is void *realloc(void *ptr, size_t size);. As with other allocation functions, the returned pointer must be checked for null, and the original pointer should not be used after successful reallocation unless the new pointer is null.[26]
The free function causes the space pointed to by ptr to be deallocated, making it available for future allocations; if ptr is null, no action occurs. Its declaration is void free(void *ptr);. Only pointers returned by malloc, calloc, or realloc (or null) should be passed to free. Memory allocated dynamically can be treated as arrays when the pointer is assigned via malloc or calloc with a multiple of sizeof(type), allowing access via array subscript notation, such as arr[i], provided i is within bounds.[26]
Certain operations on dynamically allocated memory lead to undefined behavior, including using a pointer to freed memory (use after free), deallocating the same memory block more than once (double free), passing to free or realloc a pointer not obtained from malloc, calloc, or realloc, or accessing memory beyond the allocated bounds. These behaviors are not specified by the standard and may result in program crashes, data corruption, or other unpredictable outcomes.[26]