getaddrinfo
getaddrinfo is a standard function in the C programming language's POSIX library for network applications, designed to translate a hostname (or numeric address) and a service name (or port number) into one or more socket addresses that can be used to create network connections or bind to local interfaces.[1] It returns a linked list of addrinfo structures, each containing protocol-specific address information, socket type details, and flags, enabling protocol-independent programming across IPv4 and IPv6.[2] Defined in the POSIX.1g standard and detailed in RFC 3493, the function supports hints for specifying address families (e.g., AF_INET for IPv4 or AF_INET6 for IPv6), socket types (e.g., SOCK_STREAM for TCP), and other preferences like passive mode for servers.[1][3]
Introduced as part of efforts to extend the Berkeley sockets API for IPv6 support, getaddrinfo addresses limitations in earlier functions like gethostbyname, which were IPv4-centric and lacked flexibility for dual-stack environments.[3] It facilitates the IPv6 transition by allowing applications to request IPv4-mapped IPv6 addresses via the AI_V4MAPPED flag, ensuring compatibility without separate code paths for each protocol family.[2] The function's output must be freed using freeaddrinfo to manage memory, and errors are reported through numeric codes like EAI_NONAME for unresolved hosts.[1] Alongside its inverse, getnameinfo, getaddrinfo forms a core component of modern network programming interfaces, promoting portability across Unix-like systems and Windows via Winsock.[3]
Overview
Purpose and Functionality
getaddrinfo is a POSIX-standard function defined in the <netdb.h> header, designed to translate hostnames, service names, or numeric addresses into a set of socket address structures suitable for network programming.[4] This function enables applications to obtain address information in a protocol-independent manner, supporting both symbolic names (like domain names) and numeric inputs (such as IP addresses).[2]
Key benefits of getaddrinfo include its simultaneous support for IPv4 and IPv6 address families, allowing developers to write code that adapts to dual-stack environments without modification.[5] It also provides protocol-independent resolution, handling various socket types and protocols while distinguishing between numeric and symbolic inputs to avoid unnecessary DNS lookups.[4] Additionally, unlike earlier functions, getaddrinfo is thread-safe, ensuring reliable operation in multithreaded applications without reliance on global state.[4]
At a high level, the workflow involves providing input hints—such as address family, socket type, protocol, and resolution flags—along with the target hostname or service, resulting in a linked list of struct addrinfo entries that encapsulate the resolved addresses and associated metadata (detailed further in the Data Structures section).[2] This output list allows applications to iterate over multiple possible addresses for a given name, facilitating flexible connection establishment.
In comparison to legacy functions like gethostbyname() and getservbyname(), which are limited to IPv4 and IPv6-specific variants respectively, getaddrinfo combines their capabilities into a unified, extensible interface.[4] These older functions are deprecated in modern standards due to their lack of IPv6 support, potential thread unsafety, and dependence on shared global data structures, making getaddrinfo the preferred choice for contemporary network applications.[6][5]
History and Standards
The getaddrinfo function was introduced in RFC 2553, published in March 1999, as part of the Basic Socket Interface Extensions for IPv6, aiming to provide a protocol-independent API for name-to-address resolution that supports both IPv4 and IPv6 while addressing the shortcomings of earlier functions like gethostbyname. This earlier API was limited to IPv4 addresses and relied on non-reentrant static buffers, making it unsuitable for multithreaded applications and IPv6 environments. The function's design emphasized thread safety through linked lists of addrinfo structures and flexibility in specifying address families, socket types, and protocols.
Subsequent updates refined its behavior: RFC 3493 in February 2003 obsoleted RFC 2553, incorporating clarifications and extensions for broader IPv6 support, including better handling of socket options and error reporting.[3] Later, RFC 6724 in September 2012 updated default address selection rules, influencing how getaddrinfo prioritizes IPv6 and IPv4 addresses in dual-stack implementations to optimize connectivity.[7] More recently, as of August 2025, an IETF draft (draft-ietf-6man-rfc6724-update) proposes updates to the default address selection rules from RFC 6724, potentially affecting getaddrinfo behavior in future implementations.[8] These evolutions ensured compatibility with emerging network standards while maintaining backward compatibility where possible.
Adoption into formal standards came with POSIX.1-2001, which incorporated getaddrinfo into the base specifications for portable operating systems, standardizing its interface across Unix-like systems. POSIX.1-2008 extended this with support for Internationalized Domain Names (IDN), adding flags like AI_IDN and AI_CANONIDN to handle non-ASCII hostnames transparently during resolution.
Platform implementations vary in availability and feature support. In BSD systems, getaddrinfo became available in the late 1990s, for example, integrated into FreeBSD 2.2.5-RELEASE (1998) as part of the WIDE Hydrangea IPv6 protocol stack, aligning with early IPv6 experimentation.[9] On Windows, it was introduced via Winsock 2.0 in 1996, though full IPv6 support required later updates in Windows 2000 and beyond. Linux and Android implementations, based on glibc and Bionic respectively, generally conform to POSIX but differ in flag support; for instance, Android's Bionic libc has historically returned EAI_BADFLAGS for certain non-standard flags like AI_IDN until recent versions.[10]
By the early 2000s, gethostbyname and related legacy functions were marked as obsolete in POSIX standards, with getaddrinfo positioned as the recommended replacement for new applications to ensure IPv6 readiness and thread safety. This deprecation timeline reflects broader shifts toward protocol-agnostic networking APIs in modern standards.
Data Structures
struct addrinfo
The addrinfo structure, defined in the <netdb.h> header, serves as the primary data type for representing network addresses and related metadata in a protocol-independent manner, facilitating the resolution of hostnames and service names into usable socket addresses.[3] It encapsulates essential details such as address family, socket type, protocol, and the actual binary address, allowing applications to create sockets without direct dependence on specific address formats like IPv4 or IPv6. This structure is integral to modern network programming APIs, enabling flexible handling of multiple address alternatives for a given hostname.[3]
The structure consists of the following fields:
ai_flags: An integer bitmask that holds input and output flags, such as AI_PASSIVE for passive socket binding, AI_CANONNAME to request the canonical hostname, or AI_NUMERICHOST to enforce numeric-only input; on output, it may include additional flags like AI_NUMERICSERV if the service was resolved numerically.[3]
ai_family: An integer specifying the address family, such as AF_INET for IPv4, AF_INET6 for IPv6, or AF_UNSPEC to indicate any supported family.[3]
ai_socktype: An integer denoting the socket type, for example SOCK_STREAM for TCP-like connections or SOCK_DGRAM for UDP-like datagrams; a value of 0 indicates that any socket type is acceptable.[3]
ai_protocol: An integer representing the protocol, such as IPPROTO_TCP or IPPROTO_UDP; a value of 0 allows any protocol compatible with the socket type.[3]
ai_addrlen: A socklen_t value indicating the length in bytes of the socket address stored in ai_addr, ensuring safe handling during socket operations like bind() or connect().[3]
ai_canonname: A pointer to a null-terminated character string containing the canonical (preferred) hostname for the resolved address, which is particularly useful for applications needing the official name rather than aliases.[3]
ai_addr: A pointer to a sockaddr structure (or a derived type like sockaddr_in or sockaddr_in6) holding the binary representation of the resolved network address.[3]
ai_next: A pointer to the next addrinfo structure in a linked list, or NULL if this is the final element.[3]
In practice, getaddrinfo() populates a linked list of these structures to provide multiple resolution options, such as both IPv4 (A records) and IPv6 (AAAA records) addresses for a hostname, or various socket types if unspecified.[3] The list allows applications to iterate through alternatives, selecting the most suitable based on preferences like address family or protocol, with the chain terminated by a NULL ai_next pointer. For instance, ai_canonname offers the preferred hostname for logging or display, while ai_addr and ai_addrlen directly support socket creation and binding, ensuring compatibility across network configurations.[3]
Memory for the addrinfo linked list, including dynamically allocated fields like ai_canonname and ai_addr, is allocated by getaddrinfo() and must be explicitly deallocated using freeaddrinfo() to prevent leaks, as the function returns a pointer to the head of the list for sequential freeing.[3] This management approach accommodates variable-length data and multiple entries without requiring manual allocation by the caller.[3]
ai_flags and Hints Structure
The getaddrinfo() function accepts an optional hints parameter, which is a pointer to a struct addrinfo structure used to constrain the set of returned socket addresses. This hints structure allows the caller to specify preferences for address family, socket type, protocol, and resolution behavior, thereby tailoring the resolution process to specific needs such as client connections or server bindings. Only the fields ai_flags, ai_family, ai_socktype, and ai_protocol in the hints structure are examined; all other members, including ai_addr and ai_next, must be zero or null pointers and are otherwise ignored by the function. If the hints pointer is null, getaddrinfo() defaults to an unspecified address family (AF_UNSPEC), zero socket type and protocol, and zero flags, resulting in the broadest possible set of addresses.[1]
The ai_flags field within the hints structure is a bitwise-inclusive OR of zero or more constants that modify the resolution semantics. These flags control aspects such as whether to return wildcard addresses for binding, request canonical names, enforce numeric interpretations, or handle IPv6 preferences for IPv4 compatibility. The standard flags defined in POSIX and RFC 3493 include:
- AI_PASSIVE: Indicates that the returned socket addresses should be suitable for use with
bind() to accept incoming connections. If the nodename is null and this flag is set, the function sets the IPv4 address to INADDR_ANY (0.0.0.0) or the IPv6 address to in6addr_any (::), representing a wildcard that listens on all interfaces. Without this flag, a loopback address (127.0.0.1 for IPv4 or ::1 for IPv6) is used instead when nodename is null.[1][2]
- AI_CANONNAME: Requests that the function attempt to return the canonical name corresponding to the nodename in the first element of the linked list of results, if the nodename is non-null. This provides the official, fully qualified domain name for the host, useful for logging or display purposes.[1][2]
- AI_NUMERICHOST: Specifies that the nodename must be a numeric string representing a network host address (e.g., "192.0.2.1" or "::1"), and no name resolution is performed; if the string does not parse as numeric, an error is returned. This flag prevents unintended DNS lookups for strings that might resemble hostnames.[1][2]
- AI_NUMERICSERV: Requires the servname to be a numeric port number string (e.g., "80"), bypassing service name resolution via
/etc/services or equivalent; an error occurs if it cannot be parsed as numeric. This is particularly useful in environments where service databases are unavailable or unreliable.[1][2]
IPv6-specific flags extend the functionality for dual-stack environments:
- AI_V4MAPPED: When the address family is
AF_INET6, this flag allows the function to return IPv4-mapped IPv6 addresses (e.g., ::ffff:192.0.2.1) if no native IPv6 addresses are available for the nodename, enabling IPv6 sockets to handle IPv4 traffic transparently. The resulting ai_addrlen is set to 16 bytes for these addresses.[2]
- AI_ALL: Used in conjunction with
AI_V4MAPPED, this ensures that both native IPv6 addresses and IPv4-mapped addresses are returned, even if native IPv6 addresses exist; it prompts queries for both AAAA (IPv6) and A (IPv4) records in DNS. Without it, only native IPv6 addresses would be preferred.[2]
- AI_ADDRCONFIG: Restricts returned addresses based on the local system's configured interfaces: IPv4 addresses are included only if at least one IPv4 address is configured (excluding loopback), and IPv6 addresses only if an IPv6 address is configured. This avoids generating addresses for unsupported protocol stacks.[2]
These flags directly influence the output linked list by filtering or modifying the socket address structures returned. For instance, AI_PASSIVE ensures server-ready addresses with wildcard values, while AI_ADDRCONFIG may eliminate entire address families if not locally supported, promoting efficient resource use in heterogeneous networks. Platform-specific extensions, such as additional flags in GNU/Linux implementations, may provide further customization but are not part of the core POSIX or RFC standards.[1][2]
Core Functions
getaddrinfo()
The getaddrinfo() function provides a protocol-independent way to resolve hostnames or service names into socket addresses, supporting both IPv4 and IPv6. Its prototype is defined as:
c
int getaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res);
int getaddrinfo(const char *node, const char *service, const struct addrinfo *hints, struct addrinfo **res);
This function is specified in RFC 3493 and adopted in the POSIX standard.[3][11]
The node parameter is a pointer to a null-terminated string representing a hostname or a numeric IP address string (IPv4 dotted-decimal or IPv6 hexadecimal), or NULL to indicate any local address. The service parameter is a pointer to a null-terminated string specifying a service name (e.g., "http") or a decimal port number, or NULL to indicate port zero. The hints parameter points to a struct addrinfo (detailed in the Data Structures section) that supplies optional constraints such as address family (ai_family), socket type (ai_socktype), protocol (ai_protocol), and flags (ai_flags) to filter results; if NULL, default values are used. The res parameter is a pointer to a pointer to struct addrinfo, which upon success receives the head of a linked list of results.[3][11]
The resolution process begins by examining the node parameter: if it is a valid numeric address string, it is directly parsed into a socket address structure without performing name resolution, using functions like inet_pton() for validation. Otherwise, if node is a hostname, the function invokes the system's name resolution mechanisms, typically starting with local host files such as /etc/hosts and falling back to DNS queries via resolver libraries (e.g., as configured in /etc/nsswitch.conf on UNIX-like systems). For the service parameter, if it is a numeric string, it is converted directly to a port number; otherwise, it is resolved using service databases like /etc/services or equivalent. The process respects the constraints in hints, such as limiting to specific address families (e.g., AF_INET for IPv4 or AF_INET6 for IPv6) or socket types, and may generate results across multiple families if ai_family is AF_UNSPEC.[3][11][10]
On success, getaddrinfo() returns 0 and populates *res with a linked list of struct addrinfo entries, each containing a resolved socket address (ai_addr), its length (ai_addrlen), and other relevant fields like canonical name if requested via the AI_CANONNAME flag in hints. The order of the list is implementation-defined; many systems sort it according to default address selection rules such as those in RFC 6724, which prioritize destination addresses based on criteria such as scope, prefix length, and label matching to optimize connectivity (e.g., preferring global IPv6 addresses over link-local ones). The caller must traverse the list via the ai_next pointers to access all results.[3][11][7]
The function is designed to be thread-safe and reentrant, avoiding the static buffers used in legacy functions like gethostbyname(), which makes it suitable for multithreaded applications without requiring external synchronization.[3][11]
freeaddrinfo()
The freeaddrinfo() function is a standard library routine in POSIX-compliant systems used to deallocate memory associated with address information structures obtained from name resolution operations.[12] Its prototype is declared as void freeaddrinfo(struct addrinfo *ai);, where the argument ai points to the head of a linked list of addrinfo structures.[2]
The primary purpose of freeaddrinfo() is to recursively free the dynamically allocated memory for the entire linked list of addrinfo structures returned by getaddrinfo(), including any embedded allocations such as the ai_addr socket address buffers and the ai_canonname strings.[13] This function handles the traversal of the list via the ai_next pointers, ensuring that all associated storage is released in a single call, which prevents memory leaks in applications that perform address resolution.[12] It also supports the freeing of arbitrary sublists within the original list, allowing partial deallocation if only portions of the results are processed.[14]
Best practices for using freeaddrinfo() emphasize calling it immediately after the application has finished processing the address information, regardless of whether the resolution was fully successful or resulted in a partial list due to incomplete results.[12] If getaddrinfo() fails (returns non-zero), *res is not set, and applications should not call freeaddrinfo() on it; invoking freeaddrinfo([NULL](/page/Null)) has unspecified behavior per the standards, though many implementations treat it as a no-op.[11][15] This rule is crucial for resource management in long-running programs, as failing to free the structures can lead to gradual memory exhaustion, particularly in scenarios involving repeated resolutions.[2]
As a void function, freeaddrinfo() produces no return value, simplifying its integration into code paths where error checking is not applicable post-deallocation.[12] It is required to be thread-safe, enabling safe concurrent calls from multiple threads without additional synchronization, though applications must still ensure that the addrinfo list is not accessed by other threads during or after freeing.[2] In single-threaded contexts, it remains fully safe and efficient.
Historically, freeaddrinfo() was introduced alongside getaddrinfo() in RFC 2553 (1999) to provide a streamlined memory management mechanism, addressing the complexities of manual deallocation in legacy socket APIs like gethostbyname(), which often relied on static buffers or required explicit freeing of fragmented structures.[16] This pairing was refined in RFC 3493 (2003), which obsoleted the earlier specification and incorporated it into POSIX standards to support IPv6 transitions and protocol-independent programming.[3]
Reverse Resolution
getnameinfo()
The getnameinfo() function performs reverse resolution, converting a socket address into human-readable hostname and service name strings in a protocol-independent manner. It serves as the inverse of getaddrinfo(), taking an address as input rather than a name, and is commonly used for logging, display purposes, or applications needing to present addresses in textual form.[17][3]
The function prototype is defined as follows:
c
#include <sys/socket.h>
#include <netdb.h>
int getnameinfo(const struct sockaddr *restrict sa, socklen_t salen,
char *restrict host, socklen_t hostlen,
char *restrict serv, socklen_t servlen,
int flags);
#include <sys/socket.h>
#include <netdb.h>
int getnameinfo(const struct sockaddr *restrict sa, socklen_t salen,
char *restrict host, socklen_t hostlen,
char *restrict serv, socklen_t servlen,
int flags);
Here, sa points to the input socket address structure (such as sockaddr_in for IPv4 or sockaddr_in6 for IPv6), often obtained from a struct addrinfo via its ai_addr member, while salen specifies the length of that structure. The host and serv parameters are output buffers for the null-terminated hostname and service name strings, respectively, with hostlen and servlen indicating their maximum sizes (typically at least 1024 bytes for host to accommodate fully qualified domain names). The flags argument controls behavior, including options like NI_NOFQDN to return only the local hostname without the domain for local hosts, NI_NUMERICHOST to force numeric output of the address (e.g., "192.0.2.1" instead of a name), NI_NAMEREQD to fail if no hostname is available, NI_NUMERICSERV to return the port number as a string (e.g., "80") rather than a service name like "http", and NI_DGRAM to indicate a datagram-oriented service. If either buffer is a null pointer or its length is zero, the corresponding output is not produced.[17][3]
In operation, getnameinfo() first attempts to resolve the address using local mechanisms such as the host database (e.g., /etc/hosts on Unix-like systems), falling back to numeric representation if no name is found; for remote addresses, it performs a reverse DNS lookup via the system's resolver. It supports both IPv4 and IPv6 addresses, including IPv4-mapped IPv6 addresses, and is designed to be thread-safe without relying on global state. Unlike getaddrinfo(), which resolves names to address structures for connection setup, getnameinfo() outputs strings directly from addresses, making it suitable for post-resolution tasks without intermediate storage of linked lists. The function returns 0 on success or a non-zero error code (e.g., EAI_NONAME) on failure.[17][3]
This function is standardized in RFC 3493, which provides IPv6 extensions to the basic socket interface and aligns with POSIX.1 specifications, including IEEE Std 1003.1-2008 and later editions from The Open Group. It obsoletes earlier definitions in RFC 2553 and ensures compatibility across IPv4 and IPv6 environments.[3][17]
Error Handling
Return Codes
The getaddrinfo() and getnameinfo() functions return 0 to indicate successful completion, with the value corresponding to the EAI_SUCCESS constant defined as 0 in the <netdb.h> header.[18][1] On failure, these functions return a non-zero value representing one of the EAI_* error constants, also defined in <netdb.h>, which are suitable for use in preprocessor directives and can be converted to human-readable strings via gai_strerror().[18][19] The freeaddrinfo() function, by contrast, returns no value and does not indicate errors, as it simply frees the dynamically allocated memory from a prior getaddrinfo() call.
To check for success in application code, developers typically test the return value directly, such as int res = getaddrinfo(...); if (res != 0) { /* handle [error](/page/Error) */ }, applying this pattern to both getaddrinfo() and getnameinfo().[1] Among the EAI_* constants, EAI_SYSTEM is notable for signaling an underlying system error, in which case the global errno variable (on POSIX systems) provides the specific cause, such as ENOENT for an unknown host.[18][1] Other common constants include EAI_AGAIN for temporary failures, EAI_BADFLAGS for invalid flag arguments, EAI_FAIL for non-recoverable errors, EAI_FAMILY for unsupported address families, EAI_MEMORY for allocation failures, EAI_NONAME for unresolved names, EAI_SERVICE for unrecognized services, EAI_SOCKTYPE for invalid socket types, and EAI_OVERFLOW for buffer overflows.[18]
In POSIX-compliant environments, these EAI_* constants ensure portable error handling across Unix-like systems.[18] However, on non-POSIX platforms like Windows, getaddrinfo() returns Windows Sockets (WSA) error codes instead, such as WSAHOST_NOT_FOUND mapping to EAI_NONAME or WSAEINVAL to EAI_BADFLAGS, requiring calls to WSAGetLastError() for details rather than errno.[20] This difference introduces portability challenges, as Windows implementations in the Ws2_32.dll library do not directly support EAI_* values but provide equivalent WSA codes, and EAI_SYSTEM equivalents rely on GetLastError() for system-specific errors like those from underlying network operations.[20][21]
Common Errors and Diagnostics
The getaddrinfo and getnameinfo functions return specific error codes prefixed with EAI_ to indicate failure conditions during name resolution or address translation. These codes provide diagnostic information for troubleshooting issues in network programming.[3]
One common error is EAI_NONAME, which occurs when the specified node name or service name cannot be resolved, often due to an unknown hostname, misspelling, or DNS misconfiguration; developers should verify the input strings for accuracy and check DNS availability or local resolver settings to resolve this.[3][1] Similarly, EAI_AGAIN signals a temporary failure, such as a DNS server timeout or network congestion, and applications are advised to implement retry logic with exponential backoff to handle transient issues effectively.[3][10]
EAI_FAIL indicates a non-recoverable failure in the name resolution process, typically from a permanent DNS error like server misconfiguration; in such cases, verifying the system's resolver configuration files, such as /etc/[resolv.conf](/page/Resolv.conf) on Unix-like systems, can help identify and correct the underlying problem.[3][1] The EAI_FAMILY error arises when the requested address family in the hints structure is unsupported by the system, often mitigated by setting the AI_ADDRCONFIG flag, which restricts results to address families actively configured on the local interfaces.[3][10] For EAI_SERVICE, the issue stems from an invalid or unrecognized service name or port number for the given socket type; validation against standard service databases like /etc/services on POSIX systems ensures compatibility.[3][1]
To diagnose these errors, the gai_strerror function converts EAI_* codes into human-readable strings, facilitating logging and user feedback in applications.[3] Additionally, even on partial failures, the res output from getaddrinfo may contain usable addresses, so logging or inspecting the linked list before freeing it with freeaddrinfo can salvage partial resolutions.[10][1]
Platform-specific behaviors include EAI_MEMORY on Linux systems under low RAM conditions, where memory allocation for the addrinfo list fails, necessitating checks on system resources before invocation.[10] On Windows, errors may map to underlying Winsock codes like WSA_NOT_ENOUGH_MEMORY for buffer-related allocation issues, requiring attention to input buffer sizes in constrained environments.[20]
Practical Usage
Basic Example
The getaddrinfo function provides a straightforward way to resolve a hostname and service name into socket addresses suitable for network communication. In this basic example, we demonstrate its use in a client program that resolves the hostname "example.com" to its IPv4 addresses and establishes a connection to port 80 (the default for the "http" service). The code assumes a POSIX-compliant environment and focuses on IPv4 for simplicity, omitting comprehensive error handling to highlight the core workflow.
c
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <string.h>
int main(void) {
struct addrinfo *res;
int status = getaddrinfo("example.com", "http", NULL, &res);
if (status != 0) {
fprintf(stderr, "getaddrinfo error: %s\n", gai_strerror(status));
exit(EXIT_FAILURE);
}
for (struct addrinfo *p = res; p != NULL; p = p->ai_next) {
if (p->ai_family == AF_INET && p->ai_socktype == SOCK_STREAM) {
int sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
if (sockfd == -1) {
continue;
}
// Convert address for display
char ipstr[INET_ADDRSTRLEN];
struct sockaddr_in *addr_in = (struct sockaddr_in *)p->ai_addr;
inet_ntop(AF_INET, &addr_in->sin_addr, ipstr, sizeof(ipstr));
printf("Attempting connection to %s:%d\n", ipstr, ntohs(addr_in->sin_port));
if (connect(sockfd, p->ai_addr, p->ai_addrlen) == 0) {
printf("Connected successfully to %s\n", ipstr);
close(sockfd);
break;
} else {
close(sockfd);
}
}
}
// Memory deallocation omitted here for focus on resolution and connection
return 0;
}
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <arpa/inet.h>
#include <string.h>
int main(void) {
struct addrinfo *res;
int status = getaddrinfo("example.com", "http", NULL, &res);
if (status != 0) {
fprintf(stderr, "getaddrinfo error: %s\n", gai_strerror(status));
exit(EXIT_FAILURE);
}
for (struct addrinfo *p = res; p != NULL; p = p->ai_next) {
if (p->ai_family == AF_INET && p->ai_socktype == SOCK_STREAM) {
int sockfd = socket(p->ai_family, p->ai_socktype, p->ai_protocol);
if (sockfd == -1) {
continue;
}
// Convert address for display
char ipstr[INET_ADDRSTRLEN];
struct sockaddr_in *addr_in = (struct sockaddr_in *)p->ai_addr;
inet_ntop(AF_INET, &addr_in->sin_addr, ipstr, sizeof(ipstr));
printf("Attempting connection to %s:%d\n", ipstr, ntohs(addr_in->sin_port));
if (connect(sockfd, p->ai_addr, p->ai_addrlen) == 0) {
printf("Connected successfully to %s\n", ipstr);
close(sockfd);
break;
} else {
close(sockfd);
}
}
}
// Memory deallocation omitted here for focus on resolution and connection
return 0;
}
This code first includes the required headers for socket operations, address resolution, and address printing. The getaddrinfo call with NULL hints uses default parameters, typically returning both IPv4 and IPv6 addresses if available, along with the appropriate socket type (stream for "http") and protocol. It populates a linked list of struct addrinfo entries, where each ai_addr member holds a sockaddr structure ready for use in socket functions. The loop iterates over this list, filtering for IPv4 stream sockets, creates a socket using the family's details, displays the resolved IP address via inet_ntop, and attempts a connection; success on the first viable address simulates a typical client setup, while the iteration supports fallback if multiple addresses exist (e.g., for load balancing across a hostname's IPs like 93.184.216.34 and others for example.com).[10]
To compile, use cc basic_example.c -o basic_example on Linux systems, as getaddrinfo is part of the standard C library. On some Unix systems like Solaris, additional linking is required: cc basic_example.c -o basic_example -lnsl -lsocket. This example assumes the system resolver can reach DNS for "example.com"; in practice, it might return one or more IPv4 addresses, demonstrating how getaddrinfo enables protocol-independent resolution without manual parsing of addresses or ports.
IPv6 and Dual-Stack Considerations
In dual-stack environments, where both IPv4 and IPv6 are supported, the getaddrinfo() function facilitates protocol-independent address resolution by allowing applications to request addresses from multiple families simultaneously. By setting the ai_family field in the struct addrinfo hints to AF_UNSPEC, the function returns a list of socket addresses encompassing both IPv4 (AF_INET) and IPv6 (AF_INET6) results, enabling applications to attempt connections or bindings across protocols without specifying a preference upfront.[2][10] To enhance compatibility, the ai_flags can include AI_V4MAPPED, which instructs getaddrinfo() to return IPv4-mapped IPv6 addresses (e.g., ::ffff:0:aabb:ccdd) when native IPv6 addresses are unavailable, allowing IPv6 sockets to handle IPv4 traffic transparently.[2] Combining AI_ALL with AI_V4MAPPED ensures the function returns both native IPv6 addresses and their IPv4-mapped counterparts, providing a comprehensive list for robust dual-stack operation.[2][10]
A practical code example for dual-stack resolution involves initializing hints with AF_UNSPEC and appropriate flags, then iterating over the returned addresses to bind a listener socket. The following C snippet demonstrates setting up a TCP server that listens on both IPv4 and IPv6 for a given service (e.g., "http"):
c
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main(void) {
struct addrinfo hints, *res, *ressave;
int sockfd, error;
char *host = NULL; // Bind to any address
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE | AI_V4MAPPED | AI_ALL; // Dual-stack with mappings
error = getaddrinfo(host, "http", &hints, &res);
if (error) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(error));
return 1;
}
ressave = res;
for (res = ressave; res; res = res->ai_next) {
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (sockfd < 0) continue;
if (bind(sockfd, res->ai_addr, res->ai_addrlen) == 0) {
if (listen(sockfd, 5) == 0) {
printf("Listening on family %d\n", res->ai_family);
// Handle connections here
}
close(sockfd);
}
}
freeaddrinfo(ressave);
return 0;
}
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
int main(void) {
struct addrinfo hints, *res, *ressave;
int sockfd, error;
char *host = NULL; // Bind to any address
memset(&hints, 0, sizeof(hints));
hints.ai_family = AF_UNSPEC;
hints.ai_socktype = SOCK_STREAM;
hints.ai_flags = AI_PASSIVE | AI_V4MAPPED | AI_ALL; // Dual-stack with mappings
error = getaddrinfo(host, "http", &hints, &res);
if (error) {
fprintf(stderr, "getaddrinfo: %s\n", gai_strerror(error));
return 1;
}
ressave = res;
for (res = ressave; res; res = res->ai_next) {
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
if (sockfd < 0) continue;
if (bind(sockfd, res->ai_addr, res->ai_addrlen) == 0) {
if (listen(sockfd, 5) == 0) {
printf("Listening on family %d\n", res->ai_family);
// Handle connections here
}
close(sockfd);
}
}
freeaddrinfo(ressave);
return 0;
}
This approach creates multiple sockets as needed, binding to wildcard addresses (e.g., 0.0.0.0 for IPv4 and :: for IPv6), and supports fallback across families.[22][10]
The order of addresses returned by getaddrinfo() follows the default destination address selection algorithm outlined in RFC 6724, which prioritizes IPv6 over IPv4 in dual-stack scenarios to encourage native protocol adoption.[23] This sorting applies a policy table with precedence values—IPv6 global unicast addresses (prefix ::/0) receive a precedence of 40, while IPv4-mapped addresses (prefix ::ffff:0:0/96) get 35—combined with scope considerations (e.g., preferring link-local or global scopes that match the source address).[24] The algorithm uses up to 10 rules, including avoiding deprecated addresses and favoring home addresses, to produce an ordered list that minimizes connection failures and round-trip times.[25]
Dual-stack implementations must navigate several pitfalls to ensure reliable operation. For instance, the IPv6 loopback address ::1 is distinct from the IPv4 loopback 127.0.0.1, so a service binding exclusively to ::1 remains inaccessible to IPv4-only clients, and vice versa, potentially leading to connectivity issues in mixed environments.[26] Similarly, when resolving literal IPv6 addresses in URLs (e.g., [2001:db8::1]), applications must correctly parse and pass the bracketed form to getaddrinfo() without stripping delimiters, as malformed inputs can result in resolution failures or fallback to unintended IPv4 interpretations.[27][10]
To test dual-stack behavior, applications should be compiled on systems with IPv6 enabled (e.g., via kernel configuration or flags like -DIPV6 in build tools) and verified using tools like getent ahosts or direct calls to getaddrinfo() with AF_UNSPEC. Expected output includes both address families for dual-stack hosts, with IPv6 addresses appearing first per the default policy; discrepancies may indicate misconfigured policy tables in /etc/gai.conf.[24][10]
In real-world deployments, web servers such as those implementing HTTP listeners leverage getaddrinfo() to resolve hostnames and bind sockets across protocols, creating separate listeners for IPv4 and IPv6 wildcard addresses to handle incoming connections seamlessly. This strategy, often limited to a small number of sockets (e.g., two for dual-stack), ensures backward compatibility while preferring IPv6 where available, as recommended for protocol-independent servers.[22]
Security Aspects
Potential Vulnerabilities
The getaddrinfo function, by relying on the underlying DNS resolver for hostname-to-address translation, is susceptible to DNS cache poisoning and spoofing attacks, where an attacker injects false records into the resolver's cache, resulting in incorrect ai_addr structures that redirect connections to malicious endpoints, such as in man-in-the-middle scenarios.[28] These attacks exploit the transparency of DNS queries issued by getaddrinfo, allowing forged responses to persist in client-side caches and affect subsequent resolutions without authentication in non-DNSSEC environments.[29]
Implementations of getaddrinfo have faced buffer overflow vulnerabilities, particularly in the GNU C Library (glibc), where crafted DNS responses during dual A/AAAA queries can trigger stack-based overflows in the send_dg and send_vc functions of libresolv, leading to denial-of-service crashes or potential arbitrary code execution when using AF_UNSPEC or AF_INET6 address families.[30][31] Similarly, the musl libc implementation contains a remote stack-based buffer overflow in DNS response parsing, exploitable when querying a malicious nameserver configured in resolv.conf, affecting getaddrinfo and related functions.[32]
Denial-of-service risks arise from malformed node inputs or oversized DNS responses processed by getaddrinfo, which can cause prolonged timeouts or resource exhaustion, especially in recursive resolution scenarios where the function issues multiple queries.[33] In glibc, specific instances include stack overflows from crafted inputs that overflow internal buffers, halting applications, and file descriptor leaks during partial DNS writes.
The getnameinfo function, often used alongside getaddrinfo for reverse resolution, may truncate output or return an error (EAI_OVERFLOW) if the provided hostlen or servlen parameters are insufficient for the resolved names. This design avoids buffer overflows unlike legacy APIs such as gethostbyname. However, attacker-controlled nameservers can exploit vulnerabilities in the underlying resolver implementations, such as early versions of musl libc, during reverse DNS queries.[32]
Misuse of the AI_NUMERICHOST flag, intended to bypass DNS for numeric strings, can introduce injection risks if unvalidated user inputs are passed, as invalid or oversized numeric port strings in glibc may lead to incorrect parsing—such as truncating values exceeding 65535 to their lower 16 bits—potentially enabling unintended connections or failures in dual-stack environments.[34]
Historically, prior to the widespread adoption of DNSSEC as specified in RFC 4033, getaddrinfo resolutions lacked cryptographic validation, allowing spoofed canonical names (ai_canonname) to be trusted without verification, undermining the integrity of returned host information in unsecures resolvers.
Mitigation Strategies
To mitigate potential security risks in getaddrinfo implementations, developers should prioritize robust input validation for the node and service parameters. These strings must be sanitized to remove or reject malicious characters, such as control sequences or excessively long inputs that could lead to buffer overflows or injection attacks, with lengths limited to reasonable bounds like 255 characters for hostnames per RFC 1035 standards. When dealing with trusted IP addresses, the AI_NUMERICHOST flag in the hints structure should be used to bypass DNS resolution entirely, treating the node as a literal numeric address and avoiding unnecessary network queries.[10]
Configuring timeouts for DNS resolvers is essential to prevent denial-of-service (DoS) conditions from prolonged or hanging lookups. The RES_OPTIONS environment variable or options in /etc/resolv.conf can set the timeout parameter (e.g., timeout:2 for a 2-second limit per query attempt) and the number of retries (e.g., attempts:2), capping query durations to mitigate resource exhaustion.[35] System calls like res_init() can reinitialize these settings dynamically if needed.
Enabling DNSSEC validation enhances the integrity of resolved canonnames and addresses returned by getaddrinfo. In /etc/resolv.conf, the trust-ad option instructs the resolver to honor the Authentic Data (AD) bit in DNS responses from a validating upstream server, ensuring that glibc trusts and propagates DNSSEC-verified results during name resolution.[35] This is particularly useful when the AI_CANONNAME flag is set in ai_flags to request canonical names, as it verifies against spoofing attempts.
For related reverse resolution via getnameinfo(), proper buffer sizing prevents truncation in output handling. Buffers for hostnames and service names should be allocated using the NI_MAXHOST (1025 bytes) and NI_MAXSERV (32 bytes) constants defined in <netdb.h>, providing sufficient space for fully qualified domain names and port strings without risking truncation or overruns.[36]
Integrating with modern secure resolvers offers hardened lookup mechanisms for getaddrinfo. Unbound, a validating recursive DNS resolver, can be configured as a local stub to perform DNSSEC-aware queries, replacing or augmenting the default glibc resolver for enhanced privacy and integrity.[37] Similarly, systemd-resolved provides a local caching resolver with built-in DNSSEC support and DoT/DoH capabilities, configurable via resolved.conf to enforce secure protocols for all NSS-integrated calls like getaddrinfo.[38]
Auditing resolved results is a key post-resolution practice for ongoing security. Applications should log the resolved IP addresses and associated metadata (e.g., via syslog or auditd rules targeting NSS calls) and implement verification logic to cross-check them against whitelists of expected peer addresses, detecting anomalies such as unexpected canonical names or IP mismatches.[39]