Fact-checked by Grok 2 weeks ago

Amazon Redshift

Amazon Redshift is a fully managed, petabyte-scale cloud data warehouse service developed by Amazon Web Services (AWS) that enables organizations to store and analyze large volumes of structured and semi-structured data using standard SQL queries and existing business intelligence tools.^[1] Launched in limited preview on November 28, 2012, it has evolved into a cornerstone of AWS analytics offerings, supporting workloads from terabytes to exabytes with high performance and scalability.^[2] At its core, Amazon Redshift employs a columnar storage architecture optimized for analytical queries, which compresses data to reduce storage costs and minimizes I/O operations for faster execution.^[3] This design, combined with massively parallel processing across compute nodes, allows it to handle complex queries on vast datasets in seconds or minutes, often outperforming traditional on-premises data warehouses.^[1] Users can deploy Redshift in two primary modes: provisioned clusters, where administrators manually configure compute and storage resources, or serverless, which automatically scales capacity based on demand using AI-driven optimization, ensuring no charges when idle.^[3] Key benefits include exceptional price-performance, with options to pay only for compute and storage used, and zero administrative overhead for tasks like hardware provisioning, patching, or backups, as AWS manages the underlying infrastructure.^[1] Security features encompass encryption at rest and in transit with AES-256, integration with AWS Identity and Access Management (IAM), VPC controls, and granular access policies via Amazon Lake Formation, enabling compliance with standards like HIPAA and PCI DSS.^[3] For analytics, it supports near real-time ingestion from sources like Amazon Kinesis and Amazon MSK, zero-ETL integrations with databases such as Amazon Aurora, and advanced capabilities like materialized views for caching frequent queries and short query acceleration for low-latency responses.^[3] Amazon Redshift extends beyond traditional warehousing with built-in machine learning through Redshift ML, allowing users to train and deploy models directly in SQL without moving data, and generative AI integrations via Amazon Bedrock and Amazon Q for tasks like natural language processing and SQL query generation on warehouse data.^[3]^[4] It seamlessly connects to data lakes in Amazon S3 for federated querying of open formats like Parquet and ORC, and now supports writing to Apache Iceberg tables (as of November 2025), enabling hybrid lakehouse architectures.^[3]^[5] It integrates with visualization tools such as Amazon QuickSight, Tableau, and Power BI for end-to-end analytics workflows.^[3] These features have made it a popular choice for enterprises running business intelligence, reporting, and data science applications at scale.^[1]

Overview

Definition and Purpose

Amazon Redshift is a fully managed, petabyte-scale data warehouse service provided by Amazon Web Services (AWS), designed to handle large-scale analytics workloads in the cloud.^[1] It is built on an extended version of PostgreSQL, incorporating optimizations and extensions specifically for analytical processing and high-performance querying.^[6] This architecture allows users to store and query vast amounts of data using standard SQL, integrating seamlessly with existing business intelligence (BI) tools for reporting and visualization.^[7] The core purpose of Amazon Redshift is to enable fast and cost-effective analysis of structured and semi-structured data, empowering organizations to perform business intelligence, generate reports, and support data-driven decision-making at enterprise scale.^[7]^[8] By processing complex queries against petabytes of data, it facilitates insights into historical performance, trends, and operational metrics without the overhead of managing infrastructure.^[3] Tens of thousands of customers rely on it daily to analyze exabytes of data for these purposes.^[9] Amazon Redshift evolved from traditional on-premises data warehouses to a cloud-native service, addressing key limitations of legacy systems such as high costs, rigid scalability, and slow performance in handling massive datasets. A defining aspect of this evolution is its separation of storage and compute, which provides flexibility to scale resources independently based on workload demands.^[10]^[11] This design leverages foundational technologies like columnar storage for efficient compression and massively parallel processing (MPP) for distributed query execution.^[12]^[8]

Key Characteristics

Amazon Redshift employs a columnar data storage format, which stores data by columns rather than rows, enabling efficient compression and faster query performance specifically for analytical workloads that involve aggregations and scans across large datasets.^[3] This design reduces I/O requirements by allowing queries to access only the relevant columns, minimizing data transfer and storage footprint through techniques like zone maps and advanced encodings such as AZ64 for numeric and temporal data.^[3] The service utilizes a massively parallel processing (MPP) architecture, distributing query execution across multiple nodes to handle large-scale data processing efficiently and support high concurrency for numerous users and queries simultaneously.^[3] This parallelization ensures scalable performance as data volumes grow, with the ability to add transient capacity dynamically without disrupting operations.^[3] Amazon Redshift maintains compatibility with PostgreSQL, supporting standard SQL queries and advanced extensions such as window functions for complex analytics like ranking and cumulative calculations.^[6]^[13] As a fully managed service, it automates provisioning, patching, backups, and recovery, relieving users from infrastructure management tasks.^[1] To sustain optimal performance, Amazon Redshift incorporates machine learning-based query optimization through automatic table optimization, where AI analyzes query patterns to select and apply ideal sort and distribution keys.^[14] Additionally, it features automatic vacuuming, including background VACUUM DELETE operations to reclaim space from deleted rows and maintain data efficiency without manual intervention.^[15] These capabilities support petabyte-scale data warehousing for enterprise analytics.^[1]

History

Launch and Early Development

Amazon Redshift was announced on November 28, 2012, during the inaugural AWS re:Invent conference in Las Vegas, where it was introduced as a limited preview service designed to deliver petabyte-scale data warehousing in the cloud.^[2] The service aimed to provide developers and businesses with a fully managed, relational data warehouse that supported standard SQL queries and integrated seamlessly with existing business intelligence tools, such as those from Jaspersoft and MicroStrategy.^[16] This launch positioned Redshift as a disruptive alternative to traditional on-premises data warehouses from vendors like Teradata and Oracle, which often faced challenges with high costs and limited scalability for massive datasets.^[17] Following a beta testing phase in late 2012 with select customers, Redshift achieved general availability on February 15, 2013, enabling broader access for AWS users worldwide.^[18] At launch, it supported clusters scaling up to approximately 1 PB of compressed user data through configurable node types, starting from hundreds of gigabytes for smaller workloads.^[19] The service's early motivations stemmed from the need for efficient analytics on large-scale data volumes, particularly in scenarios requiring rapid query performance at a fraction of the cost of legacy systems—under $1,000 per terabyte per year.^[2] Key initial features included a basic massively parallel processing (MPP) architecture for distributing queries across nodes, direct integration with Amazon S3 for efficient data loading via the COPY command, and compatibility with standard ODBC and JDBC drivers for connectivity to BI applications.^[16] Built on an enhanced version of PostgreSQL, Redshift offered SQL compatibility while optimizing for columnar storage and compression to handle petabyte-scale operations. The beta period facilitated quick feedback, resulting in rapid adoption; by early 2013, hundreds of organizations across sectors like e-commerce, gaming, and advertising—such as Kongregate and Photobox—had deployed it for analytics workloads.^[18]

Major Milestones and Updates

In 2014, Amazon Redshift introduced dense compute (DC1) nodes, featuring solid-state drives (SSDs) that provided a high ratio of CPU, memory, and I/O performance to storage, enabling up to twice the performance of previous dense storage nodes at the same price.^[20] By 2017, the service enhanced integration with Amazon S3 through the launch of Amazon Redshift Spectrum, allowing users to query exabyte-scale datasets directly in S3 data lakes using standard SQL without loading data into Redshift clusters.^[21] In 2019, Amazon Redshift pioneered the separation of storage and compute resources with the introduction of RA3 nodes and Redshift Managed Storage, permitting independent scaling of compute capacity and storage while achieving 99.999999999% durability for stored data.^[22] The year 2021 marked the public preview launch of Amazon Redshift Serverless, an on-demand option that automatically scales compute and concurrency for variable workloads without requiring infrastructure provisioning.^[23] Celebrating its 10-year anniversary in 2022, Amazon Redshift highlighted advancements in zero-ETL integrations, enabling seamless data sharing across AWS accounts and regions to support collaborative analytics without traditional extract, transform, and load processes.^[23] From 2024 to 2025, Amazon Redshift rolled out over 100 features and enhancements, including AI-powered query optimization via Amazon Q for generating SQL from natural language, expansions in federated querying through zero-ETL connections to databases like Amazon Aurora and enterprise applications such as Salesforce, and improvements delivering up to 3x better price-performance compared to prior generations.^[24] In 2025, key updates included the general availability of Multidimensional Data Layouts in September for dynamic data sorting to accelerate analytical queries, support for writing to Apache Iceberg tables in November to enhance open table format integration, and expanded SUPER data type capabilities for handling semi-structured data.^[25]^[5]^[26] A key update in this period was the general availability of multi-data warehouse writes for enhanced cross-region data sharing, facilitating global analytics workflows.^[27] Throughout its evolution, Amazon Redshift has undergone continuous reinvention driven by customer feedback, with a strong emphasis on machine learning integrations like Redshift ML for in-database model training and sustainability efforts through optimized resource efficiency to reduce environmental impact.^[23]

Architecture

Cluster Components

An Amazon Redshift cluster serves as the foundational unit of a data warehouse, consisting of hardware and software components designed for scalable data processing. It includes a leader node for coordination and one or more compute nodes for storage and computation, enabling massively parallel processing (MPP) to distribute workloads across nodes for efficient query handling. Clusters can be configured as single-node setups for development and testing or multi-node configurations for production environments, with support for automatic failover to maintain availability in case of node failures.^[28] The leader node is a single, dedicated component responsible for managing client connections, parsing and analyzing incoming SQL queries, developing query execution plans, and coordinating activities across the compute nodes. It maintains metadata about the cluster's tables, views, and user permissions, and compiles and distributes code to relevant compute nodes only when queries reference data stored on those nodes. The leader node operates using a PostgreSQL-compatible engine, specifically based on PostgreSQL 8.0.2, which allows compatibility with standard SQL clients and tools with minimal modifications. In single-node clusters, the leader node also performs compute functions, combining both roles into one instance.^[29]^[6] Compute nodes form the scalable backbone of the cluster, handling the actual storage of data and execution of query portions in parallel. Each compute node has its own dedicated CPU, memory, and storage, divided into slices for further parallelism in data processing. As of 2025, clusters support up to 128 compute nodes, depending on the node type and instance size, allowing for petabyte-scale data warehousing. Multi-node clusters mirror data across compute nodes to enable automatic recovery and failover, minimizing downtime during hardware issues.^[30]^[29] Amazon Redshift offers several compute node types optimized for different workloads, balancing compute power, memory, and storage needs:

DC2 nodes (dense compute-optimized): Designed for compute-intensive workloads with high-performance SSD storage; available in sizes like dc2.large (15 GB memory, 2 vCPUs) up to dc2.8xlarge (244 GB memory, 32 vCPUs), supporting up to 128 nodes in larger configurations.^[28]
RA3 nodes (Redshift Managed Storage): Enable independent scaling of compute and storage, with data persisted in Amazon S3 and a high-performance SSD tier-1 cache on each node for frequently accessed data; for example, ra3.16xlarge provides 384 GB memory, 48 vCPUs, and up to 128 TB of managed storage per node, supporting clusters of 2 to 128 nodes for elastic expansion to petabytes.^[29]^[30]

These components collectively ensure that Redshift clusters can handle diverse data warehousing needs, from small-scale prototyping to large enterprise deployments, with built-in management for reliability and performance.^[31]

Data Distribution and Storage

Amazon Redshift organizes data across compute nodes using distribution styles to ensure even workload distribution and minimize data movement during queries. Data is stored in a columnar format on each node's slices, where slices represent logical partitions of the node's storage. This approach enables parallel processing and efficient scans by reading only relevant columns.^[32]^[33]

Distribution Styles

Redshift supports four primary distribution styles for tables: KEY, EVEN, ALL, and AUTO. The KEY style distributes rows based on the values in a specified distribution column, using a hash function to collocate matching values across slices, which optimizes joins and aggregations on that column by reducing data redistribution.^[34] This style is ideal for large tables frequently joined on the distribution key, such as fact tables distributed by a foreign key matching dimension tables.^[34] In contrast, the EVEN style distributes rows in a round-robin fashion across all slices without regard to column values, providing uniform data placement and preventing skew, but it may require data movement for joins.^[34] It suits tables not involved in joins or where no clear distribution key exists.^[34] The ALL style replicates the entire table on every node, ensuring no data movement for joins involving this table, but it increases storage requirements proportionally to the number of nodes and slows inserts or updates.^[34] This is best for small dimension tables, typically under 1 GB, that are rarely updated.^[34] For flexibility, the AUTO style lets Redshift automatically select and adjust the optimal style based on table size and query patterns: ALL for small tables, KEY for medium tables with suitable keys, and EVEN for very large tables.^[34] Adjustments occur in the background with minimal query disruption, and users can monitor changes via system views like SVL_AUTO_WORKER_ACTION.^[34]

Sort Keys

Sort keys define the order in which data is physically stored within columns on disk, enabling query optimization through data skipping and improved compression. Redshift offers compound, interleaved, and AUTO sort keys. A compound sort key sorts data by the first column entirely, then by subsequent columns within those groups, making it efficient for queries filtering on a prefix of the key columns, such as range scans on the leading column followed by equality on trailing ones.^[35] This boosts performance for merge joins, GROUP BY, and window functions using the sort order, while also enhancing compression ratios.^[35] However, queries relying solely on later columns may not benefit and could perform worse due to skipped prefix optimization.^[35] An interleaved sort key assigns equal weight to each column in the sort key, up to eight columns, allowing effective sorting for queries on any subset or combination of those columns without prioritizing order.^[36] It uses an internal compression scheme for zone map values to better distinguish among similar data, making it suitable for diverse query patterns, such as filters on non-prefix columns like regions or categories.^[36] Compared to compound keys, interleaved sorts handle varied predicates more evenly but require longer VACUUM REINDEX operations and may underperform with monotonically increasing data like timestamps.^[36] The AUTO sort key lets Redshift automatically choose and maintain the best type based on workload, recommending compound or interleaved as needed.^[37] Sort keys integrate with zone maps, which store minimum and maximum values for each 1 MB data block in a column. During query execution, the engine skips irrelevant blocks if the query predicate falls outside the zone map range, potentially avoiding up to 98% of data scans for range-restricted queries on sorted data, such as a one-month filter over five years of timestamps.^[37]

Columnar Storage and Compression

Redshift employs columnar storage, where each column is stored separately on disk in 1 MB blocks, allowing queries to read only required columns and reducing I/O compared to row-based systems.^[33] This format supports automatic compression on a per-column basis, applied during data loading or via ANALYZE COMPRESSION, to minimize storage footprint without user specification.^[38] Common encodings include LZ (LZO) for high ratios on long text strings like descriptions and Zstandard (ZSTD) for versatile compression across numeric and temporal data types, often achieving up to 4:1 ratios depending on data patterns.^[38] For example, byte-dictionary encoding can yield over 20:1 compression on low-cardinality columns, but overall averages hover around 4:1 for mixed workloads.^[39]^[38] Compression is managed automatically with ENCODE AUTO, selecting optimal encodings like DELTA for integers or RUNLENGTH for sorted booleans to balance storage savings and query speed.^[38]

Integration with External Storage

For RA3 node types, Redshift uses Redshift Managed Storage (RMS), which decouples compute from storage by combining local high-performance SSDs with Amazon S3 for durable, scalable persistence.^[28] Data is automatically tiered: hot data stays on SSDs for fast access, while colder data is offloaded to S3, with intelligent prefetching based on access patterns to maintain performance.^[12] This allows independent scaling of storage up to 16 PB per node without resizing compute, billed separately at a consistent rate regardless of tier.^[28] RMS supports features like automatic fine-grained eviction and workload-optimized data management, ensuring clusters handle growing datasets efficiently.^[12]

Query Execution Engine

Amazon Redshift's query execution process begins on the leader node, where a submitted SQL query undergoes parsing to validate its syntax and structure, followed by optimization using a cost-based query planner. This planner generates an execution plan that accounts for MPP architecture, data distribution strategies, join orders, aggregation methods, and columnar storage to minimize computational overhead and data movement. The resulting plan specifies how data will be processed across the cluster, ensuring efficient resource utilization for analytical workloads.^[40]^[41] The execution engine, residing on the leader node, then translates the optimized plan into low-level compiled code, breaking it into discrete steps, segments, and streams tailored for parallel processing. This code is distributed to the compute nodes, where each node's slices execute portions of the query concurrently in a massively parallel processing (MPP) manner, enabling high-throughput handling of complex operations on petabyte-scale data. Amazon Redshift's engine is built on an extended version of PostgreSQL 8.0.2, incorporating custom MPP extensions such as join redistribution—which dynamically co-locates data across nodes to facilitate efficient joins—and broadcast mechanisms for aggregations, allowing seamless scaling without data replication.^[6]^[42]^[43] To enhance performance, the engine employs compile-time optimizations via the MPP-aware cost-based planner and runtime techniques like predicate pushdown, which applies filtering conditions early in the execution pipeline to reduce scanned data volume, and hash joins, which build in-memory hash tables for rapid data matching across distributed slices. These optimizations, combined with the columnar storage format that enables selective column scans and compression-aware planning, significantly accelerate query runtime by limiting unnecessary I/O and network transfers.^[44]^[45]^[46] As of 2025, Amazon Redshift integrates AI/ML-driven capabilities, such as automated materialized views that use machine learning to identify and maintain precomputed result sets for frequent query patterns, paired with automatic query rewriting that transparently substitutes these views into executing queries for substantial speedups—for instance, reducing execution time from seconds to milliseconds in join-heavy workloads. The service also supports advanced SQL constructs, including common table expressions (CTEs) for modular query composition, correlated and uncorrelated subqueries for nested logic, and federated queries that enable direct access to external data sources like Amazon S3 or other databases without data movement.^[47]^[48]^[49]

Features

Data Loading and Management

Amazon Redshift provides efficient mechanisms for ingesting large volumes of data into its clusters, primarily through the COPY command, which enables parallel bulk loading from sources such as Amazon S3, Amazon DynamoDB, Amazon EMR, and remote hosts via SSH.^[50]^[51] The command supports various input formats, including CSV, JSON, AVRO, Parquet, and ORC, allowing users to load structured and semi-structured data while specifying options for error handling, such as the MAXERROR parameter to tolerate a certain number of invalid rows.^[50]^[52] During loading, data is distributed across cluster nodes according to the table's distribution style to balance workloads.^[51] For exporting data, the UNLOAD command facilitates the parallel unloading of query results from Redshift tables to Amazon S3 in formats like text, JSON, or Parquet, with support for server-side encryption and options to control file partitioning for subsequent processing or reloading.^[53]^[54] This enables seamless data movement to external storage for backup, sharing, or integration with other AWS services. Redshift Spectrum extends data access by allowing queries against exabytes of data in Amazon S3 without requiring ingestion into the cluster, using external tables defined over S3 data in formats such as Parquet, ORC, JSON, and text.^[55] These external tables integrate directly with native Redshift tables in SQL queries, enabling federated analysis across on-cluster and off-cluster data lakes. Data management in Redshift includes automatic compression during the COPY process, where the service analyzes sample data and applies optimal encodings like RAW, LZO, or ZSTD to columns via the ENCODE AUTO setting, reducing storage costs by up to 4x for certain workloads.^[56]^[38] To maintain query performance, the ANALYZE command samples table rows to update column statistics, which the query optimizer uses for efficient execution plans; Amazon Redshift automatically runs lightweight ANALYZE operations after significant data changes, but explicit runs are recommended for large loads.^[57]^[58] Zero-ETL integrations, introduced in 2022, enable near real-time data replication from sources like Amazon Aurora MySQL or PostgreSQL into Redshift without traditional extract, transform, and load pipelines, supporting continuous synchronization for analytics.^[59]^[60] As of 2025, enhanced streaming ingestion capabilities allow low-latency, high-throughput loading directly from Amazon Managed Streaming for Apache Kafka (MSK) or Amazon Kinesis Data Streams into materialized views, accommodating formats like JSON, Avro, and Protobuf for real-time analytics.^[61]^[62]^[63]

Security and Compliance

Amazon Redshift provides robust security features to protect data warehouses from unauthorized access and ensure data integrity. These include identity and access management, encryption mechanisms, network isolation, comprehensive auditing, and adherence to industry compliance standards. By leveraging AWS-native services, Redshift enables organizations to implement defense-in-depth strategies tailored to their regulatory requirements.^[64] Authentication in Amazon Redshift is managed through integration with AWS Identity and Access Management (IAM), which controls access to clusters and resources using roles and policies. Database users can be created with multi-factor authentication (MFA) for added security, and federated access is supported via identity providers such as Active Directory Federation Services (ADFS), allowing single sign-on (SSO) without storing credentials in the database. This federated approach uses SAML 2.0 assertions to map external identities to database roles, enabling seamless access for enterprise users. Additionally, AWS IAM Identity Center facilitates centralized identity management across AWS services.^[65]^[66]^[64] Data encryption in Redshift protects information both at rest and in transit. At rest, all clusters are encrypted by default using AES-256, with keys managed by AWS Key Management Service (KMS); customers can use their own KMS keys for enhanced control over encryption policies and key rotation. In transit, data between clients and clusters is secured via SSL/TLS connections, and hardware-accelerated SSL ensures protection for data movement within the AWS network, such as during loads from Amazon S3. These options allow for server-side or client-side encryption configurations to meet specific security needs.^[67]^[68]^[64] Network security features isolate Redshift clusters within customer-defined environments. Clusters can be deployed in an Amazon Virtual Private Cloud (VPC) for private networking, using security groups as virtual firewalls to control inbound and outbound traffic based on IP addresses and ports. Publicly accessible endpoints are available but not recommended for production; instead, private endpoints and VPC peering provide secure connectivity without exposing clusters to the public internet. Enhanced VPC support includes integration with AWS PrivateLink for private access to the query editor.^[69]^[64] Auditing capabilities in Redshift enable detailed tracking of activities for compliance and incident response. AWS CloudTrail logs all API calls to clusters, capturing management actions like create, delete, and modify operations, with logs delivered to Amazon S3 for analysis. Database-level auditing includes connection logs (authentication attempts and sessions), user logs (changes to user privileges), and user activity logs (SQL queries), which can be streamed to Amazon CloudWatch for real-time monitoring or exported to S3 for long-term storage. These logs support integration with Amazon GuardDuty for threat detection.^[70]^[71]^[64] Redshift complies with major industry standards, including SOC 1, SOC 2, and SOC 3 for financial reporting, audit, and security controls; PCI DSS for payment card data protection; HIPAA for handling protected health information; and GDPR for data privacy in the European Union. As of 2025, Redshift supports data residency requirements by allowing clusters to be provisioned in multiple AWS Regions worldwide, ensuring data remains within specified geographic boundaries to meet local regulations. Compliance reports are available through AWS Artifact for validation.^[72]^[73]^[74]

Advanced Analytics Capabilities

Amazon Redshift provides several built-in features that enable sophisticated data analysis beyond standard SQL querying, leveraging its query execution engine to support complex workloads efficiently.^[40] These capabilities allow users to perform advanced computations, integrate machine learning, and query heterogeneous data sources seamlessly within the data warehouse environment. Materialized views in Amazon Redshift store pre-computed results of complex queries, significantly accelerating the execution of frequent and predictable analytical operations. By caching these results, materialized views reduce the need to recompute resource-intensive joins, aggregations, or transformations each time, improving query performance for dashboards and reports. Users can create materialized views based on Redshift tables or external data via Amazon Redshift Spectrum, and refresh them manually with the REFRESH MATERIALIZED VIEW command or enable automated maintenance using machine learning to detect and update based on workload patterns.^[75]^[76]^[77]^[47] Concurrency scaling enhances Redshift's ability to handle variable workloads by automatically provisioning up to 10 additional clusters during peak demand, ensuring consistent performance for thousands of concurrent queries without manual intervention. This feature is particularly useful for bursty analytics scenarios, such as end-of-month reporting, where it adds temporary capacity and removes it once the spike subsides, with eligible clusters receiving up to one hour of free usage credits per day.^[78]^[79] For machine learning integrations, Redshift includes built-in functions like HyperLogLog for efficient approximations of distinct value counts (cardinality estimation) in large datasets, enabling scalable analytics on high-cardinality data without full scans.^[80]^[81] Additionally, Amazon Redshift ML allows SQL users to create, train, and deploy models directly using familiar SQL commands, with seamless integration to Amazon SageMaker for advanced model training on external compute resources while keeping inference within the Redshift cluster for low-latency predictions.^[82]^[83] The federated query engine in Redshift facilitates joining internal data with external sources, such as Amazon RDS databases, Amazon S3 data lakes, or remote relational databases like MySQL, without data movement. This capability supports hybrid analytics by allowing SQL queries to span operational and analytical stores, optimizing for scenarios like real-time reporting across silos.^[49]^[84] As of 2025, Redshift incorporates generative AI features through Amazon Q, enabling natural language querying in the Redshift Query Editor v2 to generate SQL statements from conversational prompts, which accelerates development and democratizes access for non-expert users while maintaining security through role-based permissions.^[85]^[86]^[87]

Performance and Scaling

Optimization Techniques

Amazon Redshift provides several optimization techniques to improve query performance and resource efficiency by addressing data maintenance, workload prioritization, table design, and monitoring. Users can manually intervene to tune clusters, ensuring faster execution times and better utilization of compute resources. These methods focus on post-load maintenance, query routing, and schema choices to minimize bottlenecks. To maintain table efficiency after data manipulation language (DML) operations like inserts, updates, or deletes, users should regularly execute the VACUUM command, which reclaims space from deleted rows and re-sorts data to optimize storage and query speed. Amazon Redshift automatically performs VACUUM DELETE operations in the background based on the volume of deleted rows, but manual VACUUM is recommended after significant DML activity to further compact tables and restore sort order. Complementing this, the ANALYZE command updates table statistics, such as column value distributions, enabling the query optimizer to generate more efficient execution plans. Running ANALYZE routinely at the end of load or update cycles ensures accurate cardinality estimates, reducing unnecessary data scans.^[88]^[58] Workload management (WLM) allows users to configure query queues that prioritize different workloads, preventing long-running queries from blocking short ones and improving overall throughput. In manual WLM, administrators define multiple queues with allocated memory percentages and assign user groups or query types—such as short analytical queries versus complex ETL jobs—to specific queues based on rules like query text patterns or user roles. Automatic WLM, powered by machine learning, dynamically adjusts concurrency and memory allocation across up to eight queues to maximize resource use without manual configuration. This setup is particularly effective for mixed workloads, where short queries complete faster by avoiding contention with resource-intensive operations.^[89]^[90] Selecting appropriate distribution and sort keys during table creation is a core best practice for minimizing data movement and I/O during joins and scans. Distribution keys determine how rows are spread across cluster nodes; choosing a key on frequently joined columns co-locates related data, reducing network traffic, while EVEN distribution suits tables without clear join patterns to balance load. Sort keys, often compound for time-series data or interleaved for multi-dimensional queries, enable zone maps that skip irrelevant blocks during reads, significantly speeding up range filters. As of 2024, improved autonomics algorithms provide smarter recommendations for distribution and sort keys to automate these optimizations. To avoid data skew—where uneven distribution leads to hotspots on specific nodes—users should analyze row counts per slice using system views and select keys with high cardinality and uniform values, such as customer IDs for sales tables. Compression techniques in storage further enhance these benefits by reducing disk I/O for sorted data.^[91]^[92]^[34]^[24] Temporary tables offer a lightweight way to stage intermediate results in ETL pipelines, improving performance by avoiding persistent storage overhead. Created with the TEMPORARY keyword, these tables can inherit or specify distribution and sort keys to align with parent tables, ensuring efficient joins without full data redistribution. They are session-scoped, automatically dropped at session end, and support encodings for compression, making them ideal for complex transformations where materializing subsets reduces main query complexity.^[93] Short query acceleration (SQA) targets small, ad-hoc queries for sub-second responses by routing them to a dedicated, concurrency-aware space separate from main WLM queues. Enabled by default with a configurable maximum runtime (1-20 seconds), SQA uses machine learning to predict and prioritize queries likely to finish quickly, bypassing queues for long-running jobs and maintaining high throughput in interactive scenarios. This feature is especially valuable for business intelligence dashboards, where user responsiveness directly impacts productivity.^[94] Effective monitoring relies on system tables and views to track key metrics, guiding further optimizations. The STL_QUERY_METRICS table logs per-segment details like CPU time, rows processed, and I/O for completed queries, helping identify bottlenecks such as high CPU skew indicating uneven workloads. For cluster-wide insights, Amazon CloudWatch metrics provide CPU utilization percentages and query throughput (queries per second), alerting on thresholds like sustained 80% utilization that may signal under-provisioning. Querying STL tables, such as STL_WLM_QUERY_STATE for queue wait times, enables proactive tuning of WLM configurations or keys based on real-time diagnostics.^[95]^[96]^[97]

Scaling Options

Amazon Redshift provides multiple scaling options to accommodate growing data volumes and increasing query concurrency, allowing users to adjust compute resources dynamically without significant disruptions. These options include vertical and horizontal scaling for provisioned clusters, concurrency scaling for handling peak loads, and a serverless deployment model that automates resource management. By leveraging these mechanisms, Redshift ensures efficient performance as workloads evolve.^[98]^[78]^[99] Vertical scaling in Redshift involves resizing the cluster by changing the node type to increase compute and memory capacity per node, such as upgrading from dc2.large to dc2.8xlarge nodes. This process uses elastic resize, which typically completes in minutes with minimal downtime— the cluster remains available for read-only queries during data redistribution, and full availability is restored shortly after. Elastic resize supports changes within the same node family or to RA3 nodes, enabling up to a 2x increase in node count for DC2 clusters or 4x for certain RA3 configurations, making it suitable for steady workload growth. In October 2024, AWS introduced RA3.large instances for improved price-performance in smaller-scale deployments.^[98]^[100]^[101] Horizontal scaling is available primarily in RA3 clusters, where users can add or remove compute nodes independently of storage, thanks to managed storage that decouples compute from data capacity. This elastic approach allows quick adjustments to the number of nodes—up to 128 per cluster— to handle larger datasets or higher throughput, with resize operations completing in about 10 minutes and limited downtime similar to vertical scaling. By scaling compute horizontally, RA3 clusters provide flexibility for variable workloads while optimizing costs through independent storage scaling.^[98] For provisioned clusters, concurrency scaling automatically adds temporary compute capacity during peak periods to support thousands of concurrent users and queries with consistently fast performance. This feature offloads eligible queries to up to 10 additional scaling clusters, enabling burst capacity that can increase overall throughput significantly— for example, handling write operations like ETL jobs without impacting the main cluster. As of 2024, concurrency scaling has been expanded to support write queries using complex data types such as SUPERUSER, GEOMETRY, and GEOGRAPHY. Users configure it via workload management queues, and charges apply only for active scaling usage, making it ideal for intermittent high-demand scenarios.^[78]^[79]^[102] Amazon Redshift Serverless offers an automated scaling model where resources are provisioned and scaled elastically based on workload demands, eliminating the need for manual cluster management. It uses Redshift Processing Units (RPUs) to measure and adjust compute capacity dynamically, scaling up for complex queries and down during idle periods. In 2024, AI-driven scaling and optimization enhancements were introduced, providing up to 10x better price-performance for variable workloads by learning patterns and adjusting resources proactively. Billing is based on RPU-hours consumed, with no charges when the endpoint is idle, providing a pay-per-use approach for unpredictable or sporadic analytics needs. AWS Graviton processors in Serverless offer up to 30% better price-performance as of 2024.^[99]^[103]^[104]^[24] Provisioned Redshift clusters support seamless pause and resume operations to control costs during low-activity periods, suspending compute billing while retaining data in storage. Pausing a cluster takes effect within minutes via the console, CLI, or API, and resuming restores full functionality shortly after, allowing users to schedule these actions for non-production hours without data loss or reconfiguration. This capability, available since 2020, complements other scaling options by enabling on-demand resource suspension in provisioned environments.^[105]^[106]

Integrations

With AWS Services

Amazon Redshift integrates seamlessly with Amazon Simple Storage Service (S3) for efficient data ingestion, leveraging the COPY command to load data in parallel from multiple files stored in S3 buckets, utilizing Redshift's massively parallel processing architecture.^[107] Additionally, Amazon Redshift Spectrum extends this integration by enabling direct SQL queries on exabyte-scale data in S3 without the need to load it into Redshift tables, allowing users to analyze data lakes alongside warehouse data. As of November 17, 2025, Redshift supports writing to Apache Iceberg tables in S3, enabling updates and inserts to open-format data lakes for enhanced lakehouse workflows.^[5]^[55] AWS Glue facilitates ETL processes by connecting to Redshift databases, moving data through S3 for maximum throughput via the COPY and UNLOAD commands, and supporting visual ETL job authoring in Glue Studio for data preparation and transformation.^[108] For database migrations, AWS Database Migration Service (DMS) uses S3 as an intermediary to transfer data from sources like Oracle, PostgreSQL, or other databases into Redshift, enabling continuous replication and schema conversion.^[109] In analytics pipelines, Redshift connects with Amazon QuickSight for interactive visualization and dashboarding, supporting secure access through IAM roles, trusted identity propagation, or database credentials to query live data directly. This integration allows QuickSight users to build reports and perform ad-hoc analysis on Redshift datasets. Complementing this, Amazon Athena provides federated querying capabilities via its Redshift connector, enabling ad-hoc SQL queries on Redshift data from S3-based environments or shared catalogs in the AWS Glue Data Catalog, ideal for exploring shared data without dedicated clusters.^[110] For machine learning workflows, Redshift offers direct integration with Amazon SageMaker through Redshift ML, allowing users to train models on Redshift data using SageMaker algorithms and perform in-database inference via SQL, including access to SageMaker endpoints for remote predictions.^[82] Similarly, integration with Amazon Bedrock, announced in October 2024, enables the creation of external models using Bedrock's large language models (LLMs) for generative AI applications, where users can invoke LLMs directly from SQL queries on Redshift data to support tasks like text generation or summarization.^[111] A key specific integration is zero-ETL with Amazon Aurora and Amazon RDS, introduced in 2022, which automates near-real-time data replication from these operational databases to Redshift without traditional ETL processes, supporting MySQL and PostgreSQL engines for timely analytics and ML on transactional data.^[112] Redshift Data Sharing facilitates secure, live data access across AWS accounts and regions without data copying, using datashares to enable collaboration on transactionally consistent datasets while maintaining governance and encryption.^[113]

With External Tools

Amazon Redshift supports integration with a variety of business intelligence (BI) tools through its ODBC and JDBC drivers, which enable direct querying and visualization of data stored in Redshift clusters.^[114]^[115] For instance, Tableau connects to Redshift using these drivers to create interactive dashboards and perform data analysis.^[116] Similarly, Power BI leverages ODBC/JDBC connectivity to import Redshift data for reporting and insights.^[117] Looker also integrates seamlessly via JDBC, allowing users to build semantic models and explore Redshift datasets within its platform. For extract, transform, load (ETL) and extract, load, transform (ELT) workflows, Redshift is compatible with popular open-source and third-party tools that facilitate data pipeline orchestration and transformations. Apache Airflow, often managed via Amazon Managed Workflows for Apache Airflow (MWAA), can schedule and automate data loading into Redshift.^[118] dbt (data build tool) supports ELT processes directly on Redshift, enabling modular SQL transformations within the warehouse.^[118] Matillion, a cloud-native ETL platform, provides drag-and-drop orchestration for loading and transforming data into Redshift environments.^[119] Redshift's compatibility extends to programming languages and SQL clients, allowing developers to interact with clusters programmatically. In Python, the psycopg2 library or the official Amazon Redshift Python connector facilitates connections for querying and data manipulation.^[120]^[121] R users can connect via packages like RJDBC or RPostgres to analyze Redshift data statistically.^[122] SQL clients such as DBeaver support Redshift through its built-in driver, offering a graphical interface for schema management and query execution.^[123] A key enabler of these integrations is Redshift's adherence to the PostgreSQL wire protocol, which allows tools designed for PostgreSQL to connect without custom modifications.^[124] This includes pgAdmin, a popular open-source administration tool, which can manage Redshift connections for database exploration and maintenance.^[125] Federated query capabilities support access to on-premises databases using AWS PrivateLink for secure, private connectivity over the AWS network.^[126] Redshift's SQL compatibility underpins these external tool interactions, providing a familiar interface for standard querying.^[6]

Pricing

Cost Models

Amazon Redshift offers several pricing models designed to accommodate different workload patterns and budget requirements, primarily through provisioned clusters, reserved instances, and a serverless option.^[31] On-demand pricing for provisioned clusters bills users per hour of compute usage, with rates varying by node type and region; for example, in US East (N. Virginia), the dc2.large node starts at $0.25 per hour, while higher-end ra3.16xlarge nodes cost up to $13.04 per hour as of November 2025.^[31] Reserved instances provide cost savings for committed usage, offering 1-year or 3-year terms with options for no upfront, partial upfront, or all upfront payments, potentially reducing costs by up to 75% compared to on-demand rates depending on the commitment and node type.^[31] The serverless model shifts billing to a pay-per-query basis, charging for Redshift Processing Units (RPUs) consumed, with rates at $0.36 per RPU-hour, billed per second with a 60-second minimum.^[31] Serverless Reservations provide up to 24% savings for 1-year commitments as of April 2025.^[127] This approach eliminates the need for cluster provisioning and automatically scales compute resources, making it suitable for unpredictable or bursty workloads.^[31] Additional costs beyond compute include managed storage at $0.024 per GB-month for Redshift Managed Storage (RMS), which separates storage from compute to enable independent scaling.^[31] Data transfer out from Redshift to the internet follows standard AWS rates, starting at $0.09 per GB for the first 10 TB per month in US East (N. Virginia).^[31] Queries via Amazon Redshift Spectrum, which access data in Amazon S3, incur charges of $5 per TB of data scanned, regardless of compression.^[31] Concurrency scaling, which allows clusters to handle spikes in concurrent queries by adding temporary capacity, is included at no extra cost for the first hour per day per cluster; additional usage beyond that is billed at the on-demand rate of the provisioned cluster's node type, per second with no minimum.^[31] These models integrate with Redshift's scaling options, such as elastic resize, to influence overall compute consumption and thus costs.^[31]

Pricing Component	Key Rate (US East, N. Virginia)	Billing Unit
On-Demand (e.g., ra3.4xlarge)	$3.26 per node-hour	Per hour
Reserved Instances	Up to 75% discount on on-demand	1- or 3-year term
Serverless	$0.36 per RPU-hour	Per second (60s min)
Managed Storage	$0.024 per GB-month	Per month
Spectrum Queries	$5 per TB scanned	Per query
Concurrency Scaling (excess)	On-demand node rate	Per second after 1 free hour/day

Cost Management Strategies

One effective strategy for managing costs in Amazon Redshift is to pause and resume clusters during idle periods, such as nights or weekends, which suspends compute charges while continuing to incur only storage fees.^[105] This feature is available via the AWS Management Console, CLI, or API, and can be automated with Amazon EventBridge for scheduled operations, potentially reducing expenses for non-continuous workloads like development environments.^[128] When paused, clusters retain their data and configurations, resuming operations in minutes without data loss.^[106] Right-sizing clusters involves monitoring usage patterns with Amazon CloudWatch metrics, such as CPU utilization and query throughput, to select the optimal node type and quantity that matches workload demands without overprovisioning.^[129] For instance, if metrics indicate low utilization, resizing from a larger dc2.8xlarge node to a smaller ra3.4xlarge can lower costs while maintaining performance, with elastic resize operations completing in 10-15 minutes.^[130] Tools like Amazon Redshift Advisor provide recommendations based on historical data to identify opportunities for such adjustments.^[131] For long-term commitments, Amazon Redshift Reserved Instances offer up to 75% savings compared to on-demand pricing through one- or three-year terms, applicable to specific node types and providing flexible capacity reservations.^[31] These commitments apply across clusters in the same AWS Region, allowing reallocation of reserved capacity as needs evolve, unlike more rigid options.^[130] To control query-related costs, configure Workload Management (WLM) queues to prioritize critical queries, allocate resources efficiently, and prevent resource-intensive operations from dominating compute usage.^[132] Setting query timeouts and monitoring rules within WLM can abort or log long-running queries, while system views like STL_QUERY and SVL_QLOG enable analysis of query execution times and resource consumption to optimize inefficient patterns.^[133]^[134] As of 2025, leveraging zero-ETL integrations reduces data movement costs by enabling near-real-time data access from sources like Amazon RDS without traditional ETL pipelines, incurring no additional fees beyond standard Redshift resource usage.^[31] This approach minimizes compute expenses for data ingestion and transformation, particularly for operational analytics workloads.^[135] AWS Cost Explorer provides detailed breakdowns of Redshift-specific costs, including node hours, storage, and Spectrum queries, allowing users to set budgets, forecast expenses, and receive recommendations for further optimizations like Reserved Instance purchases.^[136] By filtering reports for Redshift usage over 30-60 days, organizations can identify trends and allocate costs accurately to projects or teams.^[137] Additionally, Amazon Redshift Serverless offers a cost-variable alternative for unpredictable workloads, automatically scaling compute resources and billing only for actual usage without cluster management overhead.^[130]

Use Cases

Business Intelligence and Reporting

Amazon Redshift serves a central role in business intelligence (BI) workflows by aggregating vast amounts of data from operational sources, including relational databases, data lakes, and external systems, into a unified structure suitable for generating dashboards and reports. This aggregation process enables organizations to consolidate disparate data streams—such as transactional records and historical logs—into a scalable data warehouse optimized for analytical queries.^[7] In common BI scenarios, Redshift facilitates ad-hoc reporting for sales analytics, where teams can quickly analyze revenue trends across regions or product lines, and cohort analysis in marketing to track customer behavior groups over time based on acquisition dates or engagement metrics. These applications leverage Redshift's SQL-based querying to support interactive exploration without the need for complex preprocessing.^[7] A primary benefit of Redshift in BI and reporting is its ability to execute sub-second queries on terabytes of data, powered by columnar storage, compression, and result caching, which delivers real-time insights even during peak usage periods. This performance enables business users to derive actionable intelligence rapidly, such as identifying emerging patterns in sales data to inform immediate strategic adjustments.^[7] An example workflow for BI tasks begins with loading raw data from Amazon S3 into Redshift using built-in COPY commands or ETL tools, followed by executing standard SQL queries to aggregate and transform the data, and concluding with visualization in Amazon QuickSight to create interactive dashboards for stakeholders.^[7] In retail settings, companies like Daiso employ Redshift for inventory trend reporting, where it processes sales and stock data from nearly 6,000 stores to monitor 76,000 items, accommodating seasonal spikes by extending data retention from one day to two years for deeper trend analysis and optimized replenishment decisions.^[138]

Machine Learning and Advanced Analytics

Amazon Redshift facilitates machine learning (ML) workflows through Amazon Redshift ML, a feature that enables SQL users to create, train, and deploy models without requiring specialized ML expertise or external tools. By integrating with Amazon SageMaker, Redshift ML automates much of the model-building process using SageMaker Autopilot, supporting algorithms such as XGBoost for classification and regression, multilayer perceptrons for complex predictions, and K-Means for clustering. This allows data stored in Redshift to serve as the training dataset, with models registered as SQL user-defined functions (UDFs) for in-database inference, reducing data movement and latency.^[83]^[139]^[140] Feature engineering in Redshift ML pipelines often leverages SQL UDFs, which permit custom scalar or aggregate functions to transform raw data into model-ready features directly within queries. For instance, analysts can define UDFs to normalize variables, encode categorical data, or compute derived metrics like transaction velocities from financial datasets. Training data preparation integrates seamlessly with SageMaker by exporting subsets of Redshift tables to Amazon S3, where SageMaker handles preprocessing tasks such as imputing missing values and one-hot encoding, ensuring compatibility for downstream model training. This approach streamlines the end-to-end ML process, enabling rapid iteration on large-scale datasets.^[141]^[82]^[142] In advanced analytics scenarios, Redshift supports anomaly detection for fraud analytics by training supervised binary classification models on historical transaction data, often achieving strong performance metrics such as an F1 score of 0.91 when using streaming ingestion from Amazon Kinesis Data Streams. These models evaluate features like transaction amounts, timing, and risk scores to flag potential fraud in near real-time via materialized views and SQL predictions. For recommendation systems, Redshift ML powers personalized content delivery, as demonstrated by Jobcase, which uses it to generate billions of job search recommendations by training models on user interaction data and applying inferences at scale within SQL workflows. The in-database ML functions provide key integration benefits, allowing outlier detection through algorithms like XGBoost integrated via SageMaker, which minimizes the need for data export and supports efficient querying of petabyte-scale datasets for predictive tasks.^[143]^[144]^[82] Financial firms commonly employ Redshift for risk modeling on historical transaction data, enabling complex simulations for credit risk assessment, portfolio optimization, and asset valuation through ML-driven forecasts integrated into dashboards. As of 2025, Redshift extends support for generative AI workloads via integration with Amazon Bedrock, permitting direct invocation of foundation models like Anthropic Claude within SQL for natural language processing tasks such as sentiment analysis or text summarization on query results. This is complemented by Amazon Q features in the Redshift Query Editor, which allow natural language querying to generate SQL code from user prompts, enhancing accessibility for advanced analytics without coding expertise.^[145]^[111]^[85]

References

[1]
What is Amazon Redshift? - Amazon Redshift - AWS Documentation
Amazon Redshift is a fully managed, petabyte-scale data warehouse service in the cloud, offering fast query performance with SQL-based tools.
[2]
Announcing Amazon Redshift
Nov 28, 2012 · We are excited to announce the limited preview of Amazon Redshift, a fast and powerful, fully managed, petabyte-scale data warehouse service ...
[3]
Amazon Redshift Features - Cloud Data Warehouse
Amazon Redshift offers price performance, scalability, security, efficient storage, near real-time analytics, and generative AI features.
[4]
Amazon Redshift and PostgreSQL - AWS Documentation
Amazon Redshift is based on PostgreSQL. Amazon Redshift and PostgreSQL have a number of very important differences that you must be aware of.
[5]
Amazon Redshift – AWS - Cloud Data Warehouse
Amazon Redshift is a cloud data warehouse for SQL analytics, offering high price-performance, serverless scaling, and near real-time analytics.
[6]
Amazon Redshift - Big Data Analytics Options on AWS
Amazon Redshift is a fast, fully-managed, petabyte-scale data warehouse service that makes it simple and cost-effective to analyze all your data efficiently.
[7]
What's new in Amazon Redshift – 2021, a year in review
Dec 16, 2021 · Amazon Redshift is the cloud data warehouse of choice for tens of thousands of customers who use it to analyze exabytes of data to gain ...
[8]
Best practice 11.1 – Decouple storage from compute
Amazon Redshift RA3 instance types support the ability to decouple the compute and storage. This allows your Amazon Redshift storage to scale independently ...
[9]
Scale your cloud data warehouse and reduce costs with the new ...
Aug 18, 2020 · The separation of compute and storage allows us to scale independently, and allows for easier cluster maintenance.” For many customers that ...
[10]
Amazon Redshift deep dive - Data Warehousing on AWS
As a columnar MPP technology, Amazon Redshift offers key benefits for performant, cost-effective data warehousing, including efficient compression, reduced ...
[11]
Window functions - Amazon Redshift - AWS Documentation
Amazon Redshift supports two types of window functions: aggregate and ranking. Following are the supported aggregate functions: AVG window function · COUNT ...Row_number · SUM · Count · First_value
[12]
Automatic table optimization - Amazon Redshift - AWS Documentation
Automatic table optimization is a self-tuning capability that automatically optimizes the design of tables by applying sort and distribution keys.
[13]
VACUUM - Amazon Redshift - AWS Documentation
Amazon Redshift automatically sorts data and runs VACUUM DELETE in the background. This lessens the need to run the VACUUM command. For more information, see ...Parameters · Usage notes
[14]
Amazon Redshift – The New AWS Data Warehouse
Nov 28, 2012 · We are launching Amazon Redshift in a limited public beta mode today. Amazon Redshift is a massively parallel, fully-managed data warehouse ...
[15]
Amazon Web Services Announces Redshift, New Data Warehouse ...
Nov 28, 2012 · Amazon Web Services (AWS) has launched Redshift, a new petabyte-scale data warehouse service intended to disrupt the old-guard data ...
[16]
Amazon Takes Redshift, Its Cloud-Based Data Warehouse Killer ...
its cloud-based data warehousing service — widely available, after first ...Missing: general | Show results with:general
[17]
Redshift, New Instance Types, Price Cuts and More at AWS Re: Invent
Dec 4, 2012 · Redshift, New Instance Types, Price Cuts and More at AWS Re: Invent. By Tiffany Trader. December 4, 2012. Last week, Amazon Web Services kicked ...
[18]
Faster & More Cost-Effective SSD-Based Nodes for Amazon Redshift
Jan 24, 2014 · The Dense Compute nodes feature a high ratio of CPU power, RAM, and I/O performance to storage, making them ideal hosts for data warehouses ...
[19]
Introducing Amazon Redshift Spectrum
Today we announced the general availability of Amazon Redshift Spectrum, a new feature that allows you to run SQL queries against exabytes ...
[20]
Amazon Redshift introduces RA3 nodes with managed storage ...
RA3 nodes enable you to scale and pay for compute and storage independently allowing you to size your cluster based only on your compute needs.Missing: separation | Show results with:separation
[21]
Amazon Redshift: Ten years of continuous reinvention
The Redshift managed storage layer (RMS) is designed for a durability of 99.999999999% and 99.99% availability over a given year, across multiple availability ...
[22]
Recap of Amazon Redshift key product announcements in 2024
Dec 17, 2024 · Amazon Redshift, launched in 2013, has undergone significant evolution since its inception, allowing customers to expand the horizons of ...Recap Of Amazon Redshift Key... · Seamless Lakehouse... · Simplified Ingestion And...
[23]
https://www.amazon.science/latest-news/amazon-redshift-ten-years-of-continuous-reinvention
[24]
Amazon Redshift provisioned clusters - AWS Documentation
An Amazon Redshift data warehouse is a collection of computing resources called nodes, which are organized into a group called a cluster.
[25]
Data warehouse system architecture - Amazon Redshift
The core infrastructure component of an Amazon Redshift data warehouse is a cluster. A cluster is composed of one or more compute nodes.
[26]
Amazon Redshift provisioned clusters - Amazon Redshift
### Summary of Amazon Redshift Clusters
[27]
Amazon Redshift Pricing
Amazon Redshift node types. Amazon Redshift offers different node types to accommodate your workloads, and we recommend choosing RA3. Amazon Redshift ...<|separator|>
[28]
Data distribution for query optimization - Amazon Redshift
Machine learning · Machine learning overview · Machine learning for novices and ... Data distribution for query optimization. PDF · RSS. Focus mode. On this ...
[29]
Columnar storage - Amazon Redshift - AWS Documentation
Using columnar storage, each data block stores values of a single column for multiple rows. As records enter the system, Amazon Redshift transparently converts ...Missing: MPP | Show results with:MPP
[30]
Distribution styles - Amazon Redshift - AWS Documentation
Amazon Redshift distribution styles are AUTO, EVEN, KEY, or ALL. AUTO assigns optimal style; EVEN distributes rows round-robin; KEY by column; ALL copies to ...
[31]
Compound sort key - Amazon Redshift - AWS Documentation
A compound sort key is most useful when a query's filter applies conditions, such as filters and joins, that use a prefix of the sort keys.
[32]
Interleaved sort key - Amazon Redshift
An interleaved sort gives equal weight to each column, or subset of columns, in the sort key. If multiple queries use different columns for filters, ...
[33]
Sort keys - Amazon Redshift - AWS Documentation
Sort keys in Amazon Redshift are columns defined when creating a table, which can be compound or interleaved, and can be set to SORTKEY AUTO.
[34]
Compression encodings - Amazon Redshift - AWS Documentation
ENCODE AUTO is the default for tables. When a table is set to ENCODE AUTO, Amazon Redshift automatically manages compression encoding for all columns in the ...
[35]
Testing compression encodings - Amazon Redshift
You can see that BYTEDICT encoding on the second column produced the best results for this dataset. This approach has a compression ratio of better than 20:1.
[36]
Query processing - Amazon Redshift - AWS Documentation
Amazon Redshift routes a submitted SQL query through the parser and optimizer to develop a query plan. The execution engine then translates the query plan into ...
[37]
Query planning and execution workflow - Amazon Redshift
The query plan specifies execution options such as join types, join order, aggregation options, and data distribution requirements.
[38]
Architecture components of an Amazon Redshift data warehouse
The execution engine translates the query plan into discrete steps, segments, and streams: · The execution engine generates compiled code based on these steps, ...
[39]
Amazon Redshift : Avoid data redistribution - Abhishek Tiwari
Mar 14, 2016 · Amazon Redshift query execution engine ships with an MPP-aware query optimizer. Redshift's query optimizer determines where the block of data ...
[40]
Amazon Redshift Performance - AWS Documentation
The Amazon Redshift query run engine incorporates a query optimizer that is MPP-aware and also takes advantage of the columnar-oriented data storage. The ...
[41]
Lesser Known aspects of Amazon Redshift | by Abdul Rafee Wahab
Apr 28, 2023 · It has several optimization techniques, such as: Predicate pushdown; JOIN reordering; Projection pushdown. This allows queries to run faster and ...
[42]
Query performance improvement - Amazon Redshift
Machine learning · Machine learning overview · Machine learning for novices and experts ... For more information, see Data distribution for query optimization.
[43]
Automated materialized views - Amazon Redshift
Automated materialized views in Redshift automatically create and manage precomputed result sets using machine learning to improve query performance.
[44]
Optimize your Amazon Redshift query performance with automated ...
Jul 12, 2022 · Automated materialized views provide performance improvements to a data warehouse without requiring any manual effort or specialized expertise.
[45]
Querying data with federated queries in Amazon Redshift
By using federated queries in Amazon Redshift, you can query and analyze data across operational databases, data warehouses, and data lakes.
[46]
COPY - Amazon Redshift - AWS Documentation
To use the COPY command, you must have INSERT privilege for the Amazon Redshift table. COPY syntax. COPY table-name [ column-list ] FROM data_source ...COPY syntax · Required parameters · Usage notes and additional...
[47]
Loading tables with the COPY command - Amazon Redshift
The COPY command leverages the Amazon Redshift massively parallel processing (MPP) architecture to read and load data in parallel from files on Amazon S3.
[48]
COPY from columnar data formats - Amazon Redshift
COPY loads data from ORC and Parquet formats in S3, using Redshift Spectrum. The number of columns must match, and the S3 bucket must be in the same region. ...
[49]
UNLOAD - Amazon Redshift - AWS Documentation
The UNLOAD command unloads query results to text, JSON, or Apache Parquet files on Amazon S3, using Amazon S3 server-side encryption.Required privileges and... · Syntax · Parameters · Usage notes
[50]
Unloading data in Amazon Redshift
To unload data from database tables to a set of files in an Amazon S3 bucket, you can use the UNLOAD command with a SELECT statement.
[51]
Amazon Redshift Spectrum - AWS Documentation
Using Amazon Redshift Spectrum, you can efficiently query and retrieve structured and semi-structured data from files in Amazon S3 without having to load the ...Getting started with Amazon ...Database Developer GuideExternal tablesMetricsExternal schemas
[52]
Loading tables with automatic compression - Amazon Redshift
You can use automatic compression when you create and load a brand new table. The COPY command performs a compression analysis.
[53]
ANALYZE - Amazon Redshift - AWS Documentation
Explicit ANALYZE commands analyze new data, predicate columns, or resource-intensive operations. Analyze history provides details on completed analyze ...Parameters · Usage notes
[54]
Analyzing tables - Amazon Redshift - AWS Documentation
The ANALYZE command gets a sample of rows from the table, does some calculations, and saves resulting column statistics. By default, Amazon Redshift runs a ...
[55]
AWS announces Amazon Aurora zero-ETL integration with Amazon ...
Nov 29, 2022 · Amazon Aurora now supports zero-ETL integration with Amazon Redshift, to enable near real-time analytics and machine learning (ML) using Amazon Redshift.
[56]
Aurora zero-ETL integrations - AWS Documentation
A zero-ETL integration makes the data in your Aurora DB cluster available in Amazon Redshift or an Amazon SageMaker AI lakehouse in near real-time. Once that ...Getting started with Aurora... · Data filtering for Aurora zero...
[57]
Streaming ingestion to a materialized view - Amazon Redshift
Streaming ingestion provides low-latency, high-speed data ingestion from Amazon Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka to an Amazon ...
[58]
Amazon Redshift streaming data ingestion for Amazon MSK
The Amazon Redshift streaming ingestion feature provides low-latency, high-speed ingestion of streaming data from Amazon MSK into an Amazon Redshift ...
[59]
Near real-time streaming analytics on protobuf with Amazon Redshift
Aug 4, 2025 · A message queue using Amazon Managed Streaming for Apache Kafka (Amazon MSK) or Amazon Kinesis accepts the protobuf messages sent by the event ...
[60]
Amazon Redshift Security & Governance
Amazon Redshift supports industry-leading security by supporting integration with AWS IAM Identity Center for single sign-on, multi-factor authentication and ...
[61]
Security in Amazon Redshift - AWS Documentation
Amazon Redshift has a collection of best practices for managing permissions, identities and secure access.
[62]
AD FS - Amazon Redshift - AWS Documentation
AD FS is used as an identity provider to access Amazon Redshift. This involves setting up trust between AD FS and AWS, and using JDBC/ODBC for authentication.
[63]
Encryption at rest - Amazon Redshift - AWS Documentation
Using AWS KMS, you can create encryption keys and define the policies that control how these keys can be used. AWS KMS supports AWS CloudTrail, so you can audit ...
[64]
Encryption in transit - Amazon Redshift - AWS Documentation
Redshift Spectrum supports the Amazon S3 server-side encryption (SSE) using your account's default key managed by the AWS Key Management Service (KMS).
[65]
https://docs.aws.amazon.com/redshift/latest/mgmt/iam-redshift-user-mgmt.html
[66]
Logging with CloudTrail - Amazon Redshift - AWS Documentation
CloudTrail records actions in Amazon Redshift by capturing all API calls. It can deliver events to S3, and logs can be viewed in the console. Trails can be ...
[67]
Database audit logging - Amazon Redshift - AWS Documentation
When you turn on logging on your cluster, Amazon Redshift exports logs to Amazon CloudWatch, or creates and uploads logs to Amazon S3, that capture data from ...
[68]
AWS Services in Scope by Compliance Program
We include generally available services in the scope of our compliance efforts based on the expected use case, feedback and demand.SOC · FedRAMP · PCI DSS · DoD CC SRG
[69]
Cloud Compliance - Amazon Web Services (AWS)
AWS supports 143 security standards and compliance certifications, including PCI-DSS, HIPAA/HITECH, FedRAMP, GDPR, FIPS 140-3, and NIST 800-171.Compliance Programs · SOC · AWS Services in Scope · Compliance Resources
[70]
Build a multi-Region analytics solution with Amazon Redshift ...
Jun 19, 2025 · Data residency requirement – All customer financial data must remain in the Bahrain Region (me-south-1) to comply with local financial ...Solution Overview · Implement Cross-Region Data... · Configure Quicksight In The...
[71]
Materialized views in Amazon Redshift - AWS Documentation
Materialized views are especially useful for speeding up queries that are predictable and repeated. Instead of performing resource-intensive queries against ...Refreshing a materialized view · Automated materialized views
[72]
CREATE MATERIALIZED VIEW - Amazon Redshift
Creates a materialized view based on one or more Amazon Redshift tables. You can also base materialized views on external tables created using Spectrum or ...
[73]
Refreshing a materialized view - Amazon Redshift
To update the data in the materialized view, you can use the REFRESH MATERIALIZED VIEW statement at any time to manually refresh materialized views.
[74]
Concurrency scaling - Amazon Redshift - AWS Documentation
Concurrency scaling allows Amazon Redshift to automatically add additional cluster capacity when workload demands increase, and remove that capacity when ...Regions for concurrency scaling · Monitoring concurrency scaling
[75]
Amazon Redshift Concurrency Scaling
With the Concurrency Scaling feature, you can easily support thousands of concurrent users and concurrent queries, with consistently fast query performance. As ...
[76]
HyperLogLog functions - Amazon Redshift - AWS Documentation
Following, you can find descriptions for the HyperLogLog functions for SQL that Amazon Redshift supports.
[77]
HyperLogLog sketches - Amazon Redshift - AWS Documentation
This topic describes how to use HyperLogLog sketches in Amazon Redshift. HyperLogLog is an algorithm for the count-distinct problem, approximating the number of ...
[78]
Machine learning - Amazon Redshift - AWS Documentation
Amazon Redshift ML is a cloud service that uses models to generate results. It can use your data or Foundation Models for NLP, and works with SageMaker AI.
[79]
Amazon Redshift ML
Amazon Redshift ML makes it easy for data analysts and database developers to create, train, and apply machine learning models using familiar SQL commands.
[80]
Getting started with using federated queries to MySQL
To create a federated query to MySQL databases, you follow this general approach: Set up connectivity from your Amazon Redshift cluster to your Amazon RDS or ...
[81]
Interacting with Amazon Q generative SQL - Amazon Redshift
Generate SQL queries using natural language with Amazon Q. How to. Configure generative SQL settings as an administrator · Enable and use generative AI features ...
[82]
Amazon Q generative SQL in Amazon Redshift Query Editor now ...
Nov 18, 2024 · Amazon Q generative SQL provides a conversational interface where users can submit SQL queries in natural language, within the scope of their ...
[83]
Write queries faster with Amazon Q generative SQL for Amazon ...
Nov 7, 2024 · In this post, we show you how to enable the Amazon Q generative SQL feature in the Redshift query editor and use the feature to get tailored SQL ...Write Queries Faster With... · Use Amazon Q To Generate Sql... · Custom Context
[84]
Vacuuming tables - Amazon Redshift - AWS Documentation
Amazon Redshift automatically runs a VACUUM DELETE operation in the background based on the number of deleted rows in database tables.
[85]
Workload management - Amazon Redshift - AWS Documentation
Use workload management (WLM) in Amazon Redshift to define multiple query queues and to route queries to the appropriate queues at runtime.
[86]
Implementing automatic WLM - Amazon Redshift
With automatic workload management (WLM), Amazon Redshift manages query concurrency and memory allocation. You can create up to eight queues with the service ...
[87]
Choose the best distribution style - Amazon Redshift
Amazon Redshift distributes rows across slices using AUTO, EVEN, KEY, or ALL styles. AUTO assigns optimal style based on data size. EVEN distributes rows round- ...
[88]
Choose the best sort key - Amazon Redshift - AWS Documentation
Amazon Redshift stores your data on disk in sorted order according to the sort key ... Compound sort key · Interleaved sort key · Table constraints · Loading data.
[89]
CREATE TABLE - Amazon Redshift - AWS Documentation
A maximum of eight columns can be specified for an interleaved sort key. An interleaved sort gives equal weight to each column, or subset of columns, in the ...
[90]
Short query acceleration - Amazon Redshift - AWS Documentation
Short query acceleration (SQA) prioritizes short queries, running them in a dedicated space, and uses machine learning to predict execution time.
[91]
STL_QUERY_METRICS - Amazon Redshift - AWS Documentation
Contains metrics information, such as the number of rows processed, CPU usage, input/output, and disk use, for queries that have completed running.Missing: throughput | Show results with:throughput
[92]
Performance data in Amazon Redshift
Using CloudWatch metrics for Amazon Redshift, you can get information about your cluster's health and performance and see information at the node level.Missing: STL | Show results with:STL
[93]
WLM system tables and views - Amazon Redshift
View the status of Amazon Redshift queries, queues, and service classes by using WLM-specific system tables.<|separator|>
[94]
Resizing a cluster - Amazon Redshift - AWS Documentation
To resize a cluster · On the navigation menu, choose Clusters. · Choose the cluster to resize. · For Actions, choose Resize. The Resize cluster page appears.
[95]
What is Amazon Redshift Serverless? - AWS Documentation
Amazon Redshift Serverless automatically provisions data warehouse capacity and intelligently scales the underlying resources.
[96]
Scale your Amazon Redshift clusters up and down in minutes to get ...
Nov 21, 2018 · We're excited to introduce elastic resize, a new feature that enables you to add or remove nodes in an Amazon Redshift cluster in minutes.Scale Your Amazon Redshift... · Introducing Elastic Resize · How Elastic Resize Works
[97]
Amazon Redshift - Cloud Data Warehouse Serverless
Amazon Redshift Serverless makes it easier to run and scale analytics in seconds without having to set up, manage, or scale data warehouse infrastructure.Missing: 2020 | Show results with:2020
[98]
Pausing and resuming a cluster - Amazon Redshift
You can pause and resume a cluster on the Amazon Redshift console, with the AWS CLI, or with Amazon Redshift API operations.
[99]
Loading data from Amazon S3 - Amazon Redshift
### Summary of Amazon Redshift Integration with S3 for Data Ingestion
[100]
Redshift connections - AWS Glue
When connecting to Amazon Redshift databases, AWS Glue moves data through Amazon S3 to achieve maximum throughput, using the Amazon Redshift SQL COPY and UNLOAD ...
[101]
Using an Amazon Redshift database as a target for AWS Database ...
AWS DMS uses an Amazon S3 bucket to transfer data to the Amazon Redshift database. For AWS DMS to create the bucket, the console uses an IAM role, dms-access- ...Limitations on Amazon... · Endpoint settings · Multithreaded task settings for...
[102]
Amazon Athena Redshift connector
The Amazon Athena Redshift connector enables Amazon Athena to access your Amazon Redshift and Amazon Redshift Serverless databases, including Redshift ...
[103]
Amazon Redshift ML integration with Amazon Bedrock
This section describes how to use Amazon Redshift ML integration with Amazon Bedrock. With this feature, you can invoke an Amazon Bedrock model using SQL.
[104]
Zero-ETL integrations - Amazon Redshift - AWS Documentation
Use zero-ETL integrations to enable timely analytics and machine learning in Amazon Redshift using data from Aurora and Amazon RDS.
[105]
Data sharing in Amazon Redshift
With Amazon Redshift data sharing, you can securely share access to live data across Amazon Redshift clusters, workgroups, AWS accounts, and AWS Regions.
[106]
Downloading and installing the Amazon Redshift ODBC driver
Amazon Redshift ODBC drivers enable connecting SQL client tools to Redshift clusters. Configure ODBC driver versions, installation guides, URL retrieval, ...
[107]
Download the Amazon Redshift JDBC driver, version 2.1
Amazon Redshift offers drivers for tools that are compatible with the JDBC 4.2 API. The class name for this driver is com.amazon.redshift.Driver.Missing: Looker | Show results with:Looker
[108]
Amazon Redshift - Tableau Help
This article describes how to connect Tableau to an Amazon Redshift database and set up the data source.
[109]
Connecting Power BI to Amazon Redshift via ODBC Driver - Devart
This tutorial explores how to connect to Amazon Redshift and import data into Power BI Desktop using an ODBC driver.
[110]
Build data pipelines with dbt in Amazon Redshift using Amazon ...
Aug 13, 2025 · In this post, we explore a streamlined, configuration-driven approach to orchestrate dbt Core jobs using Amazon Managed Workflows for Apache ...
[111]
Amazon Redshift - Matillion
Accelerate your ability to run complex queries and scale analytics in the cloud with effortless data integration and transformation in your Amazon Redshift ...
[112]
Examples of using the Amazon Redshift Python connector
The following example guides you through connecting to an Amazon Redshift cluster using your AWS credentials, then querying a table and retrieving the query ...
[113]
aws/amazon-redshift-python-driver - GitHub
Supported Amazon Redshift features include: IAM authentication; Identity provider (IdP) authentication; Redshift specific data types. This pure Python connector ...Aws/amazon-Redshift-Python-D... · Redshift_connector · Getting Started<|control11|><|separator|>
[114]
Access your data in Amazon Redshift and PostgreSQL with Python ...
May 27, 2016 · In this small tutorial we will see how we can extract data that is stored in Amazon Redshift to load it with Python or R, and then use the ...Amazon Redshift · Connect To Redshift With... · Load Data To PandasMissing: DBeaver | Show results with:DBeaver
[115]
Redshift | DBeaver Documentation
This section provides guidance on configuring and using AWS Redshift with DBeaver. AWS Redshift is a fully managed data warehouse service known for its high ...
[116]
Amazon Redshift and PostgreSQL JDBC and ODBC
Because Amazon Redshift is based on PostgreSQL, we previously recommended using JDBC4 Postgresql driver version 8.4.703 and psqlODBC version 9.x drivers.
[117]
Connecting to a Redshift cluster from pgAdmin - Stack Overflow
Mar 8, 2013 · I want to connect to the cluster through pgAdmin III, but after I connect to the cluster, I get an error that states: Error: Must be superuser to view default_ ...How to get Amazon Redshift working on PGADMIN 3/ PostgreSQL ...Transfer data from redshift to postgresql - Stack OverflowMore results from stackoverflow.com
[118]
Amazon Redshift data ingestion options | AWS Big Data Blog
Sep 5, 2024 · Amazon Redshift, a warehousing service, offers a variety of options for ingesting data from diverse sources into its high-performance, scalable environment.Amazon Redshift Data... · Amazon Redshift Ingestion... · Federated Queries Use Case...
[119]
To use the pause and resume actions on the Amazon Redshift ...
Mar 11, 2020 · On the Amazon Redshift console, choose Clusters. · Choose your desired cluster. · Choose Actions. · Choose Pause. · To determine when to pause the ...Missing: provisioned | Show results with:provisioned
[120]
Amazon Redshift launches pause and resume - AWS
Mar 11, 2020 · While the cluster is paused, you are only charged for the cluster's storage. This adds significant flexibility in managing operating costs for ...
[121]
Performance metrics in the CloudWatch console - Amazon Redshift
The following table summarizes the types of Amazon Redshift metric dimensions that are available to you.
[122]
[PDF] Amazon Redshift: Cost Optimization - awsstatic.com
Users can right-size to a more cost-effective cluster and Amazon Redshift will automatically add additional cluster capacity when you need it to process an ...
[123]
Amazon Redshift Cost Optimization best practices | AWS re:Post
Optimize Redshift costs by sizing clusters, using reserved instances, data sharing, controlling Spectrum costs, and using Redshift Serverless.
[124]
Workload management - Amazon Redshift - AWS Documentation
Amazon Redshift workload management (WLM) enables flexible management priorities within workloads so that short, fast-running queries don't get stuck in queues.
[125]
WLM query monitoring rules - Amazon Redshift - AWS Documentation
Query monitoring rules define metrics-based performance boundaries for WLM queues and specify what action to take when a query goes beyond those boundaries.
[126]
System monitoring (provisioned only) - Amazon Redshift
System monitoring tables and views contain a subset of data found in several of the Amazon Redshift system tables.
[127]
Simplify data integration using zero-ETL from Amazon RDS to ...
Aug 8, 2025 · Zero-ETL is a set of fully managed integrations by AWS that minimizes the need to build ETL data pipelines. It makes data available in Amazon ...
[128]
[PDF] Amazon Redshift - Management Guide
... data warehouse has the same functional capabilities as a Single-AZ data warehouse, except for the following limitations that apply to a Multi-AZ data warehouse:.
[129]
Manage your data warehouse cost allocations with Amazon Redshift ...
Mar 27, 2023 · Cost Explorer helps you manage your AWS costs by giving you detailed insights into the line items in your bill. In Cost Explorer, you can ...Create Tags · View And Edit Tags · Create Cost Reports
[130]
daiso-case-study - Amazon AWS
“The key factor was the significant increase in the retention period of store inventory data from 1 day to 2 years. ... A detailed architecture diagram ...Key Outcomes · 76,000 Items · Outcome | Facilitating...
[131]
Machine learning overview - Amazon Redshift - AWS Documentation
By using Amazon Redshift ML, you can train machine learning models using SQL statements and invoke them in SQL queries for prediction.
[132]
Getting started with Amazon Redshift ML - AWS Documentation
Amazon Redshift ML makes it easy for SQL users to create, train, and deploy machine learning models using familiar SQL commands.<|separator|>
[133]
User-defined functions in Amazon Redshift
You can create a custom scalar user-defined function (UDF) using either a SQL SELECT clause or a Python program. The new function is stored in the database ...
[134]
Build ML features at scale with Amazon SageMaker Feature Store ...
Aug 17, 2023 · In this post, we show you three options to prepare Redshift source data at scale in SageMaker, including loading data from Amazon Redshift, performing feature ...
[135]
Near-real-time fraud detection using Amazon Redshift Streaming ...
Jan 4, 2023 · This post demonstrates how Amazon Redshift allows you to build near-real-time ML predictions by using Amazon Redshift streaming ingestion and Redshift ML ...
[136]
How Jobcase is using Amazon Redshift ML to recommend job ...
Jul 15, 2021 · In this post, Jobcase data scientists discuss how Amazon Redshift ML helped us generate billions of job search recommendations in record time and with improved ...How Amazon Redshift Ml... · Our Recommendation System · Simplifying Ml From Within...
[137]
Industry solutions with Amazon Redshift
Handle complex queries and data warehouses workloads cost effectively to detect fraud and analyze risk, optimize portfolios, value assets, and improve customer ...Industry Solutions With... · Financial Services · Healthcare And Life Sciences