AWS Services

Amazon Athena

Understand Athena as serverless SQL over S3 data, including workgroups, Glue Data Catalog, partitioning, file formats, cost controls, and SAA-C03 signals.

foundation5 min readUpdated 2026-06-03CloudCertificationDataCostOperations

Serverless QuerySQLData LakeWorkgroupGlue Data CatalogPartitioningColumnar FormatQuery Result Location

After this, you will understand

Athena makes data lake querying feel concrete: keep data in S3, describe it in a catalog, and query it with SQL when needed.

Plain version

Amazon Athena is a serverless query service that runs SQL against data stored in S3 and other supported data sources.

Decision pressure

Learners build or keep a data warehouse running for occasional ad hoc S3 queries, or forget that file format and partitioning control cost.

Exam-ready model

Use Athena for serverless, pay-per-query SQL over S3 data, especially logs, exports, data lake files, and ad hoc analysis.

Think before readingWhy can Athena be cheaper than Redshift for occasional queries?

Athena does not require a running warehouse; it charges by data scanned, so occasional well-partitioned S3 queries can be cost-effective.

Reading in progress

This page is saved in your local study history so you can continue later.

Next: AWS Glue

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

Concepts Covered

Serverless SQL querying
S3 data lakes
Glue Data Catalog
Workgroups
Query result locations
Partitioning
Parquet and ORC columnar formats
Compression
Athena versus Redshift, Glue, and QuickSight
SAA-C03 query-cost traps

1. Plain-English Mental Model

Amazon Athena is serverless SQL for data stored in S3 and other supported sources.

The simple model is:

files in S3 + table metadata in catalog -> Athena SQL query -> result in S3

You do not provision database servers. You point Athena at data, usually described through the AWS Glue Data Catalog, then run SQL queries. Athena reads the underlying files and writes query results to an S3 location.

Athena is excellent for ad hoc analysis, log queries, data lake exploration, and situations where keeping a warehouse running would be unnecessary.

It is not the same as Redshift. Redshift is a data warehouse. Athena is a serverless query engine over data.

2. Why This Service Exists

S3 often becomes the place where data lands first.

CloudTrail logs, ALB logs, exports, event files, data lake tables, CSVs, Parquet files, and archived datasets all live in S3. Teams need to query that data without loading everything into a warehouse first.

Athena exists to make S3 queryable with SQL.

For SAA-C03, Athena appears in questions about serverless querying, SQL over S3 logs, ad hoc analysis, no infrastructure management, Glue Data Catalog, partitioned data, columnar formats, and paying based on data scanned.

The service boundary: Glue catalogs and transforms data. Athena queries data. Redshift warehouses data. QuickSight visualizes data.

3. The Naive Approach And Where It Breaks

The naive pattern is loading every dataset into a database:

S3 logs -> ETL -> database -> query once

This can be wasteful if the data is queried occasionally. A serverless query engine is simpler.

Another naive pattern is running Athena against raw unpartitioned CSV files. It works, but it may scan much more data than necessary and cost more.

Another mistake is expecting Athena to be a low-latency transactional database. Athena is for analytical queries over datasets, not per-request application reads.

Athena works best when S3 data is organized, partitioned, compressed, and stored in efficient formats.

4. Core Primitives

A database and table definition describe data structure in a catalog.

The AWS Glue Data Catalog is commonly used as Athena's metastore.

A workgroup controls query settings, result locations, encryption, limits, and cost governance.

Query results are stored in S3.

Partitions let queries skip irrelevant data paths, such as date or Region folders.

Columnar formats such as Parquet and ORC reduce scanned data when queries read only selected columns.

Compression reduces storage and scan volume.

Federated queries can connect to additional data sources through connectors, but S3 data lake querying is the core SAA-C03 mental model.

5. Architecture Use Cases

Use Athena to query CloudTrail, VPC Flow Logs, ALB logs, and application logs stored in S3.

Use Athena for data lake exploration:

S3 data lake -> Glue Data Catalog -> Athena SQL -> analyst query

Use Athena to validate data before loading curated datasets into Redshift.

Use Athena with QuickSight for dashboards over S3-based datasets when performance and concurrency fit.

Use Glue crawlers or jobs to create catalog tables and transform raw data into optimized formats.

Use workgroups to separate teams, enforce result locations, and control query spending.

7. Security Model

Athena security depends on IAM, S3 permissions, Glue Data Catalog permissions, workgroup settings, and KMS keys.

Users need permission to run Athena queries, read the underlying S3 data, access catalog metadata, and write query results to S3.

Query result buckets can contain sensitive data. Protect them with encryption, lifecycle rules, and access controls.

Lake Formation can provide fine-grained permissions for data lake access in more advanced architectures.

Do not grant broad S3 read access to every analyst if only specific datasets are approved.

CloudTrail can audit Athena API usage.

8. Reliability And Resilience

Athena is serverless, so there is no cluster to patch or fail over.

Reliability depends on S3 availability, catalog availability, correct permissions, and query design.

If the underlying S3 paths change, partitions are stale, or schemas drift, queries can fail or return incomplete results.

Keep data pipelines idempotent. Catalog updates and partition registration should be part of the ingestion process.

Use lifecycle policies carefully on query result buckets so old temporary results do not grow forever.

9. Performance And Scaling

Athena performance is shaped by data layout.

Partitioning reduces scanned files. Columnar formats reduce scanned columns. Compression reduces bytes. Avoid many tiny files because file listing and overhead can hurt query performance.

Athena is good for ad hoc analysis but not always for high-concurrency dashboard workloads or subsecond interactive applications.

If queries become frequent, predictable, and performance-sensitive, Redshift or a specialized analytics store may fit better.

Query only the columns and partitions needed.

10. Cost Model

Athena charges primarily by data scanned for queries, plus S3 storage, requests, Glue, and result storage costs.

A query over raw uncompressed CSV can cost much more than the same query over partitioned Parquet.

Workgroups can set query limits and enforce settings.

Lifecycle rules should clean query result locations.

Athena can be very cost-effective for occasional queries, but careless scans over huge datasets get expensive quickly.

12. SAA-C03 Exam Signals

"Serverless SQL query over data in S3" points to Athena.

"Ad hoc query CloudTrail or ALB logs in S3" points to Athena.

"No infrastructure to manage for querying S3 data" points to Athena.

"Glue Data Catalog table metadata" often appears with Athena.

"Reduce Athena cost" points to partitioning, compression, columnar formats, and limiting scanned data.

"Data warehouse for repeated BI analytics" points to Redshift.

"ETL job or crawler" points to Glue.

13. Common Exam Traps

Do not choose Redshift when the requirement is simple serverless querying over S3.

Do not choose Athena for low-latency application transactions.

Do not ignore scanned data cost.

Do not query raw unpartitioned logs when partitioning is available.

Do not forget query results are written to S3 and need permissions.

Do not confuse Glue crawlers with Athena queries.

Review Amazon S3, AWS Glue, Amazon Redshift, Amazon QuickSight, and S3 Lifecycle And Storage Classes.

Official AWS references:

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.

Prerequisites

Read these first if the mechanics feel unfamiliar.

Amazon S3Start here if Amazon S3 is still fuzzy.AWS GlueStart here if AWS Glue is still fuzzy.

Read these in order

What to study next

Prerequisites

More Links