AWS Services

AWS DataSync

Understand AWS DataSync for online file and object transfer, including agents, locations, tasks, S3, EFS, FSx, NFS, SMB, HDFS, object storage, security, scaling, and SAA-C03 traps.

foundation7 min readUpdated 2026-06-03CloudCertificationNetworkingReliabilityOperations
AWS DataSyncDataSync AgentLocationTaskNFSSMBObject StorageAmazon S3Amazon EFSAmazon FSx

After this, you will understand

DataSync gives learners a clean mental boundary between online storage migration, managed file transfer endpoints, database migration, and physical device transfer.

Plain version

AWS DataSync copies file or object data online between storage systems and AWS storage services using managed transfer tasks.

Decision pressure

Learners pick DataSync for databases, partner SFTP endpoints, or disconnected sites where an online network transfer is not realistic.

Exam-ready model

Use DataSync when datasets must move online between NFS, SMB, HDFS, object storage, S3, EFS, or FSx with automation, validation, and high throughput.

Think before readingWhat is the shortest exam distinction between DataSync and Transfer Family?
DataSync is for managed bulk or recurring storage movement; Transfer Family is for users and partners connecting with file transfer protocols such as SFTP.

Reading in progress

This page is saved in your local study history so you can continue later.

Study path

Read these in order

Start with the mechanics, then move into the patterns that explain why the system is shaped this way.

  1. 1AWS Transfer Familyaws-services
  2. 2AWS Snow Familyaws-services

Concepts Covered

  • AWS DataSync
  • Online file and object transfer
  • Agents, locations, and tasks
  • NFS, SMB, HDFS, and object storage sources
  • Amazon S3, Amazon EFS, and Amazon FSx targets
  • Incremental and scheduled transfer
  • Data integrity validation
  • VPC endpoints and IAM roles
  • DataSync versus DMS, Transfer Family, Storage Gateway, and Snow Family

1. Plain-English Mental Model

AWS DataSync is managed online storage copying.

The simple model is:

source storage location -> DataSync task -> destination storage location

The source might be an on-premises NFS share, SMB share, HDFS cluster, object store, another cloud storage service, S3 bucket, EFS file system, or FSx file system. The destination might be S3, EFS, FSx, or another supported storage location.

DataSync is not an end-user file portal. It is not primarily for partners logging in with SFTP. It is not a database migration tool. It is a service for moving file and object datasets over the network with automation, performance, encryption, integrity checks, and operational visibility.

If DMS is "move database data," DataSync is "move storage data."

2. Why This Service Exists

Copying large file systems sounds simple until it becomes production work.

A team may need to move a 40 TB NFS share into S3. Another team may need to sync research files to FSx for Lustre before analytics jobs. A media company may need recurring transfers into AWS for processing. A data center may need to archive cold files to S3 Glacier storage classes.

The naive answer is often rsync, custom scripts, or manual copy jobs. Those can work at small scale, but they create problems: fragile retries, weak monitoring, slow single-threaded transfer, inconsistent metadata handling, no managed scheduling, poor error reporting, and manual validation.

DataSync exists to make storage movement a managed service instead of a pile of scripts.

For SAA-C03, DataSync appears when the question asks for online transfer of file or object data to, from, or between AWS storage services.

3. The Naive Approach And Where It Breaks

The naive pattern is:

cron job -> copy command -> hope the dataset arrived

This breaks when the dataset is large, when files change during transfer, when retries are needed, when permissions and metadata matter, or when the business asks for repeatable scheduled transfers.

Another mistake is treating all migration as database migration. If the source is an NFS share, SMB share, HDFS cluster, object storage bucket, EFS file system, or FSx file system, DMS is not the right mental model.

Another mistake is using Snow Family when the site has enough network bandwidth and needs recurring transfer. Physical devices help when network transfer is impossible, too slow, or operationally impractical. DataSync is the online transfer service.

4. Core Primitives

A DataSync agent is deployed where DataSync needs access to storage that AWS cannot directly reach, such as on-premises NFS, SMB, HDFS, or object storage. The agent connects the local storage environment to the DataSync service.

A location represents a source or destination. Examples include an NFS location, SMB location, S3 location, EFS location, or FSx location.

A task defines the transfer between source and destination. It includes options such as schedule, filters, verification, bandwidth limits, and metadata behavior.

Task execution is a run of the task.

IAM roles allow DataSync to access AWS storage locations such as S3.

CloudWatch metrics, logs, and task status support operations and troubleshooting.

5. Architecture Use Cases

Use DataSync to migrate on-premises NFS or SMB file shares to Amazon S3:

on-premises NFS or SMB -> DataSync agent -> S3 bucket

Use DataSync to move datasets into Amazon EFS when Linux applications in AWS need a managed shared file system.

Use DataSync to move Windows file data into Amazon FSx for Windows File Server.

Use DataSync for recurring ingestion into AWS. For example, a lab uploads daily instrument output to S3 where analytics jobs process it.

Use DataSync for disaster recovery or standby file system seeding when data must be copied regularly to EFS or FSx.

Use DataSync to archive cold on-premises data to S3 storage classes when the requirement is to free local capacity and keep durable cloud storage.

7. Security Model

DataSync security includes the agent, storage credentials, IAM roles, encryption, network paths, and storage policies.

For AWS storage, DataSync uses IAM roles and resource policies where applicable. The role should allow only the required bucket, file system, or storage path access.

Data is encrypted in transit. AWS storage services provide encryption at rest options such as S3 server-side encryption and KMS keys.

VPC endpoints can keep traffic to supported AWS services on private network paths where required.

On-premises agents need controlled network access. Treat the agent like production infrastructure: monitor it, patch it through supported mechanisms, and restrict who can configure tasks.

If copying sensitive data into S3, pair DataSync with bucket policies, Block Public Access, KMS, lifecycle policy, access logging, and least-privilege IAM.

8. Reliability And Resilience

DataSync improves transfer reliability through managed retries, task status, verification options, and operational metrics.

It does not make the source storage reliable. If the source file system is inconsistent, offline, or overloaded, the transfer can still fail or lag.

For recurring syncs, monitor task success, bytes transferred, files skipped, errors, and duration. A scheduled task that silently falls behind can create false confidence.

DataSync can validate transferred data, which matters when migration success must mean "arrived intact," not just "copied without obvious error."

For cutover, plan freeze windows or incremental final syncs if applications continue writing to the source during migration.

9. Performance And Scaling

DataSync is designed for high-speed transfer. AWS documentation describes purpose-built transfer behavior, parallelism, and automation that improve over manual copying for many workloads.

Performance depends on network bandwidth, latency, source storage performance, destination service limits, file count, average file size, metadata operations, agent resources, and task configuration.

Lots of tiny files can behave differently from fewer large files because metadata operations dominate. Bandwidth alone does not determine throughput.

Use bandwidth limits when transfer must not starve production traffic. Use scheduling when business hours and maintenance windows matter.

Direct Connect or VPN can provide private or more predictable connectivity, but DataSync is the transfer service. Connectivity and transfer orchestration are separate layers.

10. Cost Model

DataSync cost includes data transferred, AWS storage destination costs, requests, data transfer charges where applicable, CloudWatch logs, and supporting network connectivity.

Archiving to low-cost S3 storage classes can save money, but lifecycle, retrieval fees, minimum storage duration, and restore time must match requirements.

Recurring syncs cost more over time than a one-time migration. However, they can replace manual operations and reduce the risk of failed scripts or stale data.

If the network is too slow for the dataset size, DataSync runtime and operational delay may be worse than physical transfer. That is where Snow Family historically appears in exam-style questions.

12. SAA-C03 Exam Signals

"Move on-premises NFS or SMB data to S3" points to DataSync.

"Transfer file data to EFS or FSx" points to DataSync.

"Online, scheduled, incremental data transfer" points to DataSync.

"High-speed file or object transfer with validation" points to DataSync.

"Migrate relational databases with CDC" points to DMS, not DataSync.

"Partners need SFTP access to upload files into S3" points to Transfer Family, not DataSync.

"Network transfer would take too long or the site is disconnected" can point to Snow Family or current physical transfer options, not DataSync.

13. Common Exam Traps

Do not choose DataSync for database replication.

Do not choose Transfer Family for a managed bulk migration task unless the requirement is protocol-based end-user transfer.

Do not assume DataSync removes the need for storage permissions. S3 bucket policy, IAM role, KMS key policy, and file permissions still matter.

Do not ignore file metadata and application consistency. A copied file system is useful only if the application can safely use it.

Do not choose Storage Gateway when the question asks to migrate data and stop using the on-premises storage. Storage Gateway is often for hybrid access with a local protocol bridge.

Review Amazon S3, Amazon EFS, and Amazon FSx before choosing DataSync targets.

Next, study AWS Transfer Family to separate managed storage movement from managed file transfer endpoints.

Official AWS references:

What to study next

These links keep the session moving: read prerequisites first, then open the systems, concepts, and patterns that deepen this page.