Skip to content

Amazon S3 (Simple Storage Service)

  • S3 allows to store objects (files) in buckets (directories)
  • Bucket names are gloablly unique (across all AWS accounts)
  • Buckets are defined at region level
  • Max bucket size is 5TB (5000 GB)
  • If uploading more than 5GB use "multi-part upload"
  • S3 bucket cannot be mounted

Bucket Limit

Bucket have a soft limit of 100 & hard limit of 1000 per account, but no object limits.

Amazon S3: Security

User Based

  • Iam Policies - API calls allowed or denied for specific users

Resource Based

  • Bucket Policies - Bucket wide rules from the S3 console - allows cross account (most common)

  • Object Access Control Lists (ACL) - finer grained (can be disabled)

  • Bucket Access Control Lists (ACL) - less common (can be disabled)

Access Control Lists are legacy and should be rarely used.

S3 Bucket Policies

A bucket policy is a type resource policy. Simple use case for bucket policy: Grant anonymous access to the bucket or grant access to other AWS accounts.

Note

Identity policy control what an identity can access & can only be attached to the current AWS account. Resource Policy control who can access the resource and can reference other aws accounts for ALLOW/DENY. Resource policies can be used to open a bucket to the public by referencing anonymous principals in the policy.

  • JSON based policies
  • Effect: Allow/Deny
  • Actions: Set of API to allow or deny
  • Principal: The account or user to apply the policy to
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAccessToAccount1",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:root"
      }, // Only resouce policies have the `Principal` in the policy statement.
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*"
    }
  ]
}

Amazon S3: Replication

There are 2 types of (asynchronous) replication with S3: CRR (Cross region replication) & SRR (Same Region Replication)

Use cases:

  • SRR: Log aggregation, Prod & Test account sync, Resilience with strict sovereignity
  • CRR: Global Resilience improvements, Latency Reduction

  • To enable replication, versioning must be enabled in source & destination buckets.

  • Buckets that are to be replicated can be in different aws accounts.
  • Buckets must have appropriate IAM permissions to read & write to the destination bucket.
  • Storage class is maintained from source bucket while replicating by default
  • Ownership: default is the source account (In case of replication across accounts)
  • RTC Replication Time Control can be enabled which enables a 15 minute SLA.
  • Once replication is enabled, only new objets are replicated. (To replicate existing objects, S3 batch replication can be used.)
  • By default replication is one-way. Bidirectional is an additional setting.
  • Source bucket owner needs permissions to the objects
  • System events such as lifecyle changes & objects using glacier & glacier deep archive are not replicated.
  • Deletes (Delete markers) are not replicated by default. This is an additional setting.

Replication Chaining

There is no chaining of replication. If bucket 1 has replication to bucket 2 and bucket 2 has replication to bucket 3, then objects created in bucket 1 are not replicated to bucket 3.

There is a option to enable Delete marker replication. This will also replicate the delete markers across buckets. Only delete markers are replicated and not actual deletes.

Amazon S3: Durability & Availability

  • High durability (99.999999999% or 11 9's) . On average for 10 million objects stored, you can expect to loose 1 object once every 10,000 years. Durability is same for all storage classes.

  • Availability S3 Standard: 99.99% availability = 53 minutes downtime per year

Amazon S3: Storage Classes

S3 Standard - General Purpose

  • 99.99 % availability (Stored across 3 AZs)
  • Used for frequently accessed data (milliseconds: first byte latency)
  • Low latency & high throughput
  • Can sustain 2 concurrent facility failures

S3 Infrequent Access

  • Used for data that is less frequently accessed, but requires rapid access when needed
  • Lower cost than S3 standard, but chargeable for data retrieval

Amazon S3 Standard-Infrequent Access (S3 Standard IA)

  • 99.99% availability (Stored across 3 AZs)
  • milliseconds: first byte latency
  • New cost component: Retrieval fee, minimum duration charge (30 days), minimum size charge (128KB)
  • Use cases: Disaster recovery, backups
  • Should not be used for lots of small files, temporary data, constantly accessed data

Amazon S3 One Zone-Infrequent Access (S3 One Zone IA)

  • Single AZ; data lost if AZ is destroyed
  • Cost component: Retrieval fee, minimum duration charge (30 days), minimum size charge (128KB)
  • 99.5 % availability
  • Use Cases: Storing secondary backup copies of on-premise data, or data that can be recreated
  • Should be used for long-lived data which is non-critical & replaceable & where access is infrequent

Amzon S3 Glacier

  • Low cost object storage meant for archiving/backup
  • Pricing: Pay for Storage + Object retrieval cost

There are 3 Classes of Storage within Glacier

Glacier Instant Retrieval

  • Millisecond retrieval, great for data accessed once a quarter
  • Minimum storage duration is 90 days

Glacier Flexible Retrieval

  • Can't be made public
  • Retrieval Options: Expedited (1-5 Mins), Standard (3-5 hours), Bulk (5 - 12 ours). Faster=more expensive
  • First byte latency of minutes/hours
  • Restore jobs restore data temporarily to Standard IA
  • Minimum storage duration is 90 days & minimum billable size of 40KB
  • Use case: Arcival data or yearly access to data

Glacier Deep Archive

  • Can't be made public
  • Retrieval Options: Standard (12 hours), Bulk (48 hours)
  • Restore jobs restore data temporarily to Standard IA
  • Lowest Cost
  • Minimum Storage Duration of 180 days & 40KB minimum billable size
  • Use case: Secondary long term backup or storage for data when mandated by legal/regulatory requirements

S3 Intelligent-Tiering

  • Small monthly monitoring & auto-tiering fee
  • Moves objects automatically between access tiers based on usage
  • There are no charges for retrieval

Tiers:

  • Frequent Access (automatic) : Default
  • Infrequent Access (automatic): objects not accessed for 30 days
  • Archive Instant Access (automatic) : Objects not accessed for 90 days
  • Archive Access Tier (optional): Configurable from 90 days to 700+ days
  • Deep Archive Access Tier (optional): Configurable from 180 days to 700+ days

lifecycle configuration

If a object is uploaded to standard tier, there is a Minimum storage duration of 30 days before which it cannot be tranistioned into another tier. Additionally, a single rule cannot transition to Standard-IA or One Zone-IA & then to glacier classes within 30 days. Min-> 60 days.

Amazon S3: Encryption

Warning

Buckets are not encrypted. Objects in buckets are encrypted.

Method Key Management Encryption Processing Extras
Client-Side Encryption YOU YOU S3 never sees plaintext
SSE-C YOU S3
SSE-S3 S3 (AES-256) S3 No KEY Rotation Control & Role Separation
SSE-KMS S3 & KMS S3 KEY Control & Role Separation

SSE-C

  • Server side encryption using keys fully managed by the customer outside AWS
  • Amazon S3 does not store the encryption key you provide
  • HTTPS must be used
  • Encryptiuon key must be provided in http header for every request

SSE / SSE-S3

  • Encryption using keys handled, managed, & owned by AWS
  • Object is encrypted server side
  • Encryption type is AES-256
  • Must set header "x-amz"server-side-encryption":"AES256"
  • Enabled by default for new buckets & new objects
  • Limitation: Does not have support role seperation

SSE-KMS

  • Encryption using keys handled & managed by AWS KMS
  • Advantages: Control = Audit key usage using cloudtrail
  • Object is encrypted serer side
  • Must set header "x-amz-server-side-encryption":"aws:kms"

Limitation:

  • While using SSE-KMS you may be impacted by KMS limits
  • When you upload, it calls the GenerateDataKey KMS API
  • When you download it call the Decrypt API
  • Each of the API calls counts towards the KMS quota per second (5500,10000, 30000 req/s based on region. ). This can be increased using the service quotas console.

Client-Side Encryption

  • Use client libraries such as Amazon S3 Client-Side Encryption Library
  • Clients must encrypt/decrypt data themselves before sending/retrieving to S3
  • Customer manages the keys & encryption cycle

Encryption in transit (SSL/TLS)

Encryption in flight is also called SSL/TLS

  • Amazon S3 exposes 2 endpoints:
  • HTTP endpoint
  • HTTPS endpoint

  • HTTPS is recommended & is mandatory for SSE-C

  • Most clients would use https by default

Note

Fore encryption in transit with a bucket policy.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Deny",
        "Principal": "*",
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::your-bucket-name",
        "Condition": {
            "Bool": {
                "aws:SecureTransport": "false"

S3: Bucket Keys

  • For each object written to S3 bucket, KMS is called to generate a DEK (Data Encryption Key) for that object.
  • KMS calls have limitations & incur cost, so to avoid repeated API calls to KMS, Bucket Keys can be used. (Also, not supported for DSSE-KMS encryption)
  • When enabled, a time limited bucket key is generated which offloads work from KMS to S3 thus minimizing API calls
  • CloudTrail events will KMS events now show bucket ARN instead of object ARN when enabled
  • Fewer KMS logs
  • Works with replication, the object encryption is maintained
  • If replicating plaintext objectg using bucket keys, the object is encrypted at the destination (can Result in ETAG changes)

S3: CORS

  • Cross-Origin Resource Sharing (CORS)
  • Origin = scheme(protocol) + host (domain) + port. Eg: https://maheshrijal.com [Protocol - https, port - 443, host - maheshrijal.com]
  • CORS allows web browser to make requests to other origins while visiting the main origin.
  • CORS must be allowed explicitly with CORS Header - [Access-Control-Allow-Origin].

MFA Delete

  • MFA delete forces users to generate code on a MFA decvice before doing important operations.
  • To use MFA delete, versionining must be enabled on the bucket.
  • Only the bucket owner(root account) can enable/disable MFA delete through AWS CLI.
  • Once Enabled: MFA will be required for Enabling versions & suspending versionining on the bucket.

Access Logs

This option is called Server Access Loging in bucket properties.

  • Request made to S3, from any account, authorized or denied will be logged into another S3 bucket.
  • The target logging bucket must be in the same AWS region.

Warning

Never set your logging bucket to be the monitored bucket. This creates a logging loop & your bucket will grow exponentially.

Pre-Signed URL

  • Pre Singed URLs can be created with S3 console or AWS CLI
  • Users given a Pre-signed URL inherit the permissions of the user that generated the URL for GET/PUT (Download/Upload)
  • You can create a pre-signed URL for an object you don't have access to (But, the object will not be accessible through the URL)
  • When using a pre-signed URL, the permissions match the identity which generated it (Access denied could mean the generated ID never had access or does not have access now.)
  • Don't generate pre-signed URL with a role. Because the URL stops working when temprorary credentials expire.
  • URL Expiration:
    • S3 console: 1 minute - 720 minutes
    • AWS CLI: Configure expiration with --expires-in parameter in seconds. Default 3600 secs, max 168 hours

Tip

S3 select and Glacier Select allow you to use a SQL-Like statement to retrieve partial objects from S3 and Glacier.

Vault & Object Locks

Glacier Vault Lock

  • Adopt a WORM (Write Once Read Many) model for S3 glacier vault
  • To enable: create a Vault lock policy & then lock the policy for future edits. (Can no longer be changed or deleted)
  • Helpful for compliance & data retention

S3 Object Versionning

  • Once enabled versionining cannot be disabled, only suspended.
  • When versioning is disabled, ID of the object is set to null
  • When a object with versioning is deleted, S3 will add a delete marker to the object.

S3 Object Lock

Versioning must be enabled before enabling object lock.

  • Object Lock enabled on new buckets (Contact AWS Support for existing buckets)
  • Has a WORM (Write Once Read Many) - No delete/overwrite
  • Requires versioning - individual versions are locked
  • Block an object version deletion for a specified amount of time.
  • Retention Period: Protects the object for fixed period (days or year), it can be extended.
  • Legal Hold: Protects the object indefintely, independedent from retention period. Legal Hold can be freely placed & removed using the s3:PutObjectLegalHold IAM permission.
  • An object version can have both Retention Period & Legal Hold, only one or none
  • Bucket can have default Object Lock settings or can be customized for each object

S3 Object Lock - Compliance

  • Object versions can't be overwritten or deleted by any user, including root user.
  • Object retention modes can't be changed, and retion periods can't be shortened

S3 Object Lock - Governance

  • Most users can't overwrite, delete or alter an object version or it's lock
  • Some users have special permissions to change the retention or delete the object.
  • Can bypass using the s3:BypassGovernanceRetention & x-amz-bypass-governance-retention:true (console default) in the API header

S3 Object Lock - Legal Hold

  • Set on an object version - ON or OFF
  • No Deletes, Changes until removed.
  • s3:PutObjectLegalHold is required to add or remove

S3: Performance Optimization

Multipart Upload

  • Minimum data size of 100 MB for multipart upload
  • Upload split upto a max of 10,000 parts ranging from 5MB-5GB (Last part is left over & can be smaller than 5GB)
  • Each individual upload part can fail in isoloation & restart in isolation
  • Much better transfer rates

Accelerated Transfer

  • Global transfer to S3 might not always take the most direct path
  • Use the AWS Edge Locations to speed up transfer
  • Bucket name cannot contain periods & it must be DNS compatible to use Transfer Acceleration

Access Points

Access Points simplify security management for large S3 buckets. We can create many access points with different policies & each with different network access controls.

  • Each access points has it's own DNS name or endpoint address (Internet Origin or VPC Origin)
  • Each access point has a policy (similar to bucket policy) to manage security at scale.
  • Access point can be created via the console or aws s3control create-access-point --name <> --account-id <> --bucket <>

S3 Access Point - VPC Origin

  • We can define the access point to be accessible only from within the VPC
  • You must create a VPC endpoint to access the Access Point (Gateway/Internet Endpoint)
  • The VPC Endpoint Policy must allow access to the target bucket & Access Point

S3 Object Lambda

  • Use AWS Lambda functions to change the object before it is retrieved by the caller application.
  • Only 1 S3 bucket is needed, on top of which we create S3 Access Point & S3 Object Lambda Access Points.

Use Cases:

  • Redact PII for analytics or non-production environments.
  • Converting across data formats, such as converting XML to JSON
  • Resizing & watermarking images on the fly using caller-specific details