Amazon S3 (Simple Storage Service)¶

S3 allows to store objects (files) in buckets (directories)
Bucket names are gloablly unique (across all AWS accounts)
Buckets are defined at region level
Max bucket size is 5TB (5000 GB)
If uploading more than 5GB use "multi-part upload"
S3 bucket cannot be mounted

Bucket Limit

Bucket have a soft limit of 100 & hard limit of 1000 per account, but no object limits.

Amazon S3: Security¶

User Based

Iam Policies - API calls allowed or denied for specific users

Resource Based

Bucket Policies - Bucket wide rules from the S3 console - allows cross account (most common)
Object Access Control Lists (ACL) - finer grained (can be disabled)
Bucket Access Control Lists (ACL) - less common (can be disabled)

Access Control Lists are legacy and should be rarely used.

S3 Bucket Policies¶

A bucket policy is a type resource policy. Simple use case for bucket policy: Grant anonymous access to the bucket or grant access to other AWS accounts.

Note

Identity policy control what an identity can access & can only be attached to the current AWS account. Resource Policy control who can access the resource and can reference other aws accounts for ALLOW/DENY. Resource policies can be used to open a bucket to the public by referencing anonymous principals in the policy.

JSON based policies
Effect: Allow/Deny
Actions: Set of API to allow or deny
Principal: The account or user to apply the policy to

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowAccessToAccount1",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:root"
      }, // Only resouce policies have the `Principal` in the policy statement.
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::my-bucket/*"
    }
  ]
}

Amazon S3: Replication¶

There are 2 types of (asynchronous) replication with S3: CRR (Cross region replication) & SRR (Same Region Replication)

Use cases:

SRR: Log aggregation, Prod & Test account sync, Resilience with strict sovereignity
CRR: Global Resilience improvements, Latency Reduction
To enable replication, versioning must be enabled in source & destination buckets.
Buckets that are to be replicated can be in different aws accounts.
Buckets must have appropriate IAM permissions to read & write to the destination bucket.
Storage class is maintained from source bucket while replicating by default
Ownership: default is the source account (In case of replication across accounts)
RTC Replication Time Control can be enabled which enables a 15 minute SLA.
Once replication is enabled, only new objets are replicated. (To replicate existing objects, S3 batch replication can be used.)
By default replication is one-way. Bidirectional is an additional setting.
Source bucket owner needs permissions to the objects
System events such as lifecyle changes & objects using glacier & glacier deep archive are not replicated.
Deletes (Delete markers) are not replicated by default. This is an additional setting.

Replication Chaining

There is no chaining of replication. If bucket 1 has replication to bucket 2 and bucket 2 has replication to bucket 3, then objects created in bucket 1 are not replicated to bucket 3.

There is a option to enable Delete marker replication. This will also replicate the delete markers across buckets. Only delete markers are replicated and not actual deletes.

Amazon S3: Durability & Availability¶

High durability (99.999999999% or 11 9's) . On average for 10 million objects stored, you can expect to loose 1 object once every 10,000 years. Durability is same for all storage classes.
Availability S3 Standard: 99.99% availability = 53 minutes downtime per year

Amazon S3: Storage Classes¶

S3 Standard - General Purpose¶

99.99 % availability (Stored across 3 AZs)
Used for frequently accessed data (milliseconds: first byte latency)
Low latency & high throughput
Can sustain 2 concurrent facility failures

S3 Infrequent Access¶

Used for data that is less frequently accessed, but requires rapid access when needed
Lower cost than S3 standard, but chargeable for data retrieval

Amazon S3 Standard-Infrequent Access (S3 Standard IA)

99.99% availability (Stored across 3 AZs)
milliseconds: first byte latency
New cost component: Retrieval fee, minimum duration charge (30 days), minimum size charge (128KB)
Use cases: Disaster recovery, backups
Should not be used for lots of small files, temporary data, constantly accessed data

Amazon S3 One Zone-Infrequent Access (S3 One Zone IA)

Single AZ; data lost if AZ is destroyed
Cost component: Retrieval fee, minimum duration charge (30 days), minimum size charge (128KB)
99.5 % availability
Use Cases: Storing secondary backup copies of on-premise data, or data that can be recreated
Should be used for long-lived data which is non-critical & replaceable & where access is infrequent

Amzon S3 Glacier¶

Low cost object storage meant for archiving/backup
Pricing: Pay for Storage + Object retrieval cost

There are 3 Classes of Storage within Glacier

Glacier Instant Retrieval

Millisecond retrieval, great for data accessed once a quarter
Minimum storage duration is 90 days

Glacier Flexible Retrieval

Can't be made public
Retrieval Options: Expedited (1-5 Mins), Standard (3-5 hours), Bulk (5 - 12 ours). Faster=more expensive
First byte latency of minutes/hours
Restore jobs restore data temporarily to Standard IA
Minimum storage duration is 90 days & minimum billable size of 40KB
Use case: Arcival data or yearly access to data

Glacier Deep Archive

Can't be made public
Retrieval Options: Standard (12 hours), Bulk (48 hours)
Restore jobs restore data temporarily to Standard IA
Lowest Cost
Minimum Storage Duration of 180 days & 40KB minimum billable size
Use case: Secondary long term backup or storage for data when mandated by legal/regulatory requirements

S3 Intelligent-Tiering¶

Small monthly monitoring & auto-tiering fee
Moves objects automatically between access tiers based on usage
There are no charges for retrieval

Tiers:

Frequent Access (automatic) : Default
Infrequent Access (automatic): objects not accessed for 30 days
Archive Instant Access (automatic) : Objects not accessed for 90 days
Archive Access Tier (optional): Configurable from 90 days to 700+ days
Deep Archive Access Tier (optional): Configurable from 180 days to 700+ days

lifecycle configuration

If a object is uploaded to standard tier, there is a Minimum storage duration of 30 days before which it cannot be tranistioned into another tier. Additionally, a single rule cannot transition to Standard-IA or One Zone-IA & then to glacier classes within 30 days. Min-> 60 days.

Amazon S3: Encryption¶

Warning

Buckets are not encrypted. Objects in buckets are encrypted.

Method	Key Management	Encryption Processing	Extras
Client-Side Encryption	YOU	YOU	S3 never sees plaintext
SSE-C	YOU	S3
SSE-S3	S3 (AES-256)	S3	No KEY Rotation Control & Role Separation
SSE-KMS	S3 & KMS	S3	KEY Control & Role Separation

SSE-C¶

Server side encryption using keys fully managed by the customer outside AWS
Amazon S3 does not store the encryption key you provide
HTTPS must be used
Encryptiuon key must be provided in http header for every request

SSE / SSE-S3¶

Encryption using keys handled, managed, & owned by AWS
Object is encrypted server side
Encryption type is AES-256
Must set header "x-amz"server-side-encryption":"AES256"
Enabled by default for new buckets & new objects
Limitation: Does not have support role seperation

SSE-KMS¶

Encryption using keys handled & managed by AWS KMS
Advantages: Control = Audit key usage using cloudtrail
Object is encrypted serer side
Must set header "x-amz-server-side-encryption":"aws:kms"

Limitation:

While using SSE-KMS you may be impacted by KMS limits
When you upload, it calls the GenerateDataKey KMS API
When you download it call the Decrypt API
Each of the API calls counts towards the KMS quota per second (5500,10000, 30000 req/s based on region. ). This can be increased using the service quotas console.

Client-Side Encryption¶

Use client libraries such as Amazon S3 Client-Side Encryption Library
Clients must encrypt/decrypt data themselves before sending/retrieving to S3
Customer manages the keys & encryption cycle

Encryption in transit (SSL/TLS)¶

Encryption in flight is also called SSL/TLS

Amazon S3 exposes 2 endpoints:
HTTP endpoint
HTTPS endpoint
HTTPS is recommended & is mandatory for SSE-C
Most clients would use https by default

Note

Fore encryption in transit with a bucket policy.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Deny",
        "Principal": "*",
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::your-bucket-name",
        "Condition": {
            "Bool": {
                "aws:SecureTransport": "false"

S3: Bucket Keys

For each object written to S3 bucket, KMS is called to generate a DEK (Data Encryption Key) for that object.
KMS calls have limitations & incur cost, so to avoid repeated API calls to KMS, Bucket Keys can be used. (Also, not supported for DSSE-KMS encryption)
When enabled, a time limited bucket key is generated which offloads work from KMS to S3 thus minimizing API calls
CloudTrail events will KMS events now show bucket ARN instead of object ARN when enabled
Fewer KMS logs
Works with replication, the object encryption is maintained
If replicating plaintext objectg using bucket keys, the object is encrypted at the destination (can Result in ETAG changes)

S3: CORS¶

Cross-Origin Resource Sharing (CORS)
Origin = scheme(protocol) + host (domain) + port. Eg: https://maheshrijal.com [Protocol - https, port - 443, host - maheshrijal.com]
CORS allows web browser to make requests to other origins while visiting the main origin.
CORS must be allowed explicitly with CORS Header - [Access-Control-Allow-Origin].

MFA Delete¶

MFA delete forces users to generate code on a MFA decvice before doing important operations.
To use MFA delete, versionining must be enabled on the bucket.
Only the bucket owner(root account) can enable/disable MFA delete through AWS CLI.
Once Enabled: MFA will be required for Enabling versions & suspending versionining on the bucket.

Access Logs¶

This option is called Server Access Loging in bucket properties.

Request made to S3, from any account, authorized or denied will be logged into another S3 bucket.
The target logging bucket must be in the same AWS region.

Warning

Never set your logging bucket to be the monitored bucket. This creates a logging loop & your bucket will grow exponentially.

Pre-Signed URL¶

Pre Singed URLs can be created with S3 console or AWS CLI
Users given a Pre-signed URL inherit the permissions of the user that generated the URL for GET/PUT (Download/Upload)
You can create a pre-signed URL for an object you don't have access to (But, the object will not be accessible through the URL)
When using a pre-signed URL, the permissions match the identity which generated it (Access denied could mean the generated ID never had access or does not have access now.)
Don't generate pre-signed URL with a role. Because the URL stops working when temprorary credentials expire.
URL Expiration:
- S3 console: 1 minute - 720 minutes
- AWS CLI: Configure expiration with --expires-in parameter in seconds. Default 3600 secs, max 168 hours

Tip

S3 select and Glacier Select allow you to use a SQL-Like statement to retrieve partial objects from S3 and Glacier.

Vault & Object Locks¶

Glacier Vault Lock¶

Adopt a WORM (Write Once Read Many) model for S3 glacier vault
To enable: create a Vault lock policy & then lock the policy for future edits. (Can no longer be changed or deleted)
Helpful for compliance & data retention

S3 Object Versionning¶

Once enabled versionining cannot be disabled, only suspended.
When versioning is disabled, ID of the object is set to null
When a object with versioning is deleted, S3 will add a delete marker to the object.

S3 Object Lock¶

Versioning must be enabled before enabling object lock.

Object Lock enabled on new buckets (Contact AWS Support for existing buckets)
Has a WORM (Write Once Read Many) - No delete/overwrite
Requires versioning - individual versions are locked
Block an object version deletion for a specified amount of time.
Retention Period: Protects the object for fixed period (days or year), it can be extended.
Legal Hold: Protects the object indefintely, independedent from retention period. Legal Hold can be freely placed & removed using the s3:PutObjectLegalHold IAM permission.
An object version can have both Retention Period & Legal Hold, only one or none
Bucket can have default Object Lock settings or can be customized for each object

S3 Object Lock - Compliance

Object versions can't be overwritten or deleted by any user, including root user.
Object retention modes can't be changed, and retion periods can't be shortened

S3 Object Lock - Governance

Most users can't overwrite, delete or alter an object version or it's lock
Some users have special permissions to change the retention or delete the object.
Can bypass using the s3:BypassGovernanceRetention & x-amz-bypass-governance-retention:true (console default) in the API header

S3 Object Lock - Legal Hold

Set on an object version - ON or OFF
No Deletes, Changes until removed.
s3:PutObjectLegalHold is required to add or remove

S3: Performance Optimization¶

Multipart Upload

Minimum data size of 100 MB for multipart upload
Upload split upto a max of 10,000 parts ranging from 5MB-5GB (Last part is left over & can be smaller than 5GB)
Each individual upload part can fail in isoloation & restart in isolation
Much better transfer rates

Accelerated Transfer

Global transfer to S3 might not always take the most direct path
Use the AWS Edge Locations to speed up transfer
Bucket name cannot contain periods & it must be DNS compatible to use Transfer Acceleration

Access Points¶

Access Points simplify security management for large S3 buckets. We can create many access points with different policies & each with different network access controls.

Each access points has it's own DNS name or endpoint address (Internet Origin or VPC Origin)
Each access point has a policy (similar to bucket policy) to manage security at scale.
Access point can be created via the console or aws s3control create-access-point --name <> --account-id <> --bucket <>

S3 Access Point - VPC Origin

We can define the access point to be accessible only from within the VPC
You must create a VPC endpoint to access the Access Point (Gateway/Internet Endpoint)
The VPC Endpoint Policy must allow access to the target bucket & Access Point

S3 Object Lambda¶

Use AWS Lambda functions to change the object before it is retrieved by the caller application.
Only 1 S3 bucket is needed, on top of which we create S3 Access Point & S3 Object Lambda Access Points.

Use Cases:

Redact PII for analytics or non-production environments.
Converting across data formats, such as converting XML to JSON
Resizing & watermarking images on the fly using caller-specific details