Amazon S3 (Simple Storage Service)¶
- S3 allows to store objects (files) in buckets (directories)
- Bucket names are gloablly unique (across all AWS accounts)
- Buckets are defined at region level
- Max bucket size is 5TB (5000 GB)
- If uploading more than 5GB use "multi-part upload"
- S3 bucket cannot be mounted
Bucket Limit
Bucket have a soft limit of 100 & hard limit of 1000 per account, but no object limits.
Amazon S3: Security¶
User Based
- Iam Policies - API calls allowed or denied for speciric users
Resource Based
-
Bucket Policies - Bucket wide rules from the S3 console - allows cross account (most common)
-
Object Access Control Lists (ACL) - finer grained (can be disabled)
-
Bucket Access Control Lists (ACL) - less common (can be disabled)
Access Control Lists are legacy and should be rarely used.
S3 Bucket Policies¶
A bucket policy is a type resource policy. Simple use case for bucket policy: Grant anonymous access to the bucket or grant access to other AWS accounts.
Note
Identity policy control what an identity can access & can only be attached to the current AWS account. Resource Policy control who can access the resource and can reference other aws accounts for ALLOW/DENY. Resource policies can be used to open a bucket to the public by referencing anonymous principals in the policy.
- JSON based policies
- Effect: Allow/Deny
- Actions: Set of API to allow or deny
- Principal: The account or user to apply the policy to
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowAccessToAccount1",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::123456789012:root"
}, // Only resouce policies have the `Principal` in the policy statement.
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/*"
}
]
}
Amazon S3: Replication¶
There are 2 types of (asynchronous) replication with S3: CRR (Cross region replication) & SRR (Same Region Replication)
Use cases:
- SRR: Log aggregation, Prod & Test account sync, Resilience with strict sovereignity
-
CRR: Global Resilience improvements, Latency Reduction
-
To enable replication, versioning must be enabled in source & destination buckets.
- Buckets that are to be replicated can be in different aws accounts.
- Buckets must have appropriate IAM permissions to read & write to the destination bucket.
- Storage class is maintained from source bucket while replicating by default
- Ownership: default is the source account (In case of replication across accounts)
- RTC Replication Time Control can be enabled which enables a 15 minute SLA.
- Once replication is enabled, only new objets are replicated. (To replicate existing objects, S3 batch replication can be used.)
- By default replication is one-way. Bidirectional is an additional setting.
- Source bucket owner needs permissions to the objects
- System events such as lifecyle changes & objects using glacier & glacier deep archive are not replicated.
- Deletes (Delete markers) are not replicated by default. This is an additional setting.
Replication Chaining
There is no chaining of replication. If bucket 1 has replication to bucket 2 and bucket 2 has replication to bucket 3, then objects created in bucket 1 are not replicated to bucket 3.
There is a option to enable Delete marker replication. This will also replicate the delete markers across buckets. Only delete markers are replicated and not actual deletes.
Amazon S3: Durability & Availability¶
-
High durability (99.999999999% or 11 9's) . On average for 10 million objects stored, you can expect to loose 1 object once every 10,000 years. Durability is same for all storage classes.
-
Availability S3 Standard: 99.99% availability = 53 minutes downtime per year
Amazon S3: Storage Classes¶
S3 Standard - General Purpose¶
- 99.99 % availability (Stored across 3 AZs)
- Used for frequently accessed data (
milliseconds: first byte latency
) - Low latency & high throughput
- Can sustain 2 concurrent facility failures
S3 Infrequent Access¶
- Used for data that is less frequently accessed, but requires rapid access when needed
- Lower cost than S3 standard, but chargeable for data retrieval
Amazon S3 Standard-Infrequent Access (S3 Standard IA)
- 99.99% availability (Stored across 3 AZs)
- milliseconds: first byte latency
- New cost component: Retrieval fee, minimum duration charge (30 days), minimum size charge (128KB)
- Use cases: Disaster recovery, backups
- Should not be used for lots of small files, temporary data, constantly accessed data
Amazon S3 One Zone-Infrequent Access (S3 One Zone IA)
- Single AZ; data lost if AZ is destroyed
- Cost component: Retrieval fee, minimum duration charge (30 days), minimum size charge (128KB)
- 99.5 % availability
- Use Cases: Storing secondary backup copies of on-premise data, or data that can be recreated
- Should be used for long-lived data which is non-critical & replaceable & where access is infrequent
Amzon S3 Glacier¶
- Low cost object storage meant for archiving/backup
- Pricing: Pay for Storage + Object retrieval cost
There are 3 Classes of Storage within Glacier
Glacier Instant Retrieval
- Millisecond retrieval, great for data accessed once a quarter
- Minimum storage duration is 90 days
Glacier Flexible Retrieval
- Can't be made public
- Retrieval Options: Expedited (1-5 Mins), Standard (3-5 hours), Bulk (5 - 12 ours). Faster=more expensive
- First byte latency of minutes/hours
- Restore jobs restore data temporarily to Standard IA
- Minimum storage duration is 90 days & minimum billable size of 40KB
- Use case: Arcival data or yearly access to data
Glacier Deep Archive
- Can't be made public
- Retrieval Options: Standard (12 hours), Bulk (48 hours)
- Restore jobs restore data temporarily to Standard IA
- Lowest Cost
- Minimum Storage Duration of 180 days & 40KB minimum billable size
- Use case: Secondary long term backup or storage for data when mandated by legal/regulatory requirements
S3 Intelligent-Tiering¶
- Small monthly monitoring & auto-tiering fee
- Moves objects automatically between access tiers based on usage
- There are no charges for retrieval
Tiers:
- Frequent Access (automatic) : Default
- Infrequent Access (automatic): objects not accessed for 30 days
- Archive Instant Access (automatic) : Objects not accessed for 90 days
- Archive Access Tier (optional): Configurable from 90 days to 700+ days
- Deep Archive Access Tier (optional): Configurable from 180 days to 700+ days
lifecycle configuration
If a object is uploaded to standard
tier, there is a Minimum storage duration of 30 days before which it cannot be tranistioned into another tier. Additionally, a single rule cannot transition to Standard-IA or One Zone-IA & then to glacier classes within 30 days. Min-> 60 days.
Amazon S3: Encryption¶
Warning
Buckets are not encrypted. Objects in buckets are encrypted.
Method | Key Management | Encryption Processing | Extras |
---|---|---|---|
Client-Side Encryption | YOU | YOU | S3 never sees plaintext |
SSE-C | YOU | S3 | |
SSE-S3 | S3 (AES-256) | S3 | No KEY Rotation Control & Role Separation |
SSE-KMS | S3 & KMS | S3 | KEY Control & Role Separation |
SSE-C¶
- Server side encryption using keys fully managed by the customer outside AWS
- Amazon S3 does not store the encryption key you provide
- HTTPS must be used
- Encryptiuon key must be provided in http header for every request
SSE / SSE-S3¶
- Encryption using keys handled, managed, & owned by AWS
- Object is encrypted server side
- Encryption type is AES-256
- Must set header "x-amz"server-side-encryption":"AES256"
- Enabled by default for new buckets & new objects
- Limitation: Does not have support role seperation
SSE-KMS¶
- Encryption using keys handled & managed by AWS KMS
- Advantages: Control = Audit key usage using cloudtrail
- Object is encrypted serer side
- Must set header "x-amz-server-side-encryption":"aws:kms"
Limitation:
- While using SSE-KMS you may be impacted by KMS limits
- When you upload, it calls the GenerateDataKey KMS API
- When you download it call the Decrypt API
- Each of the API calls counts towards the KMS quota per second (5500,10000, 30000 req/s based on region. ). This can be increased using the service quotas console.
Client-Side Encryption¶
- Use client libraries such as Amazon S3 Client-Side Encryption Library
- Clients must encrypt/decrypt data themselves before sending/retrieving to S3
- Customer manages the keys & encryption cycle
Encryption in transit (SSL/TLS)¶
Encryption in flight is also called SSL/TLS
- Amazon S3 exposes 2 endpoints:
- HTTP endpoint
-
HTTPS endpoint
-
HTTPS is recommended & is mandatory for SSE-C
- Most clients would use https by default
Note
Fore encryption in transit with a bucket policy.
S3: Bucket Keys
- For each object written to S3 bucket, KMS is called to generate a DEK (Data Encryption Key) for that object.
- KMS calls have limitations & incur cost, so to avoid repeated API calls to KMS, Bucket Keys can be used. (Also, not supported for DSSE-KMS encryption)
- When enabled, a time limited bucket key is generated which offloads work from KMS to S3 thus minimizing API calls
- CloudTrail events will KMS events now show bucket ARN instead of object ARN when enabled
- Fewer KMS logs
- Works with replication, the object encryption is maintained
- If replicating plaintext objectg using bucket keys, the object is encrypted at the destination (can Result in ETAG changes)
S3: CORS¶
- Cross-Origin Resource Sharing (CORS)
- Origin = scheme(protocol) + host (domain) + port. Eg: https://maheshrjl.com [Protocol - https, port - 443, host - maheshrjl.com]
- CORS allows web browser to make requests to other origins while visiting the main origin.
- CORS must be allowed explicitly with CORS Header - [Access-Control-Allow-Origin].
MFA Delete¶
- MFA delete forces users to generate code on a MFA decvice before doing important operations.
- To use MFA delete, versionining must be enabled on the bucket.
- Only the bucket owner(root account) can enable/disable MFA delete through AWS CLI.
- Once Enabled: MFA will be required for Enabling versions & suspending versionining on the bucket.
Access Logs¶
This option is called Server Access Loging in bucket properties.
- Request made to S3, from any account, authorized or denied will be logged into another S3 bucket.
- The target logging bucket must be in the same AWS region.
Warning
Never set your logging bucket to be the monitored bucket. This creates a logging loop & your bucket will grow exponentially.
Pre-Signed URL¶
- Pre Singed URLs can be created with S3 console or AWS CLI
- Users given a Pre-signed URL inherit the permissions of the user that generated the URL for GET/PUT (Download/Upload)
- You can create a pre-signed URL for an object you don't have access to (But, the object will not be accessible through the URL)
- When using a pre-signed URL, the permissions match the identity which generated it (Access denied could mean the generated ID never had access or does not have access now.)
- Don't generate pre-signed URL with a role. Because the URL stops working when temprorary credentials expire.
- URL Expiration:
- S3 console: 1 minute - 720 minutes
- AWS CLI: Configure expiration with --expires-in parameter in seconds. Default 3600 secs, max 168 hours
Tip
S3 select and Glacier Select allow you to use a SQL-Like statement to retrieve partial objects from S3 and Glacier.
Vault & Object Locks¶
Glacier Vault Lock¶
- Adopt a WORM (Write Once Read Many) model for S3 glacier vault
- To enable: create a Vault lock policy & then lock the policy for future edits. (Can no longer be changed or deleted)
- Helpful for compliance & data retention
S3 Object Versionning¶
- Once enabled versionining cannot be disabled, only suspended.
- When versioning is disabled, ID of the object is set to
null
- When a object with versioning is deleted, S3 will add a delete marker to the object.
S3 Object Lock¶
Versioning must be enabled before enabling object lock.
- Object Lock enabled on
new
buckets (Contact AWS Support for existing buckets) - Has a WORM (Write Once Read Many) - No delete/overwrite
- Requires versioning - individual versions are locked
- Block an object version deletion for a specified amount of time.
- Retention Period: Protects the object for fixed period (days or year), it can be extended.
- Legal Hold: Protects the object indefintely, independedent from retention period. Legal Hold can be freely placed & removed using the
s3:PutObjectLegalHold
IAM permission. - An object version can have both Retention Period & Legal Hold, only one or none
- Bucket can have default Object Lock settings or can be customized for each object
S3 Object Lock - Compliance
- Object versions can't be overwritten or deleted by any user, including root user.
- Object retention modes can't be changed, and retion periods can't be shortened
S3 Object Lock - Governance
- Most users can't overwrite, delete or alter an object version or it's lock
- Some users have special permissions to change the retention or delete the object.
- Can bypass using the
s3:BypassGovernanceRetention
&x-amz-bypass-governance-retention:true
(console default) in the API header
S3 Object Lock - Legal Hold
- Set on an object version - ON or OFF
- No Deletes, Changes until removed.
s3:PutObjectLegalHold
is required to add or remove
S3: Performance Optimization¶
Multipart Upload
- Minimum data size of 100 MB for multipart upload
- Upload split upto a max of 10,000 parts ranging from 5MB-5GB (Last part is left over & can be smaller than 5GB)
- Each individual upload part can fail in isoloation & restart in isolation
- Much better transfer rates
Accelerated Transfer
- Global transfer to S3 might not always take the most direct path
- Use the AWS Edge Locations to speed up transfer
- Bucket name cannot contain periods & it must be DNS compatible to use Transfer Acceleration
Access Points¶
Access Points simplify security management for large S3 buckets. We can create many access points with different policies & each with different network access controls.
- Each access points has it's own DNS name or endpoint address (Internet Origin or VPC Origin)
- Each access point has a policy (similar to bucket policy) to manage security at scale.
- Access point can be created via the console or
aws s3control create-access-point --name <> --account-id <> --bucket <>
S3 Access Point - VPC Origin
- We can define the access point to be accessible only from within the VPC
- You must create a VPC endpoint to access the Access Point (Gateway/Internet Endpoint)
- The VPC Endpoint Policy must allow access to the target bucket & Access Point
S3 Object Lambda¶
- Use AWS Lambda functions to change the object before it is retrieved by the caller application.
- Only 1 S3 bucket is needed, on top of which we create S3 Access Point & S3 Object Lambda Access Points.
Use Cases:
- Redact PII for analytics or non-production environments.
- Converting across data formats, such as converting XML to JSON
- Resizing & watermarking images on the fly using caller-specific details