These notes were written while preparing for my AWS exam and are collected from different sources and AWS documentation. Primarily, they’re notes for me, but you might find them useful too.
Since the AWS is changing quickly, it’s possible that some of these notes may be out of date, so please take that into consideration if you are reading them.
Please let me know in the comments if you have any updates which you’d like me to add.
S3 (Simple Storage Service)
- Objects reside in containers called Bucket.
- Bucket names should be globally unique. Each bucket is created in a specific region and data does not leave that region unless explicitly copied to another region.
- No sub-bucket in the buckets
- Server access logs can be enabled for buckets to logs requestors, objects, actions, and responses.
- Bucket/object naming convention:
- Limit of 100 buckets per account by default
- An object can be 0 bytes to 5TB and you can store an unlimited amount of objects in buckets.
- Bucket owner can deny access to any objects, or delete any
objects in the bucket, regardless of who owns them.
- Bucket names are limited to 63 characters.
- S3 durability is 99.999999999
- S3 availability is 99.99
- Data will get replicated on multiple devices on multiple facilities (Availability zones ) within a region
- You can access S3 using API, SDK, AWS CLI or AWS console.
- Amazon offers two consistencies model:
- Read-after-write consistency: For PUT of a new object in a bucket. It means your data is available to you right after the write. Caveat: If you make a HEAD or GET request for the object before creating the object, for example, to find out if the name exists or not then AWS provides eventual consistency.
- Eventual Consistency: For PUTs to an existing object (overwrite) or DELETE of an object. It means that once you overwrite or delete an object you may get the stale data. Note that if you request the data you will either get the stale data or new data in eventual consistency but not an inconsistent mix of data.
- At the time of this post, AWS does not support object locking. It means that if you issue two PUTs simultaneously the one with the newer timestamps will win.
S3 Storage Class:
- S3 Standard (default): for general purpose storage
- INTELLIGENT TIERING: designed to optimize the costs by automatically moving data to the most cost-effective storage class based on the pattern of usage.
- S3 Standard-IA & OneZone-IA: for long-lived, but less frequently accessed data. Standard-IA stores data redundantly across multiple AZs similar to Standard S3. OneZone-IA stores data in only one AZ at a lower price than standard-IA. They both have the same durability but oneZone-IA has less Availability.
- Glacier: for long-term archive with same durability and resiliency of Standard S3.
- RRS (Reduced Redundancy Storage): designed for non-critical, reproducible data with less durability. (99.99)
- GLACIER objects are not available right after the read. You need to first restore the GLACIER data before you can access them (STANDARD S3, RRS, STANDARD_IA, and ONEZONE_IA objects are available for anytime access)
|Storage Class||Useful for||Durability||Availability||AZ||Min Day||Min Size||Considerations|
|Standard IA||Infrequently Accessed||99.999999999||99.99||>=3||30 Days||128 KB||Per GB retrieval fee|
|Intelligent Tier||Uknown Access Pattern||99.999999999||99.99||>=3||30 Days||None||Per-object automation fee|
|OneZone IA||Infrequently Accessed None-critical Data||99.999999999||99.5||1||30 Days||128 KB||Per GB retrieval fee|
|Glacier||Archiving||99.999999999||99.99||>=3||90 Days||None||Per GB retrieval fee, after restore|
|RRS||Frequently Accessed Non-critical||99.99||99.99||>=3||None||None||None|
Setting the Storage Class for objects
- When creating a new object, you can specify its storage class. (With API use x-amz-storage-class HTTP header)
- Change the storage class of an existing object by using copy API (copy object in the same bucket with the same key)
- In a versioning enabled bucket, you cannot change the storage class of a specific version of an object
- Adding a lifecycle config to a bucket.
- When setting up a cross-region replication (CRR) config, you can set the storage class for replicated objects.
- Bucket policies provide access control to buckets and objects based on, S3 operations, requests, resources, and aspect of request like IP.
- Permissions attached to bucket apply to all of the objects in that bucket.
- Unlike access control lists which can grant permissions only on individual objects, bucket policies can either grant or deny permissions across all or subset of objects within a bucket.
- Regex can be used on Amazon resource names (ARNs) or other values, to control access to groups of objects within a bucket.
- Only bucket owner is allowed to associate a policy with a bucket.
- Note that the S3 bucket policy includes a “Principal” (specifies the user, account, service, or other entity) element, which lists the principals that bucket policy controls access for
- Bucket policies are limited to 20 KB in size.
Access Control List (ACL)
- AWS recommends using S3 bucket policies or IAM policies for access control. S3 ACLs is a legacy access control mechanism that predates IAM.
- S3 ACL is a sub-resource that’s attached to every S3 bucket and object. It defines which AWS accounts or groups are granted access and the type of access.
- Allows you to control individual objects in the buckets.
- You can grant basic read/write permissions to other AWS accounts, but not to users in your account.
- You cannot grant conditional permissions, nor can you deny permissions.
- Enabling and suspending versioning is done at the bucket level.
- The versioning state applies to all of the objects in that bucket
- By default, versioning is disabled. Regardless of whether you have enabled versioning, each object in your bucket has a version ID which will be null in case of disabled versioning.
- When you delete an object, all the versions remain in the bucket and AWS insert a delete marker on the deleted object.
- A delete makes the current version deleted, hence a GET for the object return a 404 Not Found. You can request a noncurrent version by specifying the version ID in your GET request.
- If you specify a version in your DELETE request, then AWS will remove the object permanently instead of inserting a delete marker (Only the owner of the bucket can permanently delete a version)
- Once versioning is enabled on a bucket, it can never return to an unversioned state, however, suspend versioning on the bucket is possible.
- Current objects in newly enabled versioning buckets have version ID of null. AWS will assign a version ID only for the future object and NOT for the current objects.
- Suspension of versioning on an existing bucket does not change the current objects in the bucket. What changes is how AWS handles objects in future requests.
Life Cycle Configuration
- Transition Actions: Define when an object transition to another storage tier.
- Expiration Actions: Define when objects expire. (AWS deletes expired objects)
- AWS supports transitioning from Standard tier to Standard_IA, Intelligent_tier, OneZone_IA, Glacier.
- AWS supports transitioning from STANDARD_IA to Intelligent_tier, Glacier, OneZone_IA.
- AWS supports transitioning from Inteligent_tier to OneZone_IA and Glacier.
- AWS supports transitioning from OneZone_IA to Glacier.
- You Can’t transition from any storage class to STANDARD and RRS.
- You Can’t transition from Intelligent_IA class to STANDARD_IA and RRS.
- You Can’t transition from Glacier to any other storage class.
- From the STANDARD or STANDARD_IA AWS does not transition objects that are smaller than 128 KB to the INTELLIGENT_TIER storage class because it’s not cost effective
- From STANDARD tier, AWS does not transition objects that are smaller than 128 KB to the Standard_IA or OneZone_IA storage class because it’s not cost effective
S3 Data Protection
- You can protect data in-transit (as it travels to or from AWS) or at rest (when it is on disk in AWS data centers)
- You can protect in-transit data using SSL or by using client-side encryption.
- For protecting data at rest you have these two options
- Server-side encryption: Request AWS to encrypt your data just before storing them to their disks and decrypt it when you are requesting the data.
- Client-side encryption: You encrypt your data and then upload the encrypted data to S3. You need to manage the encryption process, keys, and related tools.
- You have multiple options for server-side encryption:
- SSE-S3 (Server-side encryption with AWS S3 managed keys): Each object is encrypted with a unique key. For additional security, AWS encrypts the key itself with a master key that regularly rotates.
- SSE-KMS (Server-side encryption with AWS KMS-managed keys): Similar to SSE-S3, but more expensive because of the following advantages.
- There are separate permissions for the use of an envelope key (the key that protects your data’s encryption key).
- SSE-KMS provides you with an audit trail of when your key was used and by whom.
- You have the option to create and manage your encryption keys or use a default key which is unique to you.
- SSE-C ( Server-side encryption with Customer-provided keys): You manage the encryption keys and AWS manages the encryption.
Static Website hosting on S3
- You must create a bucket with the website hostname and upload your content to that bucket. Make it public, enable static website hosting on that bucket and indicate the index and error page. You can optionally add redirection rules for the website as well.
- Cross region replication is a feature that allows you to replicate all new objects in a source bucket to another bucket in another region.
- Any metadata and ACL associated with the object are also part of the replication.
- To enable cross-region replication, versioning must be turned on for both source and destination buckets.
- You must use an IAM policy to give S3 permission to replicate objects on your behalf.
- Cross region replication is mostly used to reduce the latency by having the objects closer to a set of users. It is also used to meet compliance requirements when a company needs to keep the backup certain destance from the original data.