AWS Solutions Architect: S3 and Glacier

5 | Written on Sun 19 April 2020. Posted in Nuggets | Richard Walker

S3 Service Architecture

Amazon Simple Storage Service (S3) is for you, application and AWS services keep their data, providing an inexpensive and reliable storage solution.

  • Maintain backup archives, log files and images
  • RAnalytics on big data at rest
  • Hosting static websites

S3 is for object storage unlike the volumes for EC2 instances that are block storage.

S3 Service Architecture

S3 files are organized into buckets. By default an AWS account has a limit of 100 buckets. S3 buckets and its content exists with a single AWS region, names of buckets however must be globally unique.

Prefixes and Delimiters

S3 stores objects within a bucket on a flat surface, however you can use prefixes and delimiters to give buckets the appearance of a structure.

Working with Large Objects

Individual objects may be no larger the 5TB. Individual uploads can be no larger the 5GB, it is recommended to use a feature called Multi-part Upload for any object larger than 100MB.

Encryption

Unless intended, data stored on S3 should always be encrypted. Use encryption keys to protect data while at rest within S3. Data at rest can be protected using either server-side or client-side encryption.

Server-side
  • Server-side encryption with AWS S3-Managed Keys (SSE-S3)
  • Server-side encryption with AWS KMS-Managed Keys (SSE-KMS)
  • Server-side encryption with Customer-Provided Keys (SSE-C)
Client-side

Its possible to encrypt data before it's transferred to using AWS KMS-Managed Customer Master Key (CMK).

Logging

S3 events to log files is disabled by default, they can produce a lot of activity. When enabled they include:

  • Account and IP address of requesters
  • SOurce bucket name
  • Action requested (GET, PUT etc.)
  • Time of request
  • Response status

S3 Durability and Availability

S3 offers different classes for objects. Depending on how critical data is and how quickly access is needed and the cost are all factors.

Durability

S3 measures durability as a percentage. 99.999999999 percent durability guarantee for most S3 and Glacier. The high durability rates are largely because they are replicated across at least three availability zones.

Amazon S3 One Zone-Infrequent Access (S3 One ZOne-IA) and Reduced Redundancy (RRS) are not quite so resilient.

Availability

Object availability is also measured as a percentage. The S3 Standard class guarantees data will be ready when ever you need it for 99.99% of a year. They is almost no chance your data will be lost, even if sometimes not have instant access to it.

Eventually Consistent Data

S3 replicates data across locations. There might be brief delays while updates propagate across the system (typically two seconds or less)

S3 Object Life-cycle

Its often important to maintain previous archive versions and retire or delete then to keep a lid on storage costs.

Versioning

If versioning is enabled at the bucket level, older overwritten copies of objects will be saved and remain accessible indefinitely.

Lifecyle Management

To avoid historical file bloat, you can configure life-cycle rules for a bucket that will automatically transition on objects storage class or delete them after a set number of days.

Accessing S3 Objects

You'll naturally need to access S3 hosted objects and also restrict access.

Access Control

By default only S3 buckets and object are accessible from your account. Access can be opened up using access control lists (ACL) rules, finer grained S3 bucket polices or Identity and Access Management (IAM) policies.

Amazon recommends applying S3 bucket polices or IAM policies instead of ACLs.

Pre-signed URLs

A pre-signed URL provide temporary access to an otherwise private file, specifying a person of time in which the URL become invalid.

Static Website Hosting

S3 buckets can be used to host HTML files for entire static websites.

S3 and Glacier Select

AWS provides a different way to access data stored in either S3 or Glacier. Select lets you apply SQL-like queries to stored objects.

Amazon Glacier

Glacier support archives as large as 40TB. It archives are encrypted by default and are given machine-generated IDs. Getting objects in an existing Glacier archive can take a number of hours to retrieve. Glacier provides an inexpensive long-term storage solution for data that seldom needs accessing.

Storage pricing

Example:

Class Amount Rate/GB/Month Cost/Month
Standard 20 GB $0.023 $0.46
One Zone-IA 65 GB $0.01 $0.65
Glacier 520 GB $0.004 $2.08
Total $3.19

Other Storage-Related Services

  • Amazon Elastic File System (EFS)
  • AWS Storage Gateway
  • AWS Snowball

Summary

S3 provides reliable and highly available object-level storage. Objects are stored in buckets on a flat surface but by using prefixes can be made to appear as if there structured like a normal file system.

Its recommend to encrypt data stored on S3.

There are multiple storage classes within S3 with varying degrees of data replication that enable you to balance durability, availability and cost.

Life cycle management lets you automate the transition of your data between classes and finally delete it.

You can control access using S3 bucket policies and/or IAM policies.

Costs can be reduced by leveraging the SQL-like Select feature.

Static HTML website can be hosted directly on S3.

Amazon Glacier store data archives in vaults that might take hours to retrieve but are cheap.

Disclaimer

Information on this page was obtained from source: AWS Certified Solutions Architect Second Edition ISBN 978-1-119-50421-4

Notes taken are kept brief and for personal reference. I urge and highly recommend anyone using this page as a source of information to purchase the source material for the complete information. The original book is fantastic and includes exercises, practice questions, verbose explanations and extra learning resources.

COMMENTS