Costly cloud?

In business, never, ever, ever ignore costs. Cloud is just one aspect of business and so, like rents, salaries, taxes and so on, it can’t be ignored either.

Cloud costs are tidal in nature and that’s perhaps why it gets unnoticed for a long time.

S3 storage is a classic. Let’s say S3 costs £2,500 this month.

No probs lol

Next month £2,750.

All good bro

Month after, £2,979.

Not an issue

And yet, by the end of the year the organisation will be forking out over £5k in S3 costs with no decrease in sight. Scale this up to your business and you will suddenly find what started off as a cheap option has become decidedly more expensive. Any cost benefit analysis, if it was even ever done, is out the window.

Not an issue

What do I do about it?

There’s a few things you can do to get started with the S3 rationalisation process.

Understand the 7 tiers of S3 storage and their costs

The first step is to understand S3. It’s worth taking plenty of time to understand what each storage class does, what it costs to use it and how data transfers into/out of it effect costs. There are 7 tiers in the S3 storage system and they each have fairly clear cut yes/no use cases.

S3 Standard
S3 Standard-Infrequent Access
S3 One Zone-Infrequent Access
S3 Intelligent-Tiering
S3 Glacier Instant Retrieval
S3 Glacier Flexible Retrieval
S3 Glacier Deep Archive

Get an idea of the cost of storage for each tier for the amount of data stored as well as the cost of moving it in/out in each case.

Review your buckets

Most companies use about 5%-10% of their data actively. The rest is unused and can be moved to long term storage. To get that data moved out to longer term storage, start by tagging the buckets so you have some idea of who owns what. This helps get a clearer idea of what’s important and relevant in the organisation.

How to tag your buckets

Pretty straightforward.

Create list of buckets. Get these by typing

aws s3api list-buckets --query "Buckets[].Name"

Contact department heads with the bucket list to ask them what buckets they own. Tag the buckets with the owner’s name, department, cost code and email address. Naturally, customise to suit. Where there is no owner per se, tag it with a group that is responsible ultimately, e.g. DevOps or Support etc.

Use the AWS Tag Manager as required to find assets in your account and tag them.

Now your buckets are tagged.

Get a usage report

Right, next up is to get an idea of how fresh (and stale) the data is with a Storage Class Analysis for your bucket. You can set up SCA in the S3 console. Here’s the documentation.

Go to the S3 Buckets console
Click the bucket of interest. Click the Metrics tab.
In the Storage Class Analysis section, click Create analytics configuration
Give the configuration a title/name and set the rule scope, so, either the whole bucket or a subfolder.
Now, chose to export as a CSV and then pick a destination bucket.
Click Create configuration and leave that to cook for 24/48 hours tops. It’s usually back long before that.

Apply a usage report using the AWS S3 API

Sometimes it’s easier to do something by command line. Change the bucket name (cloudguyinbroadstone) twice in the configuration sample below. This will create a Storage Class Analysis report called bucket-state-report. The full reference for the API command is here

aws s3api put-bucket-analytics-configuration --bucket cloudguyinbroadstone --id bucket-state-reportv--analytics-configuration '{
        "Id": "bucket-state-report",
        "StorageClassAnalysis": {
            "DataExport": {
                "OutputSchemaVersion": "V_1",
                "Destination": {
                    "S3BucketDestination": {
                        "Format": "CSV",
                        "Bucket": "arn:aws:s3:::cloudguyinbroadstone"
                    }
                }
            }
        }
    }'

The S3 Intelligent Tiering move

Once you’ve had a look through the report above, you can now begin to see the age of the data and how fresh or stale it really is, folder by folder and get a practical understanding of what data is really in use and what can be archived.

To really save money it’s time to move everything to S3 Intelligent Tiering using a Lifecycle Configuration. Normally, in production, I would of course, code all this to iterate across all buckets in the account and move them to S3 Intelligent Tiering according to the tags that were set up in the previous steps. You can use Terraform or Ansible, depending on your preferences. I prefer Ansible personally, but that’s just a matter of taste.

aws s3api put-bucket-lifecycle-configuration --bucket cloudguyinbroadstone --lifecycle-configuration '
{
    "Rules": [
        {
            "ID": "move-to-s3-intelligent-tiering",
            "Filter": {},
            "Status": "Enabled",
            "Transitions": [
                {
                    "Days": 7,
                    "StorageClass": "INTELLIGENT_TIERING"
                }
            ]
        }
    ]
}'

Now you can read those values back using the following command..

aws s3api get-bucket-lifecycle-configuration --bucket cloudguyinbroadstone

This will begin to transition the bucket to Intelligent Tiering after 7 days.

Some considerations

Though generally speaking Intelligent Tiering is a superbly cost saving and effective way to save money on AWS, there are some things I would like to draw your attention to.

Charges : If you move to Intelligent Tiering, there is a very small charge to estimate the files and their age. That charge is in the $25 for about 750TB and can be estimated of course with a spreadsheet and some time. I did think this charge was going to be huge, but it is trivial by comparison with the cost savings.
Cost Complexity: While Intelligent-Tiering aims to optimize costs, the billing structure can be complex due to the multiple tiers and transition costs associated with moving objects between them. You might need to carefully monitor your usage to ensure you’re actually saving money.
Minimum Storage Duration: Objects in the infrequent access tier must be stored for a minimum of 30 days. If you delete or move an object to another storage class before this period ends, you might incur early deletion charges.
Small Objects: If you have a large number of very small objects (< 128KB), the cost savings might be minimal. The overhead of managing transitions and monitoring access patterns could outweigh the benefits.
Monitoring and Management: While Intelligent-Tiering handles tier transitions automatically, you need to monitor and manage your usage patterns to ensure that the tiering strategy is actually saving you money. You might need to fine-tune your monitoring tools and practices.
Latency: Objects that are moved to the infrequent access tier might experience slightly higher latency when accessed. This is because there’s a retrieval process involved in moving an object back to the frequent access tier when requested.
Data Retrieval Costs: Retrieving objects from the infrequent access tier can be more expensive than from the frequent access tier due to retrieval fees. If you frequently need to access objects in the infrequent access tier, the cost savings might decrease.
No Cold Storage Savings: Intelligent-Tiering doesn’t offer the same level of cost savings as the Glacier or Glacier Deep Archive storage classes, which are designed for long-term archival storage.
Limited Use Cases: Intelligent-Tiering is most effective when you have objects with varying access patterns. If your access patterns are consistently in the frequent access range, you might not see significant benefits from tiering.

Move to Glacier

Now you’ve saved a bunch of cash (after 90 days, when your unused files finally move to Instant Retrieval), you can begin to investigate using Lifecycle policies to move your old data into Glacier Storage. I won’t go into it here, but think of it as a tape backup archive library where you can dump all your un-accessed files.

Configure a Lifecycle policy to move stuff to long term storage updating your current configuration.

An example below shows how I would update the Bucket Lifecycle policy to move the files to Glacier after 2 years, or 720 days.

aws s3api put-bucket-lifecycle-configuration --bucket cloudguyinbroadstone --lifecycle-configuration '
{
    "Rules": [
        {
            "ID": "move-to-s3-intelligent-tiering",
            "Filter": {},
            "Status": "Enabled",
            "Transitions": [
                {
                    "Days": 7,
                    "StorageClass": "INTELLIGENT_TIERING"
                },
                {
                    "Days": 730,
                    "StorageClass": "DEEP_ARCHIVE"
                }
            ]
        }
    ]
}'

Summary

Saving money is good. Saving lots of money is really good and really possible by moving everything to S3 Intelligent Tiering and let AWS do the leg work.

See how much you are spending now, make a note of it and download the data from the AWS S3 Storage Lens dashboard. Get buy in from the business and then automate all of it using your favourite tools.

Finally, after 90 days, see how your costs have dropped. The last time I did this, I reckon we were saving around $10k a month from where we would have been if we had done nothing.

What do I do about it?#

Understand the 7 tiers of S3 storage and their costs#

Review your buckets#

How to tag your buckets#

Get a usage report#

Apply a usage report using the AWS S3 API#

The S3 Intelligent Tiering move#

Some considerations#

Move to Glacier#

Summary#