An incredibly powerful way of offering a customizable user experience is to allow users to upload media content like images, for profile pictures, or other assets that will be embedded in your product. When you're building a knowledge base for teams, you might want to allow your customers to embed various kinds of documents. When you're building a social network, you might want to allow users to upload profile pictures and share images throughout your application.
For all of these features, you need a core piece of business logic, you need to manage assets uploaded by users. Serving all of this traffic would be a lot to handle, regardless of whether it's up- or download. We also need to store the files somewhere, and it would be great to have them distributed around multiple locations, so all users can access them quickly.
So a much better solution to go with is if you could control the exact specifications of how media files should be stored on a third-party infrastructure, and control when and how this data is accessed. With pre-signed URLs and POST policies, S3 offers just that. While other providers, such as Azure, have similar systems put in place, we'll focus on AWS in this post.
🥁 Before and After
In the classic model, we'd receive uploads directly to our own infrastructure, whether it's to our API service or some other endpoint that manages incoming upload requests and forwards them to our storage provider. While this method gives you full control over how you'd like to receive, process, and store assets, you're also responsible for providing a stable service, making assets accessible using CDN layers, and you get to handle all the traffic associated with storing and retrieving content.
Using the approach based on pre-signed URLs which I'll explain in this post moves the heavy lifting to AWS. S3 has been used by thousands of companies for more than a decade now, it's fast, it's stable, and it's cheap. S3 can be integrated to Cloudfront to make your assets available from anywhere on the globe, once it is cached by an edge node. Access speed is dramatically increased, while you merely have to manage the assets in your system.
The best part, you don't lose control with this approach either: Using upload policies, you can make sure the file you expect to receive is uploaded, after which you can delegate the transfer to S3. No upload or download traffic will ever pass your infrastructure.
🔏 Pre-signed URLs, POST Policies
Pre-signed URLs allow to access and store objects in S3, but they're limited to a set of resources that were granted at the time of creating the URL. This allows you to create policies instructing AWS to process only requests you allow, for example, accessing only the asset you want a user to access, or allowing them to upload the specific file you're expecting, and nothing else.
A pre-signed URL can be used multiple times and is bound to the permissions the entity creating it was assigned, so if the IAM user/role creating the pre-signed URL is not allowed to access a resource, the URL will also fail to resolve that specific resource.
Pre-signed URLs also expire after a specified amount of time, although the number of attempts you have to use the URL is not capped.
As all of the objects stored in your buckets should be private by default (we do not want to leak assets, do we?), using pre-signed URLs is a great way to grant temporary access.
It is important to note that there are different variations of pre-signed URLs: For retrieving objects, you may specify a bucket, object key, and expiry, to create a temporary URL allowing anyone with access to the URL to fetch this specific object.
For uploading assets, however, you may either create a signed URL for PUT requests, which are limited to using the object key, or using POST requests (usually for browser-based uploads). As some parameters used for uploads, such as SSECustomerKey, ACL, Expires, ContentLength, or Tagging, are not supported with regular pre-signed URLs, we'll be using POST operations to upload our assets from now on, this is more commonplace in the web anyway.
👔 Browser-Based POST requests
With POST requests, users can directly upload files to AWS S3 without having to pass your infrastructure. You provide the entity uploading the asset with the necessary credentials, which are similar to pre-signed URLs in that they consist of a signed policy specifying the resources which may be uploaded.
Note that you are of course not limited to uploading files from a browser environment, with POST requests you can use nearly any system that supports multipart form POST requests.
Uploading files via POST is not so much different than PUT requests, as a matter of fact, under the hood you perform the same action, the only difference is the payload: In your POST request, you supply your file in a multipart form body, and the HTTP headers you know from PUT operations are supplied as form fields instead.
An example POST request could look like the following
POST / HTTP/1.1
Host: destinationBucket.s3.amazonaws.com
User-Agent: browser_data
Accept: file_types
Accept-Language: Regions
Accept-Encoding: encoding
Accept-Charset: character_set
Keep-Alive: 300
Connection: keep-alive
Content-Type: multipart/form-data; boundary=9431149156168
Content-Length: length
--9431149156168
Content-Disposition: form-data; name="key"
acl
--9431149156168
Content-Disposition: form-data; name="tagging"
<Tagging><TagSet><Tag><Key>Tag Name</Key><Value>Tag Value</Value></Tag></TagSet></Tagging>
--9431149156168
Content-Disposition: form-data; name="success_action_redirect"
success_redirect
--9431149156168
Content-Disposition: form-data; name="Content-Type"
content_type
--9431149156168
Content-Disposition: form-data; name="x-amz-meta-uuid"
uuid
--9431149156168
Content-Disposition: form-data; name="x-amz-meta-tag"
metadata
--9431149156168
Content-Disposition: form-data; name="AWSAccessKeyId"
access-key-id
--9431149156168
Content-Disposition: form-data; name="Policy"
encoded_policy
--9431149156168
Content-Disposition: form-data; name="Signature"
signature=
--9431149156168
Content-Disposition: form-data; name="file"; filename="MyFilename.jpg"
Content-Type: image/jpeg
file_content
--9431149156168
Content-Disposition: form-data; name="submit"
Upload to Amazon S3
--9431149156168--
Here you see that you start with all fields required for the upload, such as the object key, the content type, optional tags, and of course your POST policy, its signature, and some other metadata. It is important to notice that the file is supplied as the last form field, as everything after the file is ignored. Make sure you always specify all fields before the actual file, otherwise you might be in for a surprise.
Known form fields include AWSAccessKeyId, acl, Cache-Control, Content-Type, Content-Disposition, Content-Encoding, Expires, file, key, policy, success_action_redirect, redirect, success_action_status, tagging, x-amz-storage-class, x-amz-meta-*, x-amz-security-token, x-amz-website-redirect-location.
You can also configure server-side encryption, but we will skip this for now.
👮 Creating a POST Policy
We just learned that a policy is used to enforce guidelines for uploading, for example, which key should be used, how large an asset should be, or the content type it uses. You may even include an MD5 hashsum for verifying the integrity of successful uploads, which can help to restrict uploads to the file you're expecting.
Enough of the theory, let's check out how we can create a policy for uploading an image! We'll use the AWS SDK for JavaScript, but the procedure should be very similar in other languages, as long as you're able to use the SDK.
import { S3 } from 'aws-sdk';
const s3 = new S3();
const post = s3.createPresignedPost({
Bucket: 'our-sample-bucket',
Expires: 3600,
Fields: {
key: 'sample-image',
'Content-Type': 'image/png',
acl: 'private'
},
Conditions: [['content-length-range', 0, 1000 * 1000 * 2]]
});
In the example above, we're generating a signed POST policy to upload a PNG image between 0 and 2MB to the our-sample-bucket
S3 bucket, expiring after one hour. The uploaded object will be private and can thus only be accessed by the resource owner, or a pre-signed URL generated by them. We also enforce the key to match sample-image
.
The SDK exposes a couple of fields: Bucket and expiry are straightforward, fields are the form fields that should be included in the POST upload. Every field present in Fields is added as an explicit condition, so the key
, content-type
, and acl
have to be present in the same way they're defined.
The conditions are supplied as an array of objects, each of which is used to validate the upload request. We specified the content length to be in a certain range, we could even have selected the same upper and lower bounds to expect a fixed object size (helpful if you know about the size upfront).
What is also helpful to understand the policy, every form field specified in the final request must be present in the list of conditions of your policy. Exceptions include x-amz-signature
, file
, policy
, and fields using x-ignore-
prefix). This is important as it enforces requests to do exactly what you want them to do: Anything not present or matching the policy is not allowed to be uploaded.
Now that we have to policy, we can add the fields to our request form body and send it off
If you're uploading objects from the browser, make sure your bucket CORS policy is configured to allow POST operations from your origin.
import FormData from 'form-data';
import fetch from 'node-fetch';
// Include all fields from policy
const body = new FormData();
for (const [k, v] of Object.entries(post.fields)) {
if (typeof v !== 'string') {
continue;
}
body.append(k, v);
}
// Append sample file
body.append('file', sampleAsset(), 'test.png');
// Send multipart request
const req = await fetch(post.url, {
method: 'POST',
body
});
And that's it. The fields
object includes all the details needed to create a successful request (such as the key, policy, signature, content type, and everything else we defined in fields
before). Note how we added the file as the last field in our form body to make sure all fields are evaluated by S3.
Once the body is complete, there's nothing more to do than sending the POST request, no additional headers are needed. This approach works even in the most inaccessible browser, but also in any other environment you can send POST requests in.
✅ Accessing Assets with Pre-signed URLs
We uploaded the asset to our S3 bucket successfully, and we'd like to retrieve it now. For this, we'll create a pre-signed URL that grants temporary access to the private object:
import { S3 } from 'aws-sdk';
import fetch from 'node-fetch';
const s3 = new S3();
const getParams = {
Expires: 3600,
Bucket: 'our-sample-bucket',
Key: 'sample-image'
};
const signedUrl = await s3.getSignedUrlPromise('getObject', getParams);
const image = await fetch(signedUrl);
In getSignedUrlPromise
, we specify an operation that should be allowed, in our case that is getObject
. We also supply parameters passed to the operation, additionally, we pass Expires
to limit access up to one hour.
The operation will return a pre-signed URL with query parameters set to access the specified object in our bucket. Once we've obtained the URL, we can run a fetch and see if our asset is returned as expected.
After a couple of steps, we're ready to build a simple service returning pre-signed URLs to store and retrieve uploaded assets. We can be sure the expected asset is uploaded by adding restrictions to the POST policy conditions. We can upload files from any platform that supports multipart form body POST requests. Additional tasks could include resizing assets, although you could even resize and optimize before you upload. You could also add hooks to your system, or think about using the proxied approach as it might make more sense in a scenario where you work a lot on the asset.
Thankfully everything works the way I needed it for a side project, and I'm still incredibly happy that I could offload the entire management and storage part, focussing on providing a seamless upload experience instead, without compromising on control over large file sizes, malicious actors, and other issues that arise with file uploads and data transfer.
If you had any issues or have a question, don't hesitate to reach out on Twitter or drop a mail!