This week Amazon announced support for dynamic content in their CDN solution Amazon CloudFront. The announce coincided with my efforts to migrate more pieces of Difio's website to CloudFront.
In this article I will not talk about hosting static files on CDN. This is easy and I've already written about it here. I will show how to cache AJAX(JSONP actually) responses and serve them directly from Amazon CloudFront.
Background
For those of you who may not be familiar (are there any) CDN stands for Content Delivery Network. In short this employs numerous servers with identical content. The requests from the browser are served from the location which gives best performance for the user. This is used by all major websites to speed-up static content like images, video, CSS and JavaScript files.
AJAX means Asynchronous JavaScript and XML. This is what Google uses to create dynamic user interface which doesn't require to reload the page.
Architecture
Difio has two web interfaces. The primary one is a static HTML website which employs JavaScript for the dynamic areas. It is hosted on the dif.io domain. The other one is powered by Django and provides the same interface plus the applications dashboard and several API functions which don't have a visible user interface. This is under the *.rhcloud.com domain b/c it is hosted on OpenShift.
The present state of the website is the result of rapid development using conventional methods - HTML templates and server-side processing. This is migrating to modern web technology like static HTML and JavaScript while the server side will remain pure API service.
For this migration to happen I need the HTML pages at dif.io to execute JavaScript and load information which comes from the rhcloud.com domain. Unfortunately this is not easily doable with AJAX because of the Same origin policy in browsers.
I'm using the Dojo Toolkit JavaScript framework which has a solution. It's called JSONP. Here's how it works:
dif.io ------ JSONP request --> abc.rhcloud.com --v
^ |
| |
JavaScript processing |
| |
+------------------ JSONP response ------------+
This is pretty standard configuration for a web service.
Going to the clouds
The way Dojo implements JSONP is through the dojo.io.script module. It works by appending a query string parameter of the form ?callback=funcName which the server uses to generate the JSONP response. This callback name is dynamically generated by Dojo based on the order in which your call to dojo.io.script is executed.
Until recently Amazon CloudFront ignored all query string parameters when requesting the content from the origin server. Without the query string it was not possible to generate the JSONP response. Luckily Amazon resolved the issue only one day after I asked about it on their forums.
Now Amazon CloudFront will use the URL path and the query string parameters to identify the objects in cache. To enable this edit the CloudFront distribution behavior(s) and set Forward Query Strings to Yes.
When a visitor of the website requests the data Amazon CloudFront will use exactly the same url path and query strings to fetch the content from the origin server. All that I had to do is switch the domain of the JSONP service to point to the cloudfront.net domain. It became like this:
| Everything on this side is handled by Amazon.
| No code required!
|
dif.io ------ JSONP request --> xyz.cloudfront.net -- JSONP request if cache miss --> abc.rhcloud.com --v
^ | ^ |
| | | |
JavaScript processing | +---------- JSONP response --------------------------+
| |
+---- cached JSONP response ---+
As you can see the website structure and code didn't change at all. All that changed was a single domain name.
Controlling the cache
Amazon CloudFront will keep the contents in cache based on the origin headers if present or the manual configuration from the AWS Console. To work around frequent requests to the origin server it is considered best practice to set the Expires header to a value far in the future, like 1 year. However if the content changes you need some way to tell CloudFront about it. The most commonly used method is through using different URLs to access the same content. This will cause CloudFront to cache the content under the new location while keeping the old content until it expires.
Dojo makes this very easy:
require(["dojo/io/script"],
function(script) {
script.get({
url: "https://xyz.cloudfront.net/api/json/updates/1234",
callbackParamName: "callback",
content: {t: timeStamp},
load: function(jsonData) {
....
},
The content property allows additional key/value pairs to be sent in the query string. The timeStamp parameter serves only to control Amazon CloudFront cache. It's not processed server side.
On the server-side we have:
response['Cache-Control'] = 'max-age=31536000'
response['Expires'] = (datetime.now()+timedelta(seconds=31536000)).strftime('%a, %d %b %Y %H:%M:%S GMT')
Benefits
There were two immediate benefits:
- Reduced page load time. Combined with serving static files from CDN this greatly improves the user experience;
- Reduced server load. Content is requested only once if it is missing from the cache and then served from CloudFront. The server isn't so busy serving content so it can be used to do more computations or simply reduce the bill.
The presented method works well for Difio because of two things:
- The content which Difio serves usually doesn't change at all once made public. In rare occasions, for example an error has been published, we have to regenerate new content and publish it under the same URL.
- Before content is made public it is inspected for errors and this also preseeds the cache.
Comments !