Engineering

What Powers Instagram: AWS and Open Source

The guys over at Instagram wrote a nice article about their technology stack. They are using a lot of the Amazon AWS services, from S3 to Ec2 and ELB. Instagram is written in Django. PostgreSQL is powering their databases together with Redis for their lists and sessions. When they started sharing their PostgreSQL cluster they switched from PostSQL Geo to Apache SOLR for their geo API. Gearman is working with 200 workers to get data asynchronously through the system and into their streaming data clients.

We thought it would be fun to give a sense of all the systems that power Instagram, at a high-level; you can look forward to more in-depth descriptions of some of these systems in the future. This is how our system has evolved in the just-over-1-year that we’ve been live, and while there are parts we’re always re-working, this is a glimpse of how a startup with a small engineering team can scale to our 14 million+ users in a little over a year.

via Instagram Engineering • What Powers Instagram: Hundreds of Instances, Dozens of Technologies.

No surprises, but it looks like a solid stack. I’ve been using most of the tools myself in earlier projects. I was a bit surprised about the size of their instances. I would’ve used more but smaller instances, but apparently the IO is not the bottleneck. Food for thought..

Storm: The Hadoop of Realtime Processing

Backtype has built a powerful system to analyze realtime social data. They help you with insights about your social influence on Twitter and YCombinator’s Hacker News by analyzing tweets.The following graph shows the (not so impressive) stats for my blog:

Screen shot 2011 06 01 at 20.06.39 Storm: The Hadoop of Realtime Processing

Because Backtype is processing all tweets for URLs to calculate your influence, they have to process a massive amount of data from the Twitter Firehose. The Firehose can go as fast as 7000 tweets per minute during New Years Eve in Tokyo. That’s a massive 117 tweets per second!

The big problem with realtime is that you can not or not easily process it in batches, because the data keeps coming. When you batch this amount of data you have to be able to process the data faster than realtime or create an always growing backlog. During the World Cup finals last year when The Netherlands was playing against Spain, we (Mobypicture’s MobyNow) had a small flaw in our code processing the tweets with #ned and #wk2010. During roughly 90 minutes we had built a backlog of 18 hours worth of processing. Because people kept using #ned and #wk2010 it was almost impossible to remove the backlog, we had to go many times faster than realtime to remove it. While displaying realtime tweets you don’t want to be more then a couple of seconds behind. With batched processing this process of removing the backlog is a fight you are fighting every time you process a batch.

Backlog also recognized that batched processing wasn’t the way to go for their analytics. So they recently developed a new system for doing realtime processing called Storm to replace their old system of queues and workers: Read the rest of ‘Storm: The Hadoop of Realtime Processing’ »

The four categories of NoSQL databases

nosql-logo

Very interesting read on the Monitis Blog about picking the right NoSQL tool. They dive into what it is, what’s possibly wrong with RDBMS, describe the different categories of NoSQL and the pros and cons of the different types.

Most people just see one big pile of NoSQL databases, while there are quite some differences. You couldn’t use a Key-Value store when you need a Graph database for example, while Relational database systems are all quite compatible.

Montis describes the following categories:

1. Key-values Stores

The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages.

Examples: Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB, Amazon SimpleDBRiak Read the rest of ‘The four categories of NoSQL databases’ »

Route 53 and Elastic Load Balancing integration

Another batch of great new AWS features, this time for Route 53 (DNS) and ELB. One in particular is very interesting. Route 53 and Elastic Load Balancing integration. Werner Vogel explains why this is a big deal:

Due to restrictions in DNS a root domain “zone apex” cannot be mapped to another domain name using a CNAME record. This has caused customers who wanted to have the root of their domain e.g. allthingsdistributed.com point to same location where for example www.allthingsdistributed.com is pointing to jump through complex redirect hoops. Through a better integration between Elastic Load Balancing and Route 53 we can now offer the ability to map the zone apex to ELB without all the redirection muck. ELB and Route 53 work together closely to ensure that if the address of the load balancer changes this is quickly reflected in Route 53.

That’s why Mobypicture is running on www.mobypicture.com instead of mobypicture.com and we had to run our own loadbalancers and scaling metrics for our moby.to url shortener. From now on we can completely switch to ELB, including all other benefits.

More information can found on the Elastic Load Balancing and the Route 53 detail pages or on Werner’s blogpost “New Route 53 and ELB features: IPv6, Zone Apex, WRR and more

Custom Metrics for Amazon CloudWatch

You can now store your business and application metrics in Amazon CloudWatch. You can view graphs, set alarms, and initiate automated actions based on these metrics, just as you can for the metrics that CloudWatch already stores for your AWS resources.

On first sight I was not really impressed. It does not sound really useful, until you dig deeper into the possibilities. For example adding page latency to CloudWatch gives you far better insight WHY your latency rises compared to free server resources.

Here’s an example of the kinds of insights that can be gained when you use custom metrics. The following chart shows the amount of free memory (the jagged line) and the application’s page latency:

cw custom memory latency 11 Custom Metrics for Amazon CloudWatch

As you can see, page latency increases as free memory decreases (looks like there’s some garbage collection going on as well).

Read the rest of ‘Custom Metrics for Amazon CloudWatch’ »