Low Latency Discussion

Low Latency (General Discussion)

What is Low Latency?

HadronZoo specializes in the development of low latency server programs but what does that actually mean? What is latency and what qualifies a program to be classed as low latency?

The latency of a system is the time taken to respond to requests. It is usually given as a average value since response times depend on the nature of the request and events such as context switching by the operating system. An 'average request' begs a definition. For a web server this might be stated as the time taken to serve a 5Kb page. Latency is sometimes quoted as a percentage of requests that will be responded to within a given time. In other, more stringent environments, latency will be quoted as the maximum. In the client/server scenario there are two distinct measures of latency: That observed by the client which includes the network transmission times, and that which is due purely to the server. Over the Internet transmission times are often the largest latency factor but server latency matters because it is strongly related to another measure - throughput. It stands to reason that the less time it takes for a server to respond to a request, the more requests it can handle in any given time.

Latency and throughput figures are notoriously optimistic with real performance in the field falling well short. Notwithstanding this, a server program such as a web server, can be said to be 'low latency' if the principle design objective is speed. Low latency programs employ particular techniques, without which they would run many times slower. It is common to run programs in multiple threads to take advantage of multiple CPU cores but low latency server programs will typically go much further and use thread specialization, lock free queues, atomic locking, epoll, caching regimes and memory intensive non-SQL proprietary database technology. And low latency is not just about programing techniques. A low latency website for example is not just a website served by a low latency web server. There are further measures within the site itself, particularly regarding bandwidth usage. Succinct page content will inevitably be faster than verbose. AJAX (Asynchronous JavaScript and XML) reduces bandwidth consumption by limiting transmission to only the part of a page that has changed. Compression will save on bandwidth and for fixed content that can be cached, won't add to processing time.

What are the benefits of low latency measures?

The benefits depend on the operating environment. Bankers for example, are concerned about both forms of latency but particularly transmission times. These are critical because buy and sell orders with long transmission times are beaten by those with short transmission times and the information they need to trigger the orders is out of date. That might sound rediculous given we are only talking in terms of a small fraction of a second but it isn't. In the world of automated trading everything is speed up by a factor of many thousands so trying to make markets in HongKong out of a Wall Street office is a bit like trying to bid for a house by post. The guy who bids by phone or Internet is going to get the bargain and you are going to miss it. Far from being 'masters of the universe' bankers are it's biggest slaves as transmission times are largely a function of the speed of light. Note that banks continue to cram themselves into financial centers while everyone else is looking at the Internet as a means to serve their market from their chosen idyll. And bankers must have the very fastest servers money can buy. It takes a lot of computing power to chunder through high speed data feeds to spot juicy trade opportunities. Low latency for bankers can never be low enough. But bankers are not the only players. With less emphasis on transmission times the weathermen, the astronomers and search engines like Google, all need to get the most out of whatever extreme computing power they can lay their hands on.

But these are guys with big problems but for the most part, fat wallets to solve them with. What about more humble endeavors? Something like a gaming server for instance. Gaming servers tend to be relatively straightforward in terms of functionality. It is mostly a case of keeping track of user states and of course, gremlins of one form or another. User states would be what 'room' the user is in and with how many gold coins and what weapons - that type of thing. The problem though is the number of concurrent users the servers can support and this is critical for two reasons. The obvious reason is that more users means more revenue. The less obvious reason is data distribution. If everyone who wants to play can be accommodated by a single server there is no data distribution issue. But if this is not the case the game has to be split up in some way and spread across multiple servers. This might be by room so each server is now handling one or a few rooms rather than all the rooms in the game. Sounds simple but now when a user moves from one room to another, their data has to follow them so more work is being done overall. And the capacity of each server is reduced because it has to cope with greater statistical variance of user numbers in the rooms it manages. The alternative, to split by user avoids migration of user data but replaces it with migration of room data! It should be pretty clear why gaming servers use low latency high throughput techniques. These are the key to avoiding the multi-server scenario or failing that, of better coping with it.

What about me? All I want is a website

So far we have picked relatively esoteric and extreme examples to illustrate the benefits of low latency techniques. So lets consider the more widely applicable practice of website hosting. As you might expect, low latency/high throughput techniques offer better benefits to the more demanding websites. If your site only gets a few hits a day server efficiency is irrelevent. And to be clear, with the transmission times on the Internet generally eclipsing the latency of even the most inefficient web servers, having a low latency website is not something that will be readily apparent to your customers. If your site has a reasonable level of traffic though, with peaks approaching 500 requests per second, you could see a considerable benefit.

Let's throw a few numbers in the air. Many servers sit behind pipes with a stated rate of 100 mbit/sec but are doing well if they ever see as much as a third of this bandwidth. A firth is a safer figure to work on. Assuming an average response of say 5Kb per page, they can burst to something like 1,000 requests per second. Half that could be sustained in the dead of night and half that again in the Internet 'rush hour'. But that's the limit imposed by the pipe and if all you are doing is serving out fixed content, pretty much any web server program running on an entry level server can match that level of throughput. You can always buy a 1 Gig pipe of course, but this is where it starts to matter what you are serving and how. The functionality does not have to be that complicated and you don't have to that bad at either PHP or SQL, before you find a faster pipe does not bring much benefit.

Most websites serve less than 500 pages per day and most of the traffic is to webcrawling robots rather than humans sliding their mouse around. If the site content is particularly compelling, the average human visitor might pull perhaps 10 or 15 pages including peripheral pages such as images and stylesheets. Even at 100 pages per second you can assauge the desires of 20,000 visitors an hour. If you have that level of traffic you almost certainly have options including sitting back and living off of advertising revenue without doing any business at all. And yet it is surprisingly easy to become traffic challenged. If you do anything clever with real time data feeds or use your site to support a mobile phone app that frequently pulls updates from the server, you can quickly find yourself needing a lot more than 1,000 page per second, particularly if the service offered lends itself to sharp usage peaks at particular times of the day.