High traffic websites

I often work with sites that has a high user-base and pageviews that means one has to find techniques that help the webservers handle the load. As most often is the use is spread in specific peaks among the day. But how does one in the most efficient way handle load? Most commonly you have hardware loadbalancers and either virutal or physical servers taking the traffic in. But that will be expensive and a smaller site with high usage might not have the funds to buy into expensive loadbalancers and hardware. So how does one get the most out of a infrastructure?

The first part is to use any old structure as a template for the new since the best data will be gathered from the old environment. How high the load is on the servers, how many peaks per day the servers receive and so on. As a guideline take the highest peak and count 30 minutes before and after to get the highest pageview peak you have. This will be the low tier traffic and during which time you will either need to make some sacrifice and have longer load times or use more capacity to handle. The normal tier will be in 8 hours of the day (longer if you have a non-work related site). The high tier is when your site is not loaded and therefore fastest.

I have seen many combinations to this effect and the most efficient (unless you use things like AWS or similar) is to use a combination of two webservers. Namely Nginx and Apache.

Since Apache is a workhorse with more then a decade in the business this is in my opinion the best choice as a webserver and to loadbalance between several Apache servers I often use Nginx. With Apache in the backend handling the more CPU heavy PHP you have Nginx in the front caching and serving content to the clients with compression and speed. The whole idea is to use Nginx to serve the static content and give Apache only the dynamic either locally or on a separate  server. To find out more check the archives there you can find scripts and configurations that will help you in this process.

Keeping up

In daily dealings with complex environments one starts to think about the advantages and disadvantages in the bleeding-edge versus old-stable philosophy. In general the community of System Administrators are a more conservative bunch, whilst the developers tend to be on the more “let us see what sticks” type of persons. Not saying there is no commingling, the concept of DevOps is a widespread and with-the-times movement.

2CFWiIM

It does not solve any problems though, which many might assume it will. The barrier between Developers and Administrators will still be there, but now we have dug some tunnels below. Namely automation and containerization. With automation comes the easy setup of platforms, with containerization the direct dependencies on the host operating system goes away.

In the age old discussion between moving forward and keeping things stable this middle ground is perhaps our best option. Changes will be on the horizon though, namely the Developer will have to know a bit about infrastructure. On the other side of the wall at least some broader automation skills will be required, or perhaps even understanding of programming.

My little guess for the future is that the role of a Administrator will move over to becoming more of a platform knowledge. Mostly due to the fact that Amazon, Microsoft, Google just to name the biggest, have made hardware a cheap commodity. Not saying that all will be in the cloud, but perhaps the classical hosting providers will.

I have been working a bit with the Microsoft Windows 2016 server core edition and must say that I am pleasantly surprised by how easy it was to manage when you figure out that winRM (windows remote management) utilizes both HTTP (5985/TCP) and HTTPS(5986) but you cannot have the HTTPS without the HTTP it seems.

But to go back to my point; ushering servers directly will be more or less gone if the current trends are any indication. The IT staff will work more with scripting, automation and ensuring that the myriad of machines they use are running then ever logging in and performing tasks on any single machine.