You’ve probably read on our recent blog articles that we’ve launched a new cloud native web application that runs in your own Google Cloud account called Presslabs Dashboard. Our engineering team has invested more than 20,000 hours of work already in developing it, diving into Kubernetes and cloud-native solutions. It felt natural to bring the results of this work and perks to our managed WordPress hosting solution as well. And this is what we did.
#Our old infrastructure and the start of the Kubernetes era
Since we’ve started diving into the managed WordPress hosting world back in 2009, we’ve always strived to offer the best services for our customers and also dig into the new technologies and opportunities. We’ve managed to develop a solid infrastructure with three main layers, separated logically and physically: the backend, the frontend and a customer control panel that provides monitoring and tooling for DevOps.
The backend layer was a fleet of bare-metal servers managed and deployed using Ansible. We also had a backup system in place based on ZFS, helped by our open-source project Z3 tool.
Our internal services and control plane for the managed hosting services have also been deployed using Ansible on bare metal servers, but they were the first ones to be migrated to Google Kubernetes Engine.
The frontend has been left untouched. It consists of two layers – our Presslabs cache nodes and a unified delivery network of frontend servers – or edge nodes as we call them. Since 2014 we are serving content to readers from servers that are as close to them as possible through our geographically distributed network of edge nodes.
Deploying and maintaining your own bare-metal cluster is no easy feat and it required at least two dedicated engineers to face all the challenges that come with these tasks. Moreover, deploys and server upgrades involved planned downtime. This is why in 2016 we started watching container orchestration tools, experimenting with both Kubernetes and Docker Swarm. Kubernetes proved to be a more stable solution back then and in September 2017 we started working on the migration of our internal services on Kubernetes.
In the meantime, we’ve also started working on Stack, our open-source WordPress infrastructure based on Kubernetes, that was born out of the idea of creating a standard in WordPress hosting and we’ve also created an opinionated version of the Stack called Presslabs Dashboard, which we’ve launched as an app in the Google Cloud Kubernetes Marketplace.
#The final step: switching WordPress backends to Kubernetes
We’ve reached a point where we had to maintain two projects with different needs: the managed WordPress hosting one and the Stack/Dashboard one. Maintaining two codebases is challenging for any team and our old systems required continuous updates to all the services we were using. At the same time, moving all our infrastructure to Kubernetes was not a trivial endeavor either. However, considering we’ve spent the last few years digging into Kubernetes, it seemed like the natural step forward to also bring our hosting infrastructure into the Kubernetes era. Right now, the system upgrades are handled via Google’s service, automatically.
What did this change involve? We’ve basically rewritten the backend layer of our infrastructure and started using Google Kubernetes Engine clusters instead of our old bare-metal servers. Our new infrastructure uses Stack as a starting point, which our engineering team adapted to our specific use-cases and integrated with the Presslabs API and our user control plane. On our specifications page we’ve created a detailed schema to illustrate the new components and how they interact.
The Ozone controller is the mastermind of each Kubernetes Engine Cluster and manages Ozone Tasks and WordPress Pods. Ozone Tasks are operations such as synchronization requests, snapshots and cache flushes, requested by our clients via the Presslabs API.
The Ozone controller also creates WordPress entities, while the WordPress Operator creates the corresponding WordPress Pods. A WordPress Pod runs containers based on stack runtimes and our custom PHP code for WordPress sites, which includes custom object cache, Carbon Cache glue code, etc. For storing media files we are using Kubernetes Persistent Volumes.
Databases are grouped in MySQL clusters that are managed by MySQL Operator we’ve specifically written and open-sourced.
Backups are done using Velero, an open-sourced tool to safely backup and restore data, perform disaster recovery, migrate the Kubernetes cluster resources and persistent volumes. Backups are done on a daily basis, they are managed by Google and kept for 30 days.
The frontend part of our infrastructure remained the same, with our geographically distributed network of edge nodes and cache nodes. Traffic comes from our cache-nodes to an NGINX Ingress controller, via the Google Cloud Load Balancer.
We collect system and WordPress related metrics using Prometheus and we visualize them using Grafana. We’ve put in place various alerts in Grafana to notify us in case a site is down or certain usage limits are reached. We also collect PHP logs using Fluent Bit, process them in Papertrail and display them to our clients from our custom control plane. As an additional monitoring tool, we are also using Google’s Operations, formerly known as Stackdriver.
Custom add-ons like Elastic Press and Thumbor are also managed by the Ozone Controller.
#The benefits of our new infrastructure
#Backend scaling and dynamic requests
With the new version of our infrastructure on Kubernetes, we can easily handle websites with intensive dynamic requests, such as WooCommerce, sites with user generated content, sites that require logged-in users and sites performing un-cached operations in general. It also allows scaling both vertically and horizontally (vertically – larger machines, horizontally – more machines), up and down, allowing a seamless experience even for the most demanding sites.
On our old infrastructure, we relied heavily on the frontend layer, where the content of the websites was cached, to deliver the sites as fast as possible. However, in the case of websites with dynamic requests, caching is not always an option and this is where our new Kubernetes infrastructure comes to the rescue.
#Easier feature development
From the beginning, we took the security aspects of our infrastructure very seriously and until this day we’ve had zero hacks. Such security measures include that we don’t allow the execution of arbitrary PHP code, the wp-admin of our hosted sites has always been secured and the wp-config.php file could not be edited directly. Now the users can edit files such as wp-config.php, php.ini or NGINX settings by creating their own custom files in their git repository.
Another security measure was that we didn’t allow direct database access and wp-cli access, as we thought it would be much safer for our experienced engineers to run database queries that could seriously impact a site. This proved a wise choice at the time and both us and our customers were satisfied with this approach.
However, as the WordPress world evolved, features such as database access or wp-cli became paramount for skilled WordPress developers and teams. Unfortunately, on our old infrastructure, such features proved difficult to implement because of all the access required, but the switch to Kubernetes opened new possibilities for us. Now we can implement such features with relative ease and give granular access to teams and developers.
For now, we only offer database and wp-cli access to our enterprise customers, but we are working full speed to implement these features for all our customers.
#Solid features and easier maintenance
Kubernetes is slowly becoming a standard in container orchestration, a highly stable product maintained by the industry’s tech giants. It goes without saying that incorporating these technologies into our infrastructure has brought us more stability and the ability to develop solid features. In rewriting our infrastructure to accommodate Kubernetes, we’ve cleaned up our code and got rid of what was not needed anymore because it is now done in a more stable way through Kubernetes mechanisms.
The time needed to maintain our servers was also significantly reduced and allowed our team to focus more on developing new features. As mentioned before, on bare metal servers we had to install and configure new servers by hand, as well as update and maintain them. With Google Kubernetes Engine, we simply ask Google to spin us up new machines with just a few clicks or use their built-in machine auto-scaler. Needless to say that this change was welcomed by our whole engineering team.
#Integrate features such as ElasticSearch and Thumbor
Two of the most pressing and troublesome issues for large sites are the search queries and image processing. Heavy search queries can significantly affect the performance of a site and a common solution to this is using ElasticPress / ElasticSearch to reduce the generation time.
Another lifesaver for sites that are using a lot of images, e.g. news sites, is Thumbor. Thumbor is a smart imaging service that allows real-time image processing such as cropping, resizing and flipping of the images.
For now, the ElasticSearch and Thumbor integrations are available only on our enterprise plans, but we are planning to make them available for all our plans as add-ons as soon as possible, so stay tuned.
#Some problems we’ve encountered
The switch to Kubernetes hasn’t been a smooth sale all the time and it still comes with its own challenges.
#Kubernetes is always changing
We are using low level Kubernetes and the APIs are always changing and improving, at a much faster rate than most technical products. This is a good thing per se, but it implies that we need to keep up with the speedy way that Kubernetes evolves and constantly update our code and the way we write code for Kubernetes accordingly.
#Problems with the YARPP plugin
On migrating all our clients to the new infrastructure, we’ve run into some issues with the sites that had the YARPP plugin installed. We’ve changed our MySQL engine from MyISAM to InnoDB and this change affected the customers that were using the YARPP plugin for related posts.
We are proud to say that to our knowledge, we are the first managed WordPress platform fully based on Kubernetes and we are working full speed to develop new and exciting features and offer our clients a seamless, highly-scalable hosting experience. If you want to test our new platform, you can create a free account or leave us a message, we’d be glad to help you out!