During April we had continued issues specific to our EU servers. As outlined before, the growth on our Amsterdam network in particular has been impressive. With this growth came a few unexpected growing pains that we have been actively rolling out solutions for. A few were outlined in previous posts. We believe we created a tool that can really help the everyday needs of front-end developers and designers without thinking about hosting, moving sites and all the hassles that go along with proper life-cycle management of a website. To achieve this, consistent availability of production clouds is of utmost importance. Availability is one of the primary reasons we chose to build this platform on SoftLayer's hardware infrastructure. We are hard at work making sure we meet and eventually exceed our SLA in availability and make downtime a non-issue.
Success of Lab Accounts
The response to the free lab account plan has been tremendous with over 5,000 MODX’ers now having a free ‘sandbox’ to play with. We are thrilled to give users a chance to test the tool, particularly the ability to snapshot and move sites, for free. The resources are divided differently between these and paid accounts. Systems are in place to make sure plenty of resources are available to production accounts. Beyond per cloud resources, the underlying technology and infrastructure that powers Lab Accounts are the same as the paid accounts. This has put more strain on our technology than we anticipated and we are making adjustments and improvements to optimize this daily.
Current measures to improve availability
In response to the recent issues we have immediately increased focus and resources (both personnel and hardware allocation) to drastically decrease if not eliminate any issues. Reports of errors have decreased with an increased amount of activity. We are continuing to focus on making sure any lingering issues are eliminated.
Over the next few days we will be further separating and speeding up the recycle processes for individual Cloud instances. These improvements will result in less time needed for changes to Clouds, injection of Snapshots, and restores, amongst other things. We will post again once these changes are complete.
We are currently working on plans for the best solution for system wide status monitoring. While we do internally monitor each platform, we are developing a more sophisticated system. This will include a page available on the public website to view recent cloud status updates. We are testing our new system at the moment and hope to have a public facing page available soon.
Investigating per cloud monitoring
After a few improvements were put in place during the last week of April the strain on the platform was much alleviated. After this fix, certain clouds continued to have issues despite no system wide issues being logged. This was partially due to some of those clouds impeding the system by not employing efficient caching. While stability and system wide monitoring are top priorities, we are are researching options to have per cloud monitoring that will proactively alert support with system issues for any single production cloud. These plans are in their infancy. Updates will be posted at our blog when solutions are found. The more we can automate this the better for everyone.
Hiring dedicated infrastructure member
SoftLayer, a top tier hosting and cloud services provider handle the hardware and availability of hardware for our platform. In addition to their expertise, we are actively seeking a specialized and dedicated infrastructure staff member to make our system more efficient and to proactively identify potential scaling issues before they happen. We are very serious about not only making this an efficient tool for your workflow, but to make this a very solid platform for the long term. A person fully dedicated to this goal will go a long way in making this a reality.
If you are still having issues
Please file a support ticket through the Cloud Dashboard or email firstname.lastname@example.org as soon as possible if you are still having issues. We do monitor social media channels, but tickets filed with support are closely monitored. Any issues, especially with production sites and downtime get top prioritization and addressed as soon as possible. We appreciate help on this front. Raising tickets with support really helps us identify and resolve issues faster.
We are working as hard as possible to ensure that downtimes are an issue of the past. If you are still concerned or have other questions, please do not hesitate to send us a support ticket and we will answer any questions and address any issues you might have.
Our goal is for hosting and the process of hosting to recede into the background and become something you don't even think of. We will not stop working on improving the cloud platform until we reach this goal, letting you focus on doing what you love.