Architecting the cloud

How to design a well-architected cloud computing system

Jani Iivari
7 min readAug 11, 2019

Introduction

Is architecting a verb? Maybe the title should be “to architect the cloud.” Well anyways, this is about how to design a cloud computing system from systems or solutions architecture viewpoint. How to build a safe, flexible system that scales and how difficult can that be? There are some special considerations and aspects in cloud architecture, especially in public cloud architecture.

When we start designing a system we usually start from value proposition, which is great, we don’t want be build systems just because systems. There needs to be value in what we do, otherwise no-one will pay for it. You could say the business value is the tip of an iceberg or a house standing in firm architectural foundation.

There is certainly also business value in solid architecture; It costs money when your system fails on Black Friday due to unexpected load, or when someone hacks to your system and steals your data. I’m always amazed by news on Black Friday problems. You don’t want your web shop to be offline on the busiest day of the year..

If we come back to the iceberg analogy, the visible part of iceberg is said to be only fifth of the whole mass. Maybe saying non-functional requirement are the non-visible part of iceberg is an over statement, but at least they provide a solid foundation for the business logic. Modern architecture relies heavily on Platform-as-a-Service type of offering, it makes the iceberg analogy even more true. Only a small part of the service is visible above the surface, the rest is operated by the cloud provider. But this doesn’t mean that we don’t need to care about it. There is decisions to be made to build a great architecture.

Black friday in webshops

Great architecture

Fortunately there is some good materials what to take into account when building a great or well architected system. AWS has a “Well -Architected Framework” to assist on architectural best practices for designing and operating reliable, secure, efficient and cost-effective systems in the cloud. Microsoft has “Pillars of a great Azure Architecture” for ensuring a solution is designed in ways that are scalable, resilient, efficient and secure. Both has the same goal, assist us to take advantage of the best practices in the cloud. While these guides provide a great foundation and best practices for architecture they both acknowledge that there is no one-size-fits-all approach and there are some trade-offs to designing an architecture. Interesting, let’s dig a little deeper…

Four or five pillars of architecture

AWS has five architecture pillars and Azure has four. The ideas on these match exactly one-to-one. The basic concepts are the same. Security is the one with full match on the name as well. No wonder, security is probably a major concern in organizations thinking of going into cloud. All others are basically the same, the difference is in weighting some thing on theme level a bit more.

Security

Cloud security is slightly different compared to the on-premises security. Not from requirements perspective but from responsibility and threat perspective. Both vendors use a defence-in-depth approach with layered structures and shared responsibility model. For example physical security on infrastructure is cloud provider responsibility and we can focus on other layers. There is no single security system, security is viewed holistically, from technology, people and processes perspective.

Depending on the type of service, IaaS, PaaS or SaaS security services differs. In all service types some security protections are built in, . This is something to consider really carefully, on IaaS you are responsible from all except the physical layers, in PaaS or SaaS less. IaaS you are subject to more attack surface and a bit more open to vulnerabilities due to old OS versions and lazy access policy enforcing. It’s not the first time someone leaving ssh or rdp open to all internet. On a cloud architecture, I like to think modularized approach using PaaS works the best. Take also advantage of specific security services in both cloud, Azure Advanced Security Center or Sentinel and Amazon Security Hub or GuardDuty.

Defence-in-depth

Performance Efficiency and Scalability

Performance is the ability to use computing resources efficiently to match the available resources with the demand. Performance optimization requires a holistic approach, you need to architect the entire solution, from networking and storaging to computing resources. I love the AWS principle “Democratize advanced technologies”. It’s exactly where the cloud ecosystems are at their best, use spark clusters, messaging layers, no-sql, load balancers, containers etc. These all used to be extremely difficult to build but in the day of cloud are a few terraform configs away (which is really, really easy). One aspect that is often forgot is monitoring, you need to monitor the performance again holistically not just the cpu of a vm. Identify bottlenecks; improve and experiment constantly.

Scalability and performance goes hand in hand. Scalability is in most cases seen as dynamically acquiring computing resources. Autoscaling dynamically allocates resources to match the demand either by scaling up or scaling out. Scaling up increased the capacity of single instance until to a certain point, scaling out adds new instances to a service. In theory there is no limit to how much you can scale out. Scaling out is usually cheaper, but requires some type of load distribution. Go for architecture that is able to scale out.

Availability, Recoverability and Reliability

You really want to have system that is reliable, not only scaling up or out in heavy usage situations but also when failures in hardware or in operations occur. Individual servers will fail, network adapters will fail and someone will eventually make a mistake. Taking into account that these incidents and temporary conditions will happen is definitely one part of a great architecture.

There are many reasons to use automation and this is one, infra-as-code helps you bring down the recovery time objective in a major disaster. Frequent backups limits the loss of data, the recovery point objective which you need to define, even if your customer wouldn’t want to accept that some data might be lost.

A key to managing failure is a frequent and automated testing of failure incidents. With proper tools, testing can be done in a system exactly like the production system and track RTO and RPO KPIs and even improve them with a better design.

Operational Excellence, Efficiency and Cost Optimization

This is a big one; how to develop, operate and monitor systems to deliver business value and to continually improve processes. You could add the cost-optimization to this area, like Microsoft does in efficiency and operations. All starts with a holistic architecture which extends to continues services and operations. Ideally the entire process takes into account, that our DevOps team will be responsible of the entire solution life-cycle. Being Cost effective extends even further, to your overall architecture. Does DevOps teams and microservices dictate that they all use their own services or should we use shared service or shared clusters? One example is an Azure SQL elastic pool. Creating a pool could decrease the costs of SQL cluster significantly.

In this area it’s quite obvious to automate as much as possible. Automate build, deploy and resources administration and monitoring. Know all the incidents before your users see there is a problem and remediate or inform early. With monitoring you will be able find inefficiencies in performance and costs and improve the system constantly. Architecting operational efficiency well, makes your life so much easier and your customers and users very happy!

It’s rather easy to spend money on a cloud system. Just apply a script and wait for few minutes to get a service up and running. Azure SQL Data Warehouse at it’s highest tier costs 300$ an hour. Maybe you don’t want to leave that up by mistake or run it idle in development or test environment. AWS has the cost efficiency on it’s own pillar. Maybe you should consider it as well. Monitor costs create automation on services and choose your services well.

Conclusions

Great or Well Architecture demands a lot. It’s not static situation and requires governance. It is a process that evolves as the system evolves. By using the AWS Well-Architected Framework or Microsoft Great Azure Architecture guides we are able to provide well thought and best practice architecture for our cloud customers.

Jani Iivari — Azure Lead @ Siili Azure Studio

Links:

--

--

Jani Iivari

Head of Analytics, Data & Integrations - Formerly known as Azure Solutions & Data Architect - #Azure #Cloud #AI #Cognitive #Data www.linkedin.com/in/janiiivari