How long does it take to find the root cause of a failure in your system?

Hanisha Arora
2 min readSep 19, 2023

--

Photo by Austin Distel on Unsplash

Five mins? Five days? (Dare I say it) I have no idea, we sit until it’s found out. This must have happened when your production would have gone under crisis. The whole team would have come up forming a crew to fix up.

Alternate opinion:

If your answer is close to 5 minutes, it’s very likely that your production system and tests have great logging. Otherwise, seemingly unimportant things like logging, exception handling, and testing are an implementation afterthought. You really need to have a strategy for logging in to your systems and tests. Proper logging reduces the requirement for debuggers.

A question that generally hits the mind when someone asks for a strategy for monitoring and logging — What’s the need and where to start?

The solution: Well-Architected Framework

At GreyB, we follow the AWS Well-Architected Framework. When architecting solutions, five pillars are considered elements that define the agility and function of applications.

Each mentioned pillar has its own design practices. These best practices are designed by the experience of AWS Solutions Architects.

The 5 Pillars

  1. Security
    That’s one of the big concerns when talking about products. This pillar is about complete data protection while managing privileges.
  2. Performance
    It’s about how your system performs on changing load. It helps you to know if your system is being used in an efficient manner.
  3. Reliability
    It’s about checking your system’s behavior on different workloads.
  4. Cost Optimization
    Is your system running at the best business value?

If you see all these 5 pillars together will help you to find and add what your system needs to perform efficiently. To know more about these pillars, click here.

Originally published at https://haox.illued.space on September 19, 2023.

--

--