Friday, January 26, 2018

Today we continue our discussion on the AWS papers in software architecture which suggests five pillars:
- Operational Excellence for running and monitoring business critical systems.
- Security to  protect information, systems, and assets with risk assessments and mitigation strategies.
- Reliability to  recover from infrastructure or service disruptions
- Performance Efficiency to ensure efficiency in the usage of resources
- Cost Optimization  to help eliminate unneeded cost and keeps the system trimmed and lean.
The guidelines to achieve the above pillars include:
1. Infrastructure capacity should be estimated not guessed
2. Systems should be tested on production scale to eliminate surprises
3. Architectural experimentation should be made easier with automation
4. There should be flexibility to evolve architectures
5. Changes to the architecture should be driven by data
6. Plan for peak days and test at these loads to observe areas of improvement
We looked at the Operational Excellence, Reliability and security pillar and we reviewed the associated best practices.
One of the  trends in operational practice is to rely on tools that sets thresholds and raises alerts. This translates to incident response instead of active and strenuously polling. As part of the response, we search the logs. Most of these are interactive command line executions but each step may be time consuming due to the volume of the logs. One way to mitigate this is to run a sequential batch script that repeats the commands on smaller chunks of data. This however means we lose the aggregations unless we store intermediary data. Fortunately this was possible using files. However most log archive systems are read only and the files may not be read from. This also restricts parallelizing tasks using library such as celery because those require network access to message broker and the only access allowed is ssh. One way to overcome this is to scatter and gather data from multiple ssh sessions. This is easier to automate because the controller does not have to be local to the log server.

No comments:

Post a Comment