Infra-Buddy Deep Dive

Published in

The Startup

7 min readOct 20, 2020

As we built out OTX and the corresponding microservice backend, it was critical for us to automate the management of the infrastructure. As discussed in previous posts in this series the utility ‘infra-buddy’ was created to achieve the following objectives:

Completely configure our infrastructure through our source code repositories and the build system.
Provide self-service infrastructure building blocks for developers to use as needed.
Create a common monitoring system for the infrastructure which did not require opt-in or additional configuration.
Create ‘identical’ CI and Production environments
Minimize duplication of the ‘code’ used to build our environment.
Be able to test our infrastructure building blocks in isolation of their use in a production system.
Create a fully replicable process for creating our runtime environments

This post dives into the technical implementation and use of infra-buddy for those interested in taking advantage of the work we did!

At its core infra-buddy is not wholly AWS specific, but as it stands that is what it has been used for and a reasonable portion of the logic has been customized for this use. It would be reasonable to imagine the use of Azure or GCP by substituting in terraform, GCP Deployment Manager or Azure Resource Manager instead of CloudFormation. But the following discussion is largely AWS specific.

The tool is built on the fundamental architecture that an ‘application’ (see earlier post) is a set of related microservices. It is in this context that a set of microservices is built from small reusable infrastructure components (called ‘services’) defined as a CloudFormation template and a few config files. An example of such can be found here — https://github.com/AlienVault-Engineering/infra-buddy-cluster in this repo you will see the following files:

cloudformation.template — the cloudformation template. This is optimized for use with this tool with a heavy use of exports for the management of dependencies across infrastructure components
cloudformation.parameters.json — the file containing the definition for the parameters to be passed in to the cloudformation template.
defaults.json — the file that defines the default values using the extensibility of infra-buddy to provide some context aware values (such as incremental ordering for rule precedence or a supported resource configuration for AWS Fargate).
Monitors.json — the file containing the definition for the data dog monitors (read alarms) to be created for alerting on known performance conditions.
‘Config’ directory — contains any additional configuration files that may be required by the cloudformation template for advanced use cases such as EC2 instance bootstrapping. For example this is how we provide the configuration file for the local datadog agent by default on our EC2 instances.

This set of files provides not only the definition of the infrastructure component but also the documentation for use of the component. Running ‘infra-buddy validate-template’ evaluates the configuration files and the cloudformation template to ensure that it is well formed and adheres to known best practices. A repo as described above defines a ‘service-type’ for infra-buddy. The infra-buddy project has the following service-types built in (with corresponding github repos like the one referenced above:

VPC — Foundational service which establishes core network layout as well as jump servers, NAT gateways, DNS Records, and configures VPC flow logs.
Cluster — Establishes a target for containers to run in your environment — either as an ECS cluster or Fargate. Creates an application load balancer, dns records and docker registry for the subsequent deployment of container based API services
ECS Service — Creates a ECS service that acts as the default target for the loadbalancer, or registers a URL path to receive HTTPS requests, or operates in headless mode for batch processing.
Cloudfront Angular — Creates a cloudfront CDN for an angular app served out of a S3 bucket and configured to route REST API calls to the application load balancer created by the Cluster stack
RDS Aurora — Creates an RDS instance configured for multi-AZ deployment and backups
Elasticsearch — Creates an elasticsearch cluster

NOTE: It is also possible to define a custom service type for use in a single project (opposed to these globally shared services) by adjusting the commandline parameters of infra-buddy on execution.

Now with the building blocks defined we need to create the definition for an instantiation of a microservice. This is done automatically on our workflow with the integration into service-buddy, but it can also be done by running ‘infra-buddy generate-service’ which will create the service definition file and a README.MD containing the documentation for the service. An example of the service.json for a container with a REST API can be seen below:

This manifest shows a REST API service for the application ‘usma-onboarding’ while a role of ‘api’ it is not using any service modifications (more on that later!) and it has defined that in dev/ci it wants to limit the number of containers running to 1 but in production it wants that bumped up to 3 (‘DESIRED_CAPACITY’). Of course at this point you are now wondering how developers figure out these magic variables and what appropriate values would be. The service has smart defaults but, in addition, auto generated with this service definition when the generate-service command is run is the following README which contains the definitions for all known parameters as well as the default value if you choose not to override them.

Read Me that is autogenerated for the ‘default-api-service’

While most developers skip the README (as well conditioned in their experience) it provides a reasonably approachable self-service model once they have been reminded a few times.

As mentioned above there is a section of the service definition (service.json) that has a section ‘service-modifications’. After a while of building out our infrastructure, we found there were a few common resources and capabilities we needed. These are things like queues, temporary data stores (S3 buckets) or capabilities like auto scaling. The benefit of centralizing the definition of these resources allowed us to ensure consistency as well as consistent monitoring and configuration best practices (how many people knew about a sqs-redrive queue before you ran into a service disruption??). To enable these service modifications developers simply needed to add them to the service.json like below

adding common service modifications to an infra-buddy service

This could then be configured like the service with service-specific variables defined in the deployment parameter section of the service.json. An example of this is seen above where the ‘DESIRED_CAPACITY’ for production is 3 instances of the service whereas for the other environments it defaults to one. The following service modification templates are built in :

autoscale — automatically increase the number of containers running in a defined ecs-service
cluster-cpu-autoscale — automatically increase the number of EC2 instances in a cluster based on CPU load of existing cluster
Cluster-memory-autoscale — automatically increase the number of EC2 instances in a ECS cluster based on the memory load of the existing cluster
sqs-threshold — trigger the ecs-autoscale based on the number of messages in a sqs queue
sqs-queue — configure a sqs queue for use by a service (while this is trivial from the standpoint of most of the AWS SDK’s experienced users know that is a trap and prone to stability issues — this provides proper persistence, retry settings and redrive queues)
Target-connection-error — automatically increase the number of containers on a load balancer on an increase of HTTP errors produced by that load balancer

“But I also need a …”

For everything else we created a safety release. In addition to the standard deployment resources, if absolutely necessary, developers could also include cloudformation templates alongside the service.json which allowed for one off resources to be defined. We found that over time these one-off resources often got adopted into standard services or service modifications.

In the end to actually deploy and create the infrastructure an ‘artifact’ representing the current state of the service as represented in your source directory is created. This is the contents of the ‘service’ directory if you are using service-buddy to automatically create your microservices. In this directory the set of files includes the following

service/

service.json (the configuration defining the microservice as seen in the example above)
monitor.json (optional the definition for alarms in your application performance monitoring system)
aws-resources.template (optional an additional cloudformation template to be run after deployment)

These artifacts represent the state of the service configuration and can be cached alongside any other build artifacts (such as a container). To then deploy you simply run the following in the directory containing the above listed files:

This then invokes the deployment in AWS using the metadata defined in service.json to define all of the necessary configuration. The core pattern obviously at play here is keeping the CLI simple and the variable configuration stored in artifacts that provide traceability into the process.

Without a doubt there are more extension points than are detailed above, but I hope this gives you a lay of the land and you are not surprised by any files as you try to use the tool. It is something that will always be a work in progress but also something that has saved us countless hours and headache. Eager to help anyone who wants to take a shot at using it for their own projects!

Infra-Buddy Deep Dive

Written by Russell Spitler