Scaling OTX

4 min readSep 22, 2020

Scaling OTX

First, credit where credit is due. Open Threat Exchange would not have been possible without the hard work of many, many people. Jaime Blasco deserves all the credit for the vision of OTX, the drive to get it off the ground, and the continued inspiration for keeping it useful. The early days would never have been possible without Bill Smartt and there is no way it would have gotten to where it is today without the tireless work of Rusty Brooks and Eduardo de la Arada. Certainly there are more who have contributed substantially through the years, but without these four the world of sharing threat intelligence would be markedly different than it is today.

Today, OTX is quite possibly the largest security forum in the world, and certainly the largest open threat intelligence exchange. There are over 100,000 registered members of OTX, which in the days of social networks with 100’s of millions of users does not seem like a lot, but as a dedicated forum for security professionals likely represents high single digit if not double digit penetration into the security community. These members contribute more than 20 million threat indicators on a daily basis and over the last four years they have identified and shared details on more than 200k individual threats. These reports have detailed the methods, tools and infrastructures through observable indicators in the forms of ‘pulses’ within OTX. These pulses have been built upon by the community, suggesting edits, identifying indicators that lead to inaccurate detections, and associating the threats with other known reports. This has now become a part of the larger security community with members ranging across all manners of security organizations — from one person teams to multinational companies. Most remarkably, over these years OTX has played a role during the identification of emerging threats, such as the threat activity around Covid-19 and WannaCry, acting as a point of collaboration as the community worked to understand the threat. It is during these times when it is apparent how much further OTX needs to go in order to better serve the community, but it is also apparent the hole it has filled so far to date.

As we come up on six years since the launch of OTX, I have been reflecting on the path it has taken from a technical standpoint. While this journey may not be of interest to everyone, there are some key lessons that I have learned over this time that may be of use to some. While I had been involved in OTX since its inception, four months after it’s initial launch at BlackHat 2014 I took over responsibility for the engineering and architecture. At this time the team was afraid to release our monolithic app, having poor test coverage, a fragile production environment and a release process based on git-pull. A state that many of us find ourselves in regardless of the smiles we present when someone discusses the wonders of the modern “CI/CD’’ world. Six years later, the team is not much larger than when we started but the community has grown substantially and our analytics systems run at an order of magnitude higher rate then they did at the beginning. This has been done with nominal downtime (hours at worst) and we have progressed from an average time between releases measured in months (3) to one measured in hours (~8hrs or about 100 releases a month). The evolution has been holistic, from our strategy for managing our code, to how we build, to how we manage our artifacts, to how we scale under load. Over the course of this series of posts I will introduce you to the concepts we have employed as well as the tools we have built to make it easier for us to do this at scale.

Deployment & Production Environment Management — core concepts we employed to make it possible to test code at scale, minimize the risk of introducing breaking changes, and how to manage service disruptions
Micro-service management — how we approached the challenges of a sprawl of micro services and addressed the core challenges of simplified code reuse, “lost” code, managing build processes, and reducing developer overhead for the creation of new services
Service-Buddy Deep Dive — an introduction to the tool we created for micro-service management (“service-buddy”).
Creating a read-only AWS UI — insight into how we established a read-only AWS console, ensuring that all deployment environments were consistent and fully managed through artifacts checked into our VCS. In addition, discuss how we made our production environment one that could be composed out of ‘building blocks’ allowing developers to compose the services they need out of predefined service templates complete with testable monitoring (across environments!)
Infra-buddy Deep Dive — an introduction to the tool we created for managing production infrastructure (“infra-buddy”).

The time I have spent working on this and the subsequent projects powered by the tools that we have built has cemented my understanding of the future of cloud and the value of a true continuous deployment pipeline. Providing the developers with the ‘easy path’ for microservices gives you an immense advantage and the efforts spent establishing the foundation for this provides substantial payoff in terms of resilience and feature velocity. I hope these thoughts are well received and welcome any comments or questions!

Written by Russell Spitler