One of the things that distinguishes Ad Hoc is that we prefer our application teams to operate the things that they build in production, as a way to foster ownership in those applications as well as ensure the quality of applications we deliver.
Operating our own services requires both engineers on app teams as well as members of our product teams to be familiar with the realities of operating applications in production. To support our team in this, we recently held a two-day OpsCamp in Chicago, IL, focused on help team members learn through sessions, activities, and simulations.
Day one consisted of five sessions, focused on five critical areas of operating highly scalable applications that end users depend on for critical services from government:
- Ops Hierarchy, with Paul Smith - applying the Service Reliability Hierarchy from the excellent Google SRE Book to our work in government digital services
- Tool Time, with Chris Gansen - an overview of when you should build tools, and principles for building them, focused in learning from the Unix philosophy - small tools that do one thing (and do it well), and that can work together
- Incident Management, with James Kassemi - providing the team with a baseline idea of how to handle and prepare for incidents
- Security Architecture, with Mike Auclair - an overview of how to architect applications to maximize security, targeted at those without a security background, focusing on 4 tenets of building secure applications: Delegation (can we inherit security?), Risk Mitigation (can we reduce our exposed surface area?), Automation (can we leverage computers to make this easier?), and Preparation (how can our architecture set us up for success in the event of an incident?)
- Forensics and Security Incident Response, with Curtis Mejeur - what to do when it all goes to pot, including how to stay calm in a security incident, how to avoid red herrings, and how to make sure that you have confirmation before making definitive statements and avoid conjecture
Day two was an opportunity for the team to apply the skills they learned in a Gameday exercise, run by Chris Gansen. This exercise exposed team members to the stress of incident response, which not all team members have been involved with before, in a controlled fashion to help build steady hands for when an incident occurs in real life. For specifics regarding the scenarios in this Gameday, check out the repo!
We think it’s important to invest in our team with events like these, and I believe our team members that attended OpsCamp came out of day 2 with skills and knowledge they didn’t have going into day 1.
If this event sounds like something you’d like to attend, check out our open positions.