OpsDoc Template (Single)


Overview

What Is It

?

Why Do We Have It

?

Primary Contacts

?

Bug Reporting

?

Design Documents

?

Other Information

?

Build

How to build the software that makes the service. Where to download it from, where the source code repository is, steps for building and making a package or other distribution mechanisms. If it is software that you modify in any way (open source project you contribute to or a local project) include instructions for how a new developer gets started. Ideally the end result is a package that can be copied to other machines for installation.

Deploy

How to deploy the software. How to build a server from scratch: RAM/disk requirements, OS version and configuration, what packages to install, and so on. If this is automated with a configuration management tool like cfengine/puppet/chef (and it should be), then say so.

Common Tasks

Step-by-step instructions for common things like provisioning (add/change/delete), common problems and their solutions, and so on.

Pager Playbook

A list of every alert your monitoring system may generate for this service and a step-by-step “what do to when…” for each of them.

Disaster Recovery

Disaster Recovery Plans and procedure. If a service machine died how would you fail-over to the hot/cold spare?

Service Level Agreement

The (social or real) contract you make with your customers. Typically things like Uptime Goal (how many 9s), RPO (Recovery Point Objective) and RTO (Recovery Time Objective).

SLA Tags Setting
System-Severity Medium
Patching Monthly
RACI Accountable ?
RACI Responsible ?
RACI Informed ?
Reboot Yes

Contract

?

Social Contract

?

Uptime Expectation

99%, 99.9%, etc

Recovery Point Objective

The maximum amount of data – as measured by time – that can be lost after a recovery from a disaster, failure, or comparable event before data loss will exceed what is acceptable to an organization

Recovery Time Objective

The maximum acceptable time that an application, computer, network, or system can be down after an unexpected disaster, failure, or comparable event takes place

In-House Information

If this is something being developed in-house, the 8th tab would be information for the team: how to set up a development environment, how to do integration testing, how to do release engineering, and other tips that developers will need. For example one project I’m on has a page that describes the exact steps for adding a new RPC to the system.

How To Setup a Development Environment

?

Integration Testing

?

Release Engineering

?

Other Tips and Information

?