(Note: This post is part of a series for new technical leaders that are joining a new company, or taking over a new organization or team; and while written for new CTOs, VPs of Engineering, or Software Managers, could be applied to other technical positions. This is also written largely for smaller to mid-size teams with a bend towards startups.)
When I was attending a presentation on participation in company boards one of the things that struck me the most was that when it came to risk – that information related to technology and IT were considered the greatest cause of stumbling blocks for boards (See Taming Information Technology Risk).
The top 3 were:
- Insufficient expertise at the board level
- Insufficient communication on company’s IT strategy and operations
- Lack of integrated business IT strategy picture presented by management to board
This means that as a new technology executive it is imperative that you take the time to understand the technology strategy and risk and communicate that to senior management.
I created a checklist that I used when I started at Decide to help me track and identify potential areas for risk and investigation.
Technical Risk Assessment
- Is all critical data backed up somewhere?
- And geographically backed up outside of the data center (such as in the cloud)?
- When was the last time the back ups were tested/restored?
- How long does it take to restore a system?
- Is there backup DNS in place?
- Are all critical systems have a hot standby, mirror, or fail over?
- Has fail over been tested?
- If a service were to fail or disappear what would be the outcome? Would the system be able to handle less requests, reduced performance, missing functionality, etc.?
- If everything went down do you have a “gone fishing” or maintenance page that will come up automatically?
- Is there an on-call schedule? What happens if the server goes down?
- Is there monitoring on all systems? Is there a dashboard indicating system health?
- How often does the monitoring generate alerts? (Do you need to investigate resolving or addressing the core reasons for the alerts?)
- Is there external monitoring in place for system uptime (Pingdom is a great inexpensive option)? Make sure that your monitoring is being monitored too!
- Is there source control? Where is it backed up? (This is something probably worth paying for a hosted service like github or bitbucket, since it is the core IP of the company and the cost of hosting and back ups probably nets out neutral, plus you get all the upgrades and tools for free with a paid service.)
- How are updates deployed to new servers? What tools do you use, how long does it take to make updates?
- What is the release and software update rhythm?
- What testing and automation tools exist to help with quality?
- Where does documentation for systems live? How is it organized? How often is it updated?
- What libraries and software are the products depending on?
- What licenses are being used? If using free or open source software are there any conditions surrounding commercial use?
- What dependencies exist within your system? Third party vendors or services like CMS, payment processing, and external data providers are some examples. For each of these understand the costs ongoing, what happens if there is an issue or failure (will the systems still function), and finally have there been any past problems or issues with these partnerships?
- What is the system infrastructure?
- How much does it cost to run the systems both in terms of hardware/hosting costs and system administration?
- Is foundational software up to date? What version of the programming language is being used, the operating system, and other core libraries?
- How do features and bugs get prioritized?
- How are releases and updates tested?
- How does the team create and review technical designs?
- How are tasks estimated?
- Do things ship on time?
- Is there a bug tracking system and methodology in place? Is it effective and utilized?
Going through each of these points will help you get a solid feel for the technology you are building and supporting. Hopefully it will also help you identify potential areas of risk and uncover areas for future investments and improvements.
If you have more to add definitely let me know since I am still improving and adding to this list myself!
[mountain image courtesy of the lovely and talented photographer, Vanessa Johnston]