Looking back, a year of the DevOps practise
After sailing through the turbelence since launching the site, I can finally find the time to sit back and look at the situation. A lot has been changed and the key changes are,
- The team is getting bigger
- Development is getting even more rigorous
- Higher expectation from both internal and external user on the platform
But the way we work was still pretty much the same - manual and reactive (aka ad hoc). The process was not sustainable and could not follow the pace of the project.
More than two hands can handle
Since launch, the backlog kept growing with bugs or features that we deferred to be done “in the future”. We started splitting the team into two, one focusing on bigger chunk of work which we call it “Road Map” and those that can fit into our bi-weekly release were called “Fast Train”. Both needed to go through the environment promotion upon test successful and thus, the number of deployments almost doubled.
Besides, we started to engage works with a new vendor which they needed to run their code on a new platform. Apparently, they have their own branch on our version control, too. And thus, making sure we were building from the right branch, deploying to the right environment by pure hands became a challenge that we stand no chance to win.
Communication is always the key to success
The site was gaining popularity and issues were caught by users quickly. Ensuring uptime was no longer sufficient as we need to increase the SLA. We had collated procedures to deal with different scenario and many of them still require data generated from manual investigation, e.g. check the system performance, read the log, etc.
At the beginning, it’s quite panicking as we didn’t know what went wrong and usually need someone (me most of the time) to give out direction on what to look into. What’s more, there were no guideline to determine if the components were behaving normally.
And as the team grows, we can foster the separation of job duties. We were now having discreet “application” and “operation” team and that means, people talking in different linguistics need to work together.
Rhea - means “Flow” in Greek
So I started to think on how to enable us to work along our flow, like a stream of river, and I came down to three principles,
- Automation
- Communication
- Culture
To free up people from the never-ending list of tasks, and enabling them a breathing space to think, or to talk. Getting things done automatically was crucial. It helped to eliminate the dependency on individuals which could save time and reduce manual errors.
We list out things that we were doing repeatedly and deployment stand out from the crowd. Naturally, we start off to automate this process. You may have something else on your list, but the key thing of the exercise is to identify what’s stopping you.
The deployment automation is a huge success as it turned around the way we manage the environment for development. Before we have the process in place, QA team would need the help from Dev to prepare the build, the Ops to do the deployment but now, they can do both by themselves.
Next, is making communication more efficient. Communication is about the way we express, as well as the “language” we use. To be precise, it’s about enabling people in different team to interpret the vocabulary. For instance, when we were told the site is slow, we usually need to go through rounds and rounds of discussion to understand what is it meant by "slow" - is it the response time or is it about the time it takes in receiving a system email? Besides, it also takes several rounds of investigation in determining who is responsible.
With the information gathered from monitoring agent (please refer to “You know as much as you monitor”), we are able to break down the system into different components and built dashboard around, showing performance of individual component. Both team can then look at the figures and knowing who can help looking into the issue.
We’ve sorted the vocabulary and I was trying to promote the use of group discussion application like CampFire / HipChat. I was amazed by the Hubot integration provided by both and implemented some custom command for looking up certain information through the message client. I thought it would be adopted but turns out it’s not the right thing to do.
The team were already using Skype in the office and Whatsapp on the go. Not everyone in the team need the Hubot integration and they have already created different group on each channel. They don't have issue in talking!
The lesson I learnt, is when they are already talking, you shouldn’t change that unless something is broken. Don’t get carried away by technology, focus on the intent.
Culture is the end goal
We continue to automate more of our work, and to provide a richer set of vocabulary. From there, I am beginning to see something beautiful.
We are not only getting more time out of automation, and enabling the team to communicate effectively, we are bringing the team together through the process. The process enable the people to understand one another’s work, especially their difficulty and needs. Decision are now driven by data and everyone can understand the bigger picture and the relationship of their work with others. These increase the sense of autonomy and we are now extending this culture to the business team.
The journey is not going to stop right here and the last thing I want to share is the manuscript I wrote 2 years ago. And I hope this article gives you some thought on how to grow your own DevOps culture.