The Ops Community ⚙️

Discussion on: Taking the redesign plunge

Collapse
 
ujjavala profile image
ujjavala • Edited

Thanks @ellativity . Glad that you liked it.

To answer your question, the entire redesign and refactoring was done to have better grip and grasp over what we had developed. As mentioned in the blog, with the earlier design when we had bugs, it was becoming difficult to fix them and sometimes when we had production bugs, we spent nights to fix it. It was becoming difficult to the level that we just couldn't go on with the code. And on top of that, with the huge infrastructure, we were incurring a lot of loss since most of it was not even being used. The entire redesign took around 2-3 months (most of the time was to get approvals) and IMO it helped us a lot.

To measure it, we had used RED metrics and of course the cost gains that we had.

Surprises I reckon we didn't have many. Yes, we had bugs in the beginning but we could fix them easily as we had lesser codebase and better logs now.

Collapse
 
ellativity profile image
Ella (she/her/elle)

Thanks for responding and explaining in more detail. So, although projecting gains was important, what was more important was that the existing codebase was unmanageable full-stop? I think I understand!

most of the time was to get approvals

Ain't that the truth 90% of the time, tho!

Thread Thread
 
ujjavala profile image
ujjavala • Edited

Exactly! Since ours was a platform team, any downtime affected around 10-15 other teams (micro-services). So basically we have to stay alive so that others are alive, and this is only possible when our MTTR is minimal.

Thread Thread
 
ellativity profile image
Ella (she/her/elle)

Whatever it takes!