Andy Hedges' Blog

How To Care About Software Quality

One of the oldest project management concepts I can remember being told about is the project management triangle. Whilst the project management triangle is largely misleading and quite frankly dangerous if taken literally it does at least tell you one thing: there are trade-offs in any given endeavour. For the classic project management triangle the trade-offs are three things with quality overlaid in a clumsy way. Those things are: cost, scope and time.

The theory is:

  • If you have enough time and money you can deliver your scope
  • If you have enough time but not enough money then you’ll have to deliver less scope
  • If you don’t have enough time but enough money then you’ll also have to deliver less scope
  • If you don’t have enough time and money then you are doubly in trouble and likely won’t deliver much of anything

Of course this is a very naïve way of looking at things and won’t get you very far. It doesn’t take into account things like morale, motivation, team politics, shared purpose, force majeure, health, ethics, sickness, personal life, indecision, competitor disruption and all those things that make humans and business such wonderfully complex and interesting things.

The triangle, as I’ve mentioned, has quality as its background, meaning I suppose, you can sacrifice quality to claw back time, costs and/or scope. Think about this though, can you really? Perhaps you can in the short term, how about the long term? If my team and I produce shoddy code that’s hard to read, has scattered concerns and global knots and all those bad things, then how am I going to keep delivering to functionality, at pace and within reasonable costs? This seems obvious, however it is missed or ignored remarkably often in small and large organisations alike.

The point is that if you sacrifice quality for short-term delivery of scope or to keep costs down then it will cost you more or you’ll get less in the future. Moreover it will cost more, take longer and deliver less for every subsequent project on that system. Until you’ve improved the quality back to an acceptable level you’ll be paying for that quick or cheap delivery in opportunity cost. Worst of all at a certain quality point the quality deteriorates faster and faster - interest is accuring on that technical debt.

Technical Debt

Ward Cunningham coined the term technical debt to describe this and like real financial debt it isn’t always desirable to avoid it. Just like in business taking on some debt can be a good thing, it can allow you to grow your business faster than otherwise and pay it back before the interest mounts. The same is true of technical debt. However, just like financial debt, if you keep on taking out loans and not paying back any of your debt the interest mounts and eventually the insolvency officers move in and shuts you down.

In software terms the insolvency will manifest in a number of different ways, technical bankruptcy looks like:

  • Every time you introduce a new feature another feature breaks (sometime referred to as the whac-a-mole problem)
  • Many of your best people leaving because they cannot take pride in what they are building (usually you end up with a churn of short-term contractors before eventually that costs too much to justify)
  • A competitor out pacing you and taking your customers

What we have to do is look at the reasons for poor decision making and ensure we can avoid them. Why do organisations, decision makers ignore software quality and therefore long term growth to pursue short term goals?

Financial Cycles

Most companies run their finances monthly and then annually, there is nothing else that matters to some people than the end of period figures. If you’ve ever spoken to an accountant at the end of the month they are usually a jibbering wreck from burning the midnight oil and if you speak to them during end of year you can forget any chances of getting some sense out of them. Not that the accountants are to blame they are just the messengers.

This is because the books have to be closed, the results have to be calculated and everyone wants them to look as good as possible to investors, bosses, themselves or whoever. What this creates is a sprint for the line at the end of each period. What happens when you sprint for the line? You give it everything you’ve got, abandon all strategy and then collapse in a heap once the line is passed.

In software quality terms this is a disaster. It means that everyone burns the midnight oil, shortcuts are taken to get features in, testing slips and shortcuts are taken. When all is said and done a few targets might have been reached but actually the code is now a steaming heap of rubbish.

Hidden Cost

In order to really understand software quality you have to really understand software, this means understanding many complex things like: systems design, software design, at least a few programming languages, data design and so one, who understand these things? Not many people, that’s who, and these people are busy creating new software and not explaining the implication of poor quality software in a language business types will understand. Typical those people, as bright as they are, aren’t the best a translating software quality issues in to monetary figures.

It’s by the nature of the complexity of the problem that few people understand its implication and often not the people (trying) to count the costs.

Opportunity Cost

This is a huge hidden cost and as such is called out separately. Opportunity cost is the difference between the value of the selected approach and the best alternative. In simple terms if I select an approach that makes me £100 over one that would have made me £250 then I have an opportunity cost of £150.

It’s an interesting concept and it applies to software in the following way. Let’s for example say I want to add a cool new feature to a mobile app. It costs £1000 to write the code to do it one way (code change A) and £3000 to write the code a better way (code change B). The first reaction is to pick A. However code change A will increase support costs by £250 a month. Moreover it will make the cost of doing the next change £1000 more expensive.

Now the opportunity cost is £250 after the 5th month and growing every month. Moreover it might mean that the next change doesn’t get made because it’s too expensive or worst still it is done in a suboptimal way meaning that it too increases opportunity cost compounding the problem.

Transient Team

A stable team is key to keeping a good quality software product. If you know you are going to be looking after the software in 3 years’ or 5 years’ time you are going to have a lot more invested in making sure it works and that it is a pleasure to support than someone who’s going to be off at another company or client in 3 months’ time.

Don’t get me wrong, I’ve worked with some amazing and dedicated people who’ve came and gone from teams quickly for whatever reason, however, on the whole, the tension isn’t there to think long term.

Unengaged Team

Things get to a point where design decisions have been overridden so many times to meet short term goals that the team stop fighting it. When this happens it’s a sad thing, you have an unengaged team who can’t take pride in their work and aren’t delivering the value they are capable of.

Reward Schemes

Similar to financial cycles more often than not companies will award bonuses annually for meeting certain goals. Getting a certain number of features added or perhaps getting X million sign ups or shifting Y million licenses. This triggers the target fixation mentality, where anything and everything is done to get these things achieved no matter the debt and damage incurred to the quality of the software.

Additionally next year’s bonuses will be negotiated from the point of knowledge of the software at the end of this year. Highly debt laden system will have the bonus hunter negotiating for lower barrier goals. If they don’t get them they move on and someone else moves in and does the same anyway.

Some Solutions

Listen to the Experts

There are no easy solutions in this, however things that help are a voice for the people who understand the complexity of good software. Listen to you developers, listen to the designers and architects. You’ve hired them for their expertise so use it.

Help Them Explain

Accept that the technologist might not be able to put a financial figure on the costs, help them with that, come up with mechanism that works for both the technologists and the accountants, make balanced decisions based on the circumstances. Remember the best software solution might not always be the best decision for the company but it quite often is.

Manage Your Tech Debt Like Any Other

Have technical debt reduction as a metric that is rewarded equally or more than adding features.

Start to show technical debt on your P&L report, take it seriously because it is a very real thing that has brought down more companies than we’ll ever know.

Set Long Term Goals

Have bonuses rewarded over 3 years too, supplement annual bonuses with some longer term goals around software quality.

Stabilise Teams

Value a stable team over transient teams, do everything you can to keep the loyal people loyal and support them in their quest for long term software quality.

Summary

Software quality is incredibly important, it’s the long terms viability of any technology focused company. Without it you are entering a downward spiral to technical insolvency. Take it seriously, sure, don’t be frightened to add a little technical debt now and then, but do it consciously, record it and pay it back in a prudent manner.

Andy Hedges
[comment]

4 Symptoms of Dysfunction in Software Teams

Once you’ve been in the industry for a decade or so you start to get a sixth sense of when things aren’t right. However even when your sixth sense isn’t working here are some signals that should raise alarm bells.

1. It Dependencies

Every time you ask someone how something works or where some data is the explanation starts with ‘it depends’.

For example say I enquire how a user’s password is verified, I’d get an answer something like this:

“It depends, if the user registered before 2001 then you need to call the genesis logon system, after 2001 however we switched to active directory, unless the user was a customer of that company we bought in 2005, in which case you’ll need to call the ‘migration’ LDAP they put in place at the time which uses Netscape LDAP so you’ll need JNDI library we patched to work around some bugs. Also if the user is an administrator we keep the credentials in the staff database so you’ll need to do an ODBC look up for that.”

You get the idea, technical debt has built up, poor decisions haven’t been rectified and now there is there a laundry list of ‘it depends’ for every question.

2. Jane Doe

Here’s where we have a systems or component that only one person understands, the cynical might refer to this a mortgage driven development however it often happens because of apathy too. Unfortunately, the system might not be considered sexy to work on and therefore hasn’t been touched other than to keep it ticking over. The trouble is that every time there is a problem only Jane can fix it and she’s not much into do that work either. She’ll do the minimum to keep it ticking over.

The trouble with this is that the system is providing value to the organisation otherwise it could be turned off. It’s very likely if it’s providing value that at some point in the future it will need enhancing and nobody is going to be able to do and that — and then Jane quits.

3. Town Hero

This is the support equivalent of ‘Ask Doe’, it isn’t a case of if this system will fail but when and how dramatically… and it’s always dramatic. When it does go wrong there is only one plucky guy in support who can fix it: our local hero. Undoubtedly he’s not around when things go wrong but everyone knows he’s the only guy that can fix it.

When the inevitable catastrophic failure occurs calls are made to his landline, his mobile and DMs to his twitter account. He’s unreachable, what should we do? We’re doomed. When all seems lost a call is received. He’s been located, but he’s on a beach in Rio. There’s no way for him to connect to take a look. However he’s had an idea he’s VPNing into a server in Hamburg to create a SSH tunnel through the San Francisco DC and from there to the corporate firewall… he’s in! A few minutes later he’s diagnosed the potential problem. People are called to mission control, orders are given, procedures are ignored, changes are made. The hero calls into the bridge and explains the problem, it’s going to be tough and take some time but he can do it — nobody else has a clue what he’s talking about. In the small hours the system is restored, everyone pats each other on the back for pulling together and working as a team — also, more importantly, thank goodness for the hero, what would we do without him?

All is well in the world thanks to that guy, our hero — until next month, when the exact same thing happens again.

The trouble is these systems never get fixed, either because the team that that supports them aren’t qualified to do so, aren’t allowed to do so or because they are too busy, maybe too busy being heroes.

4. Carcinogenic Prototype

This system starts with a great idea, an idea that has a lot of potential. In order to qualify that potential a demo/pilot/prototype is created and as it turns out the potential is realised. The business become very keen to get this prototype into full production — they might just make those targets they thought they were going to miss if this works out. The engineers who built the prototype are pleased because their work is demonstrating value but reticent because it was just a prototype.

The engineers mention this to the group. The system wasn’t designed to go live, it doesn’t scale easily, it doesn’t have good exception handling, it doesn’t do any logging, it can’t be monitored, the list goes on. The engineers are overruled but promised to be given time to fix it up later.

Later never comes because once the system is live feature enhancements come pouring in, the system grows rapidly and sometimes starts to affect other systems too. Unfortunately due to the quality of the system it’s very hard to add features in a clean way and so the technical debt grows, it compounds.

Solution

As with most things in software engineering the technical problems are symptoms of the organisational causes. For each of the symptoms listed above there are numerous organisational reasons for them occurring, I’m going to start to blog each of these organisational issues over time. However at the heart of the problem is a lack of desire to change the status quo, to enforce a level of quality in the engineering and take ownership. Take ownership of your minimum viable quality and stick to it.

Andy Hedges
[comment]

SOA vs Microservices

The microservice term has been around for a while but for as long as it has been around there has been much arm waving around what it was. It continued as one of those nebulous concepts that meant anything and everything to everyone and no one. In this way it caused a little confusion but served as a token to reference something — but what? The closest you could get was was that the services were smaller than monolithic applications, by some measure that was vague. Of course, as is often the case some people were tacking fairly arbitrary stuff onto the term too. That’s where we were, microservices were smaller than just about the biggest thing anyone can ever create in software: monolithic applications. Monolithic applications are like the Graham Numbers of software architecture.

“Graham has recently established … a bound so vast that it holds the record for the largest number ever used in a serious mathematical proof.” — Martin Gardner (c.1977)

This wasn’t particularly useful but pretty much par for the course in the IT industry, buzzwords come and go and often don’t have concrete meaning attached to them, or have multiple conflicting meanings attached to them. However this is where things took an interesting turn, Martin Fowler, a well known name in the industry made an attempt to document what the terms could mean. Fowler takes a very thorough approach to documenting things and has a rather large following of people, so when he does document something the term:

  • tends to stick
  • have concrete meaning

The trouble was that a number of people to a great or lesser extent, including myself, believed the definition of microservices that Fowler came up with is remarkably similar to SOA as known and documented, with some other practises thrown in. Those other practises were debatably a little orthogonal to the topic at hand, that is, they were good practise but beside the point.

It’s well documented what SOA is and now it is well documented what microservices are. Compare them side by side and most reasonable people, I think, will come to roughly the same conclusion. So how did this happen?

Marketing and communication

SOA is unfortunately a hard concept to grasp initially, it takes a little time, then most people, if they persever have an epiphany and wonder what it was that they didn’t understand in the first place.

Trouble is I think that only those who really went to town on SOA the first time around had that epiphany, but why did some go to town on it and others not.

What happened with SOA is that the waters were muddied with a large amount of esoteric and/or complex language, take a look at the SOA Reference Model for example, it has some very good information in there but it is an intimidating read and I can imagine that not many have read it. I would absolutely forgive anyone for not reading it, it took me about 5 quite determined attempts…

If you contrast this document to Fowler’s blog post then you can see which is more easily digestable. Fowler is (literally) an expert in the written communication of technical concepts, perhaps the best .

Vendor spin

As anyone who’s been in the IT industry for any amount of time will know for any given buzzword there are roughly a billion vendors trying to capitalise on it. This may be as simple as saying it is SOA enabled or microservice compatible but it can take more subtle forms too.

One of these approaches is to create a related standard or pattern. It wasn’t long after SOA started getting discussed as a concept that WS-* appeared on the scene with a seamingly endless parade of complex specifications and reference implementations of said specification. Each specification was dutifully implemented by the big vendors so that they could become ‘service-enabled’.

Unfortunately these specifications were awful, some absurd, moreover these vendor implementations didn’t work very well with each other, indeed some of them needed standards to defined the standards. That is a web-service exposed by one vendor couldn’t be reliable called by a client from another vendor. This left a bad taste for the right thinking engineer and there was violent push to the opposite end of the interface spectrum: ReST, JSON and so forth but that’s another story and one we all learned from.

As I eluded to early it wasn’t just specifications that clouded the SOA picture. In order to sell software licenses it was key to have something big, expensive and most importantly critical to the client once implemented. There is no greater example of this than the ESB. However I’ve written about why you don’t need an ESB before, needless to say this also left a bad taste.

For the record, to my mind, neither ESB or WS-* are desireable in a modern SOA. Unfortunately, sometimes WS-* is the only way to communicate with legacy applications. The industry did learn from all this, I hope.

Time fades

Finally, time fades, engineers like new things, I know I do. I like to play with new technology, think about and discuss new ideas. I’m also forgetful, I forget some of the cool things I once knew. As the expression goes, everything old is new.

Summary

For the most part there isn’t really anything new in microservices over SOA. SOA has some baggage of ESB and WS-*, that is some people confuse that with SOA, unfortunately vendors did that. For the record some of us gave them a hard time for that at the time.

Do we need a new term? I don’t think so, I think it would be more useful to tie down what we mean by SOA, clearly remove the bad things and highlight new thinking. However by renaming I think we revise history, we lose the discussion, the ‘changelog’.

My humble suggestion would be that it is more useful to highlight the difference, what’s new and give those things names, think of these changes as microbuzzwords.

Andy Hedges
[comment]