Evaluating software vendors


Short story written during COVID lockdown on picking the right software vendor for software development

You are a mid-executive in BigCorp. As it happens the sales department needs a new CRM. You are tasked to procure its implementation from an external software vendor.

After some searching you find (hot) IT companies in your area. Based on the number of testimonials you narrow it to 3 candidates. Contact with the reps is quickly established and after a few meetings where you reiterate the CRM doesn’t need blockchain you get 3 similar proposals. How to pick the best vendor? How to ensure project success?

The easiest way would be to pick at random. What are the odds of the project going sideways anyway? You search for reports on software project failure rate - if the rate is very low, you just arbitrarily choose one vendor and be done with it.


Unfortunately there are no reports on the probability of software project failure. Most reports floating around (from Standish Group, KPMG, Gartner etc.) have published some statistics on the project failure/success rate but it’s not clear how to interpret the results. There is almost no information on the research methods, project evaluation criteria or rules for project categorization.

Even if the reports would be conducted in a rigorous manner it’s not clear the results have any validity. If the sample sizes are small or skewed in some direction (like sampling from client companies only) then the results can’t be generalized.

You go through one report after another, but there is just consultancy snake-oil. How can anyone quote these reports in academic papers?

It feels like a dead end, but since there are a few more hours to kill you scroll through the reports just to look busy.

When looking at one particular report you notice a chart with a shape similar to a chart you have seen in a previous report. You open another report and there it is again. There seems to be a pattern!

The probability of project failure and project size always increases in tandem. Interesting! And whatever the project failure probability is, it always grows faster than the project size. You think this is true for all software projects.

How does this apply to your case? The vendor needs to have a successfully completed project that is similar in size to the proposed CRM project!


Larger projects must put more effort into the design, requirements and testing phases (Software Engineering Economics, Barry W. Boehm).

In larger projects the overall productivity decreases while the defect rate increases (Measures for Excellence: Reliable Software On Time, Within Budget, Lawrence H. Putnam). Defects in requirements and design are the most costly (The Economics of Software Quality, Capers Jones), because they are often discovered late.

If a particular vendor is fine-tuned to smaller projects he might not have the capability to take up a (substantially) larger projects. Architecture, requirements management and integration testing could become bottlenecks.

Without prior organizational experience the vendor could be caught off-guard, lacking the manpower, process or skill to handle larger projects. The resulting inefficiency would increase the project risk.

Size could be cost, number of use-cases or duration but what does “successfully completed“ mean? You scratch your head. Maybe BigCorp has some definition of its own that could be used for the CRM project?

After searching through the corporate wiki you discover that BigCorp defines a software project as successful when:

  • it’s on schedule,
  • on budget,
  • users are satisfied.

You scribble down “project size, budget + schedule”. Is that it or are there other criteria? Maybe you could add something that has to do with “users are satisfied“? You lean back in your chair, close your eyes and think what could be missing, your thoughts become blurry and random.

You are in a white room, there is no light source yet to room is bright. An elderly man in a suit stands in the center. He has the expression of someone that has been interrupted in mid-sentence.

You: Who are you? Where am I?

Man in suit: … defect potentials originally developed at IBM in the 70s, but a powerful metric …

You: Huh? Excuse me?

Man in suit: … it’s the sum of errors found in requirements errors, design errors, code errors, docum …

You: What are you talking about! Can you please tell me …

Man in suit: … and a bad fix is an attempt to fix a bug, about 7% errors are bugs in the fixes …

You: Excuse me sir, can you …

Man in suit: … Defect Discovery Efficiency is the number of defects you discover before you release the software. An important metric, but the next …

You: Hello sir, are you listening to me?

Man in suit: … important and that’s the Defect Removal Efficiency is the actual number of defects you remove before you ship to your first custom …

You: Sir! Can you hear me?

Man in suit: …the US average Defect Removal Efficiency is 85% and the best in class are over 99% …

You wave at the man in order to interrupt him.

Man in suit: … every person in this room and every company represented in this room should know about its defect potentials and Defect Removal Efficiency and if …

You turn around and search for a way to exit the room.

Man in suit: … don’t measure the Defect Removal Efficiency you are under the 85% …

You jolt in your chair woken up by the sneeze of your colleague. What a bizarre dream, almost like a convenient plot device. Is Defect Removal Efficiency a real thing or just a product of your subconscious?

Browsing a few sites reveals Defect Removal Efficiency (DRE) is a metric for measuring software quality via defects. According to Capers Jones getting to higher 90s% will lower the costs, shorter the delivery schedules and increase user satisfaction. That’s exactly what BigCorp wants!

Oh, but how to measure this DRE? If it’s difficult to measure vendors won’t provide it. Fortunately you discover it’s very simple and can be computed from historical data in the bug tracking systems.


Suppose a vendor finds 900 bugs in pre-release testing. Then they ship it to the customer and the customer finds 100 bugs in the first 3 months. We add all bugs together 900 + 100 = 1000 bugs and since the vendor found 900 out of 1000 bugs his DRE is 90%.

Although a great metric you make a note to inquire about the number of users of the delivered software as well, because DRE might be high for projects that have not made it into production or are rarely used.

You write down your findings as follows:

  • did the vendor build a project of a particular size? How many?
  • what was the budget and schedule overrun (if any) on the last project of this size?
  • what was the DRE for this project (+ does it have users)? What is the vendors overall DRE?

You feel you are on the roll and start thinking about more criteria like having experience in the problem domain, when suddenly you are interrupted by a colleague. “Hey, regarding the CRM project a vendor has already been contracted and starts work next week“.

You ask him which of the 3 vendors was chosen and on what grounds? “None“ he replies. He sees your puzzled expression and adds “yeah, it’s going to be implemented by the company where the CEOs husband is a majority shareholder, you know how these things work“. You nod, smile and start packing your things because it’s 5 PM.