Answers

Swamy K.

Lead Test Automation Engineer at MediaSolv Solutions Corporation

see all my questions

what kind of misleading metrics have you observed ?

I have seen some metrics that state something like this:

1) 100000 test cases executed, 2500 failed.

When you review the 100,000 test cases you realize that there are only 40 good test cases and the others are 2500 copies of the same test case with a minor variation.

2) 100 bugs were raised in this release

When you review the 100 bugs you find, 20 are pilot errors and are INVALID, 30 are duplicates, 15 were nitpicks that developement decides WONTFIX and only 35 are real bugs out of which 20 that were originally marked as show stoppers got downgraded to trivial and pushed out to a far far out release.

Do you have other examples that you think could have been better represented to management ?

thanks,
Swam

posted March 26, 2008 in Software Development | Closed

Share This Question

Share This

Good Answers (22)

Jerry W.

Author, Teacher, Consultant

see all my answers

Best Answers in: Software Development (3), Freelancing and Contracting (1), Personnel Policies (1), Career Management (1)

This was selected as Best Answer

What a terrific question, Swam. The only problem is that there are so many possible answers. Lines of code is probably the most common one, of course, and we haven't been able to get rid of that in half a century.

A more subtle answer is all the measurements that confuse effort with results. Managers who don't understand the work tend to reward the appearance of work. Myself, I like to see a programmer or tester who is well organized and plans well, so s/he can complete a full days work in eight hours and go home and have a life. But too many managers reward those people who come in early and stay late, while appearing to be busy all day. (at least when the manager is watching) Programmers and testers who are not well organized themselves cannot do the kind of precision work that's required in software.

For lots more on measurements that make sense, see my Quality Software Management, vol. 2: First-Order Measurement.

Links:

posted March 26, 2008

Pete "NetDoc" M.

Owner and Visionary for www.ScubaBoard.com

see all my answers

Best Answers in: Using LinkedIn (8), Business Development (6), Internationalization and Localization (3), Advertising (3), Conference Venues (2), Change Management (2), Organizational Development (2), Starting Up (2), Purchasing (1), Regulation and Compliance (1), Education and Schools (1), Contracts (1), Internet Marketing (1), Writing and Editing (1), Corporate Governance (1), Labor Relations (1), Quality Management and Standards (1), Professional Networking (1), Small Business (1), E-Commerce (1), Computers and Software (1), Software Development (1)

As Samuel Clemons once opined... "There are lies, damn lies and then there are statistics!"

It reminds me of the prank we pulled in High School. We circulated a petition at school asking for the total ban of Oxydihydride! It was THE most dangerous element known to man and was responsible for killing millions of people (more than the plague) as well as causing untold property damage. We stopped after we got a couple of hundred signatures. Of course you know it better as water.

People will always spin data to prove their theories. Some do it maliciously with an intent to defraud and some are naive about the whole process.

posted March 26, 2008

Ed M.

Accomplished IT Professional with Commercial Software Development and Data Center Management Experience

see all my answers

There are 100 bugs and we are fixing them at a rate of 25 a week, so we'll be ready to release in 4 weeks. This ignores the incoming rate, which could be 25 a week also.

posted March 26, 2008

Larry H.

Senior Systems Engineer at Phi Theta Kappa Honor Society

see all my answers

Best Answers in: Web Development (3), Computer Networking (1), Information Storage (1)

Pete beat me to the Clemens quote.

Metrics are most useful when used close to their source, to adjust process.

Up the chain, they become goals rather than measurements, and people will then naturally game the system -- thus the cliche "paid by lines of code".

For example, if you are measuring closure time on bugs, you will have many little, focused bugs, with quick, possibly less reliable fixes. If you are measuring bug counts, you will have fewer bugs, but longer closure times because each bug is overloaded, or added to, to lower the count. Neither improves the process.

posted March 26, 2008

Les D.

Software Quality Assurance Lead

see all my answers

Best Answers in: Government Policy (27), Using LinkedIn (16), Energy and Development (14), Change Management (13), Education and Schools (11), Career Management (11), Economics (10), Organizational Development (10), Business Development (9), Web Development (9), Equity Markets (8), Computers and Software (8), Ethics (7), Manufacturing (6), Starting Up (6), Biotech (6), Staffing and Recruiting (5), Environmental Health (5), Internationalization and Localization (5), Personal Investing (5), Market Research and Definition (5), Product Design (5), Computer Networking (5), Health Care (4), International Law (4), Nonprofit Management (4), Quality Management and Standards (4), Engineering (4), Green Business (4), Green Products (4), Job Search (3), Writing and Editing (3), Business Analytics (3), Packaging and Labeling (3), Project Management (3), Supply Chain Management (3), Positioning (3), Professional Networking (3), Small Business (3), Software Development (3), Certification and Licenses (2), Occupational Training (2), Accounting (2), Personnel Policies (2), Public Health and Safety (2), Treaties, Agreements and Organizations (2), Intellectual Property (2), Graphic Design (2), Labor Relations (2), Planning (2), Currency Markets (2), Wealth Management (2), Professional Books and Resources (2), Business Plans (2), E-Commerce (2), Enterprise Software (2), Information Storage (2), Telecommunications (2), Wireless (2), Facilities Management (1), Regulation and Compliance (1), Travel Tools (1), Freelancing and Contracting (1), Mentoring (1), Conference Venues (1), Foreign Investment (1), Venture Capital and Private Equity (1), Financial Regulation (1), Government Services (1), Compensation and Benefits (1), Work-life Balance (1), Exporting/Importing (1), Offshoring and Outsourcing (1), Customs, Tariffs and Taxes (1), Contracts (1), Corporate Law (1), Advertising (1), Guerrilla Marketing (1), Internet Marketing (1), Mobile Marketing (1), Lead Generation (1), Search Marketing (1), Corporate Governance (1), Bond Markets (1), Derivatives Markets (1), Hedge Funds (1), Philanthropy (1), Individual Insurance (1), Personal Taxes (1), Personal Debt Management (1), Personal Real Estate (1), Branding (1), Distribution (1), Industrial Design (1), Communication and Public Speaking (1), Professional Organizations (1), Blogging (1), Information Security (1)

The first is probably not excusable although sometimes exercise and minor variations is an appropriate part of testing. They should be grouped into larger cases for load or exercise testing, but there are areas where statistical properties are important, for example network connections. :-) if 2500/100,000 items had to be retransmitted it probably is ok, if lost, well that is an interesting situation.

In the second, 20 pilot errors,are those just communications failures about what is supposed to happen, or the result of some bad design such that regular users will be making the same errors?
Depending on paths, 30 duplicates in a team effort should mostly have been handled by QA leads, it is high, but might be expected with many jr folks on staff. It might also reflect poor planning, where cases, like your example 1 overlap. My personal preference, is to report all the nit picks for two reasons, enough typos, minor layout issues, failures in keyboard shortcuts makes a product look bad and second, sometimes a minor appearing fault is the top of the iceberg. My opinion, is that "will not fix" or defer to future decisions, made by product management and/or team discussion should not be considered failures or problems of QA, in essence a good tester, "Calls them as we(sic) sees them"

I think supporting your point however, simple bug count is not enough to judge a project or module, but the break down of final resolutions gives a lot of information, and should be discussed by some combination of development, QA, Development and product management in a post work review. Pilot errors point to need for education and better cases, duplicates the need for control of cases and some checking before bugs are entered in system and a high portion of severity reduction might be a team process issue or just a QA rating process issue.

I have mostly worked in environments with pretty good consideration about metrics, the ones that have been controversial and to me appear to have less value are some time based ones like "time to fix", "time to regress" These can be part of tracking and motivation but they get confused fast when other priorities are mixed in.

posted March 26, 2008

David E.

Engineering Manager at Intel

see all my answers

I like the standard ones used by call center managers.


year 1 - Reduced time/client
year 2 - Reduced Total # of calls to call center
year 3 - Reduced time/client

What is actually happening in year 2 was that they were understaffed, so they had long hold times, so customers with easy questions either gave up or figured it out themselves by then. In years 1 and 3, they are now handling the easier issues, which naturally resolve themselves more quickly.

If you ever compare annual reviews from an IT department you will see these trends. I don't even think they are being decietful, b/c the turn over is too high. The person there in years 1 and 2 is replaced by year 3 by a guy who thinks he can do it better. When in fact, he is just striving to maximize the metrics that upper management picks in a given year.

posted March 26, 2008

Keith C.

VP, R&D at ACL

see all my answers

Metrics can be useful but also quite dangerous. I have seen examples where performance bonuses are attached to metrics and teams can quickly shift to a myopic focus on attaining bonus without really improving the product or process the metric/bonus was initially intended to incent.

If there really are one hundred thousand test cases with minor variations in your example, I wonder what the process is to keep track of and update those while the product changes over time. A metric like that would encourage me to look deeper at what was going on, and in that way actually can serve some good as it raises questions.

I think that when the focus shifts away from people, their face to face interactions and reviewing the product frequently, to metrics and dashboards, the spirit of what the team is trying to accomplish may be diminished.

posted March 26, 2008

Sastry T.

eGovernance Specialist, Entrepreneur, Open-Source Technology Evangelist

see all my answers

From recent experience, I've picked two that come to mind as "not often discussed":

1. This metric is 'implied' (i.e., count is not posted as a metric) in many Project Managers' resumes - especially of offshore projects. The number varies but the theme remains the same:
"X" Projects managed; all of them successfully.
In most cases the truth is a variation of: "project somehow completed", after client accepted to categorize & prioritize defects, agreed to leave some of them out, and the final delivery after several schedule delays. Of course he will continue to pay for it since the remaining defects will be covered under the "software change control/management" clauses - at extra cost.
2. I've been seeing this next one on rise recently - under the guise of "continuous improvement":
Number of new defects / change requests declining with time.
Closer examination reveals that if the initial quality is low enough, you can show continuous improvement for a long, really long time.

--Sas3

posted March 26, 2008

Charles C.

Software Engineer at Intel Corporation

see all my answers

Best Answers in: Software Development (2), Graphic Design (1), Computer Networking (1), Telecommunications (1), Using LinkedIn (1)

One misleading metric I have seen is the comment to code ratio.

Of course you would see comments like:

// This comment
// is spread
// over multiple
// lines, so
// I can
// pass the
// comment to code
// ratio check

I understood the "spirit" of the initiative, but as Larry Horn mentioned, the system was quickly "gamed".

posted March 27, 2008

Roland H.

Business Analyst

see all my answers

Best Answers in: Job Search (1), Government Policy (1), Computers and Software (1), Using LinkedIn (1)

I think there were a lot of good examples, but the underlying problem in my opinion is when the measurement is used to evaluate work.

People game the system IF they know they will be judged based on those metrics.
I agree with Watts S. Humphrey, who wrote that metrics are immensely helpful but only if they are not used "against" the workers, otherwise they will simply game them, writing more LOC than necessary, putting several bugs into one, etc.

If I look at the question with this in mind, then I would say every metric can be misleading.

posted March 27, 2008

Johanna R.

Consultant, Author, Speaker--Managing Product Development

see all my answers

One of my favorite misleading metrics is to look at a point in time as opposed to a trend.

"We only found one bug today!" But the testers were all at a workshop and when they return tomorrow, they'll find 50 more to make up for lost time.

"We finished another feature!" But how long did it take and how many people?

Without looking at several pieces of data, and without looking at trends, a single-dimension metric is quite misleading.

I wrote a Better Software article about this, and discuss it in Manage It! Your Guide to Modern, Pragmatic Project Management.

Links:

posted March 27, 2008

Steven S.

Advisory Technical Consultant at EMC

see all my answers

Best Answers in: Organizational Development (5), Change Management (2), Education and Schools (1), Business Development (1), Labor Relations (1), Planning (1), Project Management (1), Communication and Public Speaking (1)

Let me follow up Jerry Weinberg's comments about measurements that confuse effort with results with a few specific examples:

1) Time reporting

Last month, 1266 person hours was dedicated to the project. If it's safe enough for people to reveal what's really happening, you will hear that people reported whatever management wanted to hear. I often wonder what percentage of people fill out accurate time reports. I suspect it's a very low number.

2) How many cars are in the parking lot early in the morning or late at night

It's a quick measurement that any manager can make. And they have heard from management experts that all the fastest growing companies have this characteristic. Many managers conclude that if there are more cars in the parking lot, the faster the business will grow. They would be better off surveying the local junk yard.

3) Percent project complete

I've heard that a project is 90% complete dozens of times in my career. I admit until I learned its uselessness, I reported projects that way. Why? The remaining 10% takes longer to complete then the first 90%. But that's not the inference people would make from the complete percentage.

Others have commented that people game measurements. I agree. That certainly happens. But why does it happen? Because people are using measurements as evidence to support their story about a project. Once the measurements are used as evidence, the gaming begins and the more the gaming, the more useless the measurement.

posted March 27, 2008

Ray M.

Energy expert, educator, award winning sculptor

see all my answers

Best Answers in: Career Management (16), Ethics (16), Change Management (13), Using LinkedIn (12), Education and Schools (10), Manufacturing (10), Government Policy (7), Business Development (7), Mentoring (6), Personnel Policies (6), Economics (5), Small Business (5), Public Relations (4), Organizational Development (4), Equity Markets (4), Energy and Development (4), Accounting (3), Government Services (3), Exporting/Importing (3), Planning (3), Project Management (3), Engineering (3), Product Design (3), Commercial Real Estate (2), Facilities Management (2), Regulation and Compliance (2), Certification and Licenses (2), Job Search (2), International Law (2), Internationalization and Localization (2), Offshoring and Outsourcing (2), Treaties, Agreements and Organizations (2), Criminal Law (2), Events Marketing (2), Lead Generation (2), Business Analytics (2), Nonprofit Management (2), Personal Investing (2), Professional Networking (2), Starting Up (2), Green Business (2), Blogging (2), E-Commerce (2), Wireless (2), Purchasing (1), Business Dining and Entertainment (1), Event Marketing and Promotions (1), Budgeting (1), Corporate Debt (1), Corporate Taxes (1), Compensation and Benefits (1), Staffing and Recruiting (1), Corporate Law (1), Advertising (1), Graphic Design (1), Sales Techniques (1), Writing and Editing (1), Corporate Governance (1), Labor Relations (1), Bond Markets (1), Commodity Markets (1), Option Markets (1), Nonprofit Fundraising (1), Social Enterpreneurship (1), Inventory Management (1), Quality Management and Standards (1), Supply Chain Management (1), Industrial Design (1), Professional Organizations (1), Enterprise Software (1), Computers and Software (1), Telecommunications (1), Software Development (1)

A dramatic reduction of violence in Iraq compared to a year ago and yet some of the highest rates since the invasion....

The economy is strong, compared to bailing out Bears Stearns

etc etc

posted March 27, 2008

George D.

Owner, iDIA Computing, LLC and Computer Software Consultant and Coach

see all my answers

Best Answers in: Computers and Software (1), Software Development (1), Web Development (1)

One of my current favorites is Code Coverage of unit tests. When a certain level of coverage is mandated, test will be written that exercise the code but don't verify that it works. I've seen test suites that had no assertions at all. If the code didn't blow up, it passed.

Anytime a metric is used as a goal instead of an indicator, it is likely to be misleading.

BTW, I second the recommendation for Jerry's book, QSM vol. 2. I'm re-reading it at the moment.

posted March 27, 2008

Giovanni P.

VP Deputy Manager of Product Unit Smart Networks & Products at Italtel

see all my answers

I suggest that the misleading metric is always the metric without its own history. In other words, if you compare the result of a metric with the related result of previous metric, the approach is good, because the percentage of error is surely the same in both cases, Otherwise the result is like what you said in your two examples. Bye

posted March 27, 2008

John D.

Founder, CEO/CTO at Precog

see all my answers

Best Answers in: Software Development (7), Enterprise Software (1), Computer Networking (1), Telecommunications (1)

Metrics should never be an end goal, because their interpretation is subjective, metrics can always be gamed, and attempting to maximize one metric will negatively impact others.

How much time do your developers spend fixing bugs, versus implementing new features? This is an interesting metric to look at, but what does it tell you, really?

Suppose developers spend no time fixing bugs. What does that mean? It could mean developers don't like fixing bugs, so they work on new features instead. Or it could mean that QA isn't finding bugs, because they're too busy playing foosball. Heck, maybe it means there is no QA department and that customers have no way to contact the company.

How many bugs do your customers submit per release? Again, that's an interesting metric to look at, but its interpretation is not so easy.

Suppose customers submitted 10 times more bugs this release than the release before. Does that mean this release is 10 times buggier? It could be. But maybe your user base increased by 10 times, or maybe you made it 10 times easier for customers to submit bugs by giving them a direct interface to your bug tracker.

You can reduce bugs to zero by killing the QA department and not providing contact information to your customers. Presto, zero bugs overnight! How will that help your company? It won't.

Finding more bugs can mean QA is doing a better job. It could mean your customer base is expanding. It could mean you're making it easier for customers to submit bugs. All these things are *good* for the company.

Instead of focusing on metrics, it's better to focus on the efficiency of processes. If QA is finding bugs, QA needs to work with developers to find out how these bugs were injected into the code, and how developers can prevent this from happening again. That's a process improvement -- you don't need metrics for that.

Similarly, if customers are reporting bugs, maybe designers can work with QA to show them how customers are using the product, so they can cover more usage scenarios; and maybe developers can work with QA to provide them better tools for creating and automating tests. This, too, is a process improvement, requiring no metrics.

Metrics do have some value, especially when looking at trends, but the vast majority of time spend metric chasing would be better spent improving the end-to-end process of delivering value to customers.

Links:

posted March 27, 2008

Don P.

Software Design Engineer at Texas Instruments

see all my answers

All of them.

It's how carefully you interpret the data that matters.

posted March 27, 2008

Jim B.

Principal Consultant, Rare Bird Enterprises "Conscious Development"

see all my answers

Best Answers in: Project Management (2), Job Search (1), Lead Generation (1), Change Management (1), Career Management (1), Software Development (1)

Good lord, um, "any of them?" You've heard that there are lies, damn lies, and statistics? Well metrics are at least as bad. I think I'd rather talk about what to do about misleading metrics.

A metric is only misleading if you are misled. So, start by asking: "What does this mean?" as you must have in the examples in your question. "100 bugs" can mean, as you've noted, nearly anything. " . . . only 35 are real bugs out of which 20 . . . got downgraded . . ." are two very useful metrics. Metrics are supposed to be about the artifact - the software. Yet, in your example you've learned something very interesting about part of the development process.

Two more things to ask about any metric are: "What am I supposed to conclude from this?" and "Says who?" The most common abuse of metrics is to imply something incorrect, without having to own the lie. So ask: "What am I supposed to conclude from having 100 bugs?" and "Says who?" I *always* provide sources and interpretation for any metric I present. As the one listening, I *always* discount any metric that doesn't reference sources or doesn't have some person standing there committing their credibility to telling me that this is something I need to care about.

Using Jerry's example, for example, "effort" vs. "results" is often an attempted scam. "We're working very hard." as a substitute for all the good we're not doing. Well, I want to know. If somebody gives me an effort metric, I want to be told: " . . . and this means we are creating lots of good results." Then I want to see the results metric.

FWIW I haven't seen a "misleading" metric yet that didn't tell me more than a straight one. If they're bothering to mislead, they know where a problem is, and they've just told you where and that they know, haven't they?

posted March 29, 2008

Wayne M.

Project Manager, PMP at Avaya Government Solutions

see all my answers

Best Answers in: Education and Schools (1), Government Contracts (1), Project Management (1), Computers and Software (1), Software Development (1)

The problem is not with the metric but how the metric is evaluated and the purpose it is to be used for. Far too often, "metrics" are only cpatured because they are easy to capture. Little or no thought is given to what needs to be accomplished.

Ideally, metrics should be used for one of two purposes. One is to verify current capabilities are being met and identify future slippages. Two is to evaluate a process change to see if it brings improvement. Expecting improvement from a measure without a defined change to be implemented is foolhardy and leads to gaming the system.

First, know what you intend to accomplish. Second define metrics that help you evaluated what is accomplished.

posted March 29, 2008

Richard Z.

Consultant: QFD Red Belt, Six Sigma Master Black Belt, ToC Jonah

see all my answers

Best Answers in: Project Management (8), Quality Management and Standards (3), Software Development (3), Planning (2), Business Development (1), Organizational Development (1), Manufacturing (1), Market Research and Definition (1), Engineering (1), Product Design (1), Computers and Software (1)

I agree with Johanna that the most misleading generic metric error is to look at a few points of any metric out of context.
As Dr. Deming tirelessly pointed out, your measurements operationally define a system. To properly interpret such measurements, you need to look at them systemically. The problem is, all systems have noise.
This is the second most misleading generic metric error: to confuse noise with signals.
Take almost any meaningful metric of real world data, plot it nicely, and then ask people to interpret it. Then put the same data on a control chart. [You will need to use two: an individual (x) chart and a moving range (mR) chart because there are two changes that can occur.] Now you can see if they are "interpreting" noise, or if there really is something happening (and how big the effect is).
I cannot tell you how many times I find authors and presenters showing nice graphs, and claiming improvement -- when the simplest statistical analysis (which is what a control chart is) shows only noise.
Unfortunately, data is not enough. You must also understand, systemically and statistically, what the data is telling you.

posted March 31, 2008

Bruce D.

co-president at DLS Solutions, Inc. and Software Consultant

see all my answers

Best Answers in: Risk Management (1), Planning (1)

Metrics as part of Software Quality Metrics fall into three categories: product, process, project. With respect to software engineering quality models, the use of metrics enters at the managed level 4 of the CMM process model. Any metric should be used properly in context to verify current quality and permit predictions of future performance (projections). A key aspect of the challenges in this area is the psychology, be careful how you reward and punish, behavior will follow. Effective use of metrics will lead to a continual improvement model like Toyota.

An excellent textbook on this subject is:

"Metric and Models in Software Quality Engineering", 2nd Ed., Stephen Kan. ISBN 0201729156.

From experience, both management and the development team ultimately wants to deliver a product ontime, onbudget, one that satisfies the demands of the internal quality requirements and equally important the customer requirements and expectations. Disconnects in communications and lacking leadership can create deficiencies in this area and contribute to misleading metrics.

Some misleading metrics:

(1) Product related - raw defect metrics (any). Without normalization and without careful relationships, this can actually reduce product quality from a customer perspective and discourage the development team. Two interesting thoughts: (a) are there defects contributing to the metrics that are really due to changing/poor/new requirements? and (b) is a shotgun approach used with a "pay-per-bug" black-box test mentality contributing to an apparent high or rising defect count? Typically the defect metric will be examined in a stage where the impact of requirements have been either forgotten or impossible to address, and when there will be an emphasis to process the defects which potentially interfere with meeting external requirement (these defect are being tracked against current internal requirements assumed to relate to external requirements, but this is not guaranteed.)

(2) Process related - rate of defect detection and removal. This can be skewed by problems in (1) above, and further harm to the project by focusing on the natural consequence of a poor reward system, where a tendency to show a good reduction rate leads to grabbing "low hanging fruit" or poor quality execution of code/documentation changes for bug fixes. Also, consider the detection based metrics, if true quality was jeopardized in the requirements phase and no defect reported, the back-end rush to test quality in both destabilizes the code base and misinforms the team to the true source of the problem, this is a vicious cycle.

(2) Project related - schedule related. A common tool used today is the burn-up and burn-down charts. This is intended to show both how well development is proceeding and highlight changes in requirements. Failure to maintain the charts properly can misinform. It is vital to distinguish between (a) differences between actual and estimated and the understanding of this, (b) changing requirements. Externally, management need to plan shipping and costs, developers struggle with estimation, the result of realities in this area lead to practices such as padding estimates to "under commit and over deliver", all the time development is dealing with complexity and management does not always appreciate the impact of the complexities and must look for ways to speed up the delivery and reduce costs. A biased focus on the estimates phase and not refine/expand the quality of the requirement information may limit improvements in this area. Failure to clearly show relationships between changing requirements and end-date changes are prevalent and misinforming.

posted April 2, 2008