Coverage closure isn’t done when the number hits 100%

 DESIGN VERIFICATION  ·  ARTICLE 03 OF 06

Coverage closure isn’t done when the number hits 100%

Your functional coverage report says 100%. Your manager wants to tape out. You’re not sleeping well. Here’s why — and what to do about it.

 

Coverage closure is one of the most misunderstood milestones in design verification. Not because the concept is complicated — measure what you’ve tested, keep going until you’ve tested enough — but because the number that represents it, the percentage on a dashboard, is far easier to trust than it deserves.

I’ve seen projects tape out with 100% functional coverage and come back from the fab with silicon bugs that were, in retrospect, entirely predictable. Not because the simulation didn’t run. Because the coverage model didn’t ask the right questions.

This article is about the difference between coverage that is complete and coverage that is correct. It’s also about the sign-off criteria that are actually worth defending in a review — and the ones that give you a number without giving you confidence.

The number is not the answer

Functional coverage works like this: you define a model of the scenarios you want to exercise, run constrained-random tests until those scenarios are exercised, and report the percentage of defined scenarios that were hit. When that number reaches 100%, you declare closure.

The problem is embedded in the first step. You define a model of the scenarios you want to exercise. The coverage number tells you how much of your model you’ve covered. It says nothing — nothing at all — about whether your model captured what actually needed to be tested.

A coverage model is a claim about what matters. 100% closure means you proved that claim. It does not validate the claim itself.

This is not an abstract concern. It manifests in specific, recurring ways on real projects:

        A model that covers every individual field value but never crosses the fields that interact. The address field is covered. The burst-type field is covered. The combination of maximum-length burst to a near-boundary address — the case that triggers the wrap logic — is never hit.

        A model that covers legal stimulus but omits the boundary between legal and illegal. The spec says burst length must be between 1 and 256. The coverage model has bins for 1, 2–7, 8–31, 32–255, and 256. Nobody defined what happens when the DUT receives length 0 or 257, and nobody modeled whether the DUT’s error-handling path was ever exercised.

        A model that was written once at the start of the project and never updated when the spec changed. The RTL grew three new operating modes. The coverage model still reflects the original two.

        A model where trivially-satisfiable bins dominate. A 32-bit address field with auto-binning generates thousands of bins that random stimulus will hit in the first hundred transactions. These bins inflate the total bin count, making meaningful bins — the ones that require targeted stimulus — a small fraction of the reported percentage.

 

In each of these cases, 100% closure is achievable and meaningless.

Coverage completeness vs. coverage correctness

It helps to separate two distinct properties of a coverage model:

Coverage completeness is what the percentage measures. Given the bins you’ve defined, have all of them been hit? This is a well-defined, automatable question with a clean answer.

Coverage correctness is the question the percentage cannot answer. Do your bins capture the scenarios that could reveal bugs in this design? This requires judgment, knowledge of the spec, and understanding of the DUT’s failure modes.

Completeness is measurable. Correctness requires a human who knows what could go wrong.

Most coverage sign-off processes focus entirely on completeness. This is understandable — completeness is objective, trackable, and produces a number that fits on a slide. Correctness requires reviewing the coverage model itself, which is slower, harder to automate, and requires the reviewer to hold the spec and the DUT architecture in their head simultaneously.

But correctness is where the value is. A small, correct coverage model is vastly more useful than a large, complete one that missed the important scenarios.

What 100% won’t show you

Let’s look at the specific classes of scenario that functional coverage consistently fails to capture, even at 100%.

Unexercised corner cases at field interactions

Most coverage models define coverpoints for individual fields and then — if you’re disciplined — cross-coverage between pairs of fields. But real bugs often live at the intersection of three or more conditions that no individual coverpoint captures.

// This model looks thorough. It is not.

covergroup cg_write_txn;  

cp_addr:   coverpoint addr {

    bins low    = {[0:32'hFFFF]};

    bins mid    = {[32'h10000:32'hFFFEFFFF]};

    bins high   = {[32'hFFFF0000:32'hFFFFFFFF]};

  }

  cp_len:    coverpoint len  { bins short = {[1:15]};  bins long = {[16:256]}; }

  cp_type:   coverpoint txn_type { bins rd = {READ}; bins wr = {WRITE}; }

  cx_type_len: cross cp_type, cp_len;  // 4 bins: covered quickly

  // Missing: the triple cross of type x len x addr

  // A WRITE of length 256 to addr > 0xFFFF0000 wraps the address space.

  // This combination never appears in any coverpoint or cross.

endgroup

 

The fix is not to add every possible N-way cross — that produces a combinatorial explosion and millions of bins that will never close. The fix is to read the spec and ask: which field combinations trigger distinct behaviour in this design? Write targeted crosses for those interactions only.

Illegal-but-reachable states

Protocol specs define legal behaviour. They also define what the DUT should do when it receives illegal inputs: return an error, ignore the request, flag a fault. These error-handling paths are among the most bug-prone parts of any design — they’re written last, tested least, and executed rarely in normal operation.

Functional coverage of legal scenarios gives you zero visibility into whether error paths have been exercised. You need explicit coverage of illegal stimulus — and you need your reference model and scoreboard to correctly predict the DUT’s response to it.

// Coverage for the error path — too often absent

covergroup cg_error_handling;

// Illegal burst lengths

cp_illegal_len: coverpoint len {

    bins zero_len   = {0};

    bins over_max   = {[257:$]};

  }

  // Unaligned access to alignment-required addresses

  cp_misalign: coverpoint addr[1:0] {

    bins misaligned = {[1:3]};  // for 32-bit-aligned protocol

  }

  // Error response actually asserted by DUT

  cp_err_resp: coverpoint dut_err_resp {

    bins err_seen = {1};

  }

  // Cross: illegal input actually produced error response

  cx_illegal_causes_err: cross cp_illegal_len, cp_err_resp;

endgroup

 

On illegal coverage and constraints

Covering illegal scenarios requires removing or relaxing the constraints that prevent them from being generated. Be surgical: use constraint_mode() to disable specific constraints for specific test classes, and make sure your scoreboard is prepared to handle the expected error response. An illegal transaction that the scoreboard doesn’t predict will generate false failures.


State machine completeness

FSMs are among the most coverage-amenable constructs in RTL design. Every state is enumerable. Every transition is explicit. And yet state machine coverage is frequently incomplete in practice, for two reasons.

First, auto-generated state coverage captures which states were visited, not which transitions were taken. A 10-state FSM might visit all 10 states with a simple stimulus sequence while leaving 15 of its 40 legal transitions completely untested.

Second, the interesting bugs in FSMs are often in re-entry and sequence-dependent transitions: what happens when the FSM receives an interrupt in state WAIT_ACK? What happens if IDLE is re-entered from ERROR rather than from COMPLETE? These require explicit transition coverage, not just state coverage.

// State coverage (weak): only checks which states were visited

cp_fsm_state: coverpoint dut.ctrl_fsm.state;

 

// Transition coverage (stronger): checks which arcs were taken

cp_fsm_trans: coverpoint dut.ctrl_fsm.state {

  bins idle_to_active  = (IDLE    => ACTIVE);

  bins active_to_wait  = (ACTIVE  => WAIT_ACK);

  bins wait_to_done    = (WAIT_ACK => COMPLETE);

  bins wait_to_err     = (WAIT_ACK => ERROR);      // error path

  bins err_to_idle     = (ERROR    => IDLE);       // recovery

  bins idle_to_idle    = (IDLE     => IDLE);       // idle re-entry

  // If any of these bins are empty at closure, that arc was never tested.

}

 

Coverage model staleness

Specs change. RTL changes. Coverage models, once written, have a gravitational tendency to stay the same.

A coverage model written against rev 1.2 of a spec may be substantially wrong by the time the RTL is at rev 2.1. New operating modes may be uncovered. Deprecated scenarios may still be modelled, consuming simulation effort and bin count without adding value. The coverage report says 100% because the old bins close — the new behaviour is simply not in the model.

The fix is procedural, not technical: coverage model review should be a mandatory step whenever the spec or RTL architecture changes in a material way. Treat the coverage model as a living document, not an artifact you write once at project kick-off.

How to review a coverage model, not just coverage data

Most coverage reviews look at the data: which bins closed, which ones didn’t, what’s the plan to close the remaining ones. This is necessary but not sufficient. A genuine coverage review also examines the model itself.

Here is the review process I use, structured as a sequence of questions:

1. Does every covergroup have a rationale?

Each covergroup should be traceable to a specific section of the spec or a known class of DUT behavior. If you can’t articulate why a covergroup exists — what verification question it’s answering — it probably shouldn’t be in the model, or it needs to be redesigned.

// Good: rationale is embedded in the model

// Spec section 6.4.2: write transactions must handle 4KB boundary wrap.

// This covergroup verifies the wrap path is exercised.

covergroup cg_write_wrap_boundary;

  ...

endgroup

 

// Bad: no traceable purpose

covergroup cg_misc_fields;  // covers... stuff?

  cp_field_a: coverpoint item.field_a;  // why?

endgroup

 

2. Are the bins hard to close?

If your first regression run closes every bin, your model is too easy. A well-designed coverage model should require targeted directed tests or carefully-tuned constraints to close the last 10–20%. Bins that close immediately with random stimulus are measuring that random stimulus was generated, which you already knew.

The target I use: after a standard constrained-random regression, the model should be at 60–80% closure. The remaining bins should require you to think about how to close them. If you have to think, the bins are measuring something meaningful.

3. Are the crosses capturing real interactions?

For every cross in the model, ask: does the RTL actually behave differently depending on this combination of values? If the answer is no — if the DUT’s behavior for WRITE+SHORT is identical to READ+SHORT in every relevant way — then this cross is generating bins without generating verification value.

Crosses are expensive in two ways: they generate large numbers of bins that require simulation effort to close, and they can make the coverage report harder to read. Every cross should be justified by a concrete scenario where the combination matters.

4. Is illegal behavior explicitly modeled?

Go through the spec’s error and exception sections. For each defined illegal condition, check whether the coverage model has an explicit coverpoint for it and whether the corresponding DUT response is also covered. If the spec says the DUT must assert an error flag within two cycles of receiving an illegal burst length, your model should cover: illegal burst length generated, error flag asserted, and the latency between them.

5. Does the model reflect the current spec revision?

Compare the covergroup list against the spec’s table of contents. For each major feature or operating mode in the spec, there should be corresponding coverage. Features without coverage are features without verification evidence.

Sign-off criteria worth defending

Sign-off is not a number. It’s a set of claims about what has been verified, backed by evidence. Here is the sign-off criteria I’m willing to put my name on:

1.     Functional coverage ≥ 98% on a model that has been reviewed against the current spec revision, with documented rationale for each covergroup and explicit waiver documentation for any bins below 100%.

2.     All FSM transitions exercised, verified by transition coverage — not state coverage alone.

3.     All error and illegal-input paths exercised, with scoreboard confirmation that the DUT’s response matches the spec’s defined behaviour.

4.     Code coverage ≥ 90% line and branch, with reviewed waivers for unreachable code (dead code identified in RTL review, not assumed by the verifier).

5.     Zero unresolved UVM_ERROR or UVM_FATAL messages in the regression. Not “none that matter” — zero. Known-noisy errors should be promoted to waivers with explicit justification.

6.     Coverage model change log reviewed and current. Every spec change since the original coverage model was written is either reflected in the model or documented as out-of-scope with justification.

 

Notice what’s not on this list: “100% functional coverage.” The number matters, but only in the context of everything else on the list. A project that hits 100% on a weak model and fails items 2 through 6 is not ready for tape-out. A project that is at 97% on a rigorous, spec-reviewed model with all FSM transitions covered and all error paths exercised is probably in better shape than most.

Sign-off is a claim, not a number. The number is evidence for the claim. Make sure you know what claim you’re making before you make it.

The conversation with your manager

At some point in every project, there is a conversation that goes like this: the manager wants to know if coverage is done. The number is at 100%. The honest answer is “the number is at 100%, but here’s what I’m still uncertain about.”

This is a hard conversation to have, because managers are trained to treat 100% as a final answer. The dashboard is green. Why are you still worried?

The way I’ve learned to frame it: “The coverage model is complete. I’m not fully confident the model captured everything that matters. Here are the three scenarios I want to add targeted tests for before I’m comfortable signing off.”

This reframes the conversation from “we’re done but you’re still worried” to “here is a specific, bounded amount of work remaining.” It keeps the risk visible and gives the manager something concrete to evaluate: is this additional week of testing worth the schedule cost?

Sometimes the answer is no. Sometimes the tape-out date is fixed and the risk is accepted. That is a legitimate decision, made by the right people with full information. What is not legitimate is treating 100% on a coverage dashboard as evidence that the decision doesn’t need to be made.

A practical starting point

If you want to audit your own coverage model today, start here. For each covergroup in your model, answer four questions in writing:

        What spec section or DUT behavior does this covergroup verify?

        Which bins in this covergroup require targeted stimulus to close — and have I written that stimulus?

        Does this covergroup have cross coverage for every field interaction that triggers distinct DUT behavior?

        Has this covergroup been reviewed since the last spec revision?

 

If any covergroup can’t answer all four questions, it needs work before you use it as a sign-off criterion.

Coverage closure is not done when the number hits 100%. It’s done when the model is correct, the bins are meaningful, the hard scenarios have been explicitly exercised, and the sign-off criteria reflect what was actually verified — not just what was measured.

The number is a starting point for that conversation. Not the end of it.

 

Next in this series:

Article 04 — How I debug a failing regression at 2am: my actual process

Comments

Popular posts from this blog

Why verification is the hardest engineering job nobody talks about

UVM anti-patterns I see in almost every new project