Coverage closure isn’t done when the number hits 100%
DESIGN VERIFICATION · ARTICLE 03 OF 06
Coverage closure isn’t done when the number
hits 100%
Your functional coverage report says 100%. Your manager wants to
tape out. You’re not sleeping well. Here’s why — and what to do about it.
Coverage closure is one of the
most misunderstood milestones in design verification. Not because the concept
is complicated — measure what you’ve tested, keep going until you’ve tested
enough — but because the number that represents it, the percentage on a
dashboard, is far easier to trust than it deserves.
I’ve seen projects tape out
with 100% functional coverage and come back from the fab with silicon bugs that
were, in retrospect, entirely predictable. Not because the simulation didn’t
run. Because the coverage model didn’t ask the right questions.
This article is about the
difference between coverage that is complete and coverage that is correct. It’s
also about the sign-off criteria that are actually worth defending in a review
— and the ones that give you a number without giving you confidence.
The number is not the answer
Functional coverage works like
this: you define a model of the scenarios you want to exercise, run
constrained-random tests until those scenarios are exercised, and report the
percentage of defined scenarios that were hit. When that number reaches 100%, you
declare closure.
The problem is embedded in the
first step. You define a model of the scenarios you want to exercise. The
coverage number tells you how much of your model you’ve covered. It says
nothing — nothing at all — about whether your model captured what actually needed
to be tested.
A coverage model is a claim about what matters. 100% closure
means you proved that claim. It does not validate the claim itself.
This is not an abstract
concern. It manifests in specific, recurring ways on real projects:
•
A model that covers every individual field value but
never crosses the fields that interact. The address field is covered. The
burst-type field is covered. The combination of maximum-length burst to a
near-boundary address — the case that triggers the wrap logic — is never hit.
•
A model that covers legal stimulus but omits the
boundary between legal and illegal. The spec says burst length must be between
1 and 256. The coverage model has bins for 1, 2–7, 8–31, 32–255, and 256.
Nobody defined what happens when the DUT receives length 0 or 257, and nobody
modeled whether the DUT’s error-handling path was ever exercised.
•
A model that was written once at the start of the
project and never updated when the spec changed. The RTL grew three new
operating modes. The coverage model still reflects the original two.
•
A model where trivially-satisfiable bins dominate. A
32-bit address field with auto-binning generates thousands of bins that random
stimulus will hit in the first hundred transactions. These bins inflate the
total bin count, making meaningful bins — the ones that require targeted
stimulus — a small fraction of the reported percentage.
In each of these cases, 100%
closure is achievable and meaningless.
Coverage completeness vs. coverage correctness
It helps to separate two
distinct properties of a coverage model:
Coverage completeness is what
the percentage measures. Given the bins you’ve defined, have all of them been
hit? This is a well-defined, automatable question with a clean answer.
Coverage correctness is the
question the percentage cannot answer. Do your bins capture the scenarios that
could reveal bugs in this design? This requires judgment, knowledge of the
spec, and understanding of the DUT’s failure modes.
Completeness is measurable. Correctness requires a human who
knows what could go wrong.
Most coverage sign-off
processes focus entirely on completeness. This is understandable — completeness
is objective, trackable, and produces a number that fits on a slide.
Correctness requires reviewing the coverage model itself, which is slower,
harder to automate, and requires the reviewer to hold the spec and the DUT
architecture in their head simultaneously.
But correctness is where the
value is. A small, correct coverage model is vastly more useful than a large,
complete one that missed the important scenarios.
What 100% won’t show you
Let’s look at the specific
classes of scenario that functional coverage consistently fails to capture,
even at 100%.
Unexercised corner cases at field interactions
Most coverage models define
coverpoints for individual fields and then — if you’re disciplined —
cross-coverage between pairs of fields. But real bugs often live at the
intersection of three or more conditions that no individual coverpoint
captures.
// This model looks thorough. It is not.
covergroup cg_write_txn;
cp_addr: coverpoint addr {
bins low = {[0:32'hFFFF]};
bins mid = {[32'h10000:32'hFFFEFFFF]};
bins high = {[32'hFFFF0000:32'hFFFFFFFF]};
}
cp_len: coverpoint len { bins short = {[1:15]}; bins long = {[16:256]}; }
cp_type: coverpoint txn_type { bins rd = {READ}; bins wr = {WRITE}; }
cx_type_len: cross cp_type, cp_len; // 4 bins: covered quickly
// Missing: the triple cross of type x len x addr
// A WRITE of length 256 to addr > 0xFFFF0000 wraps the address space.
// This combination never appears in any coverpoint or cross.
endgroup
The fix is not to add every
possible N-way cross — that produces a combinatorial explosion and millions of
bins that will never close. The fix is to read the spec and ask: which field
combinations trigger distinct behaviour in this design? Write targeted crosses
for those interactions only.
Illegal-but-reachable states
Protocol specs define legal
behaviour. They also define what the DUT should do when it receives illegal
inputs: return an error, ignore the request, flag a fault. These error-handling
paths are among the most bug-prone parts of any design — they’re written last,
tested least, and executed rarely in normal operation.
Functional coverage of legal
scenarios gives you zero visibility into whether error paths have been
exercised. You need explicit coverage of illegal stimulus — and you need your
reference model and scoreboard to correctly predict the DUT’s response to it.
// Coverage for the error path — too often absent
covergroup cg_error_handling;
// Illegal burst lengths
cp_illegal_len: coverpoint len {
bins zero_len = {0};
bins over_max = {[257:$]};
}
// Unaligned access to alignment-required addresses
cp_misalign: coverpoint addr[1:0] {
bins misaligned = {[1:3]}; // for 32-bit-aligned protocol
}
// Error response actually asserted by DUT
cp_err_resp: coverpoint dut_err_resp {
bins err_seen = {1};
}
// Cross: illegal input actually produced error response
cx_illegal_causes_err: cross cp_illegal_len, cp_err_resp;
endgroup
On illegal coverage and constraints
Covering illegal scenarios requires removing or relaxing the constraints that prevent them from being generated. Be surgical: use constraint_mode() to disable specific constraints for specific test classes, and make sure your scoreboard is prepared to handle the expected error response. An illegal transaction that the scoreboard doesn’t predict will generate false failures.
State machine completeness
FSMs are among the most
coverage-amenable constructs in RTL design. Every state is enumerable. Every
transition is explicit. And yet state machine coverage is frequently incomplete
in practice, for two reasons.
First, auto-generated state
coverage captures which states were visited, not which transitions were taken.
A 10-state FSM might visit all 10 states with a simple stimulus sequence while
leaving 15 of its 40 legal transitions completely untested.
Second, the interesting bugs in
FSMs are often in re-entry and sequence-dependent transitions: what happens
when the FSM receives an interrupt in state WAIT_ACK? What happens if IDLE is
re-entered from ERROR rather than from COMPLETE? These require explicit
transition coverage, not just state coverage.
// State coverage (weak): only checks which states were visited
cp_fsm_state: coverpoint dut.ctrl_fsm.state;
// Transition coverage (stronger): checks which arcs were taken
cp_fsm_trans: coverpoint dut.ctrl_fsm.state {
bins idle_to_active = (IDLE => ACTIVE);
bins active_to_wait = (ACTIVE => WAIT_ACK);
bins wait_to_done = (WAIT_ACK => COMPLETE);
bins wait_to_err = (WAIT_ACK => ERROR); // error path
bins err_to_idle = (ERROR => IDLE); // recovery
bins idle_to_idle = (IDLE => IDLE); // idle re-entry
// If any of these bins are empty at closure, that arc was never tested.
}
Coverage model staleness
Specs change. RTL changes.
Coverage models, once written, have a gravitational tendency to stay the same.
A coverage model written
against rev 1.2 of a spec may be substantially wrong by the time the RTL is at
rev 2.1. New operating modes may be uncovered. Deprecated scenarios may still
be modelled, consuming simulation effort and bin count without adding value.
The coverage report says 100% because the old bins close — the new behaviour is
simply not in the model.
The fix is procedural, not
technical: coverage model review should be a mandatory step whenever the spec
or RTL architecture changes in a material way. Treat the coverage model as a
living document, not an artifact you write once at project kick-off.
How to review a coverage model, not just coverage data
Most coverage reviews look at
the data: which bins closed, which ones didn’t, what’s the plan to close the
remaining ones. This is necessary but not sufficient. A genuine coverage review
also examines the model itself.
Here is the review process I
use, structured as a sequence of questions:
1. Does every covergroup have a rationale?
Each covergroup should be
traceable to a specific section of the spec or a known class of DUT behavior.
If you can’t articulate why a covergroup exists — what verification question
it’s answering — it probably shouldn’t be in the model, or it needs to be
redesigned.
// Good: rationale is embedded in the model
// Spec section 6.4.2: write transactions must handle 4KB boundary wrap.
// This covergroup verifies the wrap path is exercised.
covergroup cg_write_wrap_boundary;
...
endgroup
// Bad: no traceable purpose
covergroup cg_misc_fields; // covers... stuff?
cp_field_a: coverpoint item.field_a; // why?
endgroup
2. Are the bins hard to close?
If your first regression run
closes every bin, your model is too easy. A well-designed coverage model should
require targeted directed tests or carefully-tuned constraints to close the
last 10–20%. Bins that close immediately with random stimulus are measuring
that random stimulus was generated, which you already knew.
The target I use: after a
standard constrained-random regression, the model should be at 60–80% closure.
The remaining bins should require you to think about how to close them. If you
have to think, the bins are measuring something meaningful.
3. Are the crosses capturing real interactions?
For every cross in the model,
ask: does the RTL actually behave differently depending on this combination of
values? If the answer is no — if the DUT’s behavior for WRITE+SHORT is
identical to READ+SHORT in every relevant way — then this cross is generating
bins without generating verification value.
Crosses are expensive in two
ways: they generate large numbers of bins that require simulation effort to
close, and they can make the coverage report harder to read. Every cross should
be justified by a concrete scenario where the combination matters.
4. Is illegal behavior explicitly modeled?
Go through the spec’s error and
exception sections. For each defined illegal condition, check whether the
coverage model has an explicit coverpoint for it and whether the corresponding
DUT response is also covered. If the spec says the DUT must assert an error
flag within two cycles of receiving an illegal burst length, your model should
cover: illegal burst length generated, error flag asserted, and the latency
between them.
5. Does the model reflect the current spec revision?
Compare the covergroup list
against the spec’s table of contents. For each major feature or operating mode
in the spec, there should be corresponding coverage. Features without coverage
are features without verification evidence.
Sign-off criteria worth defending
Sign-off is not a number. It’s
a set of claims about what has been verified, backed by evidence. Here is the
sign-off criteria I’m willing to put my name on:
1.
Functional coverage ≥ 98% on a model that has been
reviewed against the current spec revision, with documented rationale for each
covergroup and explicit waiver documentation for any bins below 100%.
2.
All FSM transitions exercised, verified by transition
coverage — not state coverage alone.
3.
All error and illegal-input paths exercised, with
scoreboard confirmation that the DUT’s response matches the spec’s defined
behaviour.
4.
Code coverage ≥ 90% line and branch, with reviewed
waivers for unreachable code (dead code identified in RTL review, not assumed
by the verifier).
5.
Zero unresolved UVM_ERROR or UVM_FATAL messages in the
regression. Not “none that matter” — zero. Known-noisy errors should be
promoted to waivers with explicit justification.
6.
Coverage model change log reviewed and current. Every
spec change since the original coverage model was written is either reflected
in the model or documented as out-of-scope with justification.
Notice what’s not on this list:
“100% functional coverage.” The number matters, but only in the context of
everything else on the list. A project that hits 100% on a weak model and fails
items 2 through 6 is not ready for tape-out. A project that is at 97% on a
rigorous, spec-reviewed model with all FSM transitions covered and all error
paths exercised is probably in better shape than most.
Sign-off is a claim, not a number. The number is evidence for
the claim. Make sure you know what claim you’re making before you make it.
The conversation with your manager
At some point in every project,
there is a conversation that goes like this: the manager wants to know if
coverage is done. The number is at 100%. The honest answer is “the number is at
100%, but here’s what I’m still uncertain about.”
This is a hard conversation to
have, because managers are trained to treat 100% as a final answer. The
dashboard is green. Why are you still worried?
The way I’ve learned to frame
it: “The coverage model is complete. I’m not fully confident the model captured
everything that matters. Here are the three scenarios I want to add targeted
tests for before I’m comfortable signing off.”
This reframes the conversation
from “we’re done but you’re still worried” to “here is a specific, bounded
amount of work remaining.” It keeps the risk visible and gives the manager
something concrete to evaluate: is this additional week of testing worth the
schedule cost?
Sometimes the answer is no.
Sometimes the tape-out date is fixed and the risk is accepted. That is a
legitimate decision, made by the right people with full information. What is
not legitimate is treating 100% on a coverage dashboard as evidence that the
decision doesn’t need to be made.
A practical starting point
If you want to audit your own
coverage model today, start here. For each covergroup in your model, answer
four questions in writing:
•
What spec section or DUT behavior does this covergroup
verify?
•
Which bins in this covergroup require targeted stimulus
to close — and have I written that stimulus?
•
Does this covergroup have cross coverage for every
field interaction that triggers distinct DUT behavior?
•
Has this covergroup been reviewed since the last spec
revision?
If any covergroup can’t answer
all four questions, it needs work before you use it as a sign-off criterion.
Coverage closure is not done
when the number hits 100%. It’s done when the model is correct, the bins are
meaningful, the hard scenarios have been explicitly exercised, and the sign-off
criteria reflect what was actually verified — not just what was measured.
The number is a starting point
for that conversation. Not the end of it.
Next in this series:
Article 04 —
How I debug a failing regression at 2am: my actual process
Comments
Post a Comment