Most agencies are running ICP scoring through an LLM right now and here is what’s happening:
- The prompts look fine.
- The output looks confident.
- The potential pipeline looks healthy.
- CLOSE RATE SUCKS
If you are asking your LLM of choice to create an ICP for you and it has no rubric, no tier thresholds, & no calibration, you aren't really scoring prospects in creating a target list. What you are doing is creating a model where everybody in a vertical sort of fits. This means your team is going to fill up your pipeline with prospects that don't actually fit who you can serve best.
Why a Generic AI-Created ICP Scoring Prompt Fails
Ask any LLM to "score this prospect against our ICP" without giving it a rubric and you get something like this every time:
- A BS paragraph telling you surface level information about the target and how smart you are & how lucky they are that you are going to reach out,
- A score in the good to great range (without any distinction between good and great)
- 3 reasons why they fit that use words like “drift”, “clarity”, “compound”, and other tell-tale signs that there's no true thinking behind the score.
The output reads with authority. The reasoning - uh - well, maybe less authoritative.
There are 3 big reasons that AI-powered ICP scoring prompts produce scores that are somewhere between meaningless and useless without some serious hand-holding:
- No rubric means no consistency: Two prospects that look similar get different scores. The same prospect scored on Tuesday and Friday gets different scores. The model is averaging vibes, not measuring against a standard.
- No tier thresholds means no decision rule. If a "7" means pursue and an "8" also means pursue, and a "6" sometimes means pursue you don't have a scoring system. You've got somebody who's giving you vague guidance, not helping you focus your team's attention on the best possible fits.
- No calibration means no accuracy: A basic LLM prompt with information about the prospect and one line about you doesn't have any way to receive feedback. No matter how smart you are, your first version of your rubric will be wrong - it'll be overweighted in some areas and underweighted in others. You actually have to create a feedback mechanism that calibrates your actual sales results to the opportunity. When you build that, you actually get a model that you can trust.
Tim's Take: An LLM with a generic prompt will give you good scores for average to bad prospects and also give you good scores for the prospects who should be demanding most of your attention. you need to build a system that gets better over time that you can refine, constrain, and expand as your business evolves
The Structure of a Reliable Scoring Prompt
Every ICP scoring prompt worth its salt has four parts - if you skip any, the output creates scores that aren't worth paying attention to:
- Role: The model needs to know what kind of analyst it's playing. "You are a revenue operations leader at a B2B agency" is more useful than the default "You are a helpful assistant." The role anchors tone, gives some directional perspective, and helps the model have a little bit of skepticism.
- Rubric: A list of weighted criteria with explicit scoring guidance for each. Industry fit, size, growth trajectory, buying readiness, and delivery fit are the five that matter for most agencies. Each criteria needs a 0 to 20 (or 0 to 10 or whatever) scale and a description of what each level looks like.
- Tier thresholds: What does a 90-100 mean? What does a 60-74 mean? Tier thresholds turn a number into a decision. Without them, every prospect ends up in the gray zone and the team keeps coming back to ask "should we pursue this one?"
- Output format: Structured output - always. Have it pit out JSON if the scoring is going to be read by some other sort of AI, otherwise, markdown for people. You have to provide your LLM with a really great example of what you want. Otherwise, it's going to freestyle and ramble.
- Know about you: In in order to generate quality output, your LLM has got to have some sort of information about you, the size of your team, great clients you've had in the past, things that you do well, things that you do poorly & what kind of impact you’ve had on clients.
That's the shape. Below is the prompt.
The ICP scoring prompt (drop-in)
This is the prompt we use. It works with Claude and ChatGPT. we haven't tested it on Grok or any open source models, but I'm gonna guess that it works pretty well there, too…
You are a revenue operations leader at a B2B agency. Your job is to score
prospects against our ideal customer profile (ICP) on a 0 to 100 scale.
Use the agency context and ICP definition below to inform every score,
especially the delivery fit score.
# About our agency
- Agency name: [insert]
- Years in business: [insert, e.g., 8 years]
- Team size: [insert, e.g., 22 full-time]
- What we do (one paragraph): [insert what services we deliver, what
problems we solve, and who we typically solve them for]
- Best work we've done (top 3 case studies with outcomes): [insert]
Example: "1) Acme DTC: doubled paid social ROAS in 6 months while
scaling spend 4x. 2) Globex SaaS: rebuilt their lifecycle email and
added $1.2M ARR from existing customers in 12 months. 3) ..."
- Where we excel (specific strengths): [insert, e.g., paid social for
DTC at scale, lifecycle email for B2B SaaS, creative testing infra]
- Where we don't fit (honest limits): [insert, e.g., we don't do brand
strategy work, we can't handle pure-play SEO engagements, we don't
fit clients under $2M in revenue]
- Typical engagement size: [insert, e.g., $15K-$60K per month retainer]
- Typical engagement length: [insert, e.g., 12-month minimum, 24-month
average]
- Measurable impact we've had: [insert client outcomes in concrete
numbers, e.g., "Across the last 30 retained clients we've averaged
35% YoY revenue lift, 22% margin expansion, and 18-month retention"]
# Our ICP definition
- Industry: [insert your target verticals, e.g., DTC e-commerce $5M-$50M revenue]
- Company size: [insert range, e.g., 25-200 employees]
- Growth stage: [insert, e.g., post product-market fit, scaling]
- Buying signals: [insert, e.g., new CMO in last 12 months, recent funding,
job posts for growth roles]
- Disqualifiers: [insert, e.g., in-house performance team of 5+, agency
carousel pattern, sub-$2M revenue]
# Scoring rubric (weighted criteria)
Score each criterion on the 0-20 scale described, then sum the five scores
for the total ICP score (0-100). Use the "About our agency" section above
to inform delivery fit specifically: a prospect's problem only matches
your strongest case study if it actually looks like the work you've
described above.
## 1. Industry fit (0-20)
- 20: exact target vertical, ideal sub-segment
- 15: target vertical, adjacent sub-segment
- 10: adjacent vertical with transferable patterns
- 5: outside our verticals but has the right shape
- 0: wrong vertical, no transferable patterns
## 2. Company size (0-20)
- 20: in our revenue sweet spot, optimal team size for our delivery model
- 15: at the edges of our sweet spot
- 10: smaller or larger than ideal but workable
- 5: significantly outside ideal size
- 0: too small to afford us or too large to be our buyer
## 3. Growth trajectory (0-20)
- 20: clearly scaling, well-funded, accelerating
- 15: growing steadily, healthy fundamentals
- 10: flat or modest growth
- 5: declining or restructuring
- 0: in distress, layoffs, bankruptcy risk
## 4. Buying readiness (0-20)
- 20: active mandate, new leader, recent budget event
- 15: clear intent signals, exploring options
- 10: in problem-aware state, no urgency
- 5: not currently buying, long-term watch
- 0: no signal, no urgency, no budget visibility
## 5. Delivery fit (0-20)
Score this against the "About our agency" section above, not against a
generic agency. A 20 means we've literally done this work before with a
proven outcome. A 0 means it lives in our "Where we don't fit" list.
- 20: their problem matches one of our top case studies and our core
offer; we have proven we can deliver this exact outcome
- 15: their problem maps to our offer with minor adjustment; we've done
similar work, just not identical
- 10: their problem requires us to stretch beyond our typical delivery;
we could do it but it would not be our best work
- 5: their problem is outside our stated core competence; either skip
or refer out
- 0: their problem matches our "Where we don't fit" list explicitly
# Tier thresholds
- 90-100: A-tier, pursue aggressively, founder-level outreach
- 75-89: B-tier, pursue with senior-rep outreach, prioritize in pipeline
- 60-74: C-tier, qualify further before investing time, light-touch nurture
- 40-59: D-tier, deprioritize, monitor for changes
- 0-39: F-tier, do not pursue, document why for future reference
# Output format
Return a JSON object with this exact shape:
{
"company_name": "<string>",
"scores": {
"industry_fit": <0-20>,
"company_size": <0-20>,
"growth_trajectory": <0-20>,
"buying_readiness": <0-20>,
"delivery_fit": <0-20>
},
"total_score": <0-100>,
"tier": "A|B|C|D|F",
"reasoning": "<2-3 sentence justification for the score, citing
specific evidence and explicitly tying delivery fit to which of our
case studies the prospect's problem resembles>",
"missing_information": [
"<list any criterion where you had to guess due to lack of data>"
],
"recommended_action": "<one sentence with the next move>"
}
# Prospect to score
[Paste the prospect's company name, website, recent news, LinkedIn, and
any signals you've gathered.]
After the prompt block, add this short addition
You will replace the bracketed sections in "About our agency" and "Our ICP definition" with your own information.
The rubric structure stays the same. The tier thresholds stay the same. The output shape stays the same.
Alternative: Load agency context from a document. If you're using a tool that supports knowledge files (Claude Projects, GPT custom GPTs, Sanity Agent, or any orchestration tool with file access), you can replace the inline "About our agency" section with a single instruction: Before scoring, read the agency context document at [path or URL]. Use that document to inform every score, especially delivery fit. The document version is easier to maintain over time, because you update one source instead of every saved prompt. Replace the bracketed sections with your own ICP definition.
Variations: Persona-Level Scoring & Account-Level Scoring
The prompt above is account-level scoring. You're scoring the company. If you sell into named buyer roles, you also need persona-level scoring, which is a separate pass.
For persona scoring, swap the rubric for:
- Role fit (0-25): is this the actual buyer, an influencer, a blocker, or an end user?
- Influence (0-25): can this person move budget or sign? Or do they need to escalate?
- Pain alignment (0-25): does this person feel the pain your offer solves, or are they two layers removed from it?
- Engagement signals (0-25): have they engaged with your content, your category, or competitors? Or are they completely cold?
Tier thresholds for persona scoring run the same shape: A (90-100), B (75-89), C (60-74), D (40-59), F (0-39).
A single account can have an A-tier persona and a C-tier account score. That happens when the right person works at the wrong company. Document both scores and pursue the persona if you have a long enough sales cycle and a smart enough nurture motion.
How to Calibrate the Prompt Against Your Actual Pipeline
The prompt will be wrong on the first run (probably the 2nd & 3rd, too - but just put in the work.) Calibration is the work that makes the scores trustworthy.
Run the prompt against your last 10-30 closed-won deals in the same ICP (or as many as you have). If the average score for your won deals is 65, your tier thresholds are off by about 20 points and you need to recalibrate down. If most won deals are scoring in the 75-89 B-tier and almost none are scoring in the 90-100 A-tier, your rubric is too harsh.
Then run it against your last 10-30 closed-lost/gone dark deals in the same ICP. you might find that there is an attribute or something different about the ones you won and the ones you lost. That way you can actually improve your model right off the bat.
This is not going to take you very long, and it yields dividends forever.
When the Prompt is the Limit and When the Rubric is the Limit
Prompt failure looks like wildly different scores for the same prospect on different runs, or missing fields in the JSON output, or the model refusing to score because of "insufficient information." Prompt failures are fixable with better instruction, better examples, and constraining the output format.
Rubric failure looks like the prompt running smoothly, producing consistent scores, and the scores still not filtering out bad fits. The numbers are reliable. They're just not measuring the right things. Rubric failures require revising the criteria, often by interviewing your top sales rep and asking what they actually look for in a prospect that doesn't show up on a website.
Most agencies blame the prompt when the rubric is the problem. The prompt is the easy fix. The rubric requires honest internal work.
What to Do Next
If you're running ICP scoring through an LLM today, take this prompt, plug in your ICP definition, and run it against your last 30 deals. Compare to your historical close rate by tier. You'll either confirm your scoring system is calibrated or discover where it's wrong.
If you're not running ICP scoring through an LLM yet, this is the place to start. The setup is easy and the payoff is a sales team that stops chasing the wrong-fit accounts.