I get pitched AI startups almost every week. Some founders come with code, clear metrics and a prototype you can try in 10 minutes. Others arrive with buzzwords, an impressive slide deck, and a demo that somehow never quite answers the simple question: what exactly does the product do, and how will it make money?

Over the years I’ve learned to spot a few consistent warning signs – red flags that investors and partners too often overlook when the word “AI” is involved. These aren’t about hyperbole or marketing polish; they’re about substance. If you’re an investor, operator, or simply someone trying to separate genuine innovation from often-costly hype, here are the three red flags I see most frequently, and the practical checks I use to test whether an AI pitch is real.

Red flag: vague problem definition wrapped in technical jargon

I can forgive a founder who oversells their vision. I can't forgive one who can’t clearly explain the problem they're solving in plain English. AI is being framed as a solution to everything, and the result is products that solve nothing well.

What to listen for:

  • The founder who spends the first five minutes describing model architectures (transformers, diffusion, retrieval-augmented generation) without a concise explanation of customer pain.
  • Products that list use cases as "content generation," "insights," and "automation" without specifying which customer, which workflow, and what outcome improves.
  • My practical checks:

  • Ask for one customer story. If they can’t describe a real customer, the workflow, and the measurable outcome (time saved, conversion uplift, cost reduction), that’s a problem.
  • Request a walkthrough of the product solving that single use case from start to finish. If the demo is conceptual or uses synthetic data rather than real examples, probe harder.
  • When founders can’t communicate the problem in plain language, they often rely on the glamour of AI to mask weak product-market fit. That’s where deals go south.

    Red flag: proprietary claims without reproducible evidence

    “Proprietary algorithm,” “patent pending,” and “we beat GPT-4 on XYZ benchmark” are phrases I hear a lot. Sometimes they’re true. Often, they’re not. The dangerous middle ground is when founders make technical claims that sound plausible but offer no way to verify them.

    What to look for:

  • Bold performance claims without shared benchmarks, code, or reproducible experiments.
  • “We trained on proprietary data” used as both a moat and a reason not to show any evaluation artifacts.
  • My practical checks:

  • Ask for evaluation data and metrics in a reproducible form. If the model reportedly reduces error by 50%, request the test set or at least an anonymized sample and the evaluation script. Insist on clarity about dataset size, distribution and timeframe.
  • Request a short technical due diligence call with an engineer who can run targeted queries or tests. For example: ask a text model for specific, verifiable facts that would reveal hallucination tendencies, or give a labeled sample to show precision/recall.
  • Check for independent corroboration. Are there third-party benchmarks, customer references willing to share measured impact, or GitHub commits demonstrating active development?
  • Proprietary claims aren’t inherently suspicious, but they must be testable. If founders can’t or won’t let you verify, treat it as a major red flag.

    Red flag: dependence on general-purpose models with no clear differentiation

    Building on top of OpenAI, Anthropic, or Hugging Face models is sensible – those APIs are powerful and accelerate development. The problem arises when a startup’s “secret sauce” is merely how it strings together prompts or the UX layer without meaningful differentiation or defensibility.

    Things I see go wrong:

  • Startups that scale by piggybacking on a third-party LLM and have minimal proprietary data, fine-tuning, or unique workflows.
  • Businesses that ignore cost model risks: if your product requires many API calls to an LLM and you don’t own that cost or can’t pass it to customers, unit economics break fast.
  • My practical checks:

  • Ask about data: Do they own or control unique data? Is there a plan to collect and label data to fine-tune models or build custom embeddings? Are there contractual guarantees about data access and usage?
  • Study the cost model. Request a sensitivity analysis showing margin compression at higher usage. What happens if the upstream model’s pricing doubles or throttles access?
  • Probe defensibility. Is the differentiation a UX pattern competitors can copy in weeks, or does it require years of domain data, regulatory approvals or integrations with customer systems?
  • How I verify quickly — practical checklist I use during diligence

    CheckWhy it mattersQuick pass/fail test
    Customer story + measurable outcomeConfirms product-market fit and real impactFounder gives one concrete customer example and metric
    Reproducible evaluationVerifies technical claims and prevents hypeFounder shares test data or allows an engineer test
    Data ownership and cost modelDetermines defensibility and profitabilityFounder outlines data pipeline, margins under stress

    Beyond that checklist, I always seek one of three validating signs: a real paying customer, code I can review (even a small repo), or a technical founder who can answer detailed questions about failure modes and mitigation. If none of those are present, I become more cautious.

    Red flags investors often miss

    To close, here are the practical behaviors I see investors overlook time and again:

  • Focusing only on market size and not on the specific workflow the product disrupts. Big markets are great—until there’s no repeatable way to capture a meaningful share.
  • Accepting slide-deck KPIs (DAUs, “engagement”) without auditability. Numbers matter, but so does the story of how those numbers were generated and measured.
  • Underestimating operational risk from third-party providers. Relying on a single API provider for core functionality without contract protections or fallback plans is a structural risk too many ignore.
  • AI investing isn’t about being the most optimistic person in the room. It’s about asking disciplined, technical, and customer-focused questions. If you take away one practical habit from this piece, let it be this: demand testability. Ask for something you can try, measure or reproduce in a short period. If the founder pushes back, ask yourself why.