SourcingClassification algorithmsData productsVertical software

Sourcing at Scale: From Classification Algorithms to a 30M+ Company Universe

How proprietary classification algorithms and a multi-dimensional search engine solved the problem that every data product on the market failed to address — finding the companies that everyone else misses.

THE LANDSCAPE

Sourcing methods and their trade-offs

Outbound sourcing is something that's near (but perhaps not so very dear) to the hearts of every investor. While all of these methods have their merits, we noticed that once a market got too competitive, or an investor wanted to scale up from several deals a year, most no longer gave them a cost effective way to support their growth.

Method	Volume	% Success	Cost	Time	Competition	Scalability
Personal Connections	Low	High	Low – High	High	Low	Low
Broker Relationships	Low	Medium	High	Low	High	Medium
Events or Conferences	Low	Low	High	High	Medium	Low
Social Media	Low	Low – Med	Low	High	Low	Medium
Deal Marketplace	Medium	Medium	High	Low	High	Low
Data Products	High	Low	Medium	Medium	High	Medium
Manual Search	Medium	Low	Low	High	High	Low
Custom Lead Engine	High	Medium	Medium	Low	Low	High

THE PROBLEM

Why data-driven sourcing fails

The math looks great

Data product per year: $20K

Offshore resource per year: $10K–15K

Leads per resource (optimistic): 10K/year

With 0.01% lead-to-close: 1 deal/year = $35K per deal. 5 resources = 5 deals at $19K each.

The reality check

• All your competitors will buy the same data products

• Everyone will find the easiest 50% of companies by utilizing industry filters, simple keyword search, or AI recommendations

• Everyone contacts the same companies and increases the probability of a brokered process

• Low-cost resources mean you fill your database with junk

• Anything proprietary you find, you expose to competitors through CRM sync/integration

That doesn't mean data-driven sourcing can't be successful — it just needs to be done in a thoughtful, proprietary, and custom way.

THE DATA PROBLEM

Why existing data products fall short

As we explored various data products, we encountered the same issues with every product on the market.

Poor classification

Most data products do not differentiate between vertical software and regular software. Of companies classified as 'Software' or 'IT', only 6–7% can be considered vertical software.

Poor global coverage

Especially poor coverage of companies based outside of North America, which we attributed to the language barrier.

Limited market research

Many products only offer limited exports. Poor industry classification exacerbates the issue — you can't pull 'Healthcare Software in Canada' without cross-referencing multiple filters.

All in all, we did not have a good time with this experience — a sentiment shared by essentially every M&A employee that is on the ground and actually dealing with this exercise on a daily basis.

THE SOLUTION

Proprietary classification at scale

The foundation for every company intelligence product on the market tends to be built from some combination of three major data sources: LinkedIn company profiles, government company registries, and company website data.

We leveraged all of these sources (and many others), multiple sourcing methodologies, and a set of proprietary classification algorithms to extract relevant companies from a universe of tens of millions.

Multi-dimensional search

A multi-dimensional search algorithm with weighted scoring across keywords, descriptions, and full-text content. IDF normalization, phrase specificity bonuses, and keyword expansion.

Proprietary classification

Classification algorithms that go beyond industry codes — parsing product descriptions, feature sets, and company positioning to identify companies that generic filters miss.

Generative AI scoring

AI-powered re-ranking and validation against your specific investment mandate, considering any factor you define.

Analyst validation

Every company validated by an analyst — resulting in higher relevance than any data product on the market.

The platform does not discriminate by geography or language — almost 40% of companies are sourced from Europe, with an emerging presence in Latin America.

THE PROOF

From Kiosk to a general-purpose sourcing engine

The methodology was first proven with Kiosk — a vertical market software data platform built for investors. Notoriously tedious to identify and often misclassified by traditional data products, vertical software companies had eluded all but the most resourceful investors.

Kiosk extracted over 85,000 vertical software companies from a universe of 13M companies using proprietary classification. By 2025, the platform had expanded to cover over 200,000 verified software companies and had become a core part of the sourcing process for dozens of international investors headquartered in seven countries.

The algorithms and methodologies that powered Kiosk became the foundation for a general-purpose sourcing engine. The same classification pipeline now processes 30M+ companies and can be configured for any industry, any mandate, and any geography.

200K+

Verified software companies on the platform

Industries with precise sub-classifications

Average functional keywords per company

Countries with active institutional users

CUSTOM SOLUTIONS