Survey Sampling – How and Why Polling Works

Polls and polling have been declared dead many times. At least that is what the zeitgeist seems to say. Many smart people think polls are usually wrong, and many academics from my social science background believe that low response rates coupled with systematic non-response make polling a fool’s errand.

Not so.

Traditional polling is most likely dead, but survey research is not. The New Quant Polling approach, for lack of a better term, is based on surveys and makes much more precise forecasts — yes, predictions into the future, not just snapshots — than some seem to believe.

Population Segments

The basic idea starts with segmenting the population. We sort people into buckets or drawers: men versus women, Democrats versus Republicans, and so on. We then assume that respondents within the same bucket are, for practical purposes, interchangeable.

Of course, not all women, and not all Democrats, think alike either. But the more granular we make the buckets, the more similar people within each bucket become. This is where the first major difference between traditional polling and New Quant Polling appears.

More Granularity Through Regularization

Traditional polling cannot usually afford to be very granular. It mainly re-weights survey responses to match a target population and often stops at relatively simple distinctions, such as education levels or age groups, even when raking to crossed margins. It usually does not segment the electorate into combinations like education by party by gender by race.

The reason is straightforward: the number of buckets grows exponentially with each additional variable. Once that happens, many cells become sparse or empty. A granular prediction grid usually contains far more cells than a standard survey has respondents. And with that, estimates become unstable and noisy.

New Quant Polling works differently. It models the relationship between demographic features and the outcome itself — think vote choice. That allows us to build a much richer prediction grid. Machine learning methods, when combined with regularization, make this feasible because they can pool information across buckets where traditional weighting methods treat each bucket as largely independent.

Regularization is the key. It means that buckets with little information do not drive the prediction much, while buckets with more data and clearer distinctiveness matter more. This suppresses noise and reduces variance. That is the first big win.

Why We Do Not Want Representative Surveys

This is the part many people find counterintuitive: we do not want a representative survey.

Many people, including people in the industry, immediately ask about toplines: How many Democrats were in the sample? How many African Americans? The concern is understandable. If a sample overrepresents a group with strong preferences, then naïve summaries will be biased.

But that way of thinking belongs to an older polling logic.

No survey is representative of more than one target population. And because surveys are expensive, we almost never want to describe just one population. We want to say something about many populations at once: states, congressional districts, counties, cultural regions, demographic subgroups, and combinations of all of the above.

That means the real goal is not a descriptively representative raw sample. The goal is a sample with enough people in enough buckets to estimate the relationships we care about.

So yes, we want to oversample smaller groups and undersample larger ones. We want the raw sample to be unrepresentative. We use quotas to do that, and those quotas should not simply mimic one target population. In my own work, I use the Penrose method, but the deeper point is broader than any one formula: the sample should be designed to provide information, not to look superficially representative.

That is what allows the model to learn how demographic and geographic features relate to outcomes such as vote choice.

To be fair, we also look at toplines a lot. It is a sensible way to examine whether our quota system works, so if that is the intention of those asking for toplines, fair play.

How We get Representative Predictions?

If the raw survey is not representative in the traditional sense, how do we get representative estimates?

By shifting the burden from the sample to the model.

In New Quant Polling, we estimate the probability that each bucket supports a given option — for example, the probability of voting Democratic. We then aggregate from the bucket level up to the population of interest.

To do this well, we need two things. First, we need to know the composition of the population in each geographic unit, ideally using fine-grained census or administrative data. Second, we need estimates of turnout. In my view, turnout modelling is the hardest and most important problem in survey research. This is where the wheat is separated from the chaff.

Once we have population composition and turnout estimates, we can aggregate bucket-level predictions into representative forecasts for counties, states, congressional districts, or the country as a whole.

How this Helps with Non-Response Bias

This approach also helps with non-response bias.

The standard critique is that some types of people simply do not answer surveys, and that this undermines everything. Sometimes that is true. But New Quant Polling makes that problem much less fatal because it allows us to segment the population much more finely.

It is far more believable that non-religious, low-income, non-college, urban, Gen Z white women without party registration are similar to one another across the country than it is to assume that all women, or all young people, are interchangeable. The finer the segmentation, the more plausible the assumption within each bucket becomes.

Even when overall response rates are low, with our sampling technique we still eventually reach at least some respondents from many of these groups. And once we do, the model can learn from them. Unless respondents and non-respondents within the same bucket are still systematically different in ways the model cannot capture, much of the non-response problem can be absorbed by modelling and post-stratification.