# How to Choose a Survey Sample Size That’s Representative

In my work, which often involves running surveys for my clients, I often get asked – “how do we choose a survey sample size,” and “what response rate do we need to get a statistically significant sample?” While on face value these seem like a reasonable questions, they are in fact confusing two distinct questions:

- How many people (and who) do we need to survey to be confident that we have a representative sample of the target population?
- How many people do we need to be able confident we can identify real differences between our groups of interest?

Although related, these two questions are quite different – one is a question about the generalisability of a finding, and the other is a question of what is known as ‘statistical power.’

**Representative samples: **In statistics, a *sample* refers to a selection of units (the things being studied) which are drawn from a *population* – the entire collection of units. If it were possible, the ideal scenario for any researcher would be to obtain measurements for the entire population, thereby eliminating any concerns about representativeness. In reality, however, when conducting a survey it is virtually impossible to get access to a whole population (for a variety of logistical and practical reasons). Hence we use a sample to infer the characteristics of the population. A *representative sample* is, as it sounds, a sample which we are confident represents an unbiased estimation of the population.

**Statistical power: **Whenever we run a statistical test there is a chance that, if an effect is detected, it is simply a fluke. Likewise, if we don’t detect an effect, it’s possible that our test simply missed it. Statistical power refers to the likelihood of correctly identifying an effect where an effect exists.

In both of these cases, survey sample size plays an important role. It increases the generalisability of the sample for a population, and increases the likelihood of detecting an effect if it exists. Accordingly, it is possible to estimate the sample size required to maximise representativeness, and to achieve sufficient power to detect an effect.

For statistical power, for example, the survey sample size required depends on the test we are using, and our expected effect size. For representativeness, it is a function of population size, confidence interval and confidence level. When it comes to representativeness, however, although survey sample size is important, it is not in itself a sufficient indicator of generalisability – equally important is the composition of our sample.

**Selecting a representative sample**

When we are selecting a set of people from a population for a survey (i.e. selecting a sample) a natural question is – who should we select? There are numerous different ways to pick people to participate in a study, but in order to ensure the sample we select is not biased, it is extremely important that we select in an unbiased way.

As far as sampling techniques go, ‘simple random sampling,’ is the gold standard. In simple random sampling, individuals from a population are selected to be a part of a sample entirely by chance. When a population is made up of fairly well-defined characteristic, (e.g. white male men over 50 from Sydney), simply taking a random selection of the members of the population will help achieve a randomised sample. In the event where there may be different sub-populations we are interested in, however, (e.g. different business units in an organisation), sampling from a population based purely on chance may over-, or under-represent particular groups.

In this case, it is common to use a technique known as ‘stratified random sampling.’ In stratified random sampling, we take a random sample from each of the subgroups (strata) within a broader population in a way that maintains the proportions each sub-group represent in that population.

**Achieving an adequate response rate**

Once we have determined our strata and made our random selection of participants, we want to make sure that we obtain an appropriate response rate – if only 1 person from a strata of 10 people responds, for example, it is a bit of a stretch to call this ‘representative.’

As mentioned above, there are a number of ways to estimate with statistics the required survey sample size for our different strata. Once we have an idea of the number of responses we need for each of our strata, there are also a number of techniques we can use to help ensure we achieve it.

In a recent review of research into survey response rate, Steven Rogelberg and Jeffrey Stanton provide a very helpful summary of the most effective of these techniques:

Prenotify participant |
Prepare potential participants for the survey process by personally notifying them that they will be receiving a survey in the near future |

Publicise the survey |
Actively publicize the survey to respondents (e.g., posters, e-mails). Inform survey respondents about the purpose of the survey and how survey results will be used (e.g., action planning) |

Design carefully |
Consider the physical design of your survey: Is it pleasing to the eye? Easy to read? Uncluttered? Are questions evenly spaced? |

Provide incentives |
Provide upfront incentives to respondents, where appropriate. Inexpensive items such as pens, key chains, magnets, or certificates for free food/drink have been shown to increase response rates |

Manage survey length |
Use a theory-driven approach to survey design, which will help determine critical areas that should be addressed within the survey instrument as opposed to including too much content |

Use reminder notes |
Send reminder notes to potential respondents beginning 3 to 7 days after survey distribution |

Provide response |
Ensure that everyone is given the opportunity to participate in the survey opportunities process (e.g., provide paper surveys where employees do not have access to computers, schedule time off the phone for employees in call centres, have survey run for sufficient time so that vacation time does not impede response) |

Monitor survey response |
Monitor response rates so that HR generalists and/or the survey coordinators can identify departments with low response rates. Provide feedback and consider fostering friendly competition between units |

Establish survey importance |
An understanding of the importance of their opinions and participation will help increase the likelihood of survey completion |

Foster survey commitment |
When applicable, involve a wide range of employees (across many levels) in the survey development process |

Provide survey feedback |
Once the survey data are collected, survey feedback should be provided. Rather than influencing the present survey, this approach influences future survey efforts by positive use of the survey results |

**A caution on blindly focusing on survey sample size and response rate**

Importantly, even if the above techniques are followed and a high response rate is achieved across a representative sampling frame, it needs to be noted that response rate alone is not a sufficient indicator of the quality of information obtained. Consider the following scenarios highlighted by Rogelberg and Stanton:

Suppose a population of 100 is surveyed, and 90 respond. Of those 90, 45 say yes to some question; the other 45 say no. There are 10 people whose views we do not know. If these nonrespondents would have responded with a yes, the true figure for the population would be 55% yes. If they would have responded with a no, the true population rate would be 45% yes… Suppose now that a population of 100 is surveyed, and 10 respond. Of those 10, half say yes to some question; the other half say no. There are 90 people whose views we do not know. If half of these nonrespondents had responded with a yes, the true figure for the population would be 50% yes—identical to what the sample results showed.

These scenarios really highlight the importance of getting your sampling right and maximising response rates as much as possible. The results additionally raise an interesting question on the importance of non-responders (a topic for another day).

