Standards and Guidelines for Sampling
The starting point for the Panel was a section in the Telephone report titled Development of Sampling Frames and Sampling.
The following topics were considered by the Panel:
- Modification of a general "sampling procedures" standard applicable to all sampling methodologies
- Modification of the sampling standard for probability surveys
- The circumstances under which a website visitor intercept sample qualifies as a probability sample
- Non-probability surveys: There is no online equivalent of RDD (Random Digit Dialing) telephone sampling methodology for drawing probability samples from the general public, and as a result it is possible non-probability surveys may be more prevalent in the online survey environment. Several sampling topics were considered with respect to non-probability surveys:
- Statistical treatment: reporting of a margin of sampling error, and statistical significance tests of differences
- Whether or not the Advisory Panel should provide guidance on setting sample size
- Modification of a standard pertaining to non-probability survey sampling that was in the Telephone report
- A standard for justification of use of non-probability surveys
- A standard for maximizing representativeness of non-probability survey samples
- Whether or not the Advisory Panel should provide guidance on appropriate/acceptable uses of non-probability surveys
- Attempted census surveys: The Panel considered whether or not attempted census surveys should be broken out as a separate sampling methodology for purposes of specifying standards and guidelines. Key to this decision was whether or not the statistical treatment of data from attempted census surveys is different from that appropriate for probability surveys.
General Standard for Sampling Procedures
In the Telephone report, the following standard is stated in connection with the heading "Sampling Procedures": All research firms must clearly state the target group (universe) definition for the research study and then clearly state the method used to obtain a representative cross-section sample of this target group.
The Online Panel recommended a revised version of this standard which adds the following requirements: (1) there must be explicit indication of whether or not Internet non-users are part of the target population definition for a survey, and (2) the sampling method must be stated - i.e., probability, attempted census, or non-probability.
Standards: General Sampling Procedures
All research firms must:
- Clearly state the target group (universe) definition for the research study; in the case of online surveys this includes explicit identification of whether or not Internet non-users are part of the target group definition
- Clearly state the method(s) used to obtain a sample of this target group, including whether the method was a probability survey, a non-probability survey, or an attempted census
Sampling Standard for Probability Surveys
In the Telephone report, sampling standards are given for random probability sampling. The Advisory Panel recommended the adoption of these standards for online probability surveys, with wording changes to reflect online rather than telephone methodology.
Standards: Probability Surveys
The list or sample source must be clearly stated, including any of its limitations/exclusions in representing the universe for the target sample and the potential for bias.
A full description of the sample design and selection procedures will be stated including:
Sample stratification variables (if any)
Any multi-stage sampling steps taken
At each sampling stage, the method of attaining a random selection shall be explained, and any subsets of the universe that have been excluded or underrepresented shall be stated (e.g., Internet non-users)
Note: Whenever possible, an estimate of the percentage of the universe that has been excluded or underrepresented should be provided.
The number of attempted recontacts and procedure for attempted recontact should be stated
Respondent eligibility/screening criteria will be defined, including any oversampling requirements (e.g., region, gender)
Assuming that proper probability sampling procedures have been followed, the sampling error should then be stated based upon a given sample size at a given confidence level, but research firms must take care to:
Ensure that clients know that sampling error based upon a subset of the total sample will not be the same as that based on the total sample
Where possible, express sampling error in terms relevant to the specific nature of the most important or typical variables in a survey
State that there are many potential non-sampling sources of error and include reference to other possible sources of error in the study in order to not give a misleading impression of overall accuracy and precision.
Website Visitor Intercept Samples
The Advisory Panel was asked to comment on the circumstances under which a website visitor intercept sample qualifies as a probability sample.
There was a consensus among Panel members on both when a website visitor sample qualifies as a probability sample, and on some guidelines for conducting website visitor intercept surveys.
A website visitor intercept sample qualifies as a probability sample if both of the following conditions are met:
- Over the time period the fieldwork is conducted, the number of visitors to the website can be estimated, and survey invitations are given to a random sample of these visitors.
- The population is defined as visitors to the website over the time period during which the fieldwork was conducted.
The latter point is an important one to keep in mind when planning a website visitor intercept survey, and indicates the need to give careful consideration to defining the time period for the survey. For example, it may be desirable to have an extended fieldwork period in order to broaden the target population.
For website visitor intercept studies, survey results cannot be used to generalize to populations other than the one for which the sample was designed. This is because the definition of the target group directly impacts on whether the sample is to be treated as a probability sample or as a non-probability sample for analysis and reporting purposes. That is, while the intercept sampling process itself may be consistent with probability sampling, the definition of the target population also directly impacts whether or not for analysis and reporting purposes the sample is to be treated as a probability sample or as a non-probability sample. For example:
- If the survey fieldwork was done over a one month period, but the survey report defines the target population as "past-year visitors to the website", then the sample would have to be treated as a non-probability sample.
- If the survey intercepts visitors at one particular website, but the survey report defines the target population as "visitors to government websites", then the sample would have to be treated as a non-probability sample.
The above describes the criteria under which a website visitor sample qualifies as a probability sample. A website visitor intercept sample qualifies as an attempted census if in the first criterion survey invitations are given to all visitors during the fieldwork period, rather than to a random sample of visitors.
Guidelines for Conducting Website Visitor Intercept Surveys
The following are recommended best practices when conducting a website visitor intercept survey:
- Examine website visitor statistics to determine common entry points to the site. Simply placing the invite redirect on the home page may not be enough to get a good sample of visitors to the website.
- Use an appropriate methodology to maximize accessibility, e.g., a page redirect method.
- Take steps to minimize the likelihood a visitor will get invited multiple times to take the survey.
A significant challenge for doing online surveys of the public is generating a sample from which actionable and statistically sound results can be obtained. Notably, there is no online equivalent of the RDD (Random Digit Dialing) telephone sampling methodology for drawing probability samples of the public.
Access panels operated by research suppliers as well as those developed and operated by departments/agencies within the Government of Canada are significant from a public opinion research (POR) perspective, because they can potentially be used to conduct online surveys of the public. However, these panels are often considered to be based on nonprobabilistic sampling, and there are statistical limitations that result from using a non-probability sample: accuracy is problematic, no margin of sampling error can be reported, and often no significance testing of differences among sub-groups can be reported.
Considerable work is being done by the research industry to overcome these statistical limitations, and there are promising developments and results - e.g., prediction of U.S. voting outcomes (note: this is cited as an example because this is an area where the industry has published accuracy data). The accuracy achieved in published results is impressive, particularly in that it is predicting an outcome for a population that includes Internet non-users. However, further methodological advancements and empirical validation are needed before non-probability surveys can be used with the same confidence as probability surveys in terms of accuracy and precision in describing a target population.
At the present time, the results of non-probability surveys should be used with caution:
- Where the stakes are high in terms of impact on key policy, program or budget decisions, use of a probability sample in the research design is to be strongly preferred; non-probability surveys are good for exploratory research, to help in building understanding of the range and types of public opinion on a topic, and for experimental designs to compare impact of different stimuli (e.g., different ad concepts, different web designs, etc.).
- The standards and guidelines recommended below for non-probability sampling:
- Formalize the cautions in using non-probability samples, in terms of requiring consideration of certain issues and disclosure of these considerations (e.g., Justification standard, Sampling standard, Statistical Treatment standard, Assessment of Representativeness guideline)
- Encourage attention to maximizing the potential accuracy of the results (Maximizing Representativeness standard, Assessment of Representativeness guideline)
The Advisory Panel recommends the GC monitor methodological developments in online non-probability surveys, and actively participate in the evolution of this survey methodology by doing research based on existing research using its own body of POR surveys. There are grounds for optimism that the scope of appropriate uses for non-probability surveys, given certain methodological conditions are met, will expand in the future.
There was extensive consideration of various topics associated with non-probability surveys by the Online Advisory Panel. The topics considered can be grouped under two headings:
Standards and Guidelines
- Statistical treatment: margin of sampling error; statistical significance tests of differences, including reporting of differences among sub-groups
- AAPOR (American Association for Public Opinion Research) statement on why margin of sampling error should not be reported
- General sampling standard for non-probability surveys
- Justification of use of non-probability surveys
- Maximizing representativeness of non-probability surveys
- Guidance on setting sample sizes
- Guidance on appropriate/acceptable uses of non-probability surveys
Note: All of the above items are addressed in this section. The recommendations which affect other sections of the report - e.g., Proposal Documentation and Survey Documentation - have also been incorporated into those other sections.
The Panel recommends the following standards and guidelines related to non-probability surveys.
Standards for Non-probability Surveys
Justification of Use of Non-probability Surveys
- When a choice is made to use a non-probability sample, that choice must be justified, in both the research proposal and the research report. The justification should take into account the statistical limitations in reporting on data from a non-probability sample, and limitations in generalizing the results to the target group population.
Sampling for Non-probability Samples
- As for probability sampling, the list or sample source must be stated, including its limitations in representing the universe for the target sample.
- A full description of the regional, demographic or other classification variable controls used for balancing the sample to attempt to achieve representivity should be described.
- The precise quota control targets and screening criteria should also be stated, including the source of such targets (e.g., census data or other data source).
- Deviations from target achievement should be shown in the report (i.e., actual versus target).
Maximizing Representative-ness of Non-probability Samples
- To the extent survey results will be used to make statements about a population, steps must be taken to maximize the representativeness of the sample with respect to the target population, and these steps must be documented in the research proposal and in the survey report. (In this context, the word "representativeness" is being used broadly.) These steps could include, for example, a choice of sampling method that gives greater control over the characteristics and composition of the sample (e.g., access panel vs. "river-sampling"), use of demographic and other characteristics in constructing the sample, and use of weighting schemes.
- The survey report must discuss both the likely level of success in achieving a representative sample with respect to the key survey topic variables, and the limitations or uncertainties with respect to the level of representativeness achieved.
Statistical Treatment of Non-probability Samples
- There can be no statements made about margins of sampling error on population estimates when non-probability samples are used.
- The survey report must contain a statement on why no margin of sampling error is reported, based on the following template: "Respondents for this survey were selected from among those who have [volunteered to participate/registered to participate in (department/agency) online surveys]. [If weighting was done, state the following sentence on weighting:] The data have been weighted to reflect the demographic composition of (target population). Because the sample is based on those who initially self-selected for participation [in the panel], no estimates of sampling error can be calculated."
- This statement must be prominently placed in descriptions of the methodology in the survey report.
- For non-probability surveys it is not appropriate to use statistical significance tests or other formal inferential procedures for comparing sub-group results or for making population inferences about any type of statistic. The survey report cannot contain any statements about sub-group differences or other findings which imply statistical testing (e.g., the report cannot state that a difference is "significant").
Nevertheless, it is permissible to use descriptive statistics, including descriptive differences, appropriate to the types of variables and relations involved in the analyses. Any use of such descriptive statistics should clearly indicate that they are not formally generalizable to any group other than the sample studied, and there cannot be any formal statistical inferences about how the descriptive statistics for the sample represent any larger population.
The exception to the rule against reporting statistical significance tests of differences is non-probability surveys that employ an experimental design in which respondents are randomly assigned to different cells in the experimental design. In this case, it is appropriate to use and report on statistical significance tests to compare results from different cells in the design.
Guidelines for Non-probability Surveys
Assessment of Representative-ness of Non-probability Samples
- Evidence on how well the obtained sample in a non-probability survey matches the target population on known parameters should be presented where possible. For this purpose, use high quality data sources such as Statistics Canada or well-designed probability surveys done in the past.
- Contingent on resources and on survey importance, consider the following:
- Proactively building into different surveys common questions that could be used on an ongoing basis to compare results obtained using different survey methodologies - e.g., the results for a common question could be compared when it is asked in a telephone probability survey of the target group versus in an online non-probability survey of the target group.
- Use of a multi-mode method for a survey project in order to be able to, for example, allow comparison of the results of a probability survey component with the results for a non-probability survey component, or allow exploration of questionnaire mode effects in order to assess whether one mode might elicit more realistic, honest, or elaborated responses than another mode.
Statistical Treatment of Non-probability Surveys
- Consider using other means for putting descriptive statistics in context, for example:
- If similar studies have been done in the past, it may be useful to comment on how statistical values obtained in the study compare to similar studies from the past.
- For statistics such as correlations, refer to guides on what are considered to be low, medium or high values of descriptive correlational statistics.
The intent behind the "justification" standard is to ensure that the statistical limitations associated with non-probability surveys are taken into account in planning and reporting on such surveys.
That said, as noted in the introduction to this section, considerable work is being done by the research industry to overcome these statistical limitations, and there are promising developments and results. It may be that solutions to the statistical issues will be found in the future.
Standard for "Maximizing Representativeness" and Guidelines for "Assessment of Representativeness"
The word "representativeness" can be interpreted in a variety of ways, and there was some discussion of whether or not this term should be more tightly defined. However, the Panel decided the term should be used broadly for now, with an understanding that as online survey methodologies and experiences develop over time that perhaps the meaning of "maximizing representativeness" could be tightened up in the future.
With regard to the "Assessment of Representativeness" guidelines:
- The first guideline was suggested by the Panel in the context of general agreement that no margin of sampling error can be reported for non-probability surveys. As one Panelist stated, there could be demographic comparisons to census data, or comparisons to results of similar studies with similar dependent variables. This could provide some perspective on degree of "error" in population estimates.
- The second guideline was suggested by the Panel both with respect to assessing representativeness for particular surveys and with respect to developing a broader framework to explore issues with online and other survey methodologies by means of doing methodological research that makes use of existing POR studies.
With regard to the latter aspect of the recommended guideline:
- One suggestion was that surveys include attitudinal/evaluative/value variables in order to allow exploration over time of how the non-demographic component of online survey coverage might be changing. Although, it was also noted that getting agreement on these variables could be challenging, and might be more easily accomplished at the level of particular departments or agencies.
- Some Panelists were particularly supportive of use of multi-mode methods. While multi-mode designs can potentially add to study cost, it was suggested multi-mode methods can be useful not only for assessing the representativeness of a particular survey, but also as a means of creating data sets that could allow exploration of how online survey results and coverage evolve over time relative to other methodologies (telephone in particular). The latter could be helpful in the future when it may be appropriate to revise standards and guidelines for online surveys.
"Statistical Treatment" Standard and Guidelines
With regard to the first two standards pertaining to margin of sampling error:
- The Advisory Panel supports the MRIA position that research companies must "refrain from making statements about margin of error on population estimates when probability samples are not used."
- The disclosure statement pertaining to not reporting margin of sampling error is modeled after one proposed by AAPOR.
With regard to the standard on "use of statistical significance tests to determine whether differences among sub-groups exist", most members of the Panel felt that it is not appropriate to report statistical significance tests when using non-probability sampling.
One Panelist, however, had a different view on reporting the results of statistical significance tests. They felt that statistical significance tests of sub-group differences are acceptable providing it is stated the results must be interpreted with caution since the differences may not be representative of the population (i.e., the significance test results may have low external validity). This Panelist referenced the frequent use of non-probability samples in scientific research in the social sciences (e.g., convenience samples of students, consumers, etc.), and that in this research statistical significance tests of sub-group differences are often reported. The Panelist felt it reasonable that GC POR follow these common practices in social science research. The Panelist also felt that if a strong case can be made for the representativeness of the non-probability sample, this lends additional credence to reporting of statistical differences among sub-groups.
"Guidance" on Setting Sample Sizes for Non-probability Surveys
Panelists were asked what, if any, guidance should be provided with respect to setting sample sizes for non-probability surveys, given that margin of sampling error does not apply to such samples for purposes of estimating population parameters. The issue is that margin of error provides a metric for assessing possible sample sizes, and without this metric other criteria must be used to make decisions about sample size.
The Panel recommended the following guideline.
Guidelines for Setting Sample Size for Non-probability Surveys
Because nonprobabilistic samples cannot be used for population inferences, the number of cases has no effect on the precision of the population estimates generated. Nonetheless, there are factors to consider when setting the sample size for a non-probability survey, including:
- Description of sample data: The sample size should take into account the complexity of the descriptive analyses that will be reported.
- Consider not only the total sample, but also the number and incidence levels of the sub-groups within the total sample for which descriptive statistics will be reported.
- For multivariate descriptive analyses, the sample size should be sufficient to support these types of analyses.
- Maximizing sample representativeness: As part of adhering to the "maximizing representativeness" standard for non-probability samples, one needs to take into account the number and incidence levels of the various sub-groups judged important for purposes of making a credible claim for apparent representativeness.
"Guidance" on Currently Appropriate - and Inappropriate - Uses of Non-probability Surveys for GC Public Opinion Re-search
The members of the Panel fell into two camps with respect to providing guidance on appropriate or inappropriate uses of non-probability samples for
- Several Panelists essentially felt that no additional guidance should be stated, on the grounds that the various standards and guidelines the Panel is already recommending with respect to use of non-probability samples are sufficient. For reference these standards and guidelines covered the following areas:
- Maximizing representativeness
- Statistical treatment
- Justification of use of non-probability surveys
- Assessment of representativeness
- Statistical treatment
Working with these standards and guidelines, it would be up to the researchers on a particular project to draw conclusions on whether and how to use non-probability sampling for that project.
- Several Panelists felt the Panel should at least state examples of the more appropriate sorts of uses for non-probability surveys in a POR context, even if these are not stated as formal guidelines.
There was more agreement among the Panel with respect to the following points:
- While there are promising developments with respect to the accuracy that can be achieved using non-probability samples, there is not yet sufficient empirical (or theoretical) validation of either the accuracy or precision of population estimates to justify using non-probability samples interchangeably with probability samples.
For example, there was some discussion of the ability to use non-probability surveys to predict voting results in the U.S. The accuracy achieved in published results is impressive, particularly in that it is predicting an outcome for a population that includes Internet non-users. However:
- The published examples focus on success in predicting the total voting outcome, but questions remain on ability to predict outcomes for specific sub-groups. This is essentially a question about the likely accuracy of "multivariate" analyses. Such analyses are often important in POR surveys - e.g., to understand how results vary as a function of region, gender, age, etc.
- One cannot assume that ability to predict voting behaviour means there would be equal success in predicting the other types of dependent variables important in POR - e.g., awareness, satisfaction, preference, perceived importance, frequency of use, etc.
- Because of the commercial importance of successful prediction outcomes, there is reason to be concerned about a "publication bias" - i.e., unsuccessful predictions may not be publicized to the same extent as successful predictions.
- It is not always clear what sampling, weighting and methodological steps were required in order to achieve a successful prediction outcome - and indeed sometimes this is not provided in order to protect proprietary information. The problem is this makes it difficult to know what steps to take in a new survey on a new topic to achieve a similar level of success in accuracy of prediction.
- There was agreement among the Panel that non-probability samples should be used with caution, even though there was not consensus on how specifically to characterize when it is appropriate to use non-probability samples.
Among those who did attempt to characterize appropriate/inappropriate uses of non-probability surveys, suggestions included:
- Exploratory research
- Theory/perspective building research
- Use non-probability surveys in a manner similar to focus groups/qualitative research - e.g., to get ideas for what public opinions may be, but not to put much emphasis on the specific quantitative values obtained
- Use non-probability surveys to determine a direction (e.g., in policy or program design), but not to try to precisely estimate magnitudes/levels
- Get a quick read on something before validating it using a probability survey
- Experimental design, where the focus is on determining the existence of differences in response to some stimulus
- Non-probability surveys should not be used to make major program design or costing decisions unless no other alternative is available, and all possible steps are taken to put the results into some formal framework for assessing accuracy in representing the relevant population
- There was agreement among the Panel that the GC should continue to monitor methodological developments pertaining to the accuracy and precision of population estimates using non-probability samples.
This is a dynamic field, and it appears progress is being made. It may well be appropriate at some point in the not-too-distant future to broaden the range of POR studies where it would be acceptable to use non-probability sampling methodologies that meet proven design criteria.
In this regard, a few Panelists were concerned that the emphasis being put on the statistical limitations of non-probability samples might be perceived by some as implying that non-probability methodologies will forever be relegated to peripheral status in POR. They felt it important to emphasize that work is being done on how to generate output representative of a population, starting from non-probability samples - and that this work has already started to deliver promising results. Also relevant here are the difficulties that telephone probability samples face (e.g., declining/low response rates, coverage issues posed by cell phone usage), and a need to have a balanced perspective when judging what survey methodology will, in practical terms, deliver the most accurate and precise results for a given project.
- There were suggestions that the GC should use existing POR surveys to conduct methodological research with the goal of aiding in the development of best practices in the use and interpretation of different survey methodologies - particularly including (but not limited to) non-probability online surveys and telephone probability surveys.
Initiatives might include, for example, the following types of activities:
- Common benchmarking measures could be used to compare different survey methodologies, and to monitor trends in both demographic and non-demographic coverage of online methodologies relative to other methodologies.
- Post-hoc analyses could be done of the statistical properties of non-probability survey data, in order to explore the accuracy and precision of estimates. These could include, for example, resampling techniques (bootstrap, jacknife, etc.), and sensitivity tests showing how predictions from a non-probability sample change as a result of changes in sample size, weight factors, etc.
- Multi-mode research designs could be used in order to facilitate direct comparisons of different methodologies.
Multi-mode surveys are ones where different methods of questionnaire administration are used. They will often involve a combination of online and telephone methods, although there are certainly other possibilities as well (e.g., in-person, mail, fax).
Multi-mode surveys might be done for any of several reasons:
- Multi-mode surveys can be a way of incorporating online questionnaire administration into a probability sample. For example, when doing a telephone RDD probability sample of the general public, one could provide respondents a choice of telephone or online questionnaire completion.
- Multi-mode methods may be useful for increasing survey response rate, if for whatever reason some respondents are more reachable through one mode than another.
- Multi-mode surveys can be valuable for exploring the strengths, weaknesses, and comparability of different modes of questionnaire administration.
- Multi-mode surveys may be helpful in accommodating different accessibility requirements, or different respondent preferences.
- Multi-mode surveys might in some circumstances reduce total survey cost by shifting some of the interviews from a higher cost method (e.g., telephone) to a lower cost method (e.g., online).
A challenge posed by multi-mode methods is the possibility of "mode effects" on responses. Notably, the online (visual, self-administered) and telephone (auditory, interviewer-administered) modes have some quite different characteristics in terms of how the respondent experiences the survey - and these can potentially lead to answering questions differently.
The overall purpose of the standards below is to ensure consideration of potential mode effects in the research results.
Standards for Multi-mode Surveys
When a survey is conducted using multiple modes of questionnaire administration:
- The reasons for using a multi-mode rather than a single-mode method must be stated, both in the research proposal and the survey report.
- When the plan is to combine data collected via different modes in the data analyses, then steps must be taken to ensure as much comparability as possible across the different survey modes in terms of question wording and presentation of response options.
- Steps must be taken to ensure avoidance of duplicate respondents in different modes. The steps taken, and the results, must be documented.
- The survey report must discuss whether there are any data quality issues arising from combining data collected via different modes. This could include, for example, discussion of possible impacts of mode on key survey variables, the impact of any differences in response rate by mode, and non-response bias analyses by mode.
Attempted Census Surveys
In a census survey, an attempt is made to collect data from every member of a population. For example, an organization might want to do a survey of all of its employees. In this case, the population is "all of the organization's employees", and this would qualify as an attempted census survey if all employees are invited to participate in the survey.
Because all members of the population are invited to participate in the survey, rather than a randomly selected sample, there is no margin of sampling error. However, there are two other sampling-related sources of error that must be considered:
- Coverage error due to discrepancies between the sample source and the population
Using the example above: Perhaps the list of employee addresses is not completely up to date, and some new employees are missing from the sample source (under-coverage); or, perhaps some of the email addresses in the sample source are for non-employees such as contract workers (over-coverage).
- Non-response error: Ideally every member of the population will complete the survey questionnaire. However, this is unlikely to occur, resulting in the possibility of non-response error.
Because margin of sampling error does not apply to a census survey, statistical tests for differences among sub-groups that rely on estimated sampling error cannot be used.
The Panel recommends the following standards:
Standards for Attempted Census Surveys
- The list or sample source must be clearly stated, including any of its limitations/exclusions in representing the universe for the target sample and the potential for bias.
Note: Whenever possible, an estimate of the percentage of the universe that has been excluded or underrepresented should be provided.
- The number of attempted recontacts and procedure for attempted recontact should be stated.
- Do not state a margin of sampling error, as this does not apply to attempted census surveys.
Document "The Advisory Panel on Online Public Opinion Survey Quality - Final Report June 4, 2008" Navigation
- Date modified: