Surveys and Polls

M.J. Peterson

            We tend to think of polls as devices for anticipating election results or finding out about public attitudes on particular questions that have become the topic of strong political controversy, and of surveys as devices for asking about a wider range of questions that may or may not include queries about individuals’ attitudes on political questions.  Though distinct in the number and focus of questions, polls and surveys use many of the same basic research methods.  Both surveys and polls involve contacting some people, asking them questions, and using their responses to make judgments about how everyone involved (all registered voters, all “likely voters,” all retired people, all parents with school-age children, all homeless people, all nurses, etc) lives, experiences various events, or thinks about one or more public questions.  Pollsters and survey researchers both need to address two problems: finding the right people to ask and figuring out the exact questions to ask.

Identifying Respondents

            Polls usually focus on individuals; surveys can focus on many types of units.  A survey might treat individuals as the unit, but it could as easily use towns, schools, businesses, particular road intersections, or households as the unit.  Except when dealing with a relatively small number of units (the 50 state governments in the USA for example),  polls and surveys are based on interviewing some people or the people in some other sort of unit and then extrapolating from what those people (the sample) say to what the wider population believes.  Samples are usually fairly small. While the US Census Bureau asked one of every ten households to fill out a longer set of questions about housing and other material resources available to them during the 2000 census, the typical opinion poll taken before a US presidential election is based on answers from 1500-2000 people.  You may have noticed charts reporting survey results in USA Today specifying (in real small type somewhere near the bottom of the graphic) that the results reporting some trend in national public opinion are based on asking 400-500 people.

Relying on asking some units and then extrapolating to say how all units thing about or would react to something requires paying careful attention to methods of selecting the respondents (the individuals or the members of units who are asked the questions) to be sure that those selected are representative of the whole population.  Most of the time, pollsters and survey researchers want to do probability sampling, a method based on running a random sample, one in which every unit of the population has an equal chance of being included in the sample of units to be asked.   Random samples should be less vulnerable to biased selection; they definitely have the advantage of allowing use of probability theory to calculate how likely it is that the pattern of answers in the sample will match the pattern of answers they would get if they could ask the whole population.  Statisticians report this calculation as the error rate, and want it to be quite small.  A 50% error rate says that the poll or survey has only 1 in 2 odds of reflecting the true opinions of the whole population.  Pollsters will admit that an election is “too close to call” if the difference in support for competing candidates among the likely voters responding to their poll is less than, equal to, or (to err on the safe side) not much more than the error rate.

Securing a properly random set of respondents involves two distinct steps – developing a selection rule that will choose potential respondents at random and making sure the randomly-selected persons are part of the population of interest.  Random sample rules are easy to construct (though sometimes hard to carry out).

In a phone poll, researchers can dial every xth number in the residential section of the phonebook or use automated random dialers.

In face-to-face surveys they could approach every xth person who passes a particular point while the survey researchers are working.

When working from a list of potential respondents (that is, all units of the population of interest), a pollster or survey researcher can take the list, assign each unit on the list a number (1, 2, 3, etc), and use a random number table to pick out as many respondents as needed by matching numbers in the random number table to the numbers assigned to each individual in the list of potential respondents.

Alternately, a researcher can take the list, divide the number of units on the list by the number of respondents desired, select an initial respondent from near the top of the list, and then go down the list until reaching the respondent that is as many numbers down as is the result of dividing units by the number of respondents wanted.  Thus, if there were 200 units and 50 desired for the sample, the researcher would count down 200/50 or four lines for the next respondent.

Making sure that potential respondents are part of the population can be done in either of two ways.  Many populations can be identified with an existing list (voter registration lists, for example) or with a list that researchers assemble themselves (like using county records to get names and addresses of all farm owners).  Survey researchers seeking to understand reactions to the job-related stresses experienced by nurses today would want lists of persons employed as nurses from hospitals, rehab centers, nursing homes, and health clinics.  Once they have the list from each place employing nurses, they can make a consolidated list of all nurses in the area and then use the random selection techniques already described.  If researchers can’t get accurate lists, they need to include questions that will elicit answers revealing whether the respondent is in the relevant population.  Pollsters, who are always in this position because lists of registered voters are not equivalent to lists of everyone who actually votes in an upcoming election, ask the people they contact in a pre-election poll whether they are likely or unlikely to vote and include in the poll results only those who say they are likely to vote.
 
Survey researchers and pollsters have learned a lot about how to get a truly random sample since 1936 when the Literary Digest, a well-known weekly magazine at the time, predicted that Alf Landon, the Republican nominee, would win the presidential election.  The people doing the poll were pioneers in using telephones to contact potential respondents, and this got them intodifficulties because in 1936 well-to-do and upper middle class households were much more likely to have a telephone than working-class and poor households.  They ended up oversampling the higher income groups, who – as all students of voting behavior already knew – are more likely to vote Republican.  (Looking from the other end of the income spectrum, they simultaneously undersampled the lower income groups.)  Because the distribution among income groups of people in their sample did not closely match the distribution among incomes of every voter in the USA, they did not have a representative sample even though they had selected the residential phone numbers to dial randomly.

Sometimes researchers face situations in which they know that units differing on some characteristic have different responses to the same question.  Most experts in business activity know that large and small firms react very differently to government regulations that involve a lot of reporting; large firms can absorb it more easily because they have more people who can deal with paperwork while others make the product, sell the goods, or provide the service than do small companies.  Comparative politics experts agree that urban and rural individuals have different attitudes on many questions.  Gender, race, and income are known to strongly affect political attitudes on some issues.  When survey researchers believe that the units they want to investigate will have different views base don some characteristic, they will make sure that their sample closely reflects how the population divides on that characteristic by using a proportionate stratified sample.  In this procedure, the researcher divides the population into two or more groupings (strata) on the characteristic, and then draws random samples from the units in each grouping.  Thus, if they were investigating attitudes on war, in which individuals’ attitudes are known to differ significantly by gender, researchers would identify use separate lists of males and females to identify their respondents.   In a proportionate stratified sample, the size of the sample drawn from each strata is adjusted so that the proportion of respondents in each grouping matches the proportion of the grouping in the population.   For a study of attitudes using genders as the strata, a proportionate stratified sample would include an equal number of males and females anywhere the gender ratio is close to 50-50.  In an area where 40% of the population is male, 40% of the respondents would be drawn from the list of males, and 60% from the list of females.  When a subgroup’s share of the population of units is very low, researchers may use disproportionate stratified sample  to make sure that they have enough respondents from the relatively small group to get an accurate assessment of the range of views in the smaller group.  A researcher surveying with genders as strata might use a sample of 300 males and 300 females; a researcher stratifying a population by ethnicity in which 48% were one ethnicity, 48% a second ethnicity, and the remaining 4% a third, might use a sample with 300 from each group.  This does not mess up results if the goal is to compare attitudes within each group to attitudes within each other group.  It only becomes a problem if someone forgets how the samples were drawn and assumes that the set of responses were collected from a random sample and can be used to draw conclusions about the attitudes of the population as a whole rather than about the separate groupings.

Formulating the Questions

           Having a good random or proportionate stratified sample of respondents is important to a good research project. However the data is generated by asking the respondents for their answers to one or more questions. The data cannot be any better than the questions.

           Pollsters seeking to predict election results have a very standard set of questions.  Early in the nomination process (during primaries for example) they often try to assess the relative popularity of various candidates by asking how a person would vote if different pairs of candidates were running against each other.  You will see this in early 2008 as people start thinking about which Republican hopeful can or cannot defeat which Democratic hopeful.  At this point in the campaign season they can also likely voters in each party who they plan to support in the primary.  You saw lots of polls of this type in late August and early September as O’Reilly, Gabrielli, and Patrick were all contending for the Democratic nomination.  As the general election comes closer, the questions which to “which candidate will you vote for?” or, for the ballot questions, “will you vote yes or no on Question __?”

            Survey questionnaires are less standardized because the questions they are designed to answer vary far more.  Developing a good survey questionnaire requires careful work on six distinct steps:  a) defining the purpose of the survey and what will be included, b) selecting an appropriate form of questionnaire, c) deciding whether to use open-ended questions, closed-ended questions, or some mix of both, d) setting the order of questions, e) setting the wording of each question, and f) doing a “pretest” and “pilot” of the questionnaire to be sure that actual respondents understand and react to it as the researchers assume they will.

            Defining the purpose starts from realization that surveys can be used to attain a number of research goals.  Some surveys are used to develop descriptive information about a population of individuals or some other unit.  Questions must be focused on getting the information desired.  Other surveys are used to test one or more causal hypotheses about why people hold particular views or act in some way.  The hypothesis will determine the question content since the researcher will want data conforming challenging the hypothesis.  If a student of voting behavior claims that veterans in a country vote for conservative parties and there are no lists of veterans available, then the questionnaire must include questions about whether the respondent is a vateran as well as on how the respondent voted in the last election.  In both types of project, the questions can range from the very factual (“when were you born?” “have you been to the UK in the past five years?” “when do you get up on a workday?”) to the very judgmental (“how would you rate GW Bush’s performance as president?” “does the Israel lobby have too much influence over US policy in the Middle East?).

            There are two main types of questionnaires: self-completion questionnaires handout, mail (or e-mail) surveys that respondents fill out and return to the researcher, and interview questionnaires that an interviewer follows in direct phone or face-to-face contact with respondents (I suppose instant messaging could also be used, but am not aware of anyone using it yet).  Self-completion questionnaires can be used for a short set of easily-understood questions; structured interviews, in which all of the questions are set in advance and the interviewer may not deviate from the list (the protocol) allow a trained researcher to ask more complex questions.  Sometimes researchers use unstructured interviews allowing respondents to talk about things in the order they come into his or her head; sometimes the use a combination of structured and unstructured parts on a single questionnaire.

            The particular questions used on a structured or an unstructured questionnaire can be open-ended or closed-ended.  Open-ended questions allow respondents to reply in their own words.  Closed-ended (sometimes called close-ended) questions ask the respondent to choose among a preset array of responses.  A question asking you whether you strongly agree, agree, disagree, strongly disagree, or have no opinion is a closed-ended question.  A question asking you to describe President Bush's job performance with the first word or phrase that comes to your mind is open-ended.  Open-ended responses are harder to code because they are not returned to the researcher in a standard format but they can pick up additional information.  Some questionnaires use both types of question.  Company “exit interviews” with employees leaving for another job often includes questions like “if there was one thing you could change in the job, what would it be?” as well as questions like rate the safety of the company’s various workspaces on a one (best) to five (worst) scale.

            The order of questions on a self-completed form or in an interview can make a difference because earlier questions “prime” a respondent’s mind by suggesting a particular context or putting the respondent in a particular state of mind.  This priming can have any of several effects.  One of the better known is the consistency effect, which stems from the fact most people want their answers to be consistent with each other.  This effect operates most strongly when questions on a similar topic appear close together In 1944, while the USA was involved in World War II, D. Riggs and H. Cantrill explored the consistency effect by taking two surveys of different sets of randomly-selected respondents.  One set of respondents was asked:

            Should the United States allow its citizens to join the British and French armies?
            Should the United States allow its citizens to join the German army?

The other was asked:

            Should the United States allow its citizens to join the German army?
            Should the United States allow its citizens to join the British and French armies?

In the first version, 45% said yes to the first question and 31% said yes to the second question.  When the questions were reversed, only 22% agreed with letting US citizens join the German army (“The Wording of Questions,” in H.Cantrill, editor, Gauging Public Opinion, Princeton University Press, 1944).  22% in the midst of World War II is pretty amazing, but the point of the research was that putting the allied armies first set up a pressure to be consistent and agree with allowing citizens to join an enemy army as well.  Another sort of influence, the part-whole effect, occurs when one question refers to a broad topic and another to some more specific matter that can be seen as within the broad topic.  H. Shuman and S. Presser tried reverse ordering of two questions with two sets of respondents in 1996.  First they asked:

Do you support having laws that permit abortion?
Do you support allowing abortion when there is a strong likelihood that the baby has a serious birth defect?

When they reversed the order, fewer respondents said there should be laws permitting abortion.  They argued that respondents thought the more specific question asked about a special case of the broader question if the general question came first, but that the specific question was about a different topic if it came first.  Thus the respondents asked about the strong likelihood of severe birth defects redefined “abortion” as not covering that situation (Questions and Answers in Attitude Surveys, Sage Publications, 1996).

            Large surveys often use questions that have been used successfully in earlier surveys for either or both of two reasons.  First, they are proven to work, sparing the researcher from having to do pretesting (see below).  Second, asking the same question allows tracking attitudes or information over time.

  Writing new questions requires considerable thought and care.  Researcher need to make the question as clear as possible because ambiguous wording can lead some respondents to understand the question one way and others to understand it a different way.  If they do, their answers will not be comparable because they “agree” or “disagree” with different statements.  Using an interviewer rather than a self-completion questionnaire may not prevent this because many people are reluctant to admit that they do not understand a question.  It is also important to avoid double-barrelled questions, questions that ask about two possibilities simultaneously.  Any question with the word “or” – as in “Should the government increase the gas tax or leave it at the present level?” – is double-barrelled because a data coder cannot tell whether “yes” or “no” refers to raising the tax or leaving it where it is.  However, any question that covers two options – for instance, “would you use the library more if it were open later on weekdays or on the weekend?” – causes the same problem because the coder can’t tell whether a “yes” answer means the respondent would use the library more if it were open later on weekdays, if it were open on weekends, or both.  The library budget committee would certainly want to know so it could attract as many users as possible for the additional money spend of staff pay if the library is to be open more hours.  Loaded questions, questions in which the first words use language that will encourage respondents to view an issue, problem, group, or individual in a positive or negative light, should also be avoided because they can affect answers.  “Should deadbeat dads be put in jail?” has a very different effect on respondents than “Should parents who fail to pay child support be punished?”

            Unless they are only using previously-asked questions, pollsters and survey researchers need to try out their questionnaires before using them on a full sample.  During pretesting they try out different question wording and different question order with a few individuals to see whether they have trouble understanding the questions, whether the wording or order biases responses, or causes any other problems.  If they have a particularly long or complex set of questions, they will pretest parts of the questionnaire and then conduct a pilot – ask a few individuals to answer all of the questions.  A good pilot checks out all the phases of survey research –  drawing a small sample, administering the questionnaire, and coding the responses for analysis.  Doing so allows the lead researchers designing the survey project to make sure that any assistants who will be helping administer the questionnaire or code the data complete those steps as the lead researchers expect.

Reporting the Results

            Polls are more likely to be reported in the media than surveys, though sometimes particular questions used in a survey are so interesting that journalists write about them and use the survey data in their stories.

            Both polls and surveys ask a sample of people to estimate the attitudes in a larger population.  Thus news stories seldom say that some percentage of those polled or surveyed gave some particular answer; they report that some percent of the relevant population (Americans, football fans, senior citizens, whatever) shares that opinion. Estimates of public opinion can be reported as point estimates – single numbers saying what percent of the population would answer the question a particular way (as in “39% of the voters in Ohio will vote yes on Question 1”) – or as an interval – a range of numbers giving the high and low estimate (as in “between 37 and 41% of the voters will vote yes on Question 1”).  The point estimate treats the sample as if it exactly matches the population; the interval estimate acknowledges that even with a carefully-selected random sample that is not biased in any way it is still possible for the sample’s responses to be different than that of the whole population.  Responsible pollsters and survey researchers never report a point estimate alone.  Rather, they report xx things: the point estimate, a confidence interval around the point estimate that gives the highest and lowest value the actual population response could have, a confidence level stating how certain they are that the real population response is within the range of percentages covered by the confidence interval, and the size of (number of respondents in) the sample.  95% is the lowest confidence level commonly used; some researchers are  more demanding and want to be 99% sure but this means having to widen the confidence interval.  Sample size needs to be more than 100 for the confidence interval calculations to be correct; most pollsters prefer larger samples than that, particularly on national questions.

            Calculating the confidence interval for 95% uses the formula that applies to proportions since polls are expressed in % of respondents answering a particular way.  This formula is:

c.i. = Proportion of the sample + 1.96 multiplied by the square root of (.5 x .5) divided by the sample size.  1.96 is a constant that generated the interval matching a 95% confidence level and .5 is a constant used to make sure the interval is set at its maximum extent.  (Those who want the explanation of why this works can consult Joseph F. Healey, Statistics for Social Research, 6th edition (2002), pp. 173-174 and background on pp. 165-168.)
 
           Thus, professional pollsters who had surveyed 600 adults for their attitudes on Question 1 would give not only the point estimate of 39% will vote yes, but also calculate the confidence interval. In this example it would be .39 + 1.96 (√[.5x.5]/600) = .39 + 1.96 (√[.25/600]) = .39 + 1.96 (√0.0004) = .39 + 1.96 x 0.02 = .39 + .039 = (rounding to whole numbers) .35 to .43, or 35 to 43%. They would report “this estimate is subject to sampling error of plus or minus 4% at a 95% confidence level and is based on a survey of 600 adults.”  Journalists should pass this information along to their readers.  When they do it is usually in some corner of the box reporting the survey results in small type, so you have to look carefully for it. If they don't, you can now calculate it yourself.

          Now you can see how an election can be “too close to call.”  If pollsters survey their random samples of likely voters and 48% say they will vote for Arnold and 52% say they will vote for Condi but calculate that their sampling error is 3%, the pollsters have to acknowledge that Arnold could get anywhere from 45 to 51% of the vote and Condi anywhere from 49 to 55%.  Condi looks like she has a lead, but with an overlap in confidence intervals, pollsters will report results but not predict a winner.  They will leave picking winners to the bettors in Las Vegas and elsewhere.