A brief
explanation of survey weighting
by William Schatten
There has been much discussion
over the Forum Poll’s recent Toronto
mayoral poll which showed Doug Ford closing
the gap on John Tory. The discussion has focused on one subtable in the data release which shows that Doug Ford is only winning amongst
18-34 year old voters, while every other age category has John Tory in the lead.
To respond to this discussion a brief examination on the statistical methods of survey weighting is required.
What is a
weight?
A weight is a value assigned to each respondent in a survey sample. This value represents how much
a respondents' answers are weighted in final survey results. Researchers assign
these values based on how a sample matches up with the actual population. For
actual population numbers Forum utilizes Statistics Canada Census Data.
Why do
you use weights?
A weight is used in conjunction
with random dialing in order to make a survey sample as representative of the
targeted population as possible - in this case Toronto.
Example:
Weighting by Gender
For example, let’s assume the
city of Toronto has approximately a 50-50 split of males versus female. (Actual
numbers for voters in Toronto is approx. 47% male and 53% female – Census Metropolitan Area
of Toronto)
Let's also assume that a survey
was completed and it has a randomly selected sample of 600 males and 400
females.
As shown above we can see there
is a discrepancy between the sample size and the population.
To solve this issue researchers would
assign a value to all male respondents decreasing the weight of their
responses. A value would also be assigned to females increasing the weight
of their responses. These weights are used to bring the sample's distribution
in line with the population's.
Example
Calculation
The weighting formula is:
Population Distribution / Sample Distribution = Weight
In this case the equation is:
Male: 50% / 60% = 0.833
Female: 50% / 40% = 1.25
Therefore the assigned weight for
all male respondents is 0.833 and for all females it is 1.25.
Why is Doug ahead if only 18-34 year old voters favour him?
A bi-variate subtable, such as
the age table above, only shows the vote broken down by one variable. The final
popular vote numbers Forum publishes take into account multiple variables,
including but not limited to: Age, Gender, Region, and vote likelihood (when
applicable).
Therefore the final calculation
not only considers age but also a series of other factors.
Building on our male and female
example above, let’s see what happens when we add another factor into the
mix.
Example: Adding another variable - Region
Continuing with our example above, we are now going to add region to our weighting. Let's assume the GTA is made up of approximately 750,000 people living in Old Toronto and 1,250,000
living in the suburbs to the West, North, and East. This population
distribution equates to approximately 38% of eligible voters living in Old
Toronto and 62% in the suburbs.
Remember that males have a lower
weight than females already based on the gender data. Now let’s assume that of
the 600 males surveyed in our example above, 45% were in Old Toronto and 55%
were in the suburbs.
The regional weight calculation therefore is:
Males in Old Toronto: 38% / 45% = 0.833
Males in the Suburbs: 62% / 55% = 1.13
The males who are in Old Toronto
will again receive a slight reduction to their overall weight, while those who
live in the suburbs will receive a slight increase. Again the goal of the increase/decrease is to make the sample proportionate to actual Census Data in order to make
the sample more reliable and representative of the public at whole.
The process continues on in this
manner until all applicable variables have been considered.
Why bother showing demographic subtables if they can't be used for vote prediction ?
The subtables that we show in our
data releases have a value, but that value is not to try and calculate final
vote just using the info in that table alone.
What the age table reveals to us
is that overall - without considering any other factors - Doug Ford is the most
popular amongst young voters aged 18-34. Tory is most popular amongst every
other age category. This is the only conclusion we can draw from this table. We
cannot project a final vote from this data, and if we did it would not be
representative of Toronto.
William Schatten is a Research
Director at Forum Research. He can be reached at WSchatten@ForumResearch.com.