Oregon Vital Statistics Annual Report 1995, Volume 1
Technical Notes — Step-by-Step Instructions
Data users are diverse, including public health officials evaluating a program by using death data, demographers projecting school enrollments with birth data, and business people deciding to open a formal-wear shop based on marriage data. Many of these users have a thorough knowledge of statistics. But others find the entire subject-matter confusing and intimidating. For either group, a misunderstanding of what vital statistics mean can lead to wrong conclusions. Therefore, this section is included to provide an overview of how to use vital statistics. It is addressed to the person looking at vital events for the first time, but the experienced user may also find a review helpful.
STEP 1: FINDING THE CORRECT NUMBER
The first step is to determine how many of a particular vital event took place during the year. This involves asking two questions:
Which event or events are appropriate?
This may not be as simple as it sounds. For one thing, examining more than one type of event may be required. For example, someone concerned with teenage pregnancies will have to consider the number of induced abortions as well as the number of births which occur among teens. Taken together, they provide a useful measure of the number of pregnancies.
Deciding which events to use is important since sometimes the choice of one event over another can lead to easily different conclusions. To determine which events are appropriate, read the "Technical Notes: Definitions" section. The narratives also contain useful examples.
Who should be counted?
If you are a hospital planner who is deciding to expand or contract delivery services, you want to count the number of births which
occurred in your area, regardless of where the parents live. If you are projecting school enrollment, you want to count only how many children will potentially be
residing in your area. Fortunately, vital events are usually reported so that both of these data needs can be met.
- INFANT DEATHS
- NEONATAL DEATHS
- POSTNEONATAL DEATHS
- FETAL DEATHS
- LOW BIRTH WEIGHT INFANTS
- INDUCED ABORTIONS
The event (the death, birth, marriage, etc.) actually took place in the geographic region indicated (either Oregon or a particular county). The person participating in the event may have lived in Podunk, New York.
The person involved in the event lived in the geographic region mentioned, but the event itself may have taken place anywhere in the United States or Canada. In other words, a resident of Marion County who died in an accident while on vacation in Michigan has been added to the Marion County resident death figure.
When in doubt about which type of data to use, resident figures are usually the best choice. Most birth and death data are published by residence, which means that comparisons with other states or the United States as a whole will be easier. Exceptions to this rule are listed in the individual sections.
Once the right event has been determined, and the choice between occurrence and residence data has been made, the statistician can find the correct figures in the table(s) in this book. If the needed table is not listed, contact the Center for Health Statistics for more information.
STEP 2: MAKING THE NUMBER MEANINGFUL WITH RATES AND RATIOS
In many instances simply knowing the number of events is not sufficient. For example, we know more people died in Multnomah County than in Wheeler County, because Multnomah County has a much larger population. But what is the
of dying in each county?
In order to answer this question, statisticians calculate rates. This means that the number of events which occurred is compared to the population for which that event
could have occurred, and the figure is then standardized to some number (such as 1,000 or 100,000) for convenience.
Here is an example:
CRUDE DEATH RATE = (DEATHS/POPULATION) X 1,000
a number chosen by vital statisticians to improve the ease of comparisons
the number of people
who could have died
The more specifically a statistician can define the "population at risk" (the denominator or bottom part of the formula), the more meaningful the rate is. For example, the
crude birth rate, which compares the number of births to the population, is not nearly as informative as the
fertility rate, which uses only the number of women of childbearing age (15-44) for comparative purposes. The fertility rate is not distorted by changes in the number of men or pre-pubescent or post-menopausal women in the population. (The turn of the century notion that only
married women between the age of 15 and 44 would be considered at risk of pregnancy has been abandoned for obvious reasons.)
Unfortunately we do not always have the correct denominator for the equation. In these situations a substitute is used. For example, how many people are at risk of getting divorced? The number of married people is only available for census years. As a substitute, the crude divorce rate is calculated using the total population regardless of marital status. In other situations, the event is simply compared to another related number. For instance, the abortion ratio compares the number of abortions to the number of births. This is easier and more accurate than trying to determine the true denominator, which is the total number of pregnant women.
STEP 3: COMPARING TWO OR MORE NUMBERS
Numbers are more meaningful when they are converted into rates and ratios. But problems can arise when rates or ratios are compared for different geographical areas, different time periods, or different categories such as men versus women.
Statisticians expect a certain amount of chance variation and have methods to take this into account. The
confidence interval uses the number of cases and their distributions to determine what the rate "really is." For example, a statistician will say, "We are 95% sure that the
true infant death rate for Oregon in 1986 was 9.47 ± 0.97; that is, it lies somewhere between 8.50 and 10.44." If two rates have overlapping confidence intervals, then the difference between them may be due to this chance variation. In other words the difference is not
Chance variation is a common problem when the numbers being used to calculate rates are extremely small. Large swings often occur in the rates which do not reflect real changes. Consider Tillamook County's infant mortality rates for a five year period.
The overall rate of 10.1 is quite close to the state rate for the same time period (10.2). Yet for some years the rate is four times as high as the rate of other years simply because four additional infants died. Public health officials would waste a good deal of energy reacting to these annual rates.
Many rates based on small numbers are published in this book because readers demand them. But anyone preparing to make important decisions based on these rates should be wary. Consider this rule of thumb: a rate based on 20 cases has a 95% confidence interval about as wide as the rate itself (i.e., the interval for a rate of 50 is between 25 and 75). Even large differences between two rates based on 20 cases or less are probably not statistically significant.
If 20 is too few, how many cases are sufficient to say that a true difference exists? Unfortunately we have no easy rules for this. To be safe, the vital statistician should always try to combine several years of data or consolidate geographical areas. Confidence intervals should be calculated, and differences should be tested for statistical significance.
Changes in measurement
Another problem is that the numbers being compared have not always been based on the same type of measurement. Definitions, population estimates, certificates, and coding procedures change from time to time as the need arises. This can create "artificial" differences and can disguise "real" differences. The cause-of-death item provides an excellent example in comparability:
It appears that the incidence of hypertensive disease increased. But actually, a new coding scheme resulted in more deaths being coded as due to hypertensive disease.
Taking age, sex, and race into account
Mr. G.C. Whipple noted in 1923 that, "We might find that the death rate of bank presidents was higher than that of newsboys; but this would not be because of different occupations, but because of different ages." We expect older people to die at a higher rate than younger people We also expect people in their twenties to have more babies than the very young or the very old. Sex and race, as well as age, can affect rates drastically.
When comparing two places or two points in time, it is necessary to take these influencing characteristics into account. Here is an example:
The crude death rate increased between 1950 and 1960 from 9.1 to 9.5 deaths per 1,000 population. But an examination of the death rates for each age group indicates that all these rates decreased. This apparent contradiction is explained by the fact that in 1960 a larger proportion of the population was older. Because the risk of death is higher in older persons, the crude death rate increased.
STEP 4: ANALYZING THE DATA
The first three steps have been fairly mechanical:
(1) = Choose the correct events and the correct group to determine the number of events which took place for the geographical areas and time periods. (2) = Calculate the rates.
(3) = Compare these rates to determine if the differences are statistically significant.
NOW the vital statistician must begin to ask the difficult questions. If we find that two rates are statistically significantly different, how can we find out
why they are different? If the
differences which we expected did not prove to be significant, is there another item which perhaps is masking an actual difference? Frequently the statistician has to refine the research question and begin all over again.
Consider the researcher who asks, "Since 1985, has chronic obstructive pulmonary disease posed a greater risk to Oregonians?" If the researcher looked at the overall rate, the answer would be "yes," but closer examination reveals that the death rate for males has declined. It is among women that the rate has moved sharply upward, reflecting their increased smoking prevalence during recent decades. This gender dichotomy would need to be addressed in a study of COPD fatalities.
Several sources of help are available. Many of the widely used rates and ratios are presented in the Quick Reference section, and narratives and figures are included throughout the book to illustrate changes. And finally, the staff of the Center for Health Statistics are available for data users who need assistance.
Return to Table of Contents
A more complete and accurate estimate of pregnancies based on outcomes would include: (1) births; (2) fetal deaths (stillbirths); (3) induced abortions; and (4) spontaneous abortions (miscarriages). However, fetal deaths occur in less than one percent of all pregnancies and are relatively constant in relation to births (see the
Fetal and Infant Mortality chapter in Volume 2) and the number of miscarriages which occur is not available in vital records (perhaps 10 percent of all pregnancies). Thus, a measure which excludes these outcomes provides an adequate indicator of the number of pregnancies.