Skip to main content

Understanding Tests and Measurements for the Parent and Advocate

Part 2

Peter Wright and Pam Darr Wright

(back to Part 1)

Understanding the Bell Curve

On all bell curves, the bottom or horizontal line is called the X axis. In our sample of fifth graders, the X axis represents “number of push-ups.” And, on all bell curves, the up- and- down vertical line is called the Y axis. In our sample, the Y axis represents the number of children who earned a specific score (number of push-ups completed).
Chart showing average number of push ups children of a certain age can do

As you can see in the diagram (above), the highest point of the bell curve on the X axis equals a score of ten push-ups. You recall that more children completed ten push-ups than any other number. Thus, the highest point on this bell curve represents a score of ten. The next most frequently obtained scores were 9 and 11, followed by 8 and 12. This pattern continues out toward the extreme ends of the bell curve. In our example, the extremes occurred at 1 and 19 push-ups.

Using the bell curve, we can now chart each child’s score and compare it to the score achieved by all 100 students in the class. Look at the bell curve above, and find 10 push-ups. We know that Amy completed 10 push-ups so her raw score was 10. Ten push-ups placed her squarely in the middle of the class. Half of the youngsters in Amy’s class earned a score of 10 or more; half of the children scored 10 or less. If you look at the bell curve diagram (below), you see that Amy’s score of 10 placed her at the 50% level. The individual’s percent level is referred to as their percentile rank (PR). Amy’s percentile rank is 50 (PR=50).

Chart showing percentile rank of average push ups

Erik completed thirteen push-ups. Looking at the bell curve above, you see that his score of 13 placed him at the 84th percent level. Erik’s percentile rank is 84 (PR=84). Erik’s ability to do push-ups placed him at the 84th position out of the 100 fifth grade children tested on our measure of upper body strength.

Sam completed seven push-ups. His raw score of 7 placed him at the (bottom) 16 percent. Sam’s percentile rank was 16 (PR=16). Out of our sample of 100 fifth grade children, 84 children earned a higher score than Sam.

Larry completed 6 push-ups. We can convert his raw score of 6 to a percentile rank of 9 (PR=9). 91 children scored higher and 8 children scored lower than Larry in upper body strength as measured by the ability to do push-ups.

Oscar completed 2 push-ups. His raw score of 2 placed him in the bottom 1 percent of fifth graders tested (PR=1).

Nancy’s raw score of 17 placed her at the upper 99 percent. We say that Nancy scored at the 99th percentile rank (PR=99).

You can see the relationship between the number of push-ups completed and the child’s percentile rank (PR) reproduced in the table below:

Push-Up Scores and Percentile Ranks
Push-ups Percentile Rank Push-ups Percentile Rank
19 99 9 37
18 99 8 25
17 99 7 16
16 98 7 16
15 95 5 5
14 91 4 2
13 84 3 1
12 75 2 1
11 63 1 1
10 50

The bell curve is a powerful tool. When you use the bell curve, you can objectively compare any child’s percentile rank to that of a group of children. You can also compare a single child’s progress or regression when compared to the group.

Using the bell curve, you can compare a single child’s score to the scores obtained by other children who are older or younger or in different grades.

Let’s see how this works. Again, we will measure the children’s upper body strength by the number of push-ups they can perform. In this case, we decide to evaluate all children in all the elementary grades, from Kindergarten through fifth grade. We will assume that the average chronological age of these elementary school children is exactly eight years (CA=8-0 years).

After we test the third graders, we find that the average or mean score of our sample of 100 eight year old third graders is 6 push-ups. This means that the “average” third grade child (who is 8 years old) can do 6 push-ups. We can also compare an individual child’s score on arithmetic problems answered correctly with the average number answered correctly by children the same age.

How can we compare children from different groups? Let’s look at Larry who was a member of our original group of fifth graders. Although the average fifth grader performed 10 push-ups, Larry only completed 6 push-ups. His raw score of 6 converts to a percentile rank of nine (PR=9).

When we compare Larry’s performance to all elementary school students, we learn that Larry (a fifth grader) is functioning at the level of the average third grader — who is also eight years old — in the ability to do push-ups. Therefore, we see that Larry’s age equivalent score is 8 years (AE=8-0) and his grade equivalent score is at the third grade level (GE=3-0).

Fifth Grade Students: Push Up Scores
Child’s Name Raw Score Percentile Rank
Oscar 3 1
Larry 6 9
Sam 7 16
Amy 10 50
Erik 12 84
Frank 15 95
Nancy 17 99

Look at the table above and find Amy. At the time of testing, Amy was 10-0 years old and in the fifth grade. She scored at the mean for her peers, i.e., 10 push-ups. Her grade equivalent score was fifth grade (GE=5-0) and her age equivalent score was 10.0 years (AE=10-0). If we tested a 20 year old person and found that this person was able to do 10 push-ups, then the 20 year old has an age equivalent score of 10-0 and a grade equivalent score of 5.0, i.e., the same score as Amy.

Look again at the table of scores above and find Frank’s name. You see that Frank earned a raw score of 15 push-ups which converts to a percentile rank of 95 (PR=95). Frank’s score looks great — until we remember that Frank was “held back” three times. Although he is in the fifth grade, Frank is 13 years old!

With this new information, let’s take another look at Franks’ performance. The average score for 8th graders (who are 13 years old) is 15. Frank scored 15. Frank had a grade equivalent score of 8th grade (GE = 8.0) and an age equ ivalent score of 13 years (AE = 13-0). When we compare Frank with other children in his expected grade, we see that his achievement is in the average range. Frank is in the 95th percentile level when compared to fifth graders, not when compared to eighth graders.

Frank’s case brings up some additional questions. Frank (age 13) was included in our sample of 5 th graders who had an average age of 10. When compared to this group of children who were younger than him, Frank scored at the 95% percentile rank (PR) level. Question: If we compare Frank’s performance to that of children who are three years younger than him, will this comparison provide us with an accurate picture of his physical fitness? Answer: No.

In Frank’s case, statistics inform us of two facts. First, we see that Frank performs at a superior level when compared with other children in his grade. Second, we see that he performs at an average level when compared with children who are his age.

When you evaluate the significance of data from tests, you must know how the scores are being reported. Test scores can be reported using percentile ranks, age equivalents, grade equivalents, raw scores, scale scores, subtest scores, or standard scores.

Remember: Although Frank’s performance was superior for his grade, it was average for his age. If you did not know Frank’s age and grade, you would have been misled as to Frank’s actual achievement. But — if Frank was an 8 year old 3rd grader, his scores would be in the superior range, using both age equivalent and grade equivalent measures.

The number of push-ups each child completed was his or her raw score. Let’s assume that we want to obtain an overall fitness score. To obtain an overall or composite score, we will measure three skills (sit-ups, push-ups, a timed 50 yard dash) and obtain scores on each of these skills. In educational testing, the child’s overall score (in reading, math, etc.) is often a composite of several subtest scores.

Next, we will develop a weighting system that will convert each child’s raw score to a scale score. After we convert the raw scores to scale scores, we will be able to compare each of the three scores to each other (number of push-ups, number of sit-ups, seconds to complete the 50 yard dash). How do we convert raw scores into scale scores?

One way to convert scores is by developing a rank order system. In rank order scoring, the child who scores highest in an event (most push-ups, most sit ups, fastest run) receives a scale score of 100; the lowest receives a score of 1. The other 98 children receive their respective “rank” as their scale score.

After each child’s raw scores are converted to scale scores, we can easily compare an individual child to the group and to all children who are the same age or in the same grade. We can also compare an individual child’s performance at different times, i.e. before and after completing the fitness course. Was the child able to do significantly more push-ups after taking the fitness course? Was the child reading better after receiving reading remediation?

Composite Scores

You can see that after we develop a global composite score, the individual child’s raw scores on each of the three fitness subtests have less significance. This is exactly what happens with educational achievement and psychological tests. Most educational tests are composed of several subtests; the subtest scores are combined to develop composite scores. More about this shortly.

Let’s look at how composite scores can be used and some of the problems that arise when we rely on them.

John is a member of our original group of 100 fifth graders. He has good muscular strength (he scored at the 70% PR level in push-ups and at the 78% PR in sit-ups). But, John is very slow and uncoordinated. In the 50 yard dash, he finished 2nd from the last out of the 100 children (PR=2).

How will John’s composite fitness score be derived? In this example, we will average John’s percentile rank scores on the three events. John’s composite score is determined as follows: Add the percentile ranks of each event (70 + 78 + 2 = 150), then divide this score by the number of events (3). In John’s case, 150 / 3 = 50. (Note: actually it is improper to average the percentile rank scores, you must use the standard scores or scale / subtest scores.)

John’s composite score is 50. This composite percentile rank score of 50 places him squarely in the “average” range. Is John an “average” child? His individual scores demonstrated a significant amount of subtest scatter. When you analyze his three subtest scores, you see that he has specific strengths and a very severe deficiency. Despite his average composite score, John is not an average child! (Note: As noted above, the proper calculation is to use the standard scores. Thus the same analysis of John’s composite score by using standard scores, is calculated to a standard score of 96.5 and percentile rank of 41 — again, John appears to be an average child).

Let’s look at another example of composite scores to see how they can mislead us. Oscar was at the 1 percent level in push-ups. But when the other fitness subtests were given, Oscar was the fastest child in the class scoring at the 99% level. He was average in sit-ups, scoring at the 50% level. Oscar’s composite fitness score, using percentile ranking, is 50%. Is Oscar really an average child? Would he benefit from remediation to improve his upper body strength, as measured by push-ups? Oscar also a great deal of subtest scatter, i.e., from extremely weak upper body strength to superior speed.

Subtest Scatter

When subtest scores vary a great deal, this is called subtest scatter. If significant scatter exists, this suggests that the child has areas of strength and weakness that need to be explored.

How can you determine if significant subtest scatter is present? Most subtests have a mean score of 10. Most children will score + or – 3 points away from the mean of 10, i.e. most children will score between 7 and 13.

If the mean on a subtest is 10 (and most children score between 7 and 13), then scores between 9 and 11 will represent minimal subtest scatter. Lets assume that Child A is given a test that is composed of 10 subtests. The child’s scores on the 10 subtests are as follows: on 4 subtests, the child scores 10, on 3 subtests, the child scores 9, and on 3 subtests, the child scores 11. In this case, the overall composite score is 10 and the scatter is very minimal. This child scored in the average range in all 10 subtests.

In our next example, we will assume that Child B earns 4 subtest scores of 10, 3 scores of 4, and 3 scores of 16. The child did extremely well on 3 tests, very poorly on 3 tests, and average on 4 subtests. Again, the child’s composite score would be 10. Subtest scatter is the difference between the highest and lowest scores. In this case, subtest scatter would be 12 (16-4 = 12) Is this an “average” child? Because the child’s scores demonstrate very significant subtest scatter, we need to know more about these weak and strong areas.

In educational situations, it is essential that parents understand the nature of the weak\ areas, what skills need to be learned to strengthen those areas, and how the strong areas can be used to help remediate the child’s weak areas. The spread or variability between the subtest scores is called subtest scatter.

How do these concepts (composite scores and subtest scatter) relate to the information contained in your child’s evaluations?

The results of educational tests given to children are often provided in composite scores. On the Wechsler Intelligence Scale for Children, Third Edition (WISC-III), three scores are usually provided — a Verbal IQ (VIQ), a Performance IQ (PIQ), and a Full Scale IQ (FSIQ). Each of these IQs are composite scores. Both the Verbal and Performance IQ scores are composites of five different subtests, each of which measures a different area of ability. The Full Scale IQ is a composite of the Verbal and Performance scores — which makes it a composite of ten different subtests. IQs between 90 and 110 are considered within the “average range.”

If we rely on composite IQ scores, we may easily be misled — with serious consequences. Katie is the 14 year old youngster whose situation was outlined earlier in this article. On the Wechsler Intelligence Scale for Children-III, Katie achieved a Full Scale IQ of 101. If the only number you had was her Full Scale IQ score, you would probably assume that her IQ of 101 placed her squarely in the “average range” of intellectual functioning. Is Katie an “average” child?

Remember: The Full Scale IQ score is actually a “composite” of the Verbal IQ and Performance IQ scores. Checking further, you learn that Katie’s Verbal IQ is 114 and he Performance IQ is 86. IQ scores between 110 and 90 are considered “average.” You see that there is a 28 point difference between Katie’s Verbal and Performance IQ scores. If you did not have these additional two IQ scores, you might view Katie as an “average” child but you would be mistaken.

Katie’s Verbal IQ of 114 translates into a percentile rank of 82 (PR=82). Her Performance IQ of 86 converts to a percentile rank of 18 (PR = 18). We see that Katie has a percentile rank fluctuation of 64 points (82-18=64) between her verbal and performance abilities. We will look at more of Katie’s test scores shortly.

One of the commonly administered individual educational achievement tests is the Woodcock-Johnson Psycho-Educational Battery-Revised (WJ-R). The Woodcock-Johnson consists of a number of mandatory and optional subtests. The results obtained by the child on these different subtests are combined into composite or cluster scores. If we rely on composite or cluster scores, without examining the child’s scores on the individual subtests, we can easily overlook obvious deficiencies and significant strengths. Relying on composite or ‘cluster’ scores can lead to faulty educational decision-making, having tragic consequences for children. To advocate effectively, parents must obtain all of the subtest scores on the tests that have been administered on their child.

When Apparent Progress Means Actual Regression

One serious concern that many parents have relates to the belief that their child is not making adequate progress in a special education program. How can parents determine if their perception is accurate? And, how can parents persuade school officials that the special education program being provided to the child needs to be strengthened?

Earlier in this article, we discussed how statistics can be used in medical treatment planning. We demonstrated how a medical problem was identified and the efficacy of treatment measured, using objective tests. In our example, the patient had pre- and post- testing as a means to determine whether or not the intervention was working. Based on the results of new testing, more medical decisions would be made — to continue, terminate or change the treatment plan.

This practice of measuring change, called pre- and post- testing, has great relevance to educational planning. After the child’s performance level is identified, we can re- test the child later to measure progress, regression, or whether the child is maintaining the same position within the group.

In this way, pre- and post- testing enables us to measure educational benefit (or lack of educational benefit). Using the scores obtained from pre- and post- testing, we can create graphs to visually demonstrate the child’s progress or lack of progress in an academic area.

To see how this works, let’s revisit our fifth grade fitness class. According to our earlier testing in September, Erik completed 13 push-ups which placed him in the top 84 percent of all youngsters in his class. After a yea r of fitness training, all of the fifth grade children were re-tested. When Erik was re-tested, he completed 14 push-ups.

Question: Has Erik progressed? Answer: Yes and no.

The average performance of the fifth grade class improved by 2 push-ups (from an average raw score of 10 to an average raw score of 12). Erik’s raw score increased by 1 push-up, from 13 to 14. So, we see that although Erik’s age equivalent and grade equivalent scores increased slightly from the prior testing, his actual position in the group dropped from the 84 th to about the 75 th percentile level. While still ahead of his peers, Erik did regress.

What about Sam? Sam’s push-up performance also improved, from a raw score of 7 to a raw score of 8. Although Sam’s age equivalent and grade equivalent scores increased slightly, he also regressed. According to the new scores, his percentile rank dropped from the 16 percentile to about the 9 th percentile rank. Sam is continuing to fall further behind his peer group.

Let’s assume that we test Sam again when he re-enters school in the fall. Now, we have three sets of test data (beginning 5 th grade, end 5 th grade, beginning 6 th grade). Has Sam’s score changed? If his percentile rank continues to drop, Sam is experiencing regression. We need to know how long will it take for Sam to recoup the skills he lost during the summer. Regression and recoupment are primary issues in determining the child’s legal need for extended school year services (ESY) during the summer.

Norm Referenced versus Criterion Referenced Tests

Most standardized tests are either norm referenced or criterion referenced.

When we evaluated our sample group of fifth graders, we compared each child’s performance to the norm group of fifth graders. Both Erik (raw score of 13, percentile rank of 84) and Sam (raw score of 7, percentile rank of 16) were referenced or compared to this norm group of fifth graders. To evaluate benefit, we looked at the norm group and the individual child’s relative position in that group at the time of the first and second tests. We computed each child’s change in position, i.e. progress or regression.

In our example, we also referenced the criteria of number of push-ups completed. A criterion reference analysis determines whether or not a child meets certain criteria (without reference to a norm group.) For example, at the beginning of the year, Sam completed 7 push-ups. If the criteria for success was 8 push-ups, then Sam failed to reach that goal. Let’s assume that Sam received a year of physical fitness remediation; after that year, Sam completed the 8 push-ups. Does Sam now met the criteria for success? The answer to this question depends on whether the criteria have increased now that Sam is a year older.

Another factor complicates this picture. We know that Sam’s’ peer group completed 10 push-ups at the beginning of the year and 12 at the end of the year. Definitions of success are affected by the passage of time. If we rely on criterion referenced measures, we can be misled as to whether the child is falling further behind the peer group. We need to know exactly what the criterion is and what this means when the child is compared to a norm group.

Standard Deviation

Percentile ranks are computed by determining the mean score and the amount of variation of all scores around the mean score. Are the scores bunched around the number 10 in a tight uniform distribution? Are the scores evenly distributed? Do they peak and taper slowly in our earlier bell curves, or do they bunch at the ends, without any scores in the middle? In other words, is there a great variance, with the scores spread over a wide range with two or more peaks, or is there a normal bell curve distribution of scores?

On our push-up test, most of the 5th grade children earned scores around 10 push-ups, with an even distribution above and below 10 push-ups. But, if one-half of the children completed 5 push-ups, one-fourth completed exactly 14 push-ups, and the remaining one-fourth completed 16 push-ups, then the average or mean number of push-ups would still be 10. One-half of the children would have scored above 10 and one-half below 10.

In this case, the distribution is not evenly distributed in a smooth curve above and below the score of 10. In fact, the variance is very large and would present a highly unusual curve with a peak at 5, a drop to zero between 6 and 13, then a jump at 14, a drop at 15, another jump at 16. This distribution of scores would not present a normal bell curve distribution. Educational and psychological tests are designed to present normal bell curve distributions with predictable patterns of scores.

We simply need to know the mean and standard deviation of the test. In most educational and psychological tests, the mean is 100 and the standard deviation is 15. (Mean = 100, SD = 15) In most subtests, the mean is 10 and the standard deviation is 3. (Mean = 10, SD = 3) Average scores do not deviate far from the mean. As scores fall significantly above or below the mean, they are referred to as being a certain value or distance from the mean, e.g., 1 or 2 standard deviations from the mean.

In all tests, the mean is at 0 (zero) standard deviations from the mean. The next marker on the bell curve is +1 and -1 standard deviations from the mean, followed by 2 standard deviations from the mean. To interpret your child’s test scores, you will need to know the test instrument’s mean score and standard deviation score.

Using our original push-up example, the mean score was 10 push-ups and the standard deviation (SD) was 3 push-ups. This push-up example is identical to the subtest scores in almost all standardized educational and psychological testing.

REMEMBER: With most subtest scores, the mean is 10, and the standard deviation is 3.

One standard deviation above the mean is 10 plus 3, i.e. 10 + 3 = 13. One standard deviation below the mean is 10 minus 3; i.e. 10 – 3 = 7. One standard deviation above the mean always falls at the 84 percent level (PR = 84); one standard deviation below the mean is always at the 16 percent level (PR = 16). Two SD’s above the mean is always at the 98 percent level (PR = 98); and two SD’s below the mean are always at the 2 percent level (PR = 2).
Chart showing the relationship between standard deviation and percentile ranks

Looking at actual test scores, we may see that the child scored “one standard deviation below the mean” on a particular test or subtest If the score is one standard deviation below the mean, then the child’s percentile rank is 16.

REMEMBER: The subtest scores of most tests used with our children have a mean of 10 and standard deviation of 3. If a child scores 7 on a subtest, this means that the child scored at the 16 th percentile. A subtest score of 13 means that the child scored at the 84 th percentile.

Standard Scores

One of the most difficult concepts for most parents to grasp is that of standard scores. Since many educational test scores are given in standard scores, it is essential for parents to understand what they mean.

At an IEP meeting, a parent may be told that the child earned a standard score of 85 in one area, a standard score of 70 in another area. Most parents are relieved when they get this news — because they believe that these numbers are similar to grades with 100 as the top score and 0 as the lowest. This is absolutely incorrect! Standard scores are NOT like grades.

In standard scores, the average score or mean is 100, with a standard deviation of 15. The average child will earn a standard score of 100. If a child scores 1 standard deviation above the mean, the standard score is 100 plus 15; i.e. 100 + 15 = 115. If the child scores 1 standard deviation below the mean, this is 100 minus 15, i.e. 100 – 15 = 85.

Since a standard score of 115 is 1 standard deviation above the mean, it is always at the 84 percent level. Since a standard score of 85 is 1 standard deviation below the mean, it is always at the 16 percent level. A standard score of 130 (+2 SD) is always at the 98 percent level. A standard score of 70 (2 SD) is always at the 2 percent level.

Remember Katie? Earlier, we learned that on the Wechsler Intelligence Scale, Katie earned a Full Scale IQ of 101. Later, we saw that this score was misleading because Katie’s Verbal IQ score was 114 while her Performance IQ score was 86. The psychologist found that Katie scored 2 standard deviations above the mean on the Similarities subtest of the Wechsler Intelligence Scale for Children, 3rd Revision (WISC-III).

What does this mean?

You are learning that a score of 2 standard deviations above the \ mean places the child at the 98th percent level on the area being measured. Since the Similarities subtest of the WISC-III measures intellectual reasoning power, Katie’s intellectual reasoning power is at the 98 percent level.

The psychologist also found that Katie had a standard score of 68 — which was 2.5 standard deviations below the mean — on the spontaneous writing sample of the Test of Written Language (TOWL-III). Two SD’s below the mean is at the two percent level. With your new knowledge, you know that Katie’s ability to produce spontaneous writing samples was actually lower than the one percent level.

When we first introduced Katie, we posed two questions:

  1. Do these two test scores help to explain the academic problems Katie is having?
  2. Do her test scores tell us anything about her moodiness and her intense dislike of school?

Katie’s intellectual reasoning ability places her at the top 98 percent of all youngsters her age. However, her ability to convey her thoughts in writing is below the one percent level. If Katie is very bright but is unable to convey her knowledge to her teachers on written assignments and tests, would you expect her to feel frustrated and stupid? Do you question why, after years of frustration, Katie is angry, depressed and now wants to quit school?

Wrightslaw Rules

All educational and psychological tests that report scores using percentile ranks or standard scores are based on the bell curve. To interpret the tests results, you should know the mean and the standard deviation. The Wechsler, Woodcock-Johnson, Kaufmann, and most other standardized tests use this format.

* Since most educational and psychological tests use standard scores (SS) with a mean of 100 and a standard deviation of 15, a standard score of 100 is at the 50% percentile rank (PR) level. A standard scores of 85 is at the 16 % PR level. A standard score of 115 is at the 84% PR level.
* Most educational and psychological tests use subtest scores with a mean of 10 and standard deviation of 3. A subtest score of 10 is at the 50% PR level. Subtest scores of 7 and 13 are at the 16% and 84% PR levels.
*One half of all children fall above and one half of all children fall below the mean of 50% which is also represented as a standard score of 100. A standard score of 100 = PR 50.

  • Two-thirds of all children are between + 1 and – 1 standard deviations from the mean.
  • Two-thirds of all children are between the 16% and 84% percentile ranks. (84 minus 16 = 68)
  • A standard deviation of -1 is at the 16% level. Zero is at the 50% level. +1 SD is at the 84% level.
  • A standard score of 85 is at the 16% level; a SS of 100 is at the 50% level; a SS of 115 is at the 84% level.
  • A standard deviation of -2 is at the 2% level. A SD of +2 is at the 98% level.
  • A standard score of 70 is at the 2% level. A standard score of 130 is at the 98% level.
  • A standard score of 90 is at the 25% level. A standard score of 110 is at the 75% level.
  • One half of all children fall between the 75% level and 25% level. (75-25 = 50)
  • One half of all children achieve standard scores between 90 to 110.
  • A percentile rank score between 25% and 75% is the same as a standard score of between 90 to 110 — and are usually considered to be within the “average range.”

Understanding Test Data

The results of most educational tests are reported using standard scores. Parents must know how to convert standard scores into percentile ranks. Using the table below and bell curve above, you can convert any standard score into a percentile rank score. The earlier push-up example used standard educational scores.

Standard Score Subtest Score % Rank Standard Score Subtest Score % Rank Standard Score Subtest Score % Rank Standard Score Subtest Score % Rank
145 19 >99 107 68 97 42 97 19
140 18 >99 106 66 96 39 85 18
135 17 99 105 11 63 95 9 37 85 7 16
130 16 98 104 61 94 34 80 6 9
125 15 95 103 58 93 32 75 5 5
120 14 91 102 55 92 30 70 4 2
115 13 84 101 53 91 27 65 3 1
110 12 75 100 50 90 8 25 60 2 <1
109 73 99 47 89 23 55 1 >1
108 70 98 45 88 21

Other Tests: Means and Standard Deviations

Adding to the confusion about tests is the fact that test scores are sometimes reported differently. For example, test scores may be reported as “Z Scores.” Z scores are simply standard deviation scores of one with a mean of zero (Mean = 0, SD = 1, instead of a mean of 100 and SD of 15 as we found with standard scores).

If you know that a particular child earned a Z score of -1, then you also know that the child’s score was one standard deviation below the mean, which is a percentile rank of 16. If you convert this score, using the standard score format with a mean of 100 and a standard deviation of 15, you will see that a z score of -1 is the same as a standard score of 85.

Another test format uses T Scores. With T scores, the mean is 50 and each unit of standard deviation is equal to 10. A T score of 60 is the same as a Z score of +1. A T score of 60 and a Z score of +1 are equal to a percentile rank of 84. A T score of 70 is equal to a Z score of +2, a standard score of 130, and a percentile rank of 98.

Another measure is a Stanine test. In Stanine tests, the mean is five and the standard deviation is 2.

Specific Tests

Since tests are always in a state of change with new versions being produced, we will not attempt to review and describe each test. There are a number of parent-oriented publications that you can refer to. Interested people may ask the examiner to photocopy relevant portions of the manual for you. Examiners cannot copy actual test questions for you, but may be able to copy the instructions and explanations. This is your best source of current test information.

Earlier in this article, you learned that both the Verbal and Performance IQ scores are actually composites or averages of five different subtests. Each of the separate subtests measures very different abilities. Let’s analyze Katie’s subtest scores to see what else we can learn from them.

Wechsler Intelligence Scale for Children, Third Edition (WISC-III)
Verbal Subtests Performance Subtests
Information 10 Picture Completion 6
Similarities 16 Coding 4
Arithmetic 11 Picture Arrangement 10
Vocabulary 13 Block Design 12
Comprehension 12 Object Assembly 7
(Digit Span) 8 (Symbol Search) 6
Verbal IQ = 114
Performance IQ = 86

Subtests of the Wechsler Intelligence range from a low score of 1 to a maximum score of 19. As you learned earlier, these subtests have a mean of 10 and a standard deviation of 3. A subtest score of 7 is one standard deviation below the mean (-1 SD) which is the same as a percentile rank of 16 (PR = 16). You can also convert the subtest score of 7 into a standard score of 85 which has a percentile rank of 16.

When we discussed subtest scatter, we saw that variation among subtest scores is a valuable source of information. Look at Katie’s subtest scores. She has significant scatter, from a high score of 16 on Similarities (98 percentile) to a low score of 4 (2 percentile) on Coding.

As a parent, you need to understand what the various subtests measure. When we discussed Katie’s test scores, you learned that Similarities subtest is highly correlated with abstract reasoning. The Coding subtest measures visual- perceptual mechanics. The Coding subtest is highly correlated with reading achievement but has little relation to abstract reasoning.

Question: Which Wechsler subtest is most closely correlated to intellectual horsepower and reasoning ability?

Answer: The Similarities subtest.

Question: Which subtest measures a child’s ability to decode visual symbols?

Answer: The Coding subtest measures decoding of visual symbols.

The Psychological Assessment Resources, Inc. describes each WISC-III subtest as follows:

Information: factual knowledge, long-term memory, recall.

Similarities: abstract reasoning, verbal categories and concepts.

Arithmetic: attention and concentration, numerical reasoning.

Vocabulary: language development, word knowledge, verbal fluency.

Comprehension: social and practical judgment, common sense.

Digit Span: short-term auditory memory, concentration.

Picture Completion: alertness to detail, visual discrimination.

Coding: visual-motor coordination, speed, concentration.

Picture Arrangement: planning, logical thinking, social knowledge.

Block Design: spatial analysis, abstract visual problem-solving.

Object Assembly: visual analysis and construction of objects.

Symbol Search: visual-motor quickness, concentration, persistence.

Mazes: fine motor coordination, planning, following directions.

Intelligence testing usually includes a measure of a visual motor speed (as in the Coding subtest) and a measure of intellectual reasoning ability (as in the Similarities subtest). To develop an accurate picture of your child’s strengths and weaknesses, you need to understand what the various subtests actually measure.

When subtest scores are in parentheses, this means that these scores are not computed as a part of the overall composite score. If you look at Katie’s scores, you will see that (Digit Span) and (Symbol Search) are in parentheses. On the WISC-III, the Digit Span, Symbol Search and Mazes subtest scores are not included in the Verbal, Performance and Full Scale IQ scores. They are used to develop other composite scores.

More than half of all children with disabilities served under the special education law have learning disabilities and/or an attention deficit disorder. The most commonly administered tests fall under three categories: intellectual; educational; and projective personality tests.

In most cases, the intelligence test given is the WISC-III and/or the Stanford-Binet. Specific training and education is required before a test publisher will allow a diagnostician to administer the WISC-III. The Woodcock Test of Cognitive Abilities measures specific cognitive areas. This test may be administered by an educational diagnostician and does not require the same high level of training and certification to administer.

Other Tests

The National Information Center for Children and Youth with Disabilities (NICHCY) has published a comprehensive free article entitled “Assessing Children for the Presence of a Disability” by Betsy B. Waterman, Ph.D. It is recommended that parents read this article to further their understanding of the assessment process.

In an issue of The International (Orton) Dyslexia Society’s newsletter Perspectives, Dr. Jane Fell Greene was asked about the proper tests to use with dyslexic and learning disabled children.

Dyslexia is difficulty with language. Dyslexics experience problems in psycholinguistic processing. They have difficulty translating language to thought (reading or listening), or thought to language (writing or speaking). Although psychological, behavioral, emotional or social problems may result from dyslexia, they do not cause dyslexia. One test is inadequate: a battery is required. Typical psychoeducational tests were not designed to identify dyslexia.

Dr. Greene recommended using the Detroit Tests of Learning Aptitude as a global test that primarily tests verbal and non verbal language. “It measures the level at which the individual would perform if appropriate interventions were implemented (as is required by federal law).”

The article recommended additional tests by age group. The tests for preschool and kindergarten were the Test of Phonological Awareness, Tests of Early Written Language, Test of Early Reading Ability, and the Preschool Evaluation Scale. For primary years, the following were recommended – Test of Phonological Awareness, Test of Language Development, Peabody Individual Achievement Tests, Gray Oral Reading Test, PIAT Test of Written Expression, and the Wide Range Achievement Test. For elementary students Dr. Greene recommended the Test of Language Development, the Peabody Individual Achievement Test, Gray Oral Reading Test, PIAT Test of Written Expression and the Wide Range Achievement Test. For the adolescent and adult she recommended the Test of Adolescent and Adult Language, the Peabody Individual Achievement Test, the Gray Oral Reading Test, the PIAT Test of Written Expression and the Wide Range Achievement Test. The Detroit was recommended for all age levels.

Another area of assessment involves projective personality testing. Projective personality tests help to assess the child’s mental state, degree of anxiety, and areas of stress. They can be useful in showing that a child who is viewed as emotionally disturbed is actually a normal child who is intensely frustrated about educational problems. Children experience great frustration and unhappiness when they cannot succeed in school. If placed in a healthier environment where they are able to learn, many “emotional problems” disappear.

There are many other types of tests and “surveys.” Children who have difficulty processing information and whose tests show great scatter may benefit from a neuropsychological evaluation. Neuropsychological evaluations include tests that assess specific neurological issues that affect learning. Other measures include surveys and questionnaires that provide norm reference data, most often about behavior, how children see themselves, and how parents andteachers view them.

REMEMBER: To fully understand your child’s test scores, you must know the mean, the standard deviation, and the child’s specific score on the test, reported as either a standard score or a percentile rank. After you have the standard score or percentile rank, you can derive the other score.

Many test publishers also provide age equivalent and grade equivalent scores for specific raw scores.

After you master the information contained in this article, you will be able to convert test scores into easily understood numbers. You will be able to measure your child’s educational progress. After you master this material, the feelings of helplessness and confusion that you have experienced at earlier school meetings will dissipate. You will become an authority in discussing your child’s test score history and the significance of the data.

Private Sector Evaluations

In most of our cases, we do not rely on public school testing. Instead, we secure testing from private sector diagnosticians, child psychologists, school psychologists, and educational diagnosticians who are familiar with and able to administer a number of the multitude of tests that are available. We find that public school staff are often limited in the types of tests available for them to use and are unable to probe adequately, despite unusual scatter in a subtest profile.

Many private diagnosticians are eager to help parents learn how to chart out the child’s test history. Assume that your child was tested three years ago on the WJ-R Test and scored at the 10% level in word identification, at the 60% level in passage comprehension and had a global composite reading score of 35%. After three year of special education where the child was presumably receiving remediation in reading, the child is retested privately. Subsequent testing by the expert discloses that your child is now at the 5% level in word identification and at the 45% level in passage comprehension, with a composite reading score of 25%. Technically, the earlier composite scores of 35% and 25% fall within the “average range.” If you prepare a chart that demonstrates this regression, you may be able to convince school personnel to add true reading remediation to your child’s educational program. Individualized Education Programs

You should also obtain our book Wrightslaw: Special Education Law. The book (available from the Wrightslaw store and by fax and mail) contains the complete federal statute (IDEA-97), the federal special education regulations, and Appendix A, the appendix that explains IEPs.

You should also obtain the special education regulations from your State Department of Education. The language in the State’s publication should be similarto the Federal Regulations.

By using this article and our law book, you will be able to write IEP’s that contain measurable objectives.

For example, in an IEP that includes keyboarding, a typical public school IEP will measure typing success by using “teacher observation” at an 80 percent success rate. Your IEP will state that by December, 1996, on a five minute timed typing test of text, your child will be able to type at fifteen words per minute with one minute deducted for each error. By June, 1997, on a five minute timed typing test of text, your child will be able to type at thirty words per minute with five words per minute deducted for each error. This objective includes “Appropriate objective criteria and evaluation procedures and schedules, for determining, on at least an annual basis, whether the short term instructional objectives are being achieved.” 34 C.F.R. Section 300.346

Parent’s To Do List

  1. After you complete this article, make a list of all the times when your child has been tested. Arrange your list in chronological order. Include the names, dates, and scores of each test that has been administered to your child more than once.
  2. Begin your list with the test or tests that have been administered most frequently. In many cases, that will be the Wechsler Intelligence Test and the Woodcock-Johnson and/or Kaufmann Educational Achievement Tests.
  3. Write down all of the scores from the first administration of a test battery. Convert these scores to percentile ranks. Complete the same process with the most recent testing of the same battery. Compare the results. You should be able to determine whether your child is being remediated (catching up), staying in the same position, or falling further behind the peer group.
  4. Dig for the standard scores or percentile rank scores in your child’s file. You may find that some scores are only reported in “ranges” (i.e., high- average, low-average) or in grade equivalent or age equivalent scores. If the standard scores are not available, you should ask for them. When you request the data in standard score format, the school staff may be surprised but they should be able to comply with your request.
  5. Take the most glaring deficiencies where your child has shown minimal progress or even regression and chart out the test results. If you do not have a computer, use graph paper. Software programs like Excel and PowerPoint allow for dramatic visual presentations of test data. If this is too difficult or confusing, consult with an expert. Gather your material — your bell curve chart and standard score / percentile rank chart, your list of test scores, and your child’s evaluations, and consult with a private sector psychologist or educational diagnostician who can explain the significance of the scores using percentile ranks.
  6. Ask the professional to use the bell curve chart that includes standard scores, standard deviations and percentile ranks. Be sure that you have a photocopy of the bell curve so you can take it home to study. If the professional is willing, it may be helpful to tape record this portion of the session so that you can go back over it at home with the test scores in front of you.
  7. Contact your state’s Department of Education and request all publications about special education and IEPs, along with your state regulations.
  8. Download our companion article, “Your Child’s IEP: Practical and Legal Guidance for Parents and Advocates.”

For the professional, attorney, and the curious parent, an excellent book about tests and their meaning is Assessment of Children (currently being revised) written and published by Jerome M. Sattler, Publisher, Inc., P. O. Box 151677, San Diego, CA 92175. You can order this book from Dr. Sattler (619 460-3667) or from The Psychological Corporation (800-228-0752), or from the Advocate’s Bookstore at Wrightslaw. On page 17 of Dr. Sattler’s book, you will find a Bell Curve with percentile ranks for the Wechsler IQ tests, subtest scores, and most other tests that are used with special education children.

Go to: and where you can download and print bell curve charts and a list of standard scores, scale / subtest scores, standard deviation and percentile ranks!

Make several prints of both. You’ll be surprised at how often you’ll refer to them. Make copies for your friends.

Learn More About Tests and Assessments, See our New Slide Show – Educational Progress Graphs

Don’t forget to download Your Child’s IEP: Practical and Legal Guidance for Parents and Advocates.

Good Luck!

[We encourage you to visit the Wrightslaw website, and the new companion website “From Emotions to Advocacy – The Special Education Survival Guide”]