Judging Evidence Explanation and Glossary

 

How to Judge Evidence

By Edwin Mass

Some important things to consider when evaluating research evidence:

  • Study design: Does the study design control for potential bias or other potential reasons for improvement? Bias (explicit or implicit expectations/hope of improvement on the part of the child/family or researcher) is usually best controlled by blinding; for example, were the data analyzed by blinded research assistants or by the treating SLP or researcher? Other potential reasons that should be controlled include maturation (spontaneous, naturally occurring improvement as a function of getting older), other treatments a child may be receiving outside of the study, and becoming more familiar with the test or testing situation. Both group designs and single-case experimental designs (see Glossary below) have ways to control for these alternative explanations.
  • Source: Is the evidence published in a peer-reviewed journal? All else being equal, studies in peer-reviewed journals are more credible than studies published as book chapters, magazines, or self-published on the internet (e.g., blog posts). See the Glossary below for explanation of peer review.
  • Replication: Have the findings been replicated, or is there only one study? All else being equal, the more times a program or product has been shown to work, the more likely it is that the effect is real. It is even better when different researchers independently replicate the effects, because it indicates that the effects are not specific to particular clinicians but are likely about the program or product itself.
  • Outcomes: Is the effect large and/or meaningful? For example, improvements may be small, or only present in some children, or on a measure that is not relevant to your child’s goals.
  • Conflict of interest: Is the only available evidence produced by people with potential conflicts of interest? For example, is the only research done by the developers of the program or product, who may have a financial interest? Note that this does not mean that evidence produced by people with a conflict of interest is necessarily flawed or dishonest or untrustworthy. This consideration just means that, all else being equal, evidence from independent researchers is more credible than evidence from researchers with a conflict of interest.

 

Glossary of Terms for Judging Evidence

 

Below are terms frequently seen in discussions of different types of research within the realm of best practices for speech-language pathology as well as other literature.

NOTE: Underlined terms are defined elsewhere in the glossary.

 

Case study

A study design in which a single child or a small number of children are studied in detail, usually over time (such as over a period of treatment). A case study is observational and does not include experimental control, meaning that the researcher takes data but there is no controlled manipulation of variables (such as treatment vs. no treatment) or control for other explanations (such as maturation, normal fluctuation in performance, other treatment provided, placebo effects). Importantly, this means that evidence from a case study can document improvement (whether or not a participant improves over time), but it cannot be used to determine treatment effects (that the treatment was responsible for that improvement). A case study is NOT the same as a single-case experimental design study or single-subject design study.

 

Group design (including randomized controlled trial or RCT)

In a group design, different groups of children are assigned to different conditions (for example, treatment vs. no treatment; or treatment A and treatment B), and the groups are compared before and after treatment. The groups should be similar and have similar levels of performance before treatment. If the treatment works, then the treated group should make greater improvement than the untreated group: the groups should differ after treatment but not before. To try and ensure that groups are similar before treatment, researchers can match the groups (for example, based on performance) or they can randomly assign children to the groups. Randomizing is typically better, because it can control for both known and unknown differences between groups. However, randomization can lead to group differences before treatment with small sample sizes. Group designs in which there is only one group (a group who receives treatment) are essentially the group equivalent of case studies, in that they cannot be used to determine that the treatment is responsible for the improvement. Group designs with a treatment and a no-treatment group can be used to determine whether the treatment caused the improvement. As with SCED studies, group studies vary in their experimental strength, and should be evaluated carefully. All else being equal, a well-controlled group design offers stronger evidence than a well-controlled SCED.

 

Improvement

A gain in performance on some measure of interest over the course of treatment (for example, higher score from before vs. after treatment). Improvement by itself does not mean that the treatment worked: this also requires a demonstration that untreated items/groups did not improve (or improved less than treated items/groups) (a treatment effect).

 

Meta-analysis

A systematic statistical analysis of an aggregated body of research on the same or similar topic or treatment approach. The purpose of a meta-analysis is usually to determine what the likely effect is of a particular treatment method based on all available studies that have used that treatment approach. The focus is usually on primary literature.

 

Narrative review

A review of the literature organized around a particular question or topic to provide novel insights or perspectives on existing literature. Narrative reviews are not systematic or exhaustive, but rather typically involve selection of representative studies, and may be biased or selective with respect to which studies are included for review.

 

Non-experimental group design

A design in which there is only a single group of children who receive treatment. In other words, there is no control group. In essence, this type of design is a group design equivalent of the case study design. This means that the design can be used to document improvement but not to conclude that the treatment was responsible for that improvement.

 

Peer review

A system of “quality control” in which experts in a specific area (for example, childhood apraxia of speech; treatment research design) review articles submitted to a professional scientific journal. These reviewers, usually two or three, are selected and invited by the Editor or Associate Editor of the journal; reviewers are NOT chosen by the authors and cannot have a conflict of interest with the authors. Reviewers provide the (Associate) Editor with a detailed written evaluation and a recommendation about publication. Recommendations typically involve one of the following: Accept, Minor Revision, Major Revision, Reject. The Associate Editor reads the paper and the reviewers’ evaluations and recommendation, and decides whether to accept, reject, or request revisions for the manuscript. Authors must address the reviewers’ concerns and revise the manuscript as needed before a manuscript is accepted for publication. In most cases, authors do not know who the reviewers are (single-blind review). In some cases, journals blind reviewers to the authors’ identities as well (double-blind review). It is important to note that peer-review does not by itself mean that the study is well-done (or important); however, peer-review is a quality control mechanism that does not exist for other types of publications (such as books and book chapters, blog posts, wikipedia, etc.), and therefore, peer-reviewed articles reflect the best available (gold standard) source of scientific information. Peer-reviewed articles are vetted carefully by independent, anonymous experts in the field.

 

Primary literature

Primary literature refers to peer-reviewed articles (in scientific journals) that present new data. This usually means that these articles include a “Methods” section and a “Results” section. Articles that review previously published studies but do not contribute new data are not considered primary literature but secondary literature. This includes systematic reviews and meta-analyses: these are secondary literature.

 

Quasi-experimental group design

A design in which there is a control group (a group that doesn’t receive treatment) but in which the assignment to groups is not random. Examples include forming groups based on matching speech disorder severity at baseline, or based on logistical factors (for example, children who can attend the university clinic are in the treatment group, and children who cannot attend the university clinic are in the control group). Studies with quasi-experimental group designs can document improvement and suggest that the treatment is responsible for that improvement, but not provide strong evidence that the treatment was responsible, because there may be systematic pre-existing differences between the groups that could explain the treatment effect (greater improvement for the treated group).

 

Randomized controlled trial (RCT)

A type of group design in which children are randomly assigned to different conditions (for example, treatment or no-treatment; Treatment A or Treatment B). The goal of randomly assigning children to conditions is to make the groups equal to one another for both known variables (for example, age, gender, speech disorder severity) and unknown or unmeasurable variables (for example, motivation, personality). RCTs are often considered to reflect the ‘gold standard’ of treatment research design because they can, in theory, control for all other potential explanations for improvement. RCTs do not provide detailed information about individual children and their responses to treatment; only group means are used.

 

Secondary literature

Secondary literature refers to peer-reviewed articles (in scientific journals) that review existing data. This includes systematic reviews and meta-analyses, as well as other peer-reviewed review articles such as the traditional narrative review.

 

Single-case experimental design study (aka single-subject design study)

Not to be confused with a case study design, a single-case experimental design (SCED) is a design in which a single participant undergoes different conditions (e.g., treatment vs. no treatment) under manipulation of the researcher, with some degree of experimental control to rule out potential alternative explanations. For example, such a design may have an extended baseline to control for maturation or normal fluctuations in performance; it may have treatment targets being treated at different times to determine whether improvements only occur once treatment is started; and it may test items that are never treated to control for maturation or effects of repeated testing. SCED studies vary in the degree of control over these other potential factors, so not all SCED studies are equally strong. SCED studies may also include replication across participants (for example, a study might include 6 children), but the critical feature is that each child is systematically compared to themselves – each child serves as their own comparison. Because the researcher manipulates whether/when treatment is provided and controls for certain other potential explanations, a SCED study can both document improvement and treatment effects (i.e., can provide evidence that the improvement was caused by the treatment). If the treatment works, then improvement should only be observed when treatment is applied. SCED studies provide detailed information about individual children but cannot be easily used to generalize to other children.

 

Single-subject design study

See Single-case experimental design study

 

Systematic review

A review of the literature pertinent to a particular question in which available databases are searched systematically and exhaustively, with a formalized method of evaluating the quality of included studies. Systematic reviews usually focus on primary literature. The systematic and exhaustive nature of the review is intended to minimize the risk of selection bias in choosing which studies to review. Systematic reviews are typically published in peer-reviewed scientific journals and are considered secondary literature.

 

Tertiary literature

Any literature that is neither primary literature nor secondary literature. This includes books, book chapters, blog posts, self-published articles, wikipedia or encyclopedia entries, conference posters, conference proceedings (unless published in a peer-reviewed journal). Tertiary literature is the least credible source of information because it is not subject to an independent, anonymous vetting by experts.

 

Treatment effect

A change in performance (usually improvement) that is likely due to the treatment provided. A treatment effect requires evidence that treated items or groups improve more than untreated items or groups. If both treated and untreated items/groups improve by the same amount, there is no treatment effect: no evidence that the treatment was responsible.

 

Treatment effectiveness

The likelihood that a treatment results in a treatment effect under routine circumstances. “Routine circumstances” is typically interpreted as in real life, in real-world clinical settings, with a sample that reflects the typical clinical population (for example, with comorbid diagnoses). This contrasts with treatment efficacy. Studies of treatment efficacy are typically conducted before studies of treatment effectiveness.

 

Treatment efficacy

The likelihood that a treatment results in a treatment effect under optimal circumstances. “Optimal circumstances” is typically interpreted as in a research lab, with a carefully selected sample of children. This contrasts with treatment effectiveness.

 

How to Judge Evidence

By Edwin Mass

Some important things to consider when evaluating research evidence:

  • Study design: Does the study design control for potential bias or other potential reasons for improvement? Bias (explicit or implicit expectations/hope of improvement on the part of the child/family or researcher) is usually best controlled by blinding; for example, were the data analyzed by blinded research assistants or by the treating SLP or researcher? Other potential reasons that should be controlled include maturation (spontaneous, naturally occurring improvement as a function of getting older), other treatments a child may be receiving outside of the study, and becoming more familiar with the test or testing situation. Both group designs and single-case experimental designs (see Glossary below) have ways to control for these alternative explanations.
  • Source: Is the evidence published in a peer-reviewed journal? All else being equal, studies in peer-reviewed journals are more credible than studies published as book chapters, magazines, or self-published on the internet (e.g., blog posts). See the Glossary below for explanation of peer review.
  • Replication: Have the findings been replicated, or is there only one study? All else being equal, the more times a program or product has been shown to work, the more likely it is that the effect is real. It is even better when different researchers independently replicate the effects, because it indicates that the effects are not specific to particular clinicians but are likely about the program or product itself.
  • Outcomes: Is the effect large and/or meaningful? For example, improvements may be small, or only present in some children, or on a measure that is not relevant to your child’s goals.
  • Conflict of interest: Is the only available evidence produced by people with potential conflicts of interest? For example, is the only research done by the developers of the program or product, who may have a financial interest? Note that this does not mean that evidence produced by people with a conflict of interest is necessarily flawed or dishonest or untrustworthy. This consideration just means that, all else being equal, evidence from independent researchers is more credible than evidence from researchers with a conflict of interest.

 

Glossary of Terms for Judging Evidence

 

Below are terms frequently seen in discussions of different types of research within the realm of best practices for speech-language pathology as well as other literature.

NOTE: Underlined terms are defined elsewhere in the glossary.

 

Case study

A study design in which a single child or a small number of children are studied in detail, usually over time (such as over a period of treatment). A case study is observational and does not include experimental control, meaning that the researcher takes data but there is no controlled manipulation of variables (such as treatment vs. no treatment) or control for other explanations (such as maturation, normal fluctuation in performance, other treatment provided, placebo effects). Importantly, this means that evidence from a case study can document improvement (whether or not a participant improves over time), but it cannot be used to determine treatment effects (that the treatment was responsible for that improvement). A case study is NOT the same as a single-case experimental design study or single-subject design study.

 

Group design (including randomized controlled trial or RCT)

In a group design, different groups of children are assigned to different conditions (for example, treatment vs. no treatment; or treatment A and treatment B), and the groups are compared before and after treatment. The groups should be similar and have similar levels of performance before treatment. If the treatment works, then the treated group should make greater improvement than the untreated group: the groups should differ after treatment but not before. To try and ensure that groups are similar before treatment, researchers can match the groups (for example, based on performance) or they can randomly assign children to the groups. Randomizing is typically better, because it can control for both known and unknown differences between groups. However, randomization can lead to group differences before treatment with small sample sizes. Group designs in which there is only one group (a group who receives treatment) are essentially the group equivalent of case studies, in that they cannot be used to determine that the treatment is responsible for the improvement. Group designs with a treatment and a no-treatment group can be used to determine whether the treatment caused the improvement. As with SCED studies, group studies vary in their experimental strength, and should be evaluated carefully. All else being equal, a well-controlled group design offers stronger evidence than a well-controlled SCED.

 

Improvement

A gain in performance on some measure of interest over the course of treatment (for example, higher score from before vs. after treatment). Improvement by itself does not mean that the treatment worked: this also requires a demonstration that untreated items/groups did not improve (or improved less than treated items/groups) (a treatment effect).

 

Meta-analysis

A systematic statistical analysis of an aggregated body of research on the same or similar topic or treatment approach. The purpose of a meta-analysis is usually to determine what the likely effect is of a particular treatment method based on all available studies that have used that treatment approach. The focus is usually on primary literature.

 

Narrative review

A review of the literature organized around a particular question or topic to provide novel insights or perspectives on existing literature. Narrative reviews are not systematic or exhaustive, but rather typically involve selection of representative studies, and may be biased or selective with respect to which studies are included for review.

 

Non-experimental group design

A design in which there is only a single group of children who receive treatment. In other words, there is no control group. In essence, this type of design is a group design equivalent of the case study design. This means that the design can be used to document improvement but not to conclude that the treatment was responsible for that improvement.

 

Peer review

A system of “quality control” in which experts in a specific area (for example, childhood apraxia of speech; treatment research design) review articles submitted to a professional scientific journal. These reviewers, usually two or three, are selected and invited by the Editor or Associate Editor of the journal; reviewers are NOT chosen by the authors and cannot have a conflict of interest with the authors. Reviewers provide the (Associate) Editor with a detailed written evaluation and a recommendation about publication. Recommendations typically involve one of the following: Accept, Minor Revision, Major Revision, Reject. The Associate Editor reads the paper and the reviewers’ evaluations and recommendation, and decides whether to accept, reject, or request revisions for the manuscript. Authors must address the reviewers’ concerns and revise the manuscript as needed before a manuscript is accepted for publication. In most cases, authors do not know who the reviewers are (single-blind review). In some cases, journals blind reviewers to the authors’ identities as well (double-blind review). It is important to note that peer-review does not by itself mean that the study is well-done (or important); however, peer-review is a quality control mechanism that does not exist for other types of publications (such as books and book chapters, blog posts, wikipedia, etc.), and therefore, peer-reviewed articles reflect the best available (gold standard) source of scientific information. Peer-reviewed articles are vetted carefully by independent, anonymous experts in the field.

 

Primary literature

Primary literature refers to peer-reviewed articles (in scientific journals) that present new data. This usually means that these articles include a “Methods” section and a “Results” section. Articles that review previously published studies but do not contribute new data are not considered primary literature but secondary literature. This includes systematic reviews and meta-analyses: these are secondary literature.

 

Quasi-experimental group design

A design in which there is a control group (a group that doesn’t receive treatment) but in which the assignment to groups is not random. Examples include forming groups based on matching speech disorder severity at baseline, or based on logistical factors (for example, children who can attend the university clinic are in the treatment group, and children who cannot attend the university clinic are in the control group). Studies with quasi-experimental group designs can document improvement and suggest that the treatment is responsible for that improvement, but not provide strong evidence that the treatment was responsible, because there may be systematic pre-existing differences between the groups that could explain the treatment effect (greater improvement for the treated group).

 

Randomized controlled trial (RCT)

A type of group design in which children are randomly assigned to different conditions (for example, treatment or no-treatment; Treatment A or Treatment B). The goal of randomly assigning children to conditions is to make the groups equal to one another for both known variables (for example, age, gender, speech disorder severity) and unknown or unmeasurable variables (for example, motivation, personality). RCTs are often considered to reflect the ‘gold standard’ of treatment research design because they can, in theory, control for all other potential explanations for improvement. RCTs do not provide detailed information about individual children and their responses to treatment; only group means are used.

 

Secondary literature

Secondary literature refers to peer-reviewed articles (in scientific journals) that review existing data. This includes systematic reviews and meta-analyses, as well as other peer-reviewed review articles such as the traditional narrative review.

 

Single-case experimental design study (aka single-subject design study)

Not to be confused with a case study design, a single-case experimental design (SCED) is a design in which a single participant undergoes different conditions (e.g., treatment vs. no treatment) under manipulation of the researcher, with some degree of experimental control to rule out potential alternative explanations. For example, such a design may have an extended baseline to control for maturation or normal fluctuations in performance; it may have treatment targets being treated at different times to determine whether improvements only occur once treatment is started; and it may test items that are never treated to control for maturation or effects of repeated testing. SCED studies vary in the degree of control over these other potential factors, so not all SCED studies are equally strong. SCED studies may also include replication across participants (for example, a study might include 6 children), but the critical feature is that each child is systematically compared to themselves – each child serves as their own comparison. Because the researcher manipulates whether/when treatment is provided and controls for certain other potential explanations, a SCED study can both document improvement and treatment effects (i.e., can provide evidence that the improvement was caused by the treatment). If the treatment works, then improvement should only be observed when treatment is applied. SCED studies provide detailed information about individual children but cannot be easily used to generalize to other children.

 

Single-subject design study

See Single-case experimental design study

 

Systematic review

A review of the literature pertinent to a particular question in which available databases are searched systematically and exhaustively, with a formalized method of evaluating the quality of included studies. Systematic reviews usually focus on primary literature. The systematic and exhaustive nature of the review is intended to minimize the risk of selection bias in choosing which studies to review. Systematic reviews are typically published in peer-reviewed scientific journals and are considered secondary literature.

 

Tertiary literature

Any literature that is neither primary literature nor secondary literature. This includes books, book chapters, blog posts, self-published articles, wikipedia or encyclopedia entries, conference posters, conference proceedings (unless published in a peer-reviewed journal). Tertiary literature is the least credible source of information because it is not subject to an independent, anonymous vetting by experts.

 

Treatment effect

A change in performance (usually improvement) that is likely due to the treatment provided. A treatment effect requires evidence that treated items or groups improve more than untreated items or groups. If both treated and untreated items/groups improve by the same amount, there is no treatment effect: no evidence that the treatment was responsible.

 

Treatment effectiveness

The likelihood that a treatment results in a treatment effect under routine circumstances. “Routine circumstances” is typically interpreted as in real life, in real-world clinical settings, with a sample that reflects the typical clinical population (for example, with comorbid diagnoses). This contrasts with treatment efficacy. Studies of treatment efficacy are typically conducted before studies of treatment effectiveness.

 

Treatment efficacy

The likelihood that a treatment results in a treatment effect under optimal circumstances. “Optimal circumstances” is typically interpreted as in a research lab, with a carefully selected sample of children. This contrasts with treatment effectiveness.



Credentials:
Hours of Operation:
Treatment locations:
Address:

,
Phone:
Email:

Overall Treatment Approach:
   

Percent of CAS cases:

Parent Involvement:
   

Community Involvement:
   

Professional consultation/collaboration:

Min Age Treated:

Max Age Treated:

Insurance Accepted: