Evaluating evaluations

Randomized Control Trials (RCTs) are well established as the best method of evaluating the effectiveness of interventions. However, Quasi Experimental Designs (QEDs), while scientifically less robust, are often chosen due to practical and ethical constraints. This raises the question of how to compare results obtained by the two differing evaluation methods and how can we have more confidence in findings from QEDs.

Mary Falconer and colleagues have raised these questions while re-evaluating a QED evaluation of Healthy Families Florida, a home visitation program to reduce childhood abuse and neglect.

Their paper starts by asking a question that many prevention scientists will have mulled over at some time: when and when not is a RCT appropriate? Answering their own question the research team points out it is a balancing act between getting the best evidence and the requirements of the situation. RCTs can be extremely invasive and, at worst, disrupt the services that are being evaluated. This, coupled with ethical concerns of allocating families to receive help, this point is particularly poignant when attempting to reduce child maltreatment.

Quoting a recent review of Healthy Families America (as Healthy Families Florida is known nationally) that used evidence from both randomized and quasi-experimental designs, the authors highlight that QED evaluation results consistently indicated larger positive effects on parenting behavior, maternal education and child cognition the those from RCTs. This caused the Florida team to ask whether it was possible that evaluation methods used are altering results.

Why would this be? Part of the explanation is randomization, the key ingredient found in RCTs. Quasi experiments are similar to RCTs in some regards - they have a control group and intervention group - but in a RCT the assignment of cases to each is random, meaning that factors other than the program cannot influence outcomes. This is because these other factors will be evenly distributed between groups.

This does not mean that the findings from QEDs cannot be relied on. The authors point out how evaluation of the QEDs and alteration to the methods used can demonstrably improve confidence in their findings. Leading by example, the research team of four offered an overview of how to improve the internal, external and construct validity of their own post-test QED evaluation of Healthy Families Florida.

Falconer’s team argues that internal validity is the most important for experiments. This is because it dictates the ability for an experiment to make conclusions on cause and effect, that is, if the program reduced incidents of child maltreatment.

Internal validity is also where RCTs and QEDs differ most. As mentioned, randomization in RCTs rule out systematic differences between groups. In QEDs absence of random assignment means these differences can exist – and, therefore, can confound results. In Falconer’s evaluation she and colleagues attempted to measure these systematic differences, and once they were identified their impact on results were reduced or eliminated through statistical adjustments.

The team’s analysis of differences between study groups turned up significant variations in age, race, education and risk of abuse and neglect which the team subsequently controlled for. Differences in dropout rates and missing data were also examined. While no differences were identified between groups, the paper highlights the importance of assessing if missing data is random or not. If missing data isn’t random it has potential to bias results, even in a RCT as this will lead to another systematic difference between groups.

External validity is also important as it permits findings to be applied generally to the program’s target population. While this is a important consideration for QEDs, Falconer et al are quick to point out external validity can be just as important to consider in RCTs. In fact, some of the exclusion criteria applied to previous RCT evaluations of Healthy Families America reduced the general application of results. In their own QED evaluation all those enrolled in the state program, along with those on the waiting list, were included into the study. This meant those families in the evaluation were exactly those whom the program aimed to serve.

The four authors cited construct validity which again is important to get right in both types of evaluation. Construct validity is the ability to measure correctly what the program aims to alter. Falconer’s evaluation used child protection service records of abuse and neglect as they were objective and more reliable than self-report measures, along with their ease of access. While resources did not- permit it for their own study, the Florida team argue that multiple measures of the outcomes of interest should be used. A further consideration for QEDs is measuring other factors that are likely to affect the outcomes measured, as this will allow testing and improvement of internal validity through statistical methods.

Controlling for these factors did not alter the positive results reported in earlier evaluations of the program. This allowed the research team to conclude: “The additional analysis presented in this paper and special attention to internal, construct and external validity provides additional evidence regarding the effectiveness of Healthy Families Florida in preventing child abuse and neglect.”

Reference:

M. Falconer, M. Clark and D. Parris. (2011) Validity in an evaluation of Health Families Florida – a program to prevent child abuse and neglect. Children and Youth services Review. 33, 66-77

Link:
www. Healthyfamilies America.org

Explainers

Healthy Families America

Healthy Families America is an early intervention program designed to address potential risk factors for child abuse and neglect where families are expecting a baby or have a child aged under three months.

quasi-experimental evaluations

An evaluation method in which children referred to a program or other intervention are compared with a group of matched children who do not receive the program.

randomized controlled trials

Sometimes referred to as experimental evaluations, randomized controlled trials or RCTs randomly allocate potential beneficiaries of an intervention to a program or treatment group (who receive the intervention) or a control group (who do not). Outcomes for the two groups are then compared.