MindTools.io scientific approach rests on the rigorous and continuous development of Enlight – a comprehensive suite of criteria-based measurements. Enlight currently covers 11 different quality constructs and checklists that are produced by trained raters. Based on a rigorous systematic review assessing different aspects of both eHealth and mHealth interventions, Enlight is the first suite of measures to incorporate behavior change, persuasive design, and therapeutic alliance concepts – concepts that have been found to affect a program’s therapeutic potential. Enlight was developed to enable the examination of eHealth interventions, regardless of delivery medium (website/mobile/suite of products/text messaging/etc.) or clinical aim (mental health, chronic conditions, health-related behaviors, etc.).
In recent testing, each of Enlight’s quality constructs showed excellent inter-rater reliability (ICC = .77-.98, median .91). This reliability enables stakeholders to objectively examine individual constructs and enables us to develop a formula that incorporates the constructs based on empirical findings. Rather than suggesting an overall average score, this formula weights each construct differently in order to more accurately evaluate a program’s potential.
Enlight’s Criterion Validity – Predicting User Engagement in Real World Use
In collaboration with Microsoft Research we examined the use patterns of 30 web-based applications, and more than 100,000 users, finding that Enlight scores predict which available programs were more engaging. This makes Enlight the first criteria based scale to demonstrate criterion validity in real world settings.
More about this study: We correlated the online activities of users of 30 web-based behavioral interventions – collected from a proprietary data set of anonymized logs from consenting users of Microsoft Internet Explorer add-on – with interventions’ quality ratings obtained by trained raters prior to empirical examination. The quality ratings included: Usability, Visual Design, User Engagement, Content, Therapeutic Persuasiveness (i.e. persuasive design and incorporation of behavior change techniques), and Therapeutic Alliance. Results indicated that Therapeutic Persuasiveness was the most robust predictor of adherence (i.e., duration of use, number of unique sessions; 40 ≤ rs ≤ .58, ps≤.005), explaining 42% of the variance in user adherence in our regression model. Results indicated up to six times difference in the percentage of users utilizing the interventions for more than a minimum amount of time and sessions based on Therapeutic Persuasiveness. Findings suggest the importance of persuasive design and behavior change techniques incorporation during the design and evaluation of digital behavioral interventions.
Baumel, A., & Yom-Tov, E. (2018). Predicting user adherence to behavioral eHealth interventions in the real world: Examining which aspects of intervention design matter most. Translational Behavioral Medicine, [Epub ahead of print]. doi: 10.1093/tbm/ibx037
Other Validity Data
In our first examination of the combined scores based on averaging the various constructs (a method that is less accurate than the proposed formula), we found a significant, positive, moderate correlation between the score that uses several constructs and a score that represents research evidence supporting program’s effectiveness (r = .35; all P=.001). This means that our score was in line with evidence of a program’s efficiency measured in rigorous trials. We continue to examine the validity of our tool in order to perfect the formula we are using. Researchers can find more information on how to use Enlight Quality Assessment for Research in our resource center (to be opened June 2017).
Baumel A, Faber K, Mathur N, Kane JM, Muench F. Enlight: A Comprehensive Quality and Therapeutic Potential Evaluation Tool for Mobile and Web-Based eHealth Interventions. J Med Internet Res 2017;19(3):e82. doi: 10.2196/jmir.7270
Enlight scale and more information about how to use Enlight Quality Assessment for Research can be found in our resource center starting June 2017.
Enlight Score’s Formula
As can be seen in our scale, Enlight’s quality assessment section consists of six different constructs, with scores ranging from 1 (Very Poor) to 5 (Very Good):
- Usability: Assesses the ease of learning how to use the program and the ease of utilizing it properly.
- Visual Design: Assesses the look and feel of the program, and the visual quality of the Graphical User Interface (GUI).
- User Engagement: Assesses the extent to which the program design attracts users to utilize it.
- Content: Assesses the content provided or learned while using the eHealth program.
- Therapeutic Persuasiveness: Assesses the extent to which the eHealth program is designed to encourage users to make positive behavior changes OR to maintain positive aspects of their life.
- Therapeutic Alliance: Assesses the ability of the program to create an alliance with the user in order to effect a beneficial change.
- Not all constructs have the same impact: Some constructs were found to be better at predicting engagement and efficacy, while some are barriers rather than facilitators of successful programs. For example, while it is quite simple to examine whether a program is “easy to use” (Usability) or visually appealing (Visual Design), beyond a certain threshold, higher scores no longer matter. Moreover, “ease of use” can be negatively correlated with other quality metrics, since very lean programs that lack sufficient content or features are often very easy to learn and use.
- The multimodal relationship between constructs means that some constructs are more impactful than others at certain quality levels. In our study, a conditional probability analysis revealed that 100% of the programs that received a score of fair or above (≥3.0) in Therapeutic Persuasiveness or Therapeutic Alliance received the same range of scores in User Engagement and Content – a pattern that did not appear in the opposite direction. This finding suggests that, above a certain score, some constructs are more important.
- Accuracy: The formula aims at increased sensitivity in identifying programs with high quality. It is not meant to be highly sensitive when it comes to programs with poor scores.
When assessing the program’s general potential to benefit its target audience in real-world use, we believe that programs achieving scores between 3.5 and 5 have reasonable potential to meet this criterion.
However, generally speaking, for people to find any of these tools helpful, a program must effectively target their needs and context of use; moreover, users must be motivated enough and able to use it. This means that programs with lower scores could still be helpful in cases where users find them more suited to their needs than programs with higher scores.
We therefore extend the 3.5 to 5 range to scores between 3 and 5, assuming that the lower the score, the more the program needs to foster users’ sense of effective targeting, motivation, ability, and human support in order for them to use it.
1 – 1.99 Very Poor; 2 – 3 poor; 3-3.5 Fair; 3.5 – 5 Good
- If Usability ≥ 3 and Visual Design ≥ 3 and Content ≥ 3, then Quality Score = (3*TP+1*Content + 1*UE + 0.5*TA)/5.5
- If Usability or Visual Design < 3, these constructs are added to the calculation with 1 weighted unit:
For example, if Usability < 3 and Visual Design ≥ 3, we will add only Usability:
Quality Score = (3*TP+1*Content + 1*UE + 1*Usability + + 0.5*TA)/6.5
- If Content < 3, we give it 3 times more weight:
Quality Score = (3*TP+3*Content + 1*UE + 0.5*TA)/7.5
Notes: This means the program will receive a poor score since at these levels TP and TA are also low; this step will be taken only if adding weight to the content reduces the app’s score.