Reliability and Concurrent Validity of the BruininksOseretsky Test in Children with Cerebral Palsy
Cerebral palsy (CP) is the leading cause of motor disability in children in developed countries, affecting 2 to 3.5 per 1000 livebirths worldwide [1]. Since clinical presentation varies widely, with three CP subtypes accepted nowadays (spastic, dyskinetic and ataxic), it is important to perform a comprehensive and reliable evaluation of motor function according to the International Classification of Functioning, Disabilities and Health’s (ICF) framework, to enable better clinical decision making and follow-up [1-3]. In the activities domain of the ICF, two validated classification systems for gross motor function are the Gross Motor Function Classification System (GMFCS) [4] and the Manual Ability Classification System (MACS) [5]. Both provide a quick global picture of the activity’s limitations of the child, one focusing on upper and one on lower-limb abilities. Despite their fast administration time and their usefulness in classifying gross motor function in CP, they do not provide detail as to which areas the child is most impaired in. These are limits that they use in follow up and in the modulation of therapeutic strategies. Another reference tool is to evaluate gross motor function in CP is the Gross Motor Function Measure-66 (GMFM-66) [6,7].
It is a standardised validated observational instrument computed to measure change in gross motor function in CP children. However, its administration time is up to 60 minutes and it does not assess fine motor proficiency. It is also less well suited for more functioning CP children, due to its ceiling effect [8]. Various standardised tools exist to assess both global and fine motor proficiency in healthy children, mainly differing in the age target. A well-known tool is the Bruininks-Oseretsky test, second edition (BOT2) [9]. It is a standardised tool that assesses global and fine motor proficiency in healthy children aged 4-21 [9,10]. Originally published in 1978, it was revised in 2005 (BOT second edition: BOT2) [9]. It includes a complete (BOT2-CF) and a short form (BOT2-SF). It evaluates the activities domain of the ICF and is regularly used in the evaluation and follow up of CP children, but has not been validated, to our knowledge, in this population.
Clinimetric properties of a test should be studied before using it in clinical routine, according to the Cosmin Taxonomy guidelines [11]. Validity corresponds to a test’s ability to measure what it claims to be measuring. Concurrent validity is studied by comparing the results of a given test to those of another, already validated one, that measures the same parameter [12]. Reliability determines whether the test is able to provide the same results on repeated measures in the same subject when applied by the same evaluator (intra-rater reliability) or by two different evaluators (inter-rater reliability) [13]. Measurement error and internal consistency must also be studied to evaluate reliability, according to Cosmin Taxonomy [11]. Validity and reliability of the BOT2 have been examined in healthy children [9,14,15] but never in CP [16]. We thus set out to assess concurrent validity and reliability of the BOT2 in CP children.
Materials and Methods
BOT2-CF
Two versions of the BOT were assessed: the BOT2-CF and the BOT2-SF. The BOT2-CF is divided into 4 motor area composites, each including 2 sub-tests (8 overall), which in term, regroup various items (46 overall). The 4 motor area composites are fine manual control (d440 Fine hand use), manual coordination (d445 Hand and arm use), body coordination (d415 Maintaining a body position) and strength & agility (d446 Fine foot use). Each composite is scored separately and a global score over 320 is obtained [9]. Higher scores account for better motor proficiency. The BOT2-CF is a thorough test; however, the administration time is up to one hour. This can be a limitation for children with attention deficits [17] and regarding human resources. We thus decided to extract the items that evaluate upper-limb function. We called this section of the test BOT2-CF, upper-limb evaluation (BOT2-UL). The five sub-tests that evaluate upper-limb function are fine motor precision, fine motor integration, manual dexterity, upper-limb coordination and bilateral coordination. The 29 items of the BOT2- UL are summarised in Table 1. A score over 172 was attributed to each child.
BOT2-SF
The BOT2-SF is a summary of the CF. It is divided into the same 4 motor area composites, each including the same 2 sub-tests, and 14 items were selected from the 46 original ones, thus shortening its administration time to 30 minutes. Items of each sub-test are summarised in Table 2. An overall score over 88 is calculated [9].
Participants
Fifteen CP children, aged 4-21, were evaluated with the BOT2- UL and fifteen with the BOT2-SF. Children were included if they had a diagnosis of CP, a MACS≤4, a GMFCS≤3. They were excluded it they presented other concurrent progressive neurological disorders or severe cognitive impairment impeding them to understand instructions. Participants were recruited from two specialised schools in Belgium (“Centre Belge d’Education Thérapeutique pour Infirmes Moteurs Cérébraux” (CBIMC) and “Institut Royal d’Accueil pour le Handicap Moteur”). The study was approved by the UCL’s Hospital-Faculty Biomedical Ethics Commission and parents signed an informed consent form. Participant’s characteristics are reported in Table 3. A prospective cohort study was performed.
Assessments
Children were evaluated twice by two different evaluators (A and B) on day 1, and maximum one week later (day 2, 4.8±1.4 days) again by evaluator A. Regular activities were carried out as usual between the evaluations. Given that the BOT2 evaluates the dominant side, in children with diplegia and quadriplegia, the dominant upper limb was tested. This was determined by presenting a pencil or a tennis ball to the child and recording which hand the child would take it with. However, in hemiplegic children, the affected side was tested, given that the aim of our study was to assess the BOT2-UL and BOT2-SF as tools to evaluate motor defect. Children were examined in a quiet room, with only the evaluator present. The duration of each test was approximately 30 minutes.
Statistical Analysis
Calculations were performed with the SPSS software (SPSS v22.0.0.1 for Windows ; IBM SPSS ; Armonk, NY, USA). For each test, statistical significance was considered at 0.05.
Concurrent Validity: Concurrent validity allows us to confirm that a certain test measures that for what it was computed. A nonparametric Spearman correlation was performed between the BOT2-SF results and the MACS and GMFCS levels; and of the BOT2- UL results and the MACS level. We didn’t evaluate the correlation between the BOT2-UL results and the GMFCS level because the latter refers mainly to lower-limb activity. Correlation was considered good, moderate or poor if the correlation coefficient (ρ) was >0.6, 0.3 < ρ <0.6 or ρ <0.3, respectively [18].
Reliability: Two aspects of reliability were studied, according to Cosmin taxonomy [11] : internal consistency and inter- and intrarater reliability. Internal consistency corresponds to the degree to which items are measuring the same construct. Cronbach’s α coefficient was calculated with the results for each subset score from the first evaluation, both for the BOT2-UL and for the BOT2- SF. Cronbach’s α coefficient was considered acceptable, good and excellent, above 0.7, 0.8 and 0.9, respectively [19]. Inter- and intrarater reliability were quantified with the Intra-class Correlation Coefficient (ICC) and the Minimal Detectable Change (MDC) [20]. We calculated intra-rater reliability by comparing the results obtained by the same evaluator (A, performed on two different days, maximum one week apart) and interrater reliability by comparing the results obtained on the first day by two different evaluators (A and B, performed the same day).
The ICC is related to the variability of results across repeated measures within the subjects (i.e. between subjects’ variability) and to the measurement error (i.e. within-subject) [21]. For interand intra-rater reliability, ICC were respectively calculated with a two-ways mixed-effects model with “absolute agreement” and “consistency” types 5. Reliability was rated as excellent, moderate or poor, with ICC scores >0.75, 0.40–0.75 and <0.40, respectively [22]. MDC corresponds to the minimal change that exceeds the measurement error in score. The MDC within a 95% confidence interval (MDC95) was calculated as follows :
where 1.96 corresponds to the 95% confidence interval of the z-score of a bilateral test, and √2 is used to account for the variance between 2 measurements. Standard error of measurement (SEM) is related to measurement error throughout repeated measures and was calculated as follows:
where SDx is the standard deviation for all observations from test sessions [23].
Results
All subjects were able to perform the three evaluations. All the results are presented in Table 4, illustrated in Figures 1 & 2, and are summarized below. We obtained a homogenous distribution of the participants throughout the scores, as shown in Figure 1.
A1= results from evaluation 1, performed by rater A on day 1. B1= results from evaluation 2, performed by rater B on day 1. A2= results from evaluation 3, performed by rater A on day 2 (4.8±1.4 days later). Median [Q1-Q3]. ρ: correlation coefficient. ICC= intra-class correlation coefficient. MDC95= minimal detectable change. *p-value<0.05.
Concurrent Validity
To assess concurrent validity, we compared the results of the BOT2-UL to the MACS level, and those of the BOT2-SF to the MACS and GMFCS level. Results are presented in Table 4 and Figure 2. An excellent inverse correlation was found between the BOT2-UL results and the MACS level (ρ: -0.81, p-value: 0.001) and a good inverse correlation was found between the BOT2-SF results and the MACS level (ρ: -0.64, p-value: 0.007), meaning that children with a higher MACS level, and therefore more severe manual impairment, obtained lower results both on the BOT2-UL and on the BOT2-SF. No significant correlation was found between the BOT2-SF results and the GMFCS score (ρ: -0.35, p-value: 0.19).
Reliability
Internal consistency of the BOT2-UL and SF were excellent and good, respectively (Cronbach’s α coefficient of 0.94 and 0.89, respectively), thus indicating sufficient homogeneity of both tests. For both tests, intra- and inter-rater reliability were excellent (ICC > 0.95). In other words, the results obtained by one same evaluator at two different times, or by two different evaluators, are comparable. For the BOT2-UL, the MDC95 for intra- and inter-rater reliability were 8.7 and 8.4, respectively. This indicates that when a same patient is assessed before and after a treatment, either by one same or two different evaluators, results must differ by around 9 points for them not to be attributed to measurement error. For the BOT2- SF, the MDC95 for intra- and inter-rater reliability were 9.5 and 5.8, respectively.
Discussion
The present study is the first to evaluate concurrent validity and reliability of the BOT2-UL and BOT2-SF in CP children. Our results suggest that the BOT2-UL and the BOT2-SF can be used as reliable, valid tools to assess gross motor function in CP children presenting a GMFCS level 1-3 and a MACS level 1-4 as we obtained a good inverse correlation with the MACS level, and excellent interand intra-rater reliability for both tests. The tests were also feasible, as all children were able to perform them.
Concurrent Validity
We found a good inverse correlation between the BOT2-UL results and the MACS level. For the BOT2-SF, we also found a good inverse correlation with the MACS level and a moderate inverse correlation with the GMFCS level, although the latter was not statistically significant. This is well illustrated in Figure 2C and could be explained by our small sample and the fact that it did not include children with GMFCS level ≥4. Moreover, the GMFCS level classifies children according to their functional ability based on self-initiated movement, focusing on sitting, transfers and the use of handheld mobility devices or wheeled mobility 4. Given the large heterogeneity in motor impairment in the CP population, an important limitation in self-initiated movement may not necessarily be associated with an important upper-limb impairment 1,17. This may result in inhomogeneous scores on the BOT2-SF, where only 21% of the items evaluate lower limbs motor function exclusively.
Our results on concurrent validity, both for the BOT2-UL and the BOT2-SF, correspond to Bruiniks original findings on healthy children. Few other studies validating the BOT2 exist and were carried out mainly in healthy children. For instance, Hassan et al. evaluated validity and reliability by comparing the sub-tests to the global score and validated the BOT2-SF in the Arab healthy population. Fransen et al. [24] investigated convergent and discriminant validity of the BOT2-SF in the Flemish population, comparing it to the Korper Koordination Test. They calculated a Pearson correlation and found a ρ=0.61 and validated it in this population [24]. However, no similar study was performed in CP children. To sum up, our results show that the BOT2-SF correlates significantly with the MACS level, suggesting that it is a valid tool to evaluate upper limb activities in CP children with MACS level 1-4. However, further studies are needed to confirm our findings regarding the correlation of the BOT2-SF and the GMFCS level.
Reliability
Reliability was assessed by calculating the ICC, the MDC [20] and internal consistency [11]. Reliability is defined as the extent to which measurements can be replicated and MDC corresponds to the change in score that exceeds measurement error and indicates whether the observed change in score is statistically significant [25]. The internal consistency of the total score was excellent for the BOT2-UL and good for the BOT2-SF. Our results are comparable to those obtained originally by Bruininks in healthy children (Cronbach’s α=0.95, ICC>0.92), as well as those obtained in children with intellectual disabilities (Cronbach’s α=0.92, ICC=0.99) [9,19]. Both the BOT2-UL and the BOT2-SF presented excellent ICC values and low MDC values (less than 10% of the overall score), both for intra- and inter-rater reliability. Low MDC values indicate greater responsiveness20. This could be a useful parameter for clinicians to objectify the progression of a patient while taking into account the measurement error [25].
MDC was found to be slightly lower for inter-rater than for intra-rater, which is quite unusual, as we would expect to find greater variability between different evaluators. This may be due to the moment of the week the tests were performed; inter-rater reliability was tested the same day, usually at the beginning of the week, whereas intra-rater reliability was calculated from results obtained at the beginning and the end of the week. Various factors may slightly influence the results, such as participation to physical activities or to rehabilitation sessions [26,27]. Our results, both for ICC and MDC values, were in accordance to those obtained by Bruininks et al. [9] for healthy children and those obtained by Lucas et al. [27] who studied the BOT2 in children with foetal-alcohol spectrum disorder (FASD), who also obtained lower MDC values for interrater than for intra-rater reliability [9,27]. Our absolute MDC values are also comparable to those obtained by Wuang et al. [19] who studied the BOT2 in children with intellectual deficiencies. To sum up, our results showed low MDC values both for the BOT2-UL and the BOT2-SF, thus suggesting a good responsiveness of both tests, making them appropriate for clinical follow-up. We did not observe a ceiling or floor effect in our sample, however, none of the children obtained scores higher than 75%, both for the BOT2-UL and BOT2-SF. We obtained a homogenous distribution of the participants throughout the scores, as shown in Figure 2. In Wuang et al. [19] study, ceiling and floor effect concerned less than 15% of the participants, which was considered acceptable.
Limits and Perspectives of the Study
One of the main limits of our study is our small sample (n=15 for each test), especially regarding the distribution in the different GMFCS levels (1, 2 and 3). However, we have obtained very reproducible results, suggesting that our results are robust. We have compared the BOT2 with MACS and GMFCS levels because these two classifications provide a global picture of the motor abilities of the CP child. It is an important starting point, but these findings need to be completed by comparing the BOT2 with GMFM66 and to other tests in the different ICF domains. The lower limb items of the BOT2-CF should be evaluated in a similar study to complete our findings.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.