A Note on the Application of Advanced Statistical Methods in Medical Research
Introduction
The data on many aspect of life science are both categorical and
numerically measured values. Some of these data are observed from
controlled and/or random experiments. Whatever be the source of data,
the analysis of these helps the researchers to infer about the
characteristics of the population from which the data are collected. The
researchers in the field of medical science often face the problem in
doing analysis of the collected data due to lack of knowledge of proper
statistical techniques suitable for a particular data. For proper
conclusion about the parameter of the population, it needs the
application of the proper analytical techniques. Once a technique is
identified, the analysis can be performed using any of the Statistical
Packages. The main aspect of the data in bioscience is related to public
health and main aspect of analysis is to suggest ways and means to
control the disease and to suggest the methods so that premature death
can be reduced, specially the child and infant death. For some
countries, the failure of birth control is also a health hazard. Thus,
the health planners need the proper analytical findings in the field of
medical science and in the field of other aspects of bioscience.
The empirical analysis in the field of health science needs data
related to health hazard collected from several units suffering from any
health problem or expect to suffer from some communicable or
non-communicable diseases. Any unit under study may provide different
types of information (information of values of variable). As the values
of the variables are collected from each member of the investigated
units, the variables are expected to be correlated. We usually recognize
these collected data as Multivariate Data. There are different methods
to handle these Multivariate Data. In this note, application of some of
the multivariate techniques are discussed. The multivariate analysis has
two main aspects, viz.
a)Dependence analysis, and
b)Interdependence analysis
The multivariate regression analysis including logistic regression
analysis, discriminant analysis, and multivariate analysis of variance
are the topics of dependence analysis. The
interdependence analysis, also known as data reduction technique, deals
with principle component analysis, factor analysis, cluster analysis and
canonical correlation analysis. The data, multivariate or uni-variate,
are collected according to some pre-determined objectives. The
analytical plan is also pre-determined so that proper conclusion can be
drawn according to the objectives. The whole activities along with
statistical interpretation are presented concisely. This presentation of
analytical results is known as reporting writing. However, the
presentation of analytical results along with different activities of
the research work varies from work to work and it also varies with the
variation of objective of the research. Let us now discuss some of the
application of the multivariate analysis using a part of the real data
collected by Urmi and Bhuyan [1]. They have already done some analysis
and presented in some of the research papers published in home and
abroad [2].
Application of the Multivariate Analysis
Multiple Regression Analysis
For regression analysis the general consideration is that when n
sample units are investigated to collect information on several
variables, it may happen that some of the variables are interrelated.
For example, prevalence of diabetes [ yes= 1, no = 0] and level of BMI
(kg /m2) are interrelated along with other variables, viz. age, income,
residence, level of education, marital status, occupation, gender,
smoking habit, physical labor, food habit, habituated in processed food,
restaurant food, etc. These factors mentioned here are interrelated and
some the factors depend on income. Again, income depends on level of
education and profession. If it is expected that blood sugar level (y
mmol /l ) of any person, children or adolescent or adult depends on age
(x1 years ) , height (x2 meter ), weight (x3 kg ) food habit ( x4, taking more protein = 3, more rice = 2 more sugar product = 1 and healthy food = 0 ) and family income (x5 in thousand taka ), multiple regression analysis of y on x1 , x2 , x3 , x4 and x5 can be performed , where the multiple regression model is given by
y = B0+B1x1+ B2x2+……….. + B5x5+ e
In general y depends on k (= 5) explanatory variables x’s, e is
a random component which is inserted in the model to study the
impacts of other variables which are not included in the model. The
objective of the study is to estimate the parameters Bi’s and to test
the significance of these parameters. Under usual assumptions, the
analysis can be done. Using a part of the data of Atika and Bhuyan
[1] the analytical results of regression were shown in Tables 1 &
2. The analysis was done using 125 observations. It was observed
that blood sugar level significantly dependent on income. Here
for the conclusion made was dependent on the assumption that
the explanatory variables (x’s) were independent and the random
component (e) was normally and independently distributed with
mean zero and with common variance (Tables 1 & 2).
Logistic Regression Analysis
Let us consider that the prevalence of diabetes (y = 1 for
diabetic patient, y = 0 non-diabetic person) depends on some of
the variables discussed above. Here dependent variable is a binary
variable (indicator variable) instead of a continuous variable.
Thus, usual multiple regression analysis is not suitable to study
the effect of explanatory variables on the dependent variable. To
overcome the problem, Logistic regression analysis is to be done.
As an example, using the same set of data as mentioned above, the
logistic regression analysis was performed, and the results were
presented in Table 3. This analysis also indicated that the income
and height of the respondent had significant impacts on prevalence
of diabetes (Table 3). As a further example of logistic regression,
let us consider the analytical results of data on smoking habit of
students of American International University - Bangladesh [3,4],
where smoking habit [yes=1, no=0 ] was considered as dependent
variable and age of students, father’s education, mother’s education,
family income, residential origin and knowledge of health hazard
of students were considered as dependent variable. The analytical
results were shown in Table 4. The analytical results showed that
smoking habit was significantly influenced due to the variable age
of students, residential origin and knowledge regarding health
hazard of tobacco smoking. The example is a case of binary logistic
regression, where dependent variable is classified into two classes.
The dependent variable can also be classified into several classes
and we can do the similar analysis.
Canonical Correlation Analysis
In a separate study [5] it was observed that smoking habit
and awareness regarding health hazard of tobacco smoking were
significantly associated. Again, these two variables were related
with other socioeconomic characteristics. Hence, Bhuyan and Urmi
(2018) decided to observe the joint relationship of the variable’s
awareness of the health hazard of tobacco smoking with other
socioeconomic variables. This was done by Canonical Correlation
analysis, which is also a component of multivariate analysis (Table
5).
Factor Analysis
It is a multivariate technique to reduce the data. For example,
a diabetic patient came to a doctor for treatment. Before start of
treatment, doctor needs to know about the height, weight, BMI,
along with other characteristics. But BMI depends on height and
weight. Here BMI is a common factor. Thus, instead of observing
height, weight and BMI, it is better to observe only BMI and BMI
will help to provide a conclusive decision regarding decision the
prevalence of diabetes. Here instead of studying many variables,
some common factors can be identified for conclusion. The
technique of selection of few factors for further analysis known as
Factor analysis and it is a data reduction technique. The factors are
selected in such a way that most (around 90%) of the variability
of the data set is explained by the selected factors. One of the
selection procedure is Principle Analysis and principle component
analysis is another technique of interdependence analysis. As an
example of the factor analysis, the data mentioned above had been
used to select some the important factors to study the prevalence
of diabetes. The factor analysis provided two important factors to
study the variability in the data set of the prevalence of diabetes.
The two factors explained 65% inherent variation in the data set.
The first component indicated that the prevalence of diabetes was
mainly for body weight followed by age and height. The analytical
results were presented in the following Table 6.
Discriminant Analysis
It is also a multivariate technique in which a set of data can
be classified into several classes according to some indicator
variable and mathematical method is applied to discriminate the
sample units so that some important variables are identified for
the discrimination of the group of observations. For example, let us
consider that the sample units are classified as diabetic and nondiabetic.
It was observed in some studies that [5,6] diabetic and nondiabetic
people were significantly different due to socioeconomic
variables and some variables were very important to discriminate
the two groups. Bhuyan et al [6] have done such an analysis to
discriminate the students of public and private universities in
respect of some social characters. There are different mathematical
steps to estimate the discriminant scores for the students. Later on
the correlation coefficients of each variable and the discriminant
scores are calculated to identify the important factors for two
groups of students are discriminated. The correlation coefficients
between variables and discriminate scores are shown in Table 7.The analysis provides information that public and private
university students were significantly different in respect of their
social background. Education of parent was very much influencing
in discriminating the students of public and private universities.
The second important factor is the residential origin followed
by age of students. More urban students and students of higher
ages are admitted in private universities. Smoking habit was not
significantly different between two groups of students (r = 0.002)
(Table 7). As a further example of discriminant analysis, the
analysis presented by Fardus and Bhuyan [7] may be mentioned.
In that paper, the diabetic patients of some urban and rural areas in
Bangladesh were discriminated by the types of diabetes. Including
one unknown type, the patients were classified into 4 types of
diabetes and 3 significantly different discriminant functions were
derived. The major cause of discrimination of the patients were
studied by the correlation coefficients of the variables and the
discriminant scores. The significant correlation coefficients were
presented in the following Table 8. The first function discriminated
well among the groups of patients and the variables age and
education followed by residence were important to discriminate
among patients of different types of diabetes. The second function
discriminated well among the patients of different groups and
the important variables age, income followed by education were
identified for discrimination.The third function discriminated well among the patients of
different types of diabetes and the variables age and residence were
identified very important for discrimination. Further statistical
analysis in Medical Science are investigation of association of two
characteristics and hence to study more prevalence of a particular
characteristic. As an example, let us consider the data of used by
Urmi and Bhuyan [1], where the association of level of obesity
and prevalence of diabetes were studied. The results are shown
below. It was observed that the prevalence of diabetes and level of
obesity were significantly associated (p-value =0.033). Form the
study of odd ratio it was observed that the overweight and obese
group had 69% more chance to be affected by diabetes than the
non-obese group. The risk ratio for this group is 1.47 (Table 9). In
the above analysis both the variables used are qualitative in nature.
These variables do not follow normal distribution. Most of the test
statistics are based on the assumption of normalty of the data. But
the test is valid as it is a non-parametric test. Other non-parametric
test are also used in the analysis of data of medical science. The
study of health hazard and survival analysis are other two aspects
of analysis of data related to medical science. In this note a short
review of multivariate analysis was presented. For further analysis
one can go through the books on applied multivariate analysis [8].
Cds and Cdse Quantum Dot Solar Cells Production
and Improving Efficiency of the Cells by Ion-Doped
Quantum Points Cds Holmium - https://biomedres01.blogspot.com/2020/03/cds-and-cdse-quantum-dot-solar-cells.html
More BJSTR Articles : https://biomedres01.blogspot.com
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.