de l’information
A total of 192 faults were detected in the framework at the time of writing. These faults occurred in 70 out
of 174 classes. The dichotomous dependent variable that we used in our study was the detection or non-
detection of a fault. If one or more faults are detected then the class is considered to be faulty, and if not
then it is considered not faulty.
3.3 Data Analysis Methods
3.3.1 Testing for a Confounding Effect
It is tempting to use a simple approach to test for a confounding effect of size: examine the association
between size and fault-proneness. If this association is not significant at a traditional alpha level, then
conclude that size is not different between cases and controls (and hence has no confounding effect),
and proceed with a usual univariate analysis.
However, it has been noted that this is an incorrect approach [38]. The reason is that traditional
significance testing places the burden of proof on rejecting the null hypothesis. This means that one has
to prove that the cases and controls do differ in size. In evaluating confounding potential, the burden of
proof should be in the opposite direction: before discarding the potential for confounding, the researcher
should demonstrate that cases and controls do not differ on size. This means controlling the Type II error
rather than the Type I error. Since one usually has no control over the sample size, this means setting
the alpha level to 0.25, 0.5, or even larger.
A simpler and more parsimonious approach is as follows. For an unmatched case-control study, a
measured confounding variable can be controlled through a regression adjustment [12][99]. A regression
adjustment entails including the confounder as another independent variable in a regression model. If the
regression coefficient of the object-oriented metric changes dramatically (in magnitude and statistical
significance) with and without the size variable, then this is a strong indication that there was indeed a
confounding effect [61]. This is further elaborated below.
3.3.2 Logistic Regression Model
Binary logistic regression is used to construct models when the dependent variable can only take on two
values, as in our case. It is most convenient to use a logistic regression (henceforth LR) model rather
than the contingency table analysis used earlier for illustrations since the model does not require
dichotomization of our product metrics.The general form of an LR model is:
π=
1+e1 β0+βixi i=1 Eqn. 1∑k
where π is the probability of a class having a fault, and the xi’s are the independent variables. The β
parameters are estimated through the (unconditional) maximization of a log-likelihood [61].
In a univariate analysis only one xi,
being validated:18x1, is included in the model, and this is the product metric that is
1
1+e β0+βix1π=Eqn. 2
When controlling for size, a second xi, x2, is included that measures size:
π=
1811+e β0+βix1+β2x2Eqn. 3 Conditional logistic regression is used when there has been matching in the case-control study and each matched set is treated
as a stratum in the analysis [12].
百度搜索“77cn”或“免费范文网”即可找到本站免费阅读全部范文。收藏本站方便下次阅读,免费范文网,提供经典小说教育文库The Confounding Effect of Class Size on The Validity of Obje(15)在线全文阅读。
相关推荐: