IJCA - Volume I - Flipbook - Page 26
26 The International Journal of Conformity Assessment
Question
Number*
pj
qj
1
0.45
0.55
2
0.75
37
rjx
DI Level
35.78
0.40
Very good
0.25
32.20
-0.17
Very poor
0.95
0.05
33.26
0.27
Should be
corrected
38
0.75
0.25
34.33
0.38
Reasonable
42
0.8
0.2
33.75
0.27
Should be
corrected
43
0.9
0.1
33.27
0.19
Should be
corrected
44
0.85
0.15
33.52
0.24
Should be
corrected
45
0.8
0.2
34.5
0.49
Very good
*Questions selected typically from a total of 45.
Index of discrimination (D)
When calculating the DI in accordance with the
simple method, the respondents are divided into
two groups (lower and upper groups) according to
the method. First, the total scores are calculated
according to the results obtained from the
measurement tool and ranked from highest to
lowest. The 27% group with the highest success is
taken as the upper group and the 27% group with
the lowest success is taken as the lower group.
The remaining 46% group is excluded from the
calculation.
It is denoted as:
Pu: proportion of test takers in the upper group who
get the item right
Pl: proportion of test takers in the lower group who
get the item right
For example, consider an exam with 20
participants that contains multiple-choice
questions. If a question had 67% of the upper
group getting it correct (Pu = 0.67) and 33% of
the lower group getting it correct (Pl = 0.33), then
Item Discrimination Index would be 0.33, which
would classify the discrimination as reasonable.
Meanwhile, if both the upper and lower groups got
the question correct (Pu = Pl = 1), this would result in
an Item Discrimination Index of 0 and imply that said
item discriminates very poorly.
Point-Biserial Correlation Coefficient (rpbis )
Point biserial in the context of an exam is a way
of measuring the consistency of the relationship
between a candidate’s overall exam mark (a
2022 | Volume 1, Issue 1
continuous variable—i.e., anywhere from 0-100%)
and a candidate’s item mark (a dichotomous
variable—i.e., only two possible outcomes). It gives
an indication of how strong or weak this correlation
is compared to the other items in that exam. In other
words, does the way in which candidates answer
an item help to indicate whether they are strong or
weak candidates?
It is denoted as:
M1:mean (for the entire test) of the group that
received the positive binary variable (i.e., the “1”)
M0: mean (for the entire test) of the group that
received the negative binary variable (i.e., the “0”)
Sn: standard deviation for the entire test
p: item difficulty index
For example, consider an exam with 20 participants
that contains 45 multiple-choice questions. If the
arithmetic mean of the exam scores is 32.85 and the
standard deviation of the scores is 6.651, the DI level
of certain questions can then be examined based
on the item difficulty index, the mean of the group of
test takers that answered correctly, and the mean of
test takers that answered incorrectly, as follows:
1
Pj
qj
0.45 0.55
A biserial correlation coefficient is almost the same
as point biserial correlation, but one of the variables
is dichotomous ordinal data and has an underlying
continuity.
It is denoted: as:
M1 :mean (for the entire test) of the group that
received the positive binary variable (i.e., the “1”)
M0: mean (for the entire test) of the group that
received the negative binary variable (i.e., the “0”)
Sn: standard deviation for the entire test
p : item difficulty index
q : (1 – p )
Y : Y ordinate of the normal distribution
corresponding to the p value.
Using Item Analysis on Essay-Type Questions
q: (1 – p )
Question
Number*
Biserial Correlation Coefficient ( rbis )
M1
35.78
M0
30.45
rpbis
DI Level
0.40
Very
good
Personnel certification bodies may want to evaluate
their candidates using various types of questions—
including essay, modified essay, short answer, and
multiple-choice types of questions. Among these,
the multiple-choice question (MCQ) is very common
and is the preferred type of question used in exams
due to the efficiency and reliability of scoring and
simplicity of analysis.
One of the most common tools used to assess
knowledge is the essay question. These evaluations
depend on test and item analysis, which consists of
analyzing individual questions as well as the whole
test. Although this activity could be done more
precisely in objective-type questions, it can also
apply to essay, structured essay, and short-answer
types of questions.
For item analysis, assessors must determine the
intermediate score ranges in accordance with the
maximum score that can be given to the essay-type
or short-answer question. This involves listing all
test takers’ marks for individual questions and in
accordance with aggregate marks scored, arranging
test takers in rank order (with the highest score
given on the top), and dividing test takers between
the high-ability group (HAG) and low-ability group
(LAG).
For example, if a question is given five points, each
answered question that achieves 5 to 3.5 marks
will be considered a correct answer (A). Each
answered question that achieves 3 to 2 marks will
be considered a near-to-correct answer (B). Each
answered question that achieves 1.5 to 0.5 marks
will be considered a near-to-incorrect answer (C).
Each answered question that achieves 0 marks will
Marks range
5.0 - 3.5
3.0-2.0
1.5-0.5
0
Designated sign
A
B
C
D
Total no. of considered
test takers
Q1
11
14
0
0
25
Q2
15
9
1
0
25
Q3
5
14
5
1
25
Q4
16
9
0
0
25
Q5
8
10
7
0
25
Q6
22
3
0
0
25
1
0
25
2
0.75 0.25
32.20
34.80
-0.17
Very
poor
3
0.45 0.55
35.78
30.45
0.40
Very
good
25
0.75 0.25
34.33
28.4
0.38
Reasonable
Q7
16
8
4
20
1
0
25
0.85 0.15
33.23
30.66
0.136
Very
poor
Q8
26
Q1
1
3
12
9
25
27
0.65 0.35
34.76
29.28
0.39
Very
good
Q2
1
22
2
0
25
37
38
0.95 0.05
0.75 0.25
33.26
34.33
25
28.4
*Questions selected typically from a total of 45.
0.27
0.38
Should
be corrected
Reasonable
27
No. of HAG test
takers
No. of LAG test
takers
Q3
0
5
15
5
25
Q4
2
13
8
2
25
Q5
0
2
20
3
25
Q6
4
18
3
0
25
Q7
3
14
3
5
25
Q8
0
12
11
2
25
Correct answer
Near to correct
answer
Near to incorrect
answer
Incorrect answer
Level of Correctness
For all given questions, no. of test taker obtained marks in different range.