IJCA - Volume I - Flipbook - Page 24
24 The International Journal of Conformity Assessment
job analysis results, then validated through
committee meetings and workshops. Use
and review the pilot test results. Operators,
supervisors, and trainers should participate in
the workshops.
9. Determine Passing Score: The passing score
for an exam should be set in accordance with
the purposes of the exam. The passing score is
defined as the minimum score required to pass
an exam to assure that the certificate-holder is
professionally competent.
10. Statistical Review: Statistically review results of
exams to identify problem questions. Questions
that perform poorly should be discontinued from
current use. These may be relayed back to the
examination committee for further review and
refinement.
Details of the Exam Validation Process
It is essential to involve subject matter experts in
all parts of the validation process. To qualify as a
subject matter expert, a person must have direct,
up-to-date experience with the job, and enough
experience to be familiar with all of the tasks.
Subject matter experts may include operators,
supervisors, trainers, or other individuals with
specialized knowledge about the job.
The principal steps normally taken for exam
validation include:
1. Conduct a job analysis
2. Develop and validate items
3. Develop an exam
4. Establish a passing (cut) score
Step 1. Conduct a Job Analysis
Conducting a job analysis is an essential first step
in establishing the content validity of certification
exams. A job analysis often lists the capabilities
(i.e., knowledge, skills, and abilities) required to
perform work tasks. Job analysis information may
be gathered by directly observing people currently
in the job, conducting interviews with experienced
supervisors and job incumbents, and through
questionnaires, personnel and equipment records,
and work manuals.
Workshops are held to identify the specific job
tasks and capabilities required for successful job
performance. During these workshops, subject
matter experts verify that the task statements
2022 | Volume 1, Issue 1
developed are technically correct, unambiguous,
and accurately reflect the job. Identification of
capabilities must be done on a task-by-task basis,
so that a link is established between each task
statement and requisite capability.
Job analysis information is central in deciding what
to test for and which tests to use
Step 2. Develop and Validate Items
Exam items are developed from the results of the
job analysis so that exams are representative of job
tasks. Once the new items are written, they must go
through a validation process, which includes:
1. Linking new questions to the results of the job
analysis. The purpose of this is to ensure that all
questions on the certification exam measure at
least one important aspect of an operator’s job.
During this process, subject matter experts are
asked to rate the extent to which the questions
reflect specific tasks in the job.
2. Analyzing questions for technical accuracy,
style, readability, and possible bias to subgroups. This is done to determine whether the
correct answer is the best answer, confirm the
distractors (incorrect answers) are wrong, and
verify that the question is free from bias with
respect to race, gender, and culture.
3. Reviewing items for job importance. Importance
ratings should reflect how well the question
distinguishes between effective and ineffective
job performance and if the knowledge tested
in the question is necessary for competent job
performance. The continued relevance of questions that have been validated must be ensured
through periodic reviews of the items by subject
matter experts. Evaluation of questions should
also be conducted through statistical analysis.
Of particular importance are the difficulty index
(the ratio of examinees that answer each question correctly) and the discrimination index (how
well the question distinguishes between the
more knowledgeable and less knowledgeable
examinees).
Conduct the Item Analysis
In this phase, statistical methods are used to identify any test items that are not working well. If an
item is too easy, too difficult, fails to show a difference between skilled and unskilled examinees, or
is scored incorrectly, an item analysis will reveal
it. The two most common statistics reported in an
item analysis are the item difficulty, which measures
the proportion of examinees who responded to an
item correctly, and the item discrimination, which
measures how well the item discriminates between
examinees who are knowledgeable in the content
area and those who are not.
Item Difficulty Index (pj) is the level of question difficulty that affects test validity. If the exam is merely
composed of difficult or easy questions, the distinction among the applicants cannot be determined
clearly. The exam is expected to have an intermediate level of difficulty and this level helps determine
the distinction among the applicants. Also, it is used
for internal consistency formulas.
It is denoted as:
n(D) : Number of participants that answered an item
correctly
N: Number of all participants that take exam
TABLE 1
Evaluation of Item Difficulty Index
Item Difficulty Index
Close to 1.00
About 0.50
Close to 0.00
Item Difficulty Level
easy
medium
difficult
For example, consider an exam with 20 participants
that contains multiple-choice questions. If a
question had 9/20 test takers answer it correctly,
this would then result in an Item Difficulty Index (pJ)
of 0.45, which would then classify this question as
“medium difficulty.” If a question, on the other hand,
had 19/20 test takers answer it correctly, this would
result in a pJ of 0.9, which would classify it as an
“easy difficulty” question.
Item Discrimination Index (r) is the efficiency of
test questions used to determine the distinction
among the applicants. It expresses the relationship
between the overall score and single-question
scores. It measures how well an item is able
to distinguish between examinees who are
knowledgeable and those who are not, or between
masters and non-masters. Item discrimination
efficiency is to be high for test reliability. When an
item discriminates negatively, overall, this means
the most knowledgeable examinees are getting the
item wrong and the least knowledgeable examinees
25
are getting the item right. A negative discrimination
index may indicate the item is measuring something
other than what the rest of the test is measuring.
More often, it is a sign that the item has been
miskeyed.
When interpreting the value of a discrimination, it
is important to be aware that there is a relationship
between an item’s difficulty index and its
discrimination index. If an item has a very high
(or very low) p-value, the potential value of the
discrimination index will be much less than if the
item has a midrange p-value. In other words, if an
item is either very easy or very hard, it is not likely to
be very discriminating.
There are over 20 discrimination indices used as
indicators of the item’s discrimination effectiveness
such as the index of discrimination (D), Henryson
discrimination index (rjx), point-biserial correlation
coefficient (rpbis ), biserial correlation coefficient
(rbis), etc.
TABLE 2
Evaluation of Item Discrimination Index
Item Discrimination Index
0.4 and above
0,30 - 0,39
0,20 - 0,29
0,19 and below
Item Discrimination Level
very well
reasonable
should be corrected
very poor, remove
from test
Some of the statistical formulas are given below.
Henryson discrimination index (rjx)
It is denoted as:
:Exam score average of those who answer the
item correctly
: Arithmetic mean of the exam scores
Sx: Standard deviation of the exam scores
pj: Item difficulty index of the item
qj : 1- pj
For example, consider an exam with 20 participants
that contains 45 multiple-choice questions. If the
arithmetic mean of the exam scores is 32.85 and
the standard deviation of the scores is 6.651, the
discrimination index (DI) level of certain questions
can then be examined based on the item difficulty
index, and the exam score average of correct
answers is as follows: