What Is a Test—and What is Not a Test?
A Review of the Literature
By Dr. Eric F. Grosse, Ed.D
Important Definitions
Understanding what is—and what is not—a test begins with some important
definitions from the broader world of assessment. These definitions, and
their authors, are listed below, in alphabetical order.
assess: systematically collecting information (including but
not limited to quantitative data) without making judgments of worth
(adapted from Shrock & Coscarelli, 1996, pg. 8).
criterion-referenced tests: tests that compare people against
a standard (adapted from Shrock & Coscarelli, 1996, pg. 1).
evaluate: the process of making judgments regarding the appropriateness
of some person, program, process or product relative to a specific purpose
(Shrock & Coscarelli, 1996, pg. 8).
measure: collecting quantitative data about a specific activity,
event, process, product or other observable phenomenon (my own definition).
norm-referenced tests: tests that compare people against each
other; also known as standardized tests (adapted from Shrock
& Coscarelli, 1996, pg. 2).
psychological test: a set of questions, problems, or tasks designed
to elicit responses for use in measuring the traits, capacities, or
achievements of an individual. Examples include intelligence tests,
achievement tests, and aptitude tests (Aiken, 1998, pgs.1, 6, 10).
survey: a tool designed to gather information about an individual’s
attitudes, preferences, behavioral intentions, observations, and opinions.
Purposes include measuring employee relations, morale, and involvement;
predicting organizational outcomes; segmenting a population based on
common characteristics; and comparing employee perspectives across organizations
(adapted from Kraut, 1996, pg. 20-21, 24, 25).
test: a deliberate attempt by people to acquire information
about themselves or others (Westgaard, 1999, pg.1).
testing: the process of collecting quantitative information
about the degree to which a competence or ability is present in a test
taker, based on responses to questions where there are right and wrong
answers (adapted from Shrock & Coscarelli, 1996, pg. 7).
Testing in a Corporate or Government Setting
Testing is widespread in most corporate and government settings. Specific
to training, however, there are several conditions and/or limitations
to testing that deserve attention.
- A survey is not a test. Surveys merely gather information;
they don’t have an evaluation component as has been defined above. Kirkpatrick
Level 1 instruments are properly classified as surveys.
- All tests are not created equal. As the definitions for criterion
and norm-referenced tests indicate, there are enormous differences between
these two major types of tests. If these test types are used inappropriately
or interchanged, there are serious consequences.
- All tests first must be validated. Validation is a process
that determines, among other things, that what is asked on a test accurately
reflects the content that was taught ("content validity")
and makes sense to an "educated" test-taker ("face validity").
- Tests can be used to evaluate personal mastery only after they
have been validated. Un-validated tests are routinely—and successfully—challenged
by labor unions and in court when the result has been to deny an employee
a promotion, job transfer, or other desirable outcome, or has been used
to discharge an employee deemed by test results to be incompetent.
- Tests used to evaluate personal mastery must be based on competency
statements tied to specific job tasks. This is, in essence, another
level of test validity known as criterion validity. Tests that
are simply requirements for employees to demonstrate successful memorization
of facts, figures, relationships, etc. generally don’t pass the standard
of criterion validity.
- Tests that meet the standards of face, content, and criterion validity
and are deemed reliable across time and essentially equal populations
can be used successfully by any organization. When the proper developmental
standards for a test have been met—and employees who don’t pass a test
are provided an opportunity for remediation and re-testing—most organizations
are willing to use the results of tests as a piece of the employee performance
appraisal process.
Main Advantages and Disadvantages of Different Types
of Assessment Instruments
Type of Assessment
Instrument
|
Advantages |
Disadvantages |
| Ability tests |
- Mental ability tests are among the most useful predictors
of performance across a wide variety of jobs
- Are usually easy and inexpensive to administer
|
- Use of ability tests can result in high levels of adverse
impact
- Physical ability tests can be costly to develop and administer
|
Achievement/
proficiency tests |
- In general, job knowledge and work-sample tests have relatively
high validity
- Job knowledge tests are generally easy and inexpensive
to administer
- Work-sample tests usually result in less adverse impact
than ability tests and written knowledge tests
|
- Written job knowledge tests can result in adverse impact
- Work-sample tests can be expensive to develop and administer
|
| Biodata inventories |
- Easy and inexpensive to administer
- Some validity evidence exists
- May help to reduce adverse impact when used in conjunction
with other tests and procedures
|
- Privacy concerns may be an issue with some questions
- Faking is a concern (information should be verified when
possible
|
| Employment interviews |
- Structured interviews, based on job analyses, tend to
be valid
- May reduce adverse impact if used in conjunction with
other tests
|
- Unstructured interviews typically have poor validity
- Skill of the interviewer is critical to the quality of
interview (interviewer training can help)
|
| Personality inventories |
- Usually do not result in adverse impact
- Predictive validity evidence exists for some personality
inventories in specific situations
- May help to reduce adverse impact when used in conjunction
with other tests and procedures
- Easy and inexpensive to administer
|
- Need to distinguish between clinical and employment-oriented
personality inventories in terms of their purpose and use
- Possibility of faking or providing socially desirable
answers
- Concern about invasion of privacy (use only as part of
a broader assessment battery)
|
Honesty/integrity
measures
|
- Usually do not result in adverse impact
- Have been shown to be valid in some cases
- Easy and inexpensive to administer
|
- Strong concerns about invasion of privacy (use only as
part of a broader assessment battery)
- Possibility of faking or providing socially desirable
answers
- Test users may require special qualifications for administration
and interpretation of test scores
- Should not be used with current employees
- Some states restrict use of honesty and integrity tests
|
Education
and experience
requirements
|
- Can be useful for certain technical, professional, and
higher level jobs to guard against gross mismatch or incompetence
|
- In some cases, it is difficult to demonstrate job relatedness
and business necessity of education and experience requirements
|
| Recommendations
and reference checks |
- Can be used to verify information previously provided
by applicants
- Can serve as protection against potential negligent hiring
lawsuits
- May encourage applicants to provide more accurate information
|
- Reports are almost always positive; they do not typically
help differentiate between good workers and poor workers
|
| Assessment centers |
- Good predictors of job and training performance, managerial
potential, and leadership ability
- Apply the whole-person approach to personnel assessment
|
- Can be expensive to develop and administer
- Specialized training required for assessors; their skill
is essential to the quality of assessment centers
|
| Medical examinations |
- Can help ensure a safe work environment when use is consistent
with relevant federal, state, and local laws
|
- Cannot be administered prior to making a job offer
- Restrictions apply to administering to applicants postoffer
or to current employees
- There is a risk of violating applicable regulations (a
written policy, consistent with all relevant laws, should
be established to govern the entire medical testing program)
|
| Drug and alcohol
tests |
- Can help ensure a safe and favorable work environment
when program is consistent with relevant federal, state,
and local laws
|
- An alcohol test is considered a medical exam and applicable
law restricting medical examination in employment must be
followed
- There is a risk of violating applicable regulations (a
written policy, consistent with all relevant laws, should
be established to govern the entire drug or alcohol testing
program)
|
|
Checklist For Evaluating a Test
- Characteristic to be measured by test (skill, ability, personality
trait)
- Job/training characteristic to be assessed
- Candidate population (education, or experience level, other background)
- Test name
- Version
- Type (paper-and-pencil, computer)
- Alternate forms available?
- Scoring method (hand-scored, machine-scored)
- Technical considerations
- Reliability: r=
- Validity: r=
- Reference/norm group
- Test fairness evidence
- Adverse impact evidence
- Applicability (indicate any special group)
- Administration considerations
- Administration time
- Materials needed (include start-up costs, operational and scoring
cost)
- Costs
- Facilities needed
- Staffing requirements
- Training requirements
- Other considerations (consider clarity, comprehensiveness, utility)
- Quality of Test manual
- Supporting documents available from the publisher
- Quality of publisher assistance
- Independent reviews
- Overall evaluation
References:
- Aiken, Lewis R. (1998). Tests & Examinations: Measuring Abilities
and Performance. New York: John Wiley & Sons, Inc.
- Kraut, Allen I. (Ed.). (1996). Organizational Surveys: Tools for
Assessment and Change. San Francisco: Jossey-Bass.
- Shrock, Sharon A. and William C.C. Coscarelli. (1996). Criterion-Referenced
Test Development: Technical and Legal Guidelines for Corporate Training.
Washington DC: International Society for Performance Improvement.
- Westgaard, Odin. (1999). Tests That Work: Designing and Delivering
Fair and Practical Measurement Tools in the Workplace. San Francisco:
Jossey-Bass Pfeiffer.
- [no author]. (1999). Testing and Assessment: An Employer’s Guide
to Good Practices. Washington DC: U.S. Department of Labor, Employment
& Training Administration.
|