AISP Toolkit Feb25 2025 - Flipbook - Page 32
Racial Equity in Data Collection
CENTERING RACIAL EQUITY THROUGHOUT THE DATA LIFE CYCLE
Data collection is the process of gathering information in an organized way. This process can involve
collecting primary data or using secondary information from another source. Both primary and
secondary data collection have signi昀椀cant bene昀椀ts and limitations.
Primary data collection requires administering an instrument, such as a form or a survey, to a
speci昀椀c population. This means that the data collection can be designed for the particular needs
of the project, ideally with the focus population in mind (see Oregon Health Authority in the Work
in Action). Primary data collection for speci昀椀c populations can also be challenging as a result of
cultural norms, stigma, distrust, and fear of misuse that can lead to inadequate response rates
and incomplete responses. Secondary data collection involves using data originally collected for
a different purpose. The reuse of administrative data—data collected during the routine process
of administering programs—is commonplace. However, because these data are not necessarily
collected for reuse, there are bene昀椀ts and risks that should be carefully considered (see Assessing
Risk & Bene昀椀t). Data minimization is an important principle for ethical data use, as collecting,
storing, using, and retaining data has implications for both privacy and environmental impact.12
Data minimization: The principle of limiting or minimizing the
collection, storage, and disclosure of data to only what is necessary
to accomplish a specific use.
All data are vulnerable to biases, inaccuracies, and missingness. Bias within administrative data is
commonplace and most often takes the form of selection bias (i.e., the individuals included in the data are
not random or do not represent the intended population), as these data tend to include communities that
are over-surveilled by government agencies. Con昀椀rmation bias (i.e., data used to con昀椀rm pre-existing
beliefs) is also a concern, due to the impacts of unexamined individual, institutional, and systemic racism
on data collection—for example, an intake form that does not list a signi昀椀cant racial or ethnic group within
the population. Missing, poor quality, or inaccurate data on demographics, including race, ethnicity,
language, and disability (RELD) or sexual orientation, gender identity, and expression (SOGIE), can also
erode validity and community relevance of the study outcomes.
Administrative data are often collected from intake paperwork, self-reported online applications,
service records, and participant surveys, so the information is in different formats and inconsistently
de昀椀ned across agencies, programs, and services. Su昀케cient metadata (i.e., data about data) is
essential to design valid and reliable analytic plans and to harmonize data prior to integration, yet is
not often created as part of data collection. This lack of data documentation is a signi昀椀cant risk that
can lead to misuse.
At the same time, administrative data also creates opportunities to better capture and understand
the experiences of individuals and subgroups. The sheer volume of information available across
administrative data sources may allow for programs with low-quality or missing demographic data
to utilize higher-quality sources. Integrated administrative data may also enable the exploration of
intersectional experiences (e.g., how queer youth of color experience health care) in a way that is not
12 Down on the server farm. (2008). The Economist.
28