Skip to main content

Quasi-Identifier Complexity

Meaning

Quasi-Identifier Complexity refers to the degree of difficulty an adversary would face in linking a set of non-direct, seemingly innocuous data attributes—such as age range, gender, and specific clinical procedure codes—to an individual patient within a de-identified health dataset. High complexity is achieved when the combination of these attributes is highly diverse and non-unique, requiring an attacker to utilize numerous external data sources and sophisticated computational methods for re-identification. Conversely, low complexity suggests that the combination of attributes is sparse or highly specific, significantly increasing the linkage attack vulnerability. Managing this complexity is a primary technical challenge in responsible clinical data sharing.