Fact-checked by Grok 2 weeks ago

Private set intersection

Private set intersection (PSI) is a cryptographic protocol in secure multiparty computation that allows two or more parties, each holding a private set of elements from a common domain, to jointly compute the intersection of their sets—or some function of it, such as its size—while revealing no additional information about their inputs beyond the output.^[1] PSI protocols emerged as a practical application of secure two-party computation, with foundational work tracing back to the 1980s in general secure computation frameworks by Yao and Goldreich et al., but the first dedicated efficient PSI construction was introduced in 2004 using homomorphic encryption and oblivious pseudorandom functions to achieve sublinear communication and computation for sets of size up to thousands of elements.^[2]^[1] Subsequent advancements have focused on optimizing performance through techniques like oblivious transfer extensions, garbled Bloom filters, and lattice-based cryptography, enabling PSI for billion-scale sets in under 10 minutes on commodity hardware while supporting malicious adversaries.^[2]^[3]^[4] Security in PSI is typically analyzed under semi-honest (honest-but-curious) or malicious (arbitrary adversary) models, with semi-honest providing stronger efficiency guarantees in the standard model and malicious requiring additional mechanisms like zero-knowledge proofs for robustness in the random oracle model.^[1]^[2] Variants extend PSI to multi-party settings, unbalanced set sizes, or over-threshold intersections, addressing real-world asymmetries in data holdings.^[2] Notable applications include privacy-preserving data sharing in genomics and healthcare for identifying overlapping patient records without exposing sensitive details, contact tracing during pandemics to detect shared exposures, secure ad targeting in digital marketing to match user lists without revealing identities, and collaborative filtering in recommendation systems.^[2] These uses highlight PSI's role in enabling secure collaboration amid growing data privacy regulations like GDPR.^[5]

Definition and Motivation

Private set intersection (PSI) is motivated by the need for parties to compute the intersection of private datasets in untrusted environments, allowing secure collaboration without revealing sensitive non-overlapping information, as a practical application of secure multiparty computation.^[1]

Formal Definition

Private set intersection (PSI) is a cryptographic protocol within the framework of secure multi-party computation (SMPC) that enables two parties, typically denoted as Alice and Bob, to jointly compute the intersection of their private input sets without revealing any information beyond the output.^[6] Formally, Alice holds a private set X = \{x_1, \dots, x_n\} \subseteq \mathcal{D} of size n, and Bob holds a private set Y = \{y_1, \dots, y_m\} \subseteq \mathcal{D} of size m, where both sets are subsets of the same universe \mathcal{D}. The protocol allows them to compute either the cardinality |X \cap Y| or the actual elements of the intersection X \cap Y, while ensuring that neither party learns any information about elements of the other's set beyond the intersection and the output.^[1]^[7] PSI protocols operate under the assumption that the parties communicate over public but authenticated channels and that their inputs remain private throughout the execution, with no reliance on a trusted third party unless explicitly part of a variant.^[6] Variants of PSI differ primarily in the output distribution: in symmetric PSI, both parties receive the intersection or its cardinality; in asymmetric PSI, only one party (e.g., Alice) receives the output while the other learns nothing; and in cardinality-only PSI (PSI-CA), the parties compute solely the size |X \cap Y| without disclosing the elements.^[7] These formulations ensure that the protocol reveals no additional information about the non-intersecting elements, preserving the privacy of each party's dataset.^[1]

Applications

Private set intersection (PSI) enables two or more parties to jointly compute the intersection of their private datasets without revealing any non-intersecting elements, facilitating secure collaboration in scenarios where data privacy is paramount.^[8] This primitive underpins numerous real-world applications across industries, allowing entities to identify common items or records while adhering to privacy regulations and minimizing trust requirements.^[9] In contact discovery for messaging applications, PSI allows users to determine mutual contacts on a platform without uploading or exposing their full address books to the service provider. Cryptographic PSI protocols have been proposed and analyzed for this purpose, enabling efficient matching of hashed identifiers while ensuring the server learns nothing about non-mutual contacts.^[8] PSI supports privacy-preserving recommendation systems in domains such as dating apps and e-commerce, where it facilitates matching user profiles or preferences without disclosing full datasets. In dating platforms, fuzzy PSI extensions enable approximate matching of user attributes like interests or locations to suggest compatible pairs, reducing false positives to levels comparable to non-private methods while preserving individual privacy.^[10] For e-commerce, PSI allows retailers to intersect customer purchase histories with partner inventories for collaborative filtering, as demonstrated in structure-aware protocols that handle fuzzy matching for item recommendations without revealing sensitive transaction details.^[11] In medical data sharing, PSI enables healthcare providers to identify overlapping patient records for collaborative research or treatment without violating regulations like HIPAA. Hospitals can use PSI to find common cases across datasets for epidemiological studies, ensuring that only intersection sizes or aggregated insights are disclosed, as outlined in policy-enhanced protocols that enforce access controls on shared intersections.^[9] This approach has been integrated into secure data-sharing schemes for medical information systems, combining PSI with key aggregation to support HIPAA-compliant queries on encrypted records.^[12] For fraud detection in the financial sector, banks leverage PSI to intersect lists of suspicious transactions or accounts without exposing their full datasets to competitors or third parties. This allows detection of cross-institutional fraud patterns, such as shared compromised credentials, through secure joins that reveal only matching entries, as implemented in collaborative machine learning frameworks for financial analytics.^[13] Pilot implementations demonstrate PSI's utility in peer-to-peer queries among banks to identify fraudulent customers, scaling to large transaction volumes while maintaining privacy.^[14] PSI finds critical use in genomic data analysis, where researchers or institutions can compute intersections of genetic markers across private datasets to identify shared variants without revealing individual sequences. This supports collaborative studies on disease associations by privately matching aligned DNA sequences, as in protocols for set-maximal matches that optimize for depth in genomic alignments.^[15] Encrypted PSI variants further enable queries on genomic databases, combining homomorphic encryption with set intersection to facilitate privacy-preserving kinship or ancestry analysis.^[16] In advertising technology, PSI powers audience matching and attribution by allowing advertisers and platforms to intersect user identifiers for targeted campaigns without sharing raw data. Google's private intersection-sum protocols use PSI to compute aggregate ad conversion rates, enabling advertisers to verify campaign effectiveness while bounding leakage to differential privacy levels.^[17] Industry standards like IAB Tech Lab's PAIR initiative incorporate PSI as a privacy-enhancing technology for first-party data matching, supporting scalable audience overlaps in clean rooms compliant with regulations like GDPR.^[18]

Security Model

Adversary Models

In private set intersection (PSI) protocols, adversary models define the capabilities and behaviors of potentially compromised parties, ensuring that security guarantees hold against specified threats. These models classify adversaries based on their adherence to the protocol and intent to extract unauthorized information or disrupt the computation. The semi-honest, or honest-but-curious, model assumes that parties strictly follow the protocol steps but may attempt to infer additional information from the messages they receive and their own inputs. In this setting, adversaries are passive, analyzing transcripts without deviating from the prescribed actions, which simplifies protocol design while providing protection against eavesdropping on intermediate data. Early PSI protocols, such as the 2004 construction by Freedman, Nissim, and Pinkas, were proven secure under this model using homomorphic encryption and polynomial evaluation techniques.^[19] The malicious model, in contrast, allows parties to behave arbitrarily, including deviating from the protocol, submitting falsified inputs, or aborting prematurely to gain an advantage or learn private data. Security in this model requires protocols to either detect such deviations with high probability or ensure that any misbehavior yields no additional information beyond what a semi-honest adversary could obtain. Protocols achieving malicious security often rely on zero-knowledge proofs or commitments to enforce correctness, as demonstrated in the 2010 work by De Cristofaro, Kim, and Tsudik, which provides linear-complexity PSI secure against malicious adversaries under standard assumptions.^[20] An intermediate model is the covert adversary, where malicious parties may cheat to extract information but prefer to avoid detection, as exposure could lead to punishment or reputational damage. This model, introduced by Aumann and Lindell in 2007, quantifies security by ensuring that cheating succeeds only with negligible probability, or is detected with overwhelming probability, providing a realistic balance between semi-honest and full malicious security. Hazay and Lindell extended this to PSI in 2009, constructing protocols for set intersection secure against covert adversaries using cut-and-choose techniques on oblivious transfers.^[21]^[22] PSI protocols typically assume computationally bounded adversaries running in probabilistic polynomial time relative to the security parameter. Adversaries are often modeled as static, where corruptions are chosen before protocol execution, though adaptive corruptions—where parties are compromised during execution—are considered in more robust settings. Collusion among a subset of parties is also accounted for, limiting the number of corrupted participants (e.g., up to t < n in multi-party PSI with n parties) to prevent total compromise. These assumptions align with standard secure multi-party computation frameworks.^[22]

Security Properties

Private set intersection (PSI) protocols provide security guarantees primarily through the simulation paradigm, where the view of an adversarial party during protocol execution is computationally indistinguishable from its view in an ideal execution involving a trusted third party that computes and reveals only the intersection. This ensures that no party learns any information about the other's input set beyond the elements in the intersection itself.^[1] In the semi-honest model, privacy holds if at least one party is honest, while in the malicious model, it requires simulation even against corrupt parties attempting to deviate.^[2] A key aspect of privacy is input indistinguishability, meaning that elements not in the intersection remain hidden from the other party, preventing inference about the full set composition or sizes except for the output. The receiver, who typically learns the intersection, sees only matching elements from their own set, while the sender reveals nothing. Correctness guarantees that, when both parties follow the protocol honestly, each receives the exact intersection of their input sets with overwhelming probability.^[1]^[2] PSI protocols often incorporate additional properties to enhance practicality. Balance refers to equitable distribution of computational load between parties, particularly in scenarios with similarly sized sets, avoiding one party bearing disproportionate costs. Non-interactivity allows parties to perform computations offline after an initial setup, reducing real-time coordination needs in some constructions. Resistance to side-channel attacks, such as timing or power analysis, is achieved through constant-time implementations or masking techniques to prevent leakage of intermediate values.^[23] Security metrics for PSI frequently rely on indistinguishability under chosen-plaintext attack (IND-CPA) for underlying primitives like homomorphic encryption, ensuring ciphertexts reveal no plaintext information. Stronger protocols may use indistinguishability under chosen-ciphertext attack (IND-CCA) for robustness against adaptive adversaries. However, standard PSI offers no inherent protection against traffic analysis, which could expose set sizes or participation patterns through message volumes or timings unless augmented with padding or anonymous channels.^[1]^[2]

Basic Protocols

Diffie-Hellman-Based PSI

The Diffie-Hellman-based private set intersection (PSI) protocol, introduced by Freedman, Nissim, and Pinkas in 2004, is one of the first practical protocols for two-party computation of set intersections while preserving privacy against semi-honest adversaries.^[1] This approach builds on the Diffie-Hellman key exchange to generate shared secrets that uniquely identify matching elements without revealing non-intersecting data or the full sets.^[1] It assumes the parties hold sets of comparable sizes and operate in a multiplicative cyclic group of prime order q modulo a prime p, with a generator g.^[1] In the protocol, Alice with input set X = \{x_1, \dots, x_n\} sends to Bob the values g^{x_i} \mod p for each x_i \in X.^[1] Bob, holding set Y = \{y_1, \dots, y_n\}, receives these and, for each i and each y_j \in Y, computes the value K_{i,j} = (g^{x_i})^{y_j} = g^{x_i y_j} \mod p.^[1] Under the decisional Diffie-Hellman assumption, K_{i,j} appears pseudorandom (indistinguishable from a random group element) unless x_i = y_j, in which case K_{i,j} = g^{x_i^2} \mod p, a value Alice can independently compute as (g^{x_i})^{x_i} \mod p.^[1] Bob then uses each K_{i,j} as a key to encrypt an identifier for y_j, such as y_j itself or its index j.^[1] To achieve efficient communication, the protocol applies balanced hashing: the identifiers are hashed into B = n / \ln \ln n bins using a function derived from the keys, with maximum load O(\ln \ln n) per bin, allowing Bob to send only the relevant encryptions grouped by i.^[1] Alice, upon receiving these, for each i attempts decryption of the encryptions in the corresponding group using g^{x_i^2} \mod p; successful decryptions yield matching elements where keys align, while mismatched keys produce invalid or random outputs.^[1] The security relies on the decisional Diffie-Hellman (DDH) assumption in the group, ensuring that non-matching keys leak no information about Y.^[1] It achieves semi-honest security through the simulation paradigm, where an ideal-world simulator can replicate each party's view using only its own input and the output intersection, without access to the other party's private data.^[1] For balanced sets of size n, the protocol incurs O(n) communication and O(n) computation (dominated by group exponentiations).^[1]

Hash-Based PSI

Hash-based private set intersection (PSI) protocols enable two parties to compute the intersection of their private input sets by first mapping elements to a common hashed domain, thereby concealing the original data while allowing secure matching. In these protocols, each party independently hashes its set elements using a collision-resistant hash function to project them into a fixed-size domain, such as bins or a common range, before applying cryptographic techniques to identify overlaps without revealing non-matching elements or their positions. This approach trades the computational intensity of public-key operations on raw elements for efficient hashing, making it suitable for semi-honest adversaries where parties follow the protocol but may attempt to infer additional information from observed messages. The foundational protocol, introduced by Freedman, Nissim, and Pinkas in 2004, uses balanced hashing to distribute client elements across a reduced number of bins, followed by oblivious polynomial evaluation over the hashed inputs under homomorphic encryption, such as Paillier's cryptosystem, to compute the intersection.^[1] Subsequent variants enhance this framework for better scalability and practicality. For exact PSI, techniques like oblivious pseudorandom functions (OPRFs) allow one party to blind the hashes of the other party's elements, ensuring that matches are verified without exposing positions; this is often combined with sorting or garbled circuits for final intersection revelation. Cuckoo hashing variants, which use multiple hash functions to resolve collisions by "kicking" elements to alternative bins, further optimize bin utilization and enable circuit-based evaluation with low overhead, as detailed in Pinkas et al.'s 2018 work, achieving near-linear time while minimizing communication through stash mechanisms for overflow handling.^[24] For approximate PSI or cardinality-only computations, simple hashing with Bloom filters represents sets as bit arrays, allowing parties to obliviously compute the filter for the intersection via homomorphic operations or secure multiplication, though with a tunable false-positive rate due to inherent collisions.^[25] Server-aided hashing extends exact PSI by outsourcing hash computations to an untrusted server, where the server generates blinded hashes under OPRF without learning the inputs, reducing client workload for unbalanced set sizes.^[26] Security in hash-based PSI relies on the collision resistance of the underlying hash functions and the random oracle model, ensuring that non-intersecting elements cannot be distinguished from random noise and preventing adversaries from guessing absent elements. These protocols provide semantic security against semi-honest parties, where the view of one party simulates an ideal functionality revealing only the intersection, but they assume a large hash domain to avoid information leakage from collisions. In the malicious setting, additional zero-knowledge proofs or cut-and-choose mechanisms can be integrated, though at higher cost, as analyzed in the original balanced hashing design.^[1] Efficiency is a key strength, with communication and computation typically linear in the input set size n, enabling practical deployment on datasets up to millions of elements; for instance, cuckoo hashing variants achieve runtimes of a few seconds and communication around 10-15 MB for sets of size up to $10^5 on standard hardware.^[23] Permutation-based hashing further reduces representation lengths, yielding up to 5x speedups over prior methods for web-scale applications. However, drawbacks include vulnerability to birthday attacks if the hash space is insufficiently large relative to n, potentially leaking partial information via collision probabilities exceeding $1/2 for spaces around \sqrt{2n}, and the need for pre-processing to generate and agree on hash parameters or OPRF keys. These protocols are widely applied in contact discovery for messaging services, where users intersect phone contacts with server databases without exposing full lists.^[27]^[28]

Advanced Protocols

Homomorphic Encryption-Based PSI

Homomorphic encryption-based private set intersection (PSI) protocols enable two parties to compute the intersection of their private sets by performing computations directly on encrypted data, without decrypting intermediate results. In these protocols, one party, typically the one with the larger set (the server), encrypts its set elements using a partially homomorphic encryption (PHE) scheme such as Paillier, which supports additive operations on ciphertexts. The other party (the client) then homomorphically evaluates a polynomial or circuit on these ciphertexts, where the polynomial is constructed such that its evaluation yields zero (or an indicator) precisely when an element from the client's set matches one in the server's set. For instance, the client can compute encrypted indicators for each of its elements and homomorphically sum them to obtain the encrypted intersection size or elements, which only the client decrypts. This approach was pioneered in the work of Kissner and Song, who developed secure methods for set intersection and related operations using additive homomorphic encryption.^[29] A key property of Paillier encryption underpinning these computations is its additive homomorphism, formalized as:

\text{Enc}(m_1) \times \text{Enc}(m_2) = \text{Enc}(m_1 + m_2)

where \text{Enc} denotes encryption under the public key, allowing the client to sum encrypted indicators for matches without revealing individual values. Security in these protocols relies on the IND-CPA security of the underlying homomorphic encryption scheme, ensuring that ciphertexts reveal no information about the plaintexts. To achieve malicious security against cheating parties, zero-knowledge proofs are integrated to verify the correctness of homomorphic evaluations and polynomial commitments, preventing the server from learning the client's set or the client from inferring non-matching elements.^[29]^[3] Efficiency in homomorphic encryption-based PSI stems from the computational overhead of encryption and homomorphic operations, typically resulting in O(n log n) time complexity for sets of size n due to sorting or polynomial evaluation steps. These protocols are particularly suited to unbalanced settings, where the client's set is much smaller than the server's, as communication scales with O(|client| log |server|), minimizing data transfer for the dominant party. Modern optimizations, such as partitioning and batching of evaluations, further reduce runtime and communication.^[3] Extensions of these protocols incorporate lattice-based cryptography to provide post-quantum security, resisting attacks from quantum computers. For example, lattice-based schemes based on Learning With Errors (LWE) enable secure PSI computations with security reductions to lattice problems, ensuring long-term applicability.^[30]^[2]

Oblivious Transfer-Based PSI

Oblivious transfer (OT) serves as a core building block in private set intersection (PSI) protocols, enabling a receiver to obtain specific encrypted data from a sender without revealing the selection choice, while the sender learns nothing about the receiver's inputs. This primitive facilitates selective disclosure essential for computing set intersections securely, particularly in scenarios demanding strong privacy guarantees beyond semi-honest assumptions. The foundation of OT-based PSI traces back to the efficient OT constructions by Naor and Pinkas.^[31] These introduced 1-out-of-N and k-out-of-N variants under standard cryptographic assumptions like trapdoor permutations. These were integrated into PSI frameworks in the 2000s and 2010s, notably by Jarecki and Liu, who developed an oblivious pseudorandom function (OPRF) protocol leveraging OT-like secure computation to enable adaptive OT and direct application to set intersection, where parties compute pseudorandom evaluations on their elements to identify matches without exposure.^[32] In typical OT-based PSI protocols, the process begins with OT extensions to generate numerous OT instances efficiently from a small number of base OTs, using techniques like oblivious switching networks (OSN). The receiver initiates (N,1)-OTs for each element in their set against the sender's hashed or encrypted database, allowing selective retrieval of matching bits or encrypted values; equality is then verified via private equality tests, such as XOR-based checks on randomized shares, often combined with garbled circuits for full intersection computation or dual execution to detect deviations.^[23] This setup ensures that non-matching elements remain hidden, with optimizations like cuckoo hashing reducing the number of OTs to O(n log log n). Security in these protocols extends to malicious adversaries through compiler-based transformations from semi-honest OT, achieving full simulation-based security under the decisional Diffie-Hellman (DDH) assumption or hardness of factoring; for instance, dual-execution techniques run the protocol twice with randomized inputs, aborting on inconsistencies to prevent cheating without leaking information.^[33]^[32] Efficiency benefits from OT extensions, yielding near-linear communication and computation costs O(n) in set size n, with base OT amortization enabling symmetric-key operations for the bulk of work; practical benchmarks show protocols handling sets of size 2^{18} in about 14 seconds with roughly 78 MB communication under 128-bit security.^[23] Variants include non-interactive OT extensions for cloud-assisted PSI, where the sender delegates asymmetric operations to a semi-honest cloud server, reducing client computation while maintaining security via outsourced base OTs.^[34] Recent advancements as of 2025 include post-quantum secure OT-based PSI using lattice assumptions and updatable PSI for dynamic sets, enhancing applicability in evolving data environments without duplicating multi-party extensions.^[35]^[36]

Variants and Extensions

Multi-Party PSI

Multi-party private set intersection (MPSI) generalizes the two-party PSI problem to k \geq 3 mutually distrusting parties, each holding a private input set X_i = \{x_{i1}, \dots, x_{in_i}\} for i = 1, \dots, k, who jointly compute the intersection \cap_{i=1}^k X_i such that no party learns any element outside the full intersection or partial intersections between subsets of parties.^[37] This formalization ensures that the output is revealed only to designated parties (e.g., all or one receiver), while preserving the privacy of non-intersecting elements and set sizes beyond what is implied by the result.^[38] Seminal protocols for MPSI, introduced by Kissner and Song, represent sets as polynomials over a finite field and employ additive secret sharing combined with additively homomorphic encryption (e.g., Paillier) to enable secure multipoint evaluation for intersection computation.^[37] Threshold secure multi-party computation (SMPC) frameworks extend this using Shamir's secret sharing, where shares of set elements are distributed such that any threshold t < k parties cannot reconstruct partial results, allowing reconstruction only with sufficient shares for the full intersection.^[39] For scalability, recent protocols incorporate topological structures like ring (O-Ring) or star (K-Star) aggregations, leveraging oblivious pseudorandom functions (OPRFs) and oblivious key-value stores (OKVS) to chain computations efficiently across parties without a trusted dealer.^[38] These avoid full broadcast by using pairwise-like interactions or designated aggregators, reducing rounds to constant in semi-honest settings.^[40] As of 2025, ongoing research includes constant-round protocols and delegated variants that minimize public-key operations.^[41]^[42] Security in MPSI typically assumes an honest majority (fewer than k/2 corruptions) for semi-honest adversaries, with proofs via simulation paradigms ensuring indistinguishability from ideal functionalities; malicious security adds zero-knowledge proofs and commitments to detect deviations.^[37] Protocols like O-Ring and K-Star achieve collusion resistance against up to t < k malicious parties in semi-honest models, while handling dropouts through resilient OKVS peeling that tolerates missing responses without aborting the computation.^[38] Efficiency starts from a naive O(n k) communication and computation per party (where n is average set size), but optimizations via pairwise PSI chaining or broadcast-optimized channels reduce this to O(n \log k) in balanced topologies, with empirical gains like 1.6×–48× lower communication than prior works for k=16, n=2^{14}.^[40]^[38] MPSI finds applications in consortium blockchain data matching, where nodes securely identify overlapping transactions across ledgers without exposing full datasets, enabling privacy-preserving record linkage in distributed networks.^[43] In multi-hospital genomic studies, it facilitates patient cohort intersection for rare disease research, allowing institutions to compute shared genomic records for collaborative analysis while preventing leakage of individual patient identities or unrelated data.^[44]

PSI-Cardinality and PSI-Stats

Private Set Intersection Cardinality (PSI-CA) is a privacy-preserving protocol variant that enables two parties, each holding a private set X and Y, to jointly compute the size of their intersection |X \cap Y| without revealing the actual intersecting elements or any other information about their inputs. This restricted output enhances privacy compared to standard PSI, as no individual elements are disclosed, making it suitable for scenarios where aggregate overlap is sufficient. Protocols for PSI-CA typically operate by securely summing binary indicators for each potential match in the sets, often leveraging secure multi-party computation (SMPC) techniques to ensure correctness and privacy. An early efficient construction achieves computational and communication complexities linear in the set sizes, secure in the random oracle model under the Decisional Diffie-Hellman (DDH) assumption. Constructions for PSI-CA frequently employ garbled Bloom filters to enable match counting without position revelation. In this approach, one party generates a garbled Bloom filter encoding their set with random values, constrained such that positions corresponding to set elements yield consistent outputs during evaluation by the other party, allowing secure aggregation of matches via oblivious transfer or similar primitives. Homomorphic encryption (HE) provides another avenue, where parties encrypt indicators of set membership and homomorphically sum them to obtain the cardinality, supporting aggregation without decryption until the final result. These methods ensure that only the count is revealed, preserving element confidentiality.^[45] PSI-CA offers stronger privacy guarantees than full PSI, as the lack of element disclosure limits leakage even under semi-honest or malicious adversaries. Security can be elevated to malicious settings using zero-knowledge proofs (ZKPs) to verify the integrity of computations, such as proof-of-correctness for filter garbling or homomorphic operations, preventing deviations without compromising efficiency significantly. Communication complexity is generally lower than full PSI, with protocols achieving O(n) or O(n \log N) overhead for sets of sizes n and N (where n \leq N), and some unbalanced variants reaching sublinear in the larger set via techniques like oblivious pseudorandom functions.^[45] Building on PSI-CA, Private Set Intersection with Statistics (PSI-Stats) extends the functionality to compute aggregate statistical measures over the intersection, such as the sum, average, minimum, or maximum of associated attributes for intersecting elements, without exposing the elements themselves. For example, if sets contain identifiers paired with numerical attributes (e.g., ages or scores), parties can securely evaluate the sum of attributes in X \cap Y. Constructions adapt garbled Bloom filters or HE to incorporate attribute encryptions, enabling homomorphic aggregation of values only for matching positions. A seminal framework supports a range of statistical functions, including count, sum, and inner product, with security in the semi-honest model and efficiency scaling linearly with set sizes.^[46] PSI-Stats maintains the enhanced privacy of PSI-CA while enabling useful analytics; malicious security is achievable via ZKPs for attribute commitments and computation verifiability, ensuring robustness against faulty behavior. Efficiency remains advantageous over full PSI, with communication often O(n) and computation leveraging packed HE for batching, though attribute handling may introduce modest overhead. In epidemiology, PSI-CA and PSI-Stats facilitate overlap counts and aggregate statistics across patient datasets from multiple institutions, such as estimating shared rare disease cases without identity revelation, supporting collaborative research while complying with privacy regulations like GDPR.^[46]^[47]

References

[1]
[PDF] Efficient Private Matching and Set Intersection - Benny Pinkas
We consider the problem of computing the intersection of private datasets of two parties, where the datasets contain lists of ele- ments taken from a large ...
[2]
[PDF] Private set intersection: A systematic literature review - NICS Lab
We review Private Set Intersection, a very wide problem that can be solved with many different techniques. • We identify Homomorphic Encryption, ...
[3]
Fast Private Set Intersection from Homomorphic Encryption - Microsoft
Oct 30, 2017 · Private Set Intersection (PSI) is a cryptographic technique that allows two parties to compute the intersection of their sets without ...Missing: definition | Show results with:definition
[4]
A Brief Overview of Private Set Intersection
Apr 19, 2021 · Private set intersection (PSI) is a special case of multiparty computation, in which each party has a set of items and the goal is to learn the ...Missing: definition | Show results with:definition
[5]
[PDF] Private Set Intersection - Computer Science
Dec 7, 2021 · Abstract. A private set intersection (PSI) protocol allows two (or more) parties, each with sets from a common domain, to learn the ...
[6]
Private set intersection: A systematic literature review - ScienceDirect
The classical definition of PSI allows two parties to compute the intersection on their sets. ... GCD-filter: Private set intersection without encryption. Wang L.
[7]
https://www.sciencedirect.com/science/article/pii/S1574013723000345
[8]
[PDF] Policy-Enhanced Private Set Intersection: Sharing Information While ...
However, due to privacy regula- tions such as the Health Insurance Portability and Ac- countability Act (HIPAA), they can only share a pa- tient's record if ...
[9]
Mobile Private Contact Discovery at Scale - USENIX
Mobile Private Contact Discovery at Scale ... The most promising approaches addressing this problem revolve around private set intersection (PSI) protocols.
[10]
https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157752
[11]
LiPISC: A Lightweight and Flexible Method for Privacy-Aware ...
The basic idea is to compute the intersection of input sets without leaking privacy. Furthermore, PISC should be sufficiently flexible to recommend approximate ...
[12]
Structure-Aware Private Set Intersection, with Applications to Fuzzy ...
Aug 7, 2025 · A major impediment to using recommendation systems and collective knowledge for electronic commerce is the reluctance of individuals to ...
[13]
Advancing Compliance with HIPAA and GDPR in Healthcare
Oct 15, 2025 · Design of secure and privacy-preserving data sharing scheme based on key aggregation and private set intersection in medical information system.
[14]
[PDF] Secure Collaborative Machine Learning | Visa
During training our secret join/private set intersection techniques can allow the parties to join their respective datasets and then feed these into an ...
[15]
multiparty/psi-pilot-masstech: Private Set Intersection demo ... - GitHub
In this banks can query each other's list of fraudulent customers, while only learning if their query is a match or not. The first solution is a peer-to-peer ...
[16]
Privately computing set-maximal matches in genomic data
Jul 21, 2020 · We design a new depth-optimized algorithm for computing set-maximal matches between a database of aligned genetic sequences and the DNA of an individual.
[17]
Private queries on encrypted genomic data - PMC - NIH
Our protocol combines state-of-the-art techniques from homomorphic encryption and private set intersection protocols to minimize the computational and ...
[18]
Private Intersection-Sum Protocols with Applications to Attributing ...
The problem we consider is privately computing aggregate conversion rate of advertising campaigns. This underlying functionality can be abstracted as PrivateMissing: audience matching
[19]
PAIR Up With First Party Data - IAB Tech Lab
IAB Tech Lab PAIR will describe the data clean room processes and the privacy enhancing technologies (PETs) that enable a matching operation between two parties ...
[20]
https://link.springer.com/chapter/10.1007/978-3-642-17373-8_13
[21]
Linear-Complexity Private Set Intersection Protocols Secure in ...
In this paper, we construct PSI and Authorized PSI (APSI) protocols secure in the malicious model under standard cryptographic assumptions, with both linear ...
[22]
Security Against Covert Adversaries: Efficient Protocols for Realistic ...
We stress that in our definition, we quantify over all (possibly malicious) adversaries and do not assume that the adversary behaves in any particular way.
[23]
Efficient Protocols for Set Intersection and Pattern Matching with ...
We also present a protocol for set intersection that is fully simulatable in the model of covert adversaries. Loosely speaking, this means that a malicious ...
[24]
[PDF] Faster Private Set Intersection Based on OT Extension - USENIX
Aug 20, 2014 · Abstract. Private set intersection (PSI) allows two parties to com- pute the intersection of their sets without revealing any.
[25]
Non-Interactive Private Set Intersection From Functional Encryption
Private Set Intersection is a cryptographic scheme that helps two parties, Alice and Bob, to find the intersection of their input sets A and B while revealing ...
[26]
Authenticated Multi-Party Quantum Private Set Intersection ... - MDPI
In QPSI, multiple parties aim to compute the intersection of their private data sets without revealing any information beyond the intersection itself.
[27]
[PDF] Efficient Circuit-based PSI via Cuckoo Hashing
The proof technique presented in the full version for analyzing Cuckoo hashing in two dimensions is new and can be generalized to analyzing standard. Cuckoo ...
[28]
[PDF] When Private Set Intersection Meets Big Data: An Efficient and ...
It has linear complexity and relies mostly on efficient symmetric key operations. It has high scalability due to the fact that most operations can be ...
[29]
[PDF] Private Set Intersection with Linear Communication from General ...
Oct 2, 2019 · Our protocol is simple, generic and benefits from the permutation- hashing optimizations of Pinkas et al. (USENIX 2015) and the Batched,.
[30]
Phasing: Private Set Intersection Using Permutation-based Hashing
We describe a new approach for designing PSI protocols based on permutation-based hashing, which enables to reduce the length of items mapped to bins.Missing: 2010 | Show results with:2010
[31]
Privacy-Preserving Set Operations - SpringerLink
We design efficient, secure, and composable methods to enable privacy-preserving computation of the union, intersection, and element reduction operations.
[32]
Post-quantum secure multi-party private set-intersection in star ...
This paper addresses the issue by presenting the first post-quantum MPSI protocol in the so-called star network topology, using lattice-based public key ...
[33]
Efficient Oblivious Pseudorandom Function with Applications to ...
Abstract. An Oblivious Pseudorandom Function (OPRF) [15] is a two-party protocol between sender S and receiver R for securely computing a pseudoran-.
[34]
[PDF] Malicious-Secure Private Set Intersection via Dual Execution
Aug 9, 2017 · This simulation based paradigm defines security with respect to two interaction,. • Real interaction: a malicious adversary A attacks the ...
[35]
An Efficient Outsourced Oblivious Transfer Extension Protocol and ...
Dec 5, 2020 · In this paper, we propose a generic outsourced OT extension protocol ( ) that outsources all the asymmetric operations of the sender to a semihonest server.
[36]
[PDF] Privacy-Preserving Set Operations - CMU School of Computer Science
In this paper, we propose efficient techniques for privacy-preserving operations on multisets. By employing the mathematical properties of polynomials, we build ...
[37]
[PDF] O-Ring and K-Star: Efficient Multi-party Private Set Intersection
Aug 16, 2024 · In this paper, we propose two efficient. mPSI protocols that are secure against an arbitrary number of colluding parties. In the protocol O-Ring ...<|control11|><|separator|>
[38]
Malicious-Secure Threshold Multi-Party Private Set Intersection for ...
Within SMPC, Private Set Intersection (PSI) serves as a specialized application, allowing parties to determine the common elements of their respective private ...
[39]
[PDF] Scalable Multi-Party Private Set-Intersection
This protocol consists of a preprocessing phase that uses some- what homomorphic encryption scheme to generate correlated randomness, that is later used in an ...
[40]
A Parallel Multi-Party Privacy-Preserving Record Linkage Method ...
Jun 14, 2024 · We propose a parallel multi-party PPRL method based on consortium blockchain technology which can effectively address the issue of semi-trusted third-party ...
[41]
Record linkage based patient intersection cardinality for rare ...
Oct 8, 2022 · In this work, we extend MainSEL to allow the record linkage based calculation of the number of common patients between institutions.
[42]
[PDF] Efficient Unbalanced Private Set Intersection Cardinality and User ...
Aug 11, 2023 · An unbalanced private set intersection cardinality (PSI-. CA) protocol is a protocol to securely get the intersection cardinality of two sets X ...
[43]
PSI-Stats: Private Set Intersection Protocols Supporting Secure ...
May 28, 2020 · In this paper, we present protocols which enable the secure computations of statistical functions over PSI, which we collectively termed PSI-Stats.Missing: Troncoso- Pastoriza
[44]
Record linkage based patient intersection cardinality for rare ...
Oct 8, 2022 · In this work, we extend MainSEL to allow the record linkage based calculation of the number of common patients between institutions.