Fact-checked by Grok 2 weeks ago

Sequential pattern mining

Sequential pattern mining is a subfield of data mining focused on discovering frequent subsequences in large databases of sequences, such as customer purchase histories or biological event logs, where the order of elements matters. It identifies patterns that appear in a significant portion of the sequences, typically defined by a minimum support threshold, enabling the extraction of meaningful temporal relationships. The technique was pioneered by Rakesh Agrawal and Ramakrishnan Srikant in their seminal 1995 paper, which introduced algorithms to solve the problem over databases of ordered transactions.^[1] Central to sequential pattern mining are concepts like support, which measures the frequency of a subsequence across the dataset, and maximal patterns, which are the longest frequent subsequences not contained in longer ones. Early approaches, such as the AprioriAll algorithm, adapted candidate-generation methods from association rule mining to prune infrequent candidates level by level. Subsequent generalizations, like the Generalized Sequential Patterns (GSP) algorithm, incorporated time constraints and taxonomic hierarchies to handle more complex scenarios.^[2] Over time, the field evolved with more efficient methods, including vertical database formats in SPADE for faster counting via intersection operations and pattern-growth techniques in PrefixSpan, which project databases based on prefixes to avoid exhaustive candidate enumeration. These advancements have addressed scalability issues in massive datasets. Applications span diverse domains, including market-basket analysis for predicting customer behavior, bioinformatics for gene expression sequencing, web usage mining for user navigation patterns, and recommender systems in e-commerce. Ongoing research tackles challenges like high-dimensional data and real-time processing in IoT and trajectory analysis.^[3]

Overview

Definition and Motivation

Sequential pattern mining is the task of discovering statistically relevant subsequences, or patterns, in a sequence database, where the relative order of elements is preserved and interestingness is typically measured by frequency exceeding a user-specified minimum support threshold.^[1] This process identifies frequent ordered patterns that capture temporal dependencies, distinguishing it from unordered pattern mining tasks like frequent itemset mining.^[4] The motivation for sequential pattern mining stems from the need to analyze ordered data in real-world scenarios, such as customer transaction histories, where discovering sequences of events can reveal behavioral trends or causal links that unordered analysis overlooks.^[1] For instance, in retail databases, it uncovers patterns like a customer purchasing item a, followed later by item b, and then item c (denoted as the sequence ⟨a, b, c⟩), indicating potential marketing opportunities or product affinities over time.^[1] This approach was first formalized to support decision-making in large-scale sales data, enabling retailers to predict future behaviors from historical sequences.^[1] Beyond commerce, sequential pattern mining holds importance in data mining for modeling dynamic datasets, facilitating applications such as prediction of future events, detection of anomalies in logs, and understanding user behaviors in sequential contexts like web navigation or biological processes.^[4] By emphasizing order, it provides deeper insights into temporal relationships, extending the capabilities of traditional mining techniques to handle inherently sequential information.^[4]

Historical Development

Sequential pattern mining originated from the broader field of association rule mining, with its early foundations laid in the mid-1990s through extensions of frequent itemset mining techniques. The Apriori algorithm, introduced by Rakesh Agrawal and Ramakrishnan Srikant in 1994 for discovering frequent itemsets in transaction databases, provided the conceptual basis for handling ordered data. This was extended to sequences in their seminal 1995 paper, where they formally defined the problem of mining sequential patterns—frequent subsequences in a database of sequences—and proposed three Apriori-inspired algorithms to address it efficiently.^[1] Building on this, Agrawal and Srikant introduced the Generalized Sequential Patterns (GSP) algorithm in 1996, which generalized the approach to incorporate time constraints, taxonomy hierarchies, and variable-length gaps between events, while demonstrating significant performance improvements over the initial methods.^[5] The late 1990s and early 2000s saw further refinements, with influential contributions from researchers like Mohammed J. Zaki, who developed SPADE in 2001, a vertical-id-list-based algorithm that avoids candidate generation by using lattice structures for efficient pattern discovery.^[6] Concurrently, Jiawei Han and colleagues advanced the field through pattern-growth methods, exemplified by FreeSpan (2000) and culminating in PrefixSpan (2001), which projected databases based on prefixes to grow patterns without exhaustive candidate enumeration, marking a pivotal shift from candidate-generation paradigms like Apriori and GSP to more scalable pattern-growth approaches.^[7] This evolution continued into the 2000s with optimizations for constrained and closed patterns, but the advent of big data in the 2010s necessitated adaptations for distributed environments. Post-2010, integration with frameworks like MapReduce and Spark enabled parallel processing of massive datasets; for instance, the BIDE-MR algorithm in 2012 extended closed sequential pattern mining to Hadoop clusters, achieving substantial speedups on large-scale data.^[8] In the 2010s and 2020s, the field further evolved with the development of high-utility sequential pattern mining, which considers pattern profitability beyond frequency, and advancements in streaming and incremental mining for real-time data. Recent reviews highlight continued progress in parallel and distributed algorithms, as well as integrations with machine learning for enhanced scalability and applications in domains like IoT and predictive analytics.^[3] Key figures such as Agrawal, Srikant, Han, and Zaki remain central to the field's development, influencing subsequent work on scalable and application-specific variants.^[9]

Core Concepts

Sequence Representation

In sequential pattern mining, a sequence is formally defined as an ordered list of itemsets, where each itemset is a non-empty set of items that occur together at a specific point in time. This representation captures the temporal order of events, with itemsets denoted in parentheses to indicate simultaneity, such as S = \langle (a)(bc)(d) \rangle, where a, b, c, and d are distinct items, and the itemset (bc) signifies that items b and c appear concurrently before d.^[1] A sequence database consists of a collection of such sequences, each associated with a unique identifier, such as a customer ID (SID) in customer transaction data. For instance, a database might include entries like SID 10: \langle (30)(90) \rangle and SID 20: \langle (10)(20 30)(40) \rangle, where each sequence aggregates transactions ordered by time, and items within an itemset are typically sorted alphabetically for consistency. This structure facilitates efficient storage and querying in relational or vertical formats, enabling the mining process to scan for patterns across the ordered data.^[1] A key distinction in sequence representation is between subsequences and substrings. A subsequence allows for non-consecutive matches while preserving order, meaning a pattern \alpha = \langle a_1 a_2 \dots a_m \rangle is contained in a sequence \beta = \langle b_1 b_2 \dots b_n \rangle (with m \leq n) if there exist strictly increasing indices $1 \leq i_1 < i_2 < \dots < i_m \leq n such that a_j \subseteq b_{i_j} for each j = 1 to m. For example, \langle (a)(c) \rangle is a subsequence of \langle (a)(b)(c) \rangle since a matches the first itemset and c matches the third, skipping the intervening b. In contrast, a substring requires consecutive itemsets, so \langle (a)(c) \rangle would not be a substring of the same sequence due to the gap. This non-consecutive allowance is fundamental to capturing realistic temporal relationships in applications like purchase histories.^[1] Extensions to basic sequence representations incorporate additional constraints to model real-world complexities. Timestamps can be integrated by associating each itemset with a specific time value, enabling constraints on the intervals between consecutive itemsets, such as requiring a minimum or maximum duration (e.g., items separated by 1 to 7 days). This is exemplified in generalizations where patterns specify time windows, like \langle (a) \rightarrow_{d_1}^{d_2} (b) \rangle, with d_1 and d_2 defining allowable gaps. Gaps are handled by permitting variable numbers of intervening itemsets or items, often denoted with constraints like maximum gap size to avoid overly sparse patterns. Wildcards, represented as symbols like * for any single item or itemset, further extend flexibility, allowing patterns such as \langle (a)(*)(c) \rangle to match any item between a and c, which supports mining more generalizable motifs in noisy or variable data.^[5]^[10] These extensions maintain the ordered itemset structure while enhancing expressiveness for domains like web logs or biological sequences.

Pattern Support and Frequency

In sequential pattern mining, the support of a pattern P measures its frequency in a sequence database D, defined as the number of sequences in D that contain P as a subsequence.^[1] A subsequence occurs when the items of P appear in the same order within a sequence, allowing for intervening items but preserving the relative ordering.^[1] The absolute support counts the raw occurrences, while the relative support normalizes this by the total number of sequences, providing a proportion between 0 and 1.^[1] The relative support is formally given by:

\text{Support}(P) = \frac{|\{ s \in D \mid P \text{ is a subsequence of } s \}|}{|D|}

This formulation, introduced in early work on the topic, enables the identification of frequent patterns by comparing against a user-specified threshold.^[1] Beyond basic support, additional measures evaluate the strength and reliability of patterns, particularly for sequential rules of the form A \rightarrow B, where A precedes B. Confidence quantifies the reliability of such a rule as the ratio of the support of the full rule to the support of the antecedent:

\text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \rightarrow B)}{\text{Support}(A)}

This measure indicates the proportion of sequences supporting A that also support B afterward, aiding in the discovery of predictive relationships.^[5] For multi-element patterns, all-confidence, defined for an itemset X as \text{all-confidence}(X) = \frac{\text{Support}(X)}{\max_{i \in X} \text{Support}(\{i\})}, is the minimum confidence among association rules derived from X by placing one item in the consequent, helping to identify cohesive itemsets while being null-invariant.^[11] To manage the vast number of potential patterns, a minimum support threshold (minsup) is applied to retain only frequent ones, pruning infrequent candidates early. For instance, setting minsup = 0.5 requires a pattern to appear in at least 50% of the sequences, balancing completeness with computational efficiency.^[1] Computing support faces challenges, particularly with noisy data where minor perturbations can alter subsequence matches, leading to unreliable counts under standard definitions.^[12] Similarly, varying sequence lengths exacerbate the issue, as longer sequences yield exponentially more potential subsequences (up to $2^l for length l), inflating the search space and complicating uniform thresholding across heterogeneous data.

Types of Patterns

String-Based Patterns

String-based patterns in sequential pattern mining involve discovering frequent subsequences within strings composed of individual symbols, such as characters in text or nucleotides in DNA, where the relative order of symbols is preserved but intervening elements (gaps) are permitted. This contrasts with broader sequential mining by focusing exclusively on linear, atomic symbol sequences without grouping into multi-element sets at discrete points. The concept originates from the foundational definition of subsequences in sequential pattern mining, adapted to singleton items for string data.^[1]^[4] These patterns emphasize exact symbol matching while allowing variable gaps to identify meaningful ordered motifs in ordered text data. Approximate variants incorporate similarity measures to handle variations, such as insertions, deletions, or substitutions in biological or noisy textual data. A key technique for assessing such similarity is the longest common subsequence (LCS), which computes the longest order-preserving subsequence shared between two strings, often used to evaluate pattern relevance or cluster similar sequences in mining tasks. For example, LCS has been applied to align and compare mined patterns against base motifs, maximizing common elements to filter robust subsequences.^[4]^[13] Short string-based patterns frequently leverage n-gram extensions, which capture contiguous subsequences of fixed length n (e.g., bigrams for n=2) to efficiently estimate frequencies and build hierarchical representations in large corpora. This approach is particularly effective for initial pattern discovery in text, where n-grams serve as compact proxies for local sequential structure before extending to gapped subsequences. In bioinformatics applications, such techniques mine DNA sequences for motifs like "ATG-CGT", where "ATG" denotes a start codon and "CGT" a coding triplet, revealing gene regulatory patterns across genomes.^[4]^[14]

Itemset-Based Patterns

Itemset-based patterns in sequential pattern mining represent sequences where each element is a set of items, rather than individual items or characters. Formally, a pattern is defined as an ordered list of itemsets, denoted as \langle \{i_1, i_2, \dots, i_k\}, \{j_1, j_2, \dots, j_m\}, \dots \rangle, where each itemset \{i_1, i_2, \dots, i_k\} groups multiple items that co-occur within the same time interval or transaction, and the order across itemsets reflects temporal progression across distinct time points. This formulation captures subsequences in a database of sequences, where the pattern appears if the items in each itemset are contained (in any order within the set) in corresponding transactions, and the itemsets follow one another in time.^[1] Key characteristics of itemset-based patterns include the allowance for multiple items per time step, enabling the modeling of concurrent events, while maintaining strict ordering between consecutive itemsets to preserve temporal relationships. Within an individual itemset, the order of items is typically non-strict, meaning the items are considered a unordered collection unless specified otherwise, which distinguishes these patterns from purely linear sequences. This structure supports the discovery of frequent patterns where support is measured by the proportion of input sequences containing the pattern as a subsequence, allowing for gaps between matching itemsets to reflect real-world delays in event occurrence.^[1] A representative example is in customer purchase histories, where a pattern like \langle \{\text{bread}, \text{milk}\}, \{\text{butter}\} \rangle indicates that customers who buy bread and milk together in one transaction often purchase butter in a subsequent transaction, potentially revealing cross-selling opportunities in retail data. Such patterns emerge from timestamped transactional records, where each customer's sequence consists of itemsets from their ordered purchases over time.^[1] Extensions to itemset-based patterns incorporate taxonomic hierarchies, allowing patterns to generalize across item categories; for instance, a pattern involving "dairy" could match specific items like "milk" or "cheese" based on an is-a relationship defined in a taxonomy tree, thereby reducing redundancy and capturing broader associations. Similarly, quantitative attributes can be integrated into itemsets, such as specifying ranges or exact values for item quantities (e.g., \langle \{\text{[milk](/page/Milk):2}\}, \{\text{diapers:1}\} \rangle), which enriches patterns with magnitude information for more nuanced analysis in domains like inventory management. These extensions maintain the core sequential structure while enhancing expressiveness.^[5]^[15] Itemset-based patterns are particularly relevant for relational or transactional datasets where temporal order arises from timestamps, such as customer transaction logs or event streams, enabling the identification of behavioral trends without assuming strict intra-transaction ordering.^[1]

Algorithms

Apriori-Inspired Methods

Apriori-inspired methods for sequential pattern mining adapt the Apriori principle from frequent itemset mining, which posits that if a sequential pattern does not meet the minimum support threshold (i.e., it is infrequent), then all its extensions or supersets are also infrequent. This monotonicity property enables efficient pruning of candidate patterns during the mining process, reducing the search space by eliminating supersets of infrequent patterns early. By generating candidates level-by-level—starting from short sequences and extending to longer ones—these methods systematically explore the pattern space while minimizing unnecessary computations.^[1] The Generalized Sequential Patterns (GSP) algorithm, introduced by Srikant and Agrawal in 1996, exemplifies this approach as one of the earliest and most influential methods for mining sequential patterns in large databases. GSP employs a horizontal database representation, where sequences are stored as lists of itemsets, and proceeds in iterative passes over the data. In the first step, it scans the database to identify all frequent 1-sequences (single-item patterns) that exceed the user-specified minimum support. Subsequent iterations generate candidate (k+1)-length sequences by joining frequent k-length sequences that share a common suffix of length (k-1), followed by pruning any candidates containing infrequent (k)-subsequences based on the Apriori property. Support for these candidates is then counted in an additional database scan using sequential joins to match patterns against the data, continuing until no more frequent patterns are found. This level-wise strategy ensures completeness but relies on multiple I/O operations, making it scalable for moderate-sized datasets with adjustable support thresholds.^[5] Building on the Apriori framework, the Sequential PAttern Discovery using Equivalence classes (SPADE) algorithm, developed by Zaki in 2001, addresses some inefficiencies of GSP by adopting a vertical database format. In this representation, each distinct item is indexed with a bitmap (or id-list) indicating the sequence identifiers and timestamps where it appears, transforming the problem into efficient intersections over these bitmaps. SPADE performs a depth-first traversal of a lattice of equivalence classes—groups of sequences sharing the same items in a particular order—computing support through bitwise operations on id-lists rather than full scans for each candidate level. This allows SPADE to discover all frequent sequential patterns using only three database passes: one to build the vertical format, one to generate and count shorter patterns, and one to extend to longer ones, with pruning applied via the Apriori property at each step. SPADE demonstrates superior performance over GSP on dense datasets, where bitmap operations accelerate support counting, though it requires more memory for storing id-lists.^[6] Despite their foundational role, Apriori-inspired methods like GSP and SPADE face inherent limitations that can hinder scalability in very large or sparse datasets. The need for multiple database scans in GSP leads to significant I/O overhead, particularly as pattern lengths increase, while both algorithms risk generating an explosion of candidates when many short patterns are frequent, overwhelming computational resources even with pruning. These challenges have motivated subsequent advancements, though the core Apriori property remains a cornerstone for ensuring the correctness and efficiency of candidate reduction in sequential mining.^[1]^[6]

Projection and Pattern-Growth Methods

Projection and pattern-growth methods represent a class of algorithms for sequential pattern mining that avoid the candidate generation and testing overhead of Apriori-inspired approaches by recursively projecting the sequence database onto suffixes following frequent prefixes, thereby confining the search to relevant subspaces. This core idea enables efficient pattern discovery through divide-and-conquer strategies, where patterns grow incrementally from prefixes without exhaustive enumeration of all possible candidates. By focusing on postfix projections, these methods substantially reduce the computational cost, particularly for sparse or long-sequence datasets, as the projected databases shrink progressively with pattern extension.^[7] A seminal algorithm in this category is PrefixSpan, introduced in 2001, which performs prefix-projected sequential pattern mining by constructing projected databases for each frequent prefix and mining them recursively. PrefixSpan begins by scanning the original database once to identify frequent length-1 prefixes, then partitions the search space based on these prefixes. For each prefix, it projects the database to include only the suffixes that follow the prefix occurrences, and recurses to grow longer patterns. Pattern growth occurs by appending frequent items either to the last itemset in the prefix or as a new itemset, ensuring ordered extension while maintaining support thresholds. To enhance efficiency, PrefixSpan employs pseudo-projection, which uses in-memory pointers and offsets to represent projected databases without physically copying data, making it suitable when sequences fit in main memory. Additionally, it incorporates bi-level projection with an S-step matrix to optimize the discovery of length-2 patterns by tracking intra-sequence item co-occurrences. Experimental evaluations demonstrate that PrefixSpan requires fewer database projections than earlier methods like FreeSpan and scales well to large databases with up to millions of sequences.^[7]^[16] Building on similar principles, CloSpan, proposed in 2003, extends pattern-growth to mine closed sequential patterns—those frequent sequences with no proper supersequence sharing the same support—thereby eliminating redundancy in the output. CloSpan adopts a vertical IDList format to represent sequence occurrences and employs a depth-first search with backward traversal to verify closure properties efficiently, pruning branches where extensions cannot yield closed patterns. It merges sequences during growth to form candidates and uses global optimization techniques, such as checking for non-extendibility, to avoid redundant computations. This focus on maximal patterns results in significantly smaller output sizes, often 10 to 60 times fewer than full frequent pattern sets from PrefixSpan, and faster execution times on benchmarks like synthetic datasets with varying sequence lengths. CloSpan's approach is particularly advantageous for applications requiring concise representations, as it mines only the essential patterns while preserving all support information through closure.^[17]^[18] These projection-based methods offer key advantages, including a single initial database scan followed by targeted projections, which minimizes I/O operations compared to multi-pass candidate enumeration. They scale effectively to long sequences by confining exploration to promising subspaces, often outperforming level-wise methods by orders of magnitude on dense datasets. Moreover, they naturally accommodate user-specified constraints, such as maximum gaps between elements, through modified projection rules that filter irrelevant suffixes early.^[7]^[17] A notable variant is SPAM, introduced in 2002, which mines frequent sequences using a vertical bitmap representation without explicit database projection. SPAM transforms the sequence database into bitmaps for each item, where bits indicate presence in specific transactions, enabling rapid support counting via bitwise AND operations and transformations for sequential extensions. It traverses a lexicographic search tree depth-first, incrementally building patterns and pruning infrequent branches on-the-fly. Unlike pure projection methods, SPAM maintains the full database in memory as bitmaps, achieving up to an order-of-magnitude speedup over PrefixSpan on large, long-pattern datasets due to its efficient counting and compression of sparse bitmaps. This bitmap-centric design makes SPAM particularly effective for customer transaction logs with high dimensionality.^[19]^[20]

Applications

Business and Marketing

Sequential pattern mining extends traditional market basket analysis, which focuses on simultaneous item associations, by incorporating the temporal order of purchases to uncover sequences that predict future buying behavior and enable targeted cross-selling strategies. For instance, in retail settings, it identifies sequences of product purchases across multiple transactions, allowing stores to recommend complementary items at the point of sale or through personalized promotions. This approach has been applied in fast-moving consumer goods sectors to detect sequential patterns that inform cross-selling opportunities, improving sales efficiency by anticipating customer needs based on historical sequences.^[21] In web usage mining, sequential pattern mining analyzes clickstream data from user sessions to reveal navigation patterns, which helps optimize website structure and enhance user experience. By extracting frequent sequences from server logs, such as paths from homepage to product pages and checkout, businesses can reorganize site layouts to reduce drop-off rates and guide users toward conversions. Research demonstrates that these patterns, derived from clickstream logs, provide actionable insights for improving site navigation and increasing engagement in e-commerce environments.^[22]^[23] Within customer relationship management (CRM), sequential pattern mining supports churn prediction by modeling sequences of customer interactions, such as login followed by purchase and then cancellation, to identify at-risk behaviors early. These patterns enable proactive retention strategies, like targeted interventions before churn occurs, particularly in telecommunications and banking where interaction histories are rich. Studies show that mining sequential sentiment patterns from CRM data can significantly improve churn prediction accuracy, allowing firms to retain valuable customers through timely upselling or service adjustments.^[24]^[25] Retail chains have leveraged sequential pattern mining for inventory forecasting by analyzing seasonal buying sequences, such as holiday-related purchase progressions, to align stock levels with anticipated demand patterns. For example, patterns in past transaction sequences help predict surges in specific items during peak seasons, reducing overstock and stockouts. A case study in retail operations integrated sequential pattern mining with time-series forecasting to enhance inventory decisions, demonstrating improved efficiency in managing seasonal variations across product categories.^[26]^[27] To prioritize actionable insights, businesses integrate metrics like support, which measures the frequency of a sequence in the dataset, and confidence, which indicates the reliability of one item following another in the sequence. These metrics allow ranking of patterns by their prevalence and predictive strength, ensuring that only high-impact sequences inform marketing campaigns or operational changes. In practice, thresholds for support and confidence are tuned to filter noise, focusing on sequences with substantial business value in retail and CRM applications.

Bioinformatics and Genomics

Sequential pattern mining plays a crucial role in bioinformatics and genomics by uncovering temporal and structural relationships in biological sequences, such as DNA, RNA, proteins, and gene expression data over time. This approach helps identify regulatory mechanisms, functional motifs, and disease-associated patterns that traditional frequency-based analyses might overlook, particularly when incorporating utility measures like gene importance or expression levels. Applications span from predicting protein folds to extracting gene interactions from literature, enabling deeper insights into genomic functions and disease pathways. In gene regulation studies, sequential pattern mining analyzes time-series microarray data to discover patterns of gene expression changes, aiding in the understanding of disease progression. For example, the Top-HUGS algorithm transforms microarray datasets into sequences based on fold-change thresholds and mines high-utility patterns by weighting genes according to their disease associations and internal utilities, revealing more biologically relevant regulations than conventional methods. Evaluated on the GSE6377 dataset involving pneumonia, rheumatoid arthritis, and asthma, Top-HUGS identified top-k patterns with higher average utility and popularity scores, while being over five times faster and using less memory than baselines like CTGR-Span. Similarly, extensions to high average-utility sequential rules have been used to associate mRNA expression patterns with miRNA regulations in cancer datasets, optimizing multiple objectives for biological relevance. For protein structure prediction, sequential pattern mining extracts frequent subsequences from amino acid chains to classify proteins into folds, supporting function annotation in large proteomic databases. The cSPADE algorithm, adapted with constraints for flexible gaps, mines patterns from sequences in the Protein Data Bank (PDB) and SCOP hierarchies, achieving 24.9% top-1 accuracy and 56.5% top-5 accuracy across 36 folds in a dataset of 2,410 proteins. This method highlights discriminative motifs, such as those with maximum gaps of four residues, outperforming some alignment-based classifiers by focusing on sequential motifs rather than global similarities. In genomics, the technique identifies significant patterns in DNA sequences to reveal hidden biological functions, such as regulatory elements or evolutionary signals. An index-based spanning tree approach mines contiguous patterns using information gain and confidence thresholds on large datasets, like 19,979 real DNA sequences averaging 1,024 bases, efficiently detecting surprising motifs (e.g., IG(ATCG) ≈ 19.88) with reduced runtime compared to exhaustive scans.^[28] Additionally, in text mining for genomics, recursive sequential pattern mining with linguistic constraints extracts gene interaction patterns from PubMed abstracts, yielding 463 high-precision patterns (f-score 78.7%) that capture modalities and contexts, surpassing state-of-the-art on corpora like AIMed. These applications underscore the method's versatility in handling massive genomic data while prioritizing impactful, verifiable biological insights.

References

[1]
[PDF] Mining Sequential Patterns - Ramakrishnan Srikant
Given a database D of customer transactions, the problem of mining sequential patterns is to nd the maximal sequences among all sequences that have a.
[2]
(PDF) Sequential Pattern Mining: Approaches and Algorithms
Aug 10, 2025 · Sequential Pattern Mining arose as a subfield of data mining to focus on this field. This article surveys the approaches and algorithms proposed ...<|control11|><|separator|>
[3]
Sequential pattern mining algorithms and their applications
Oct 5, 2024 · Sequential pattern mining (SPM) is a useful tool for extracting implicit and meaningful rules from sequence datasets that can aid the decision-making process.
[4]
[PDF] A Survey of Sequential Pattern Mining - Philippe Fournier-Viger
In this paper, we are interested by sequences, as it is the type of data used in sequential pattern mining. Definitions related to sequences are given next with ...
[5]
[PDF] Mining Sequential Patterns: Generalizations and Performance ...
R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Per- formance Improvements. Research Report RJ 9994, IBM Almaden Research Center,.
[6]
SPADE: An Efficient Algorithm for Mining Frequent Sequences
In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans.
[7]
[PDF] PrefixSpan: Mining Sequential Patterns Efficiently by Prefix ...
In this paper, we propose a novel sequential pattern mining method, called PrefixSpan (i.e., Prefix-projected. Sequential pattern mining), which explores prefix ...
[8]
BIDE-Based Parallel Mining of Frequent Closed Sequences with ...
In this paper, we develop a parallel implementation of BIDE algorithm on MapReduce, called BIDE-MR. ... Parallel Mining of Closed Sequential patterns. In: 11th ...
[9]
Sequential pattern mining -- approaches and algorithms
Sequential Pattern Mining arose as a subfield of data mining to focus on this field. This article surveys the approaches and algorithms proposed to date.
[10]
[PDF] Comparative Study of Sequential Pattern Mining Frameworks ...
Definitions: The example in section 4 will help you understand the definitions better. The database, D, are defined in the same way as in the support framework.
[11]
[PDF] A Survey of Sequential Pattern Mining
In this paper, we are interested by sequences, as it is the type of data used in sequential pattern mining. Definitions related to sequences are given next with ...
[12]
https://web.cs.ucla.edu/~weiwang/paper/ICDM02_WORKSHOP.pdf
[13]
Scalable sequential pattern mining for biological sequences
The focus of this paper is scalable techniques for mining frequent patterns from a large data- base of biosequences. In biology, various tools have been ...
[14]
[PDF] SQUIRE: Sequential pattern mining with quantities q
Quantity information included in the mined sequential patterns can provide useful insight to the users.
[15]
https://www.cs.ubc.ca/~rng/psdepository/shim2007.pdf
[16]
[PDF] C oSpan: Mining Closed Sequential Patterns in Large Datasets
By exploring novel global optimization techniques, an efficient algorithm, called CloSpan (Closed Sequential pattern mining) is developed, which outperforms the.Missing: 2003 | Show results with:2003
[17]
CloSpan: Mining: Closed Sequential Patterns in Large Datasets
Dec 18, 2013 · In this paper, we propose an alternative but equally powerful solution: instead of mining the complete set of frequent subsequences, we mine ...Missing: post- | Show results with:post-
[18]
[PDF] Sequential PAttern Mining using A Bitmap Representation
Agrawal and Srikant introduced the sequential pattern mining problem in [2]. Many methods, which are based the. Apriori property [1], have been proposed for ...
[19]
Sequential PAttern mining using a bitmap representation
We introduce a new algorithm for mining sequential patterns. Our algorithm is especially efficient when the sequential patterns in the database are very long.
[20]
[PDF] Detecting sequential patterns for cross-selling fast moving consumer ...
In the marketing domain, sequential patterns have been usefi.dly deployed for predicting various aspects of customer purchase behavior.
[21]
Efficient web usage mining process for sequential patterns
This paper presents a new web usage mining process for finding sequential patterns in web usage data which can be used for predicting the possible next move ...
[22]
Mining Significant Usage Patterns from Clickstream Data
In this paper, a technique to generate Significant Usage Patterns (SUP) is proposed and used to acquire significant “user preferred navigational trails”. The ...
[23]
Sequential Sentiment Pattern Mining to Predict Churn in CRM Systems
Sequential pattern detection is one of the most popular approaches to the churn prediction problem. Although promising, initiatives based on this approach do ...
[24]
Goal-oriented sequential pattern for network banking churn analysis
In this research, we specify how to judge whether a customer is leaving and the retaining strategies. The Sequential Pattern cannot conduct a research function ...
[25]
Forecasting Supplies Inventory Using Sequential Pattern Analysis
The goal is to identify what particular items need to have more stocks and determine when they will need it and to predict the request pattern of items of every ...Missing: case seasonal
[26]
[PDF] Integrating Data Mining and Predictive Modeling Techniques ... - arXiv
Sequential pattern mining: Using the PrefixSpan algorithm, it ... Market basket analysis with association rules. Communications in Statistics ...