Fact-checked by Grok 2 weeks ago

Sequential pattern mining

Sequential pattern mining is a subfield of focused on discovering frequent subsequences in large databases of sequences, such as purchase histories or biological logs, where the order of elements matters. It identifies patterns that appear in a significant portion of the sequences, typically defined by a minimum , enabling the extraction of meaningful temporal relationships. The technique was pioneered by Rakesh Agrawal and Ramakrishnan Srikant in their seminal paper, which introduced algorithms to solve the problem over databases of ordered transactions. Central to sequential pattern mining are concepts like support, which measures the frequency of a subsequence across the dataset, and maximal patterns, which are the longest frequent subsequences not contained in longer ones. Early approaches, such as the AprioriAll algorithm, adapted candidate-generation methods from association rule mining to prune infrequent candidates level by level. Subsequent generalizations, like the Generalized Sequential Patterns (GSP) algorithm, incorporated time constraints and taxonomic hierarchies to handle more complex scenarios. Over time, the field evolved with more efficient methods, including vertical database formats in for faster counting via intersection operations and pattern-growth techniques in PrefixSpan, which project databases based on prefixes to avoid exhaustive candidate enumeration. These advancements have addressed scalability issues in massive datasets. Applications span diverse domains, including market-basket analysis for predicting customer behavior, bioinformatics for sequencing, web usage mining for user navigation patterns, and recommender systems in . Ongoing research tackles challenges like high-dimensional data and real-time processing in and trajectory analysis.

Overview

Definition and Motivation

Sequential pattern mining is the task of discovering statistically relevant subsequences, or patterns, in a , where the relative order of elements is preserved and interestingness is typically measured by frequency exceeding a user-specified minimum . This process identifies frequent ordered patterns that capture temporal dependencies, distinguishing it from unordered pattern mining tasks like frequent itemset mining. The motivation for sequential pattern mining stems from the need to analyze ordered in real-world scenarios, such as transaction histories, where discovering sequences of events can reveal behavioral trends or causal links that unordered analysis overlooks. For instance, in databases, it uncovers patterns like a purchasing item a, followed later by item b, and then item c (denoted as the sequence ⟨a, b, c⟩), indicating potential opportunities or product affinities over time. This approach was first formalized to support decision-making in large-scale sales , enabling retailers to predict future behaviors from historical sequences. Beyond commerce, sequential pattern mining holds importance in for modeling dynamic datasets, facilitating applications such as prediction of future events, detection of anomalies in logs, and understanding user behaviors in sequential contexts like web navigation or biological processes. By emphasizing order, it provides deeper insights into temporal relationships, extending the capabilities of traditional mining techniques to handle inherently sequential information.

Historical Development

Sequential pattern mining originated from the broader field of association rule mining, with its early foundations laid in the mid-1990s through extensions of frequent itemset mining techniques. The , introduced by Agrawal and Ramakrishnan Srikant in 1994 for discovering frequent itemsets in transaction databases, provided the conceptual basis for handling ordered data. This was extended to sequences in their seminal 1995 paper, where they formally defined the problem of mining sequential patterns—frequent subsequences in a database of sequences—and proposed three Apriori-inspired algorithms to address it efficiently. Building on this, and Srikant introduced the Generalized Sequential Patterns (GSP) in 1996, which generalized the approach to incorporate time constraints, hierarchies, and variable-length gaps between events, while demonstrating significant improvements over the initial methods. The late 1990s and early 2000s saw further refinements, with influential contributions from researchers like , who developed in 2001, a vertical-id-list-based that avoids candidate generation by using structures for efficient pattern discovery. Concurrently, Jiawei Han and colleagues advanced the field through pattern-growth methods, exemplified by FreeSpan (2000) and culminating in PrefixSpan (2001), which projected databases based on prefixes to grow patterns without exhaustive candidate enumeration, marking a pivotal shift from candidate-generation paradigms like Apriori and GSP to more scalable pattern-growth approaches. This evolution continued into the 2000s with optimizations for constrained and closed patterns, but the advent of in the 2010s necessitated adaptations for distributed environments. Post-2010, integration with frameworks like and enabled of massive datasets; for instance, the BIDE-MR in 2012 extended closed sequential pattern mining to Hadoop clusters, achieving substantial speedups on large-scale data. In the 2010s and 2020s, the field further evolved with the development of high-utility sequential pattern mining, which considers pattern profitability beyond frequency, and advancements in streaming and incremental mining for . Recent reviews highlight continued progress in parallel and distributed , as well as integrations with for enhanced scalability and applications in domains like and . Key figures such as , Srikant, Han, and Zaki remain central to the field's development, influencing subsequent work on scalable and application-specific variants.

Core Concepts

Sequence Representation

In sequential pattern mining, a sequence is formally defined as an ordered list of itemsets, where each itemset is a non-empty set of items that occur together at a specific point in time. This representation captures the temporal order of events, with itemsets denoted in parentheses to indicate , such as S = \langle (a)(bc)(d) \rangle, where a, b, c, and d are distinct items, and the itemset (bc) signifies that items b and c appear concurrently before d. A consists of a collection of such sequences, each associated with a , such as a customer ID () in customer data. For instance, a database might include entries like SID 10: \langle (30)(90) \rangle and SID 20: \langle (10)(20 30)(40) \rangle, where each sequence aggregates transactions ordered by time, and items within an itemset are typically sorted alphabetically for consistency. This structure facilitates efficient storage and querying in relational or vertical formats, enabling the process to scan for patterns across the ordered data. A key distinction in sequence representation is between subsequences and substrings. A subsequence allows for non-consecutive matches while preserving order, meaning a pattern \alpha = \langle a_1 a_2 \dots a_m \rangle is contained in a sequence \beta = \langle b_1 b_2 \dots b_n \rangle (with m \leq n) if there exist strictly increasing indices $1 \leq i_1 < i_2 < \dots < i_m \leq n such that a_j \subseteq b_{i_j} for each j = 1 to m. For example, \langle (a)(c) \rangle is a subsequence of \langle (a)(b)(c) \rangle since a matches the first itemset and c matches the third, skipping the intervening b. In contrast, a substring requires consecutive itemsets, so \langle (a)(c) \rangle would not be a substring of the same sequence due to the gap. This non-consecutive allowance is fundamental to capturing realistic temporal relationships in applications like purchase histories. Extensions to basic sequence representations incorporate additional constraints to model real-world complexities. Timestamps can be integrated by associating each itemset with a specific time value, enabling constraints on the intervals between consecutive itemsets, such as requiring a minimum or maximum (e.g., items separated by 1 to 7 days). This is exemplified in generalizations where patterns specify time windows, like \langle (a) \rightarrow_{d_1}^{d_2} (b) \rangle, with d_1 and d_2 defining allowable gaps. Gaps are handled by permitting variable numbers of intervening itemsets or items, often denoted with constraints like maximum gap size to avoid overly sparse patterns. Wildcards, represented as symbols like * for any single item or itemset, further extend flexibility, allowing patterns such as \langle (a)(*)(c) \rangle to any item between a and c, which supports mining more generalizable motifs in noisy or variable data. These extensions maintain the ordered itemset structure while enhancing expressiveness for domains like logs or biological sequences.

Pattern Support and Frequency

In sequential pattern mining, the support of a pattern P measures its frequency in a sequence database D, defined as the number of sequences in D that contain P as a subsequence. A subsequence occurs when the items of P appear in the same order within a sequence, allowing for intervening items but preserving the relative ordering. The absolute support counts the raw occurrences, while the relative support normalizes this by the total number of sequences, providing a proportion between 0 and 1. The relative support is formally given by: \text{Support}(P) = \frac{|\{ s \in D \mid P \text{ is a subsequence of } s \}|}{|D|} This formulation, introduced in early work on the topic, enables the identification of frequent patterns by comparing against a user-specified threshold. Beyond basic support, additional measures evaluate the strength and reliability of patterns, particularly for sequential rules of the form A \rightarrow B, where A precedes B. Confidence quantifies the reliability of such a rule as the ratio of the support of the full rule to the support of the antecedent: \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \rightarrow B)}{\text{Support}(A)} This measure indicates the proportion of sequences supporting A that also support B afterward, aiding in the discovery of predictive relationships. For multi-element patterns, all-confidence, defined for an itemset X as \text{all-confidence}(X) = \frac{\text{Support}(X)}{\max_{i \in X} \text{Support}(\{i\})}, is the minimum confidence among association rules derived from X by placing one item in the consequent, helping to identify cohesive itemsets while being null-invariant. To manage the vast number of potential patterns, a minimum support threshold (minsup) is applied to retain only frequent ones, pruning infrequent candidates early. For instance, setting minsup = 0.5 requires a pattern to appear in at least 50% of the sequences, balancing completeness with computational efficiency. Computing support faces challenges, particularly with noisy data where minor perturbations can alter subsequence matches, leading to unreliable counts under standard definitions. Similarly, varying sequence lengths exacerbate the issue, as longer sequences yield exponentially more potential subsequences (up to $2^l for length l), inflating the search space and complicating uniform thresholding across heterogeneous data.

Types of Patterns

String-Based Patterns

String-based patterns in sequential pattern mining involve discovering frequent subsequences within strings composed of individual symbols, such as characters in text or nucleotides in DNA, where the relative order of symbols is preserved but intervening elements (gaps) are permitted. This contrasts with broader sequential mining by focusing exclusively on linear, atomic symbol sequences without grouping into multi-element sets at discrete points. The concept originates from the foundational definition of subsequences in sequential pattern mining, adapted to singleton items for string data. These patterns emphasize exact symbol matching while allowing variable gaps to identify meaningful ordered motifs in ordered text data. Approximate variants incorporate similarity measures to handle variations, such as insertions, deletions, or substitutions in biological or noisy textual data. A key technique for assessing such similarity is the (LCS), which computes the longest order-preserving shared between two strings, often used to evaluate pattern relevance or cluster similar sequences in mining tasks. For example, LCS has been applied to align and compare mined patterns against base motifs, maximizing common elements to filter robust subsequences. Short string-based patterns frequently leverage n-gram extensions, which capture contiguous subsequences of fixed length n (e.g., bigrams for n=2) to efficiently estimate frequencies and build hierarchical representations in large corpora. This approach is particularly effective for initial pattern discovery in text, where n-grams serve as compact proxies for local sequential structure before extending to gapped subsequences. In bioinformatics applications, such techniques mine DNA sequences for motifs like "ATG-CGT", where "ATG" denotes a start codon and "CGT" a coding triplet, revealing gene regulatory patterns across genomes.

Itemset-Based Patterns

Itemset-based patterns in sequential pattern mining represent sequences where each element is a set of items, rather than individual items or characters. Formally, a pattern is defined as an ordered list of itemsets, denoted as \langle \{i_1, i_2, \dots, i_k\}, \{j_1, j_2, \dots, j_m\}, \dots \rangle, where each itemset \{i_1, i_2, \dots, i_k\} groups multiple items that co-occur within the same time or , and the across itemsets reflects temporal progression across distinct time points. This formulation captures subsequences in a database of sequences, where the appears if the items in each itemset are contained (in any order within the set) in corresponding transactions, and the itemsets follow one another . Key characteristics of itemset-based patterns include the allowance for multiple items per time step, enabling the modeling of concurrent events, while maintaining strict ordering between consecutive itemsets to preserve temporal relationships. Within an individual itemset, the order of items is typically non-strict, meaning the items are considered a unordered collection unless specified otherwise, which distinguishes these patterns from purely linear sequences. This structure supports the discovery of frequent where is measured by the proportion of input sequences containing the pattern as a , allowing for gaps between matching itemsets to reflect real-world delays in event occurrence. A representative example is in customer purchase histories, where a pattern like \langle \{\text{bread}, \text{milk}\}, \{\text{butter}\} \rangle indicates that customers who buy bread and milk together in one transaction often purchase butter in a subsequent transaction, potentially revealing cross-selling opportunities in retail data. Such patterns emerge from timestamped transactional records, where each customer's sequence consists of itemsets from their ordered purchases over time. Extensions to itemset-based patterns incorporate taxonomic hierarchies, allowing patterns to generalize across item categories; for instance, a pattern involving "" could match specific items like "" or "cheese" based on an is-a relationship defined in a tree, thereby reducing redundancy and capturing broader associations. Similarly, quantitative attributes can be integrated into itemsets, such as specifying ranges or exact values for item quantities (e.g., \langle \{\text{[milk](/page/Milk):2}\}, \{\text{diapers:1}\} \rangle), which enriches patterns with information for more nuanced analysis in domains like inventory management. These extensions maintain the core sequential structure while enhancing expressiveness. Itemset-based patterns are particularly relevant for relational or transactional datasets where temporal order arises from timestamps, such as customer transaction logs or event streams, enabling the identification of behavioral trends without assuming strict intra-transaction ordering.

Algorithms

Apriori-Inspired Methods

Apriori-inspired methods for sequential pattern mining adapt the Apriori principle from frequent itemset mining, which posits that if a sequential pattern does not meet the minimum support threshold (i.e., it is infrequent), then all its extensions or supersets are also infrequent. This monotonicity property enables efficient pruning of candidate patterns during the mining process, reducing the search space by eliminating supersets of infrequent patterns early. By generating candidates level-by-level—starting from short sequences and extending to longer ones—these methods systematically explore the pattern space while minimizing unnecessary computations. The Generalized Sequential Patterns (GSP) algorithm, introduced by Srikant and Agrawal in , exemplifies this approach as one of the earliest and most influential methods for mining sequential patterns in large databases. GSP employs a database representation, where sequences are stored as lists of itemsets, and proceeds in iterative passes over the data. In the first step, it scans the database to identify all frequent 1-sequences (single-item patterns) that exceed the user-specified minimum support. Subsequent iterations generate candidate (k+1)-length sequences by joining frequent k-length sequences that share a common suffix of length (k-1), followed by pruning any candidates containing infrequent (k)-subsequences based on the Apriori property. Support for these candidates is then counted in an additional database scan using sequential joins to match patterns against the data, continuing until no more frequent patterns are found. This level-wise strategy ensures completeness but relies on multiple I/O operations, making it scalable for moderate-sized datasets with adjustable support thresholds. Building on the Apriori framework, the Sequential PAttern Discovery using Equivalence classes () algorithm, developed by in , addresses some inefficiencies of GSP by adopting a vertical database . In this representation, each distinct item is indexed with a (or id-list) indicating the sequence identifiers and timestamps where it appears, transforming the problem into efficient intersections over these bitmaps. SPADE performs a depth-first traversal of a of classes—groups of sequences sharing the same items in a particular order—computing support through bitwise operations on id-lists rather than full scans for each candidate level. This allows SPADE to discover all frequent sequential patterns using only three database passes: one to build the vertical format, one to generate and count shorter patterns, and one to extend to longer ones, with pruning applied via the Apriori property at each step. SPADE demonstrates superior performance over GSP on dense datasets, where bitmap operations accelerate support counting, though it requires more memory for storing id-lists. Despite their foundational role, Apriori-inspired methods like GSP and SPADE face inherent limitations that can hinder scalability in very large or sparse datasets. The need for multiple database scans in GSP leads to significant I/O overhead, particularly as pattern lengths increase, while both algorithms risk generating an explosion of candidates when many short patterns are frequent, overwhelming computational resources even with . These challenges have motivated subsequent advancements, though the core Apriori property remains a cornerstone for ensuring the correctness and efficiency of candidate reduction in sequential mining.

Projection and Pattern-Growth Methods

Projection and pattern-growth methods represent a class of algorithms for sequential pattern mining that avoid the candidate generation and testing overhead of Apriori-inspired approaches by recursively projecting the sequence database onto suffixes following frequent prefixes, thereby confining to relevant subspaces. This core idea enables efficient pattern discovery through divide-and-conquer strategies, where patterns grow incrementally from prefixes without exhaustive enumeration of all possible candidates. By focusing on postfix projections, these methods substantially reduce the computational cost, particularly for sparse or long-sequence datasets, as the projected databases shrink progressively with pattern extension. A seminal algorithm in this category is PrefixSpan, introduced in 2001, which performs prefix-projected sequential pattern mining by constructing projected databases for each frequent prefix and mining them recursively. PrefixSpan begins by scanning the original database once to identify frequent length-1 prefixes, then partitions the search space based on these prefixes. For each prefix, it projects the database to include only the suffixes that follow the prefix occurrences, and recurses to grow longer patterns. Pattern growth occurs by appending frequent items either to the last itemset in the prefix or as a new itemset, ensuring ordered extension while maintaining support thresholds. To enhance efficiency, PrefixSpan employs pseudo-projection, which uses in-memory pointers and offsets to represent projected databases without physically copying data, making it suitable when sequences fit in main memory. Additionally, it incorporates bi-level projection with an S-step matrix to optimize the discovery of length-2 patterns by tracking intra-sequence item co-occurrences. Experimental evaluations demonstrate that PrefixSpan requires fewer database projections than earlier methods like FreeSpan and scales well to large databases with up to millions of sequences. Building on similar principles, CloSpan, proposed in 2003, extends pattern-growth to mine closed sequential patterns—those frequent sequences with no proper supersequence sharing the same —thereby eliminating redundancy in the output. CloSpan adopts a vertical IDList format to represent sequence occurrences and employs a with backward traversal to verify closure properties efficiently, pruning branches where extensions cannot yield closed patterns. It merges sequences during growth to form candidates and uses techniques, such as checking for non-extendibility, to avoid redundant computations. This focus on maximal patterns results in significantly smaller output sizes, often 10 to 60 times fewer than full frequent pattern sets from PrefixSpan, and faster execution times on benchmarks like synthetic datasets with varying sequence lengths. CloSpan's approach is particularly advantageous for applications requiring concise representations, as it mines only the essential patterns while preserving all support information through closure. These projection-based methods offer key advantages, including a single initial database scan followed by targeted projections, which minimizes I/O operations compared to multi-pass candidate enumeration. They scale effectively to long sequences by confining exploration to promising subspaces, often outperforming level-wise methods by orders of magnitude on dense datasets. Moreover, they naturally accommodate user-specified constraints, such as maximum gaps between elements, through modified projection rules that filter irrelevant suffixes early. A notable variant is , introduced in 2002, which mines frequent sequences using a vertical bitmap representation without explicit database . SPAM transforms the sequence database into bitmaps for each item, where bits indicate presence in specific transactions, enabling rapid support counting via bitwise AND operations and transformations for sequential extensions. It traverses a lexicographic depth-first, incrementally building patterns and pruning infrequent branches on-the-fly. Unlike pure projection methods, SPAM maintains the full database in memory as bitmaps, achieving up to an order-of-magnitude speedup over PrefixSpan on large, long-pattern datasets due to its efficient counting and compression of sparse bitmaps. This bitmap-centric design makes SPAM particularly effective for customer transaction logs with high dimensionality.

Applications

Business and Marketing

Sequential pattern mining extends traditional market basket analysis, which focuses on simultaneous item associations, by incorporating the temporal order of purchases to uncover sequences that predict future buying behavior and enable targeted strategies. For instance, in settings, it identifies sequences of product purchases across multiple transactions, allowing stores to recommend complementary items at the point of sale or through personalized promotions. This approach has been applied in sectors to detect sequential patterns that inform opportunities, improving sales efficiency by anticipating customer needs based on historical sequences. In web usage mining, sequential pattern mining analyzes clickstream data from user sessions to reveal navigation patterns, which helps optimize website structure and enhance . By extracting frequent sequences from server logs, such as paths from homepage to product pages and checkout, businesses can reorganize site layouts to reduce drop-off rates and guide users toward conversions. Research demonstrates that these patterns, derived from clickstream logs, provide actionable insights for improving site and increasing engagement in environments. Within (CRM), sequential pattern mining supports churn prediction by modeling sequences of customer interactions, such as login followed by purchase and then cancellation, to identify at-risk behaviors early. These patterns enable proactive retention strategies, like targeted interventions before churn occurs, particularly in and banking where interaction histories are rich. Studies show that mining sequential sentiment patterns from CRM data can significantly improve churn prediction accuracy, allowing firms to retain valuable customers through timely or adjustments. Retail chains have leveraged sequential pattern mining for inventory by analyzing seasonal buying sequences, such as holiday-related purchase progressions, to align stock levels with anticipated demand patterns. For example, patterns in past transaction sequences help predict surges in specific items during peak seasons, reducing overstock and stockouts. A in operations integrated sequential pattern mining with time-series to enhance inventory decisions, demonstrating improved efficiency in managing seasonal variations across product categories. To prioritize actionable insights, businesses integrate metrics like , which measures the of a in the , and , which indicates the reliability of one item following another in the . These metrics allow ranking of patterns by their prevalence and predictive strength, ensuring that only high-impact sequences inform campaigns or operational changes. In practice, thresholds for and are tuned to filter noise, focusing on sequences with substantial in and applications.

Bioinformatics and Genomics

Sequential pattern mining plays a crucial role in bioinformatics and genomics by uncovering temporal and structural relationships in biological sequences, such as DNA, RNA, proteins, and gene expression data over time. This approach helps identify regulatory mechanisms, functional motifs, and disease-associated patterns that traditional frequency-based analyses might overlook, particularly when incorporating utility measures like gene importance or expression levels. Applications span from predicting protein folds to extracting gene interactions from literature, enabling deeper insights into genomic functions and disease pathways. In gene regulation studies, sequential pattern mining analyzes time-series data to discover patterns of changes, aiding in the understanding of progression. For example, the Top-HUGS algorithm transforms datasets into sequences based on fold-change thresholds and mines high-utility patterns by weighting genes according to their associations and internal utilities, revealing more biologically relevant regulations than conventional methods. Evaluated on the GSE6377 involving , , and , Top-HUGS identified top-k patterns with higher average utility and popularity scores, while being over five times faster and using less memory than baselines like CTGR-Span. Similarly, extensions to high average-utility sequential rules have been used to associate mRNA expression patterns with miRNA regulations in cancer datasets, optimizing multiple objectives for biological relevance. For , sequential pattern mining extracts frequent subsequences from chains to classify proteins into folds, supporting function annotation in large proteomic databases. The cSPADE algorithm, adapted with constraints for flexible gaps, mines patterns from sequences in the (PDB) and SCOP hierarchies, achieving 24.9% top-1 accuracy and 56.5% top-5 accuracy across 36 folds in a of 2,410 proteins. This method highlights discriminative motifs, such as those with maximum gaps of four residues, outperforming some alignment-based classifiers by focusing on sequential motifs rather than global similarities. In genomics, the technique identifies significant patterns in DNA sequences to reveal hidden biological functions, such as regulatory elements or evolutionary signals. An index-based spanning tree approach mines contiguous patterns using information gain and confidence thresholds on large datasets, like 19,979 real DNA sequences averaging 1,024 bases, efficiently detecting surprising motifs (e.g., IG(ATCG) ≈ 19.88) with reduced runtime compared to exhaustive scans. Additionally, in text mining for genomics, recursive sequential pattern mining with linguistic constraints extracts gene interaction patterns from PubMed abstracts, yielding 463 high-precision patterns (f-score 78.7%) that capture modalities and contexts, surpassing state-of-the-art on corpora like AIMed. These applications underscore the method's versatility in handling massive genomic data while prioritizing impactful, verifiable biological insights.

References

  1. [1]
    [PDF] Mining Sequential Patterns - Ramakrishnan Srikant
    Given a database D of customer transactions, the problem of mining sequential patterns is to nd the maximal sequences among all sequences that have a.
  2. [2]
    (PDF) Sequential Pattern Mining: Approaches and Algorithms
    Aug 10, 2025 · Sequential Pattern Mining arose as a subfield of data mining to focus on this field. This article surveys the approaches and algorithms proposed ...<|control11|><|separator|>
  3. [3]
    Sequential pattern mining algorithms and their applications
    Oct 5, 2024 · Sequential pattern mining (SPM) is a useful tool for extracting implicit and meaningful rules from sequence datasets that can aid the decision-making process.
  4. [4]
    [PDF] A Survey of Sequential Pattern Mining - Philippe Fournier-Viger
    In this paper, we are interested by sequences, as it is the type of data used in sequential pattern mining. Definitions related to sequences are given next with ...
  5. [5]
    [PDF] Mining Sequential Patterns: Generalizations and Performance ...
    R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and Per- formance Improvements. Research Report RJ 9994, IBM Almaden Research Center,.
  6. [6]
    SPADE: An Efficient Algorithm for Mining Frequent Sequences
    In this paper we present SPADE, a new algorithm for fast discovery of Sequential Patterns. The existing solutions to this problem make repeated database scans.
  7. [7]
    [PDF] PrefixSpan: Mining Sequential Patterns Efficiently by Prefix ...
    In this paper, we propose a novel sequential pattern mining method, called PrefixSpan (i.e., Prefix-projected. Sequential pattern mining), which explores prefix ...
  8. [8]
    BIDE-Based Parallel Mining of Frequent Closed Sequences with ...
    In this paper, we develop a parallel implementation of BIDE algorithm on MapReduce, called BIDE-MR. ... Parallel Mining of Closed Sequential patterns. In: 11th ...
  9. [9]
    Sequential pattern mining -- approaches and algorithms
    Sequential Pattern Mining arose as a subfield of data mining to focus on this field. This article surveys the approaches and algorithms proposed to date.
  10. [10]
    [PDF] Comparative Study of Sequential Pattern Mining Frameworks ...
    Definitions: The example in section 4 will help you understand the definitions better. The database, D, are defined in the same way as in the support framework.
  11. [11]
    [PDF] A Survey of Sequential Pattern Mining
    In this paper, we are interested by sequences, as it is the type of data used in sequential pattern mining. Definitions related to sequences are given next with ...
  12. [12]
  13. [13]
    Scalable sequential pattern mining for biological sequences
    The focus of this paper is scalable techniques for mining frequent patterns from a large data- base of biosequences. In biology, various tools have been ...
  14. [14]
    [PDF] SQUIRE: Sequential pattern mining with quantities q
    Quantity information included in the mined sequential patterns can provide useful insight to the users.
  15. [15]
  16. [16]
    [PDF] C oSpan: Mining Closed Sequential Patterns in Large Datasets
    By exploring novel global optimization techniques, an efficient algorithm, called CloSpan (Closed Sequential pattern mining) is developed, which outperforms the.Missing: 2003 | Show results with:2003
  17. [17]
    CloSpan: Mining: Closed Sequential Patterns in Large Datasets
    Dec 18, 2013 · In this paper, we propose an alternative but equally powerful solution: instead of mining the complete set of frequent subsequences, we mine ...Missing: post- | Show results with:post-
  18. [18]
    [PDF] Sequential PAttern Mining using A Bitmap Representation
    Agrawal and Srikant introduced the sequential pattern mining problem in [2]. Many methods, which are based the. Apriori property [1], have been proposed for ...
  19. [19]
    Sequential PAttern mining using a bitmap representation
    We introduce a new algorithm for mining sequential patterns. Our algorithm is especially efficient when the sequential patterns in the database are very long.
  20. [20]
    [PDF] Detecting sequential patterns for cross-selling fast moving consumer ...
    In the marketing domain, sequential patterns have been usefi.dly deployed for predicting various aspects of customer purchase behavior.
  21. [21]
    Efficient web usage mining process for sequential patterns
    This paper presents a new web usage mining process for finding sequential patterns in web usage data which can be used for predicting the possible next move ...
  22. [22]
    Mining Significant Usage Patterns from Clickstream Data
    In this paper, a technique to generate Significant Usage Patterns (SUP) is proposed and used to acquire significant “user preferred navigational trails”. The ...
  23. [23]
    Sequential Sentiment Pattern Mining to Predict Churn in CRM Systems
    Sequential pattern detection is one of the most popular approaches to the churn prediction problem. Although promising, initiatives based on this approach do ...
  24. [24]
    Goal-oriented sequential pattern for network banking churn analysis
    In this research, we specify how to judge whether a customer is leaving and the retaining strategies. The Sequential Pattern cannot conduct a research function ...
  25. [25]
    Forecasting Supplies Inventory Using Sequential Pattern Analysis
    The goal is to identify what particular items need to have more stocks and determine when they will need it and to predict the request pattern of items of every ...Missing: case seasonal
  26. [26]
    [PDF] Integrating Data Mining and Predictive Modeling Techniques ... - arXiv
    Sequential pattern mining: Using the PrefixSpan algorithm, it ... Market basket analysis with association rules. Communications in Statistics ...