Fact-checked by Grok 2 weeks ago

Sketch Engine

Sketch Engine is a web-based tool designed for exploring patterns through large-scale text , enabling users to query and visualize authentic usage across multiple . Developed collaboratively by linguists Adam Kilgarriff, Pavel Rychly, Pavel Smrz, and David Tugwell starting in the early , it builds on innovations like word sketches—automatic summaries of a word's grammatical and collocational behavior—first introduced in the Macmillan English Dictionary in 2002. The tool was launched in 2004 as an extension of the query system, initially aimed at supporting by automating -based insights for dictionary compilation. Over the subsequent two decades, Sketch Engine has evolved into a comprehensive platform supporting over 100 languages and more than 800 pre-built corpora totaling around 1 trillion words, with individual corpora reaching up to 80 billion words each. Key features include word sketches, concordances, distributional thesauruses, term extraction, and diachronic trend analysis, allowing users to identify typical collocations, rare usages, neologisms, and multilingual parallels. It accommodates diverse writing systems such as Latin, Cyrillic, and Chinese, and facilitates custom corpus building from web sources or uploaded files. Widely adopted in , , and , Sketch Engine serves linguists, lexicographers, translators, educators, and national institutes for tasks ranging from development to language teaching and historical text analysis. Major users include , , and institutions like the Czech and academies, with ongoing updates including enhancements to diachronic analysis tools in late 2023 and recent 2025 additions such as the ParlaTalk collection of parliamentary corpora from 22 states.

Introduction and History

Overview

Sketch Engine is a web-based corpus manager and text analysis software developed by Lexical Computing for querying and analyzing large collections of authentic texts across over 100 languages and more than 30 writing systems. It serves as a comprehensive platform for linguistic exploration, enabling users to uncover patterns in language use through data-driven methods. The primary purposes of Sketch Engine include facilitating complex queries into text corpora for professionals such as lexicographers, , linguists, researchers, teachers, and language learners, allowing them to study real-world language patterns, collocations, and contextual usages. It supports applications in fields like , , , and by providing empirical evidence from vast datasets. Originating from the and corpus tools, Sketch Engine has become integral to dictionary creation and language resource development. Key to its utility are over 800 pre-built corpora encompassing a total of 1 trillion words, offering scalable resources from small specialized sets to massive general collections. Available as a subscription service with robust support, it also includes a free open-source version called NoSketch Engine, which allows self-hosting but requires users to provide their own corpora. In basic operation, users access or upload to the platform, execute searches such as concordances to retrieve contextual examples, and produce visualizations that highlight grammatical, collocational, and distributional patterns in . This workflow empowers evidence-based analysis without necessitating advanced programming skills for most tasks.

Development History

was developed in 2003 and launched in 2004 by Adam Kilgarriff, Pavel Rychlý, Pavel Smrz, and David Tugwell through their company, Lexical Computing, as a corpus analysis tool primarily aimed at lexicographers and linguists. The platform built upon earlier open-source components, including , a C++-based corpus indexer created by Rychlý during his time at , and , a web-based interface for corpus querying. An open-source variant, NoSketch Engine, was released alongside the version to support and use, providing core functionality without proprietary corpora or advanced features. Key early milestones included the integration of word sketches—automatic, corpus-derived summaries of a word's grammatical and collocational behavior—in 2004, which became a hallmark feature for efficient across languages. By 2014, Sketch Engine expanded with the launch of , a simplified web interface derived from the main platform, initially supporting English for language learners and later extending to other languages like , , , , and . In 2020, the company discontinued support for the legacy Bonito-based interface to streamline development toward a modern, unified . Post-2016 developments focused on performance and scalability, with the indexer undergoing a partial rewrite in the Go programming language starting that year to handle larger corpora more efficiently, culminating in significant speed improvements by the late . Following Kilgarriff's passing in 2015, the team emphasized multilingual capabilities, adding enhancements for bilingual and integrating with projects, such as the EUR-Lex parallel corpus covering all official EU languages for legal and translational analysis. By 2024, new features like the tool enabled diachronic analysis of word usage trends over time, while ongoing expansions added dozens of corpora annually, reaching over 800 preloaded options as of 2025 and incorporating AI-assisted functionalities for automated term extraction and . In 2025, updates included new corpora such as the ParlaTalk parliamentary collections from 22 EU states and enhancements to concordance visualization.

Core Features

Search and Analysis Tools

Sketch Engine provides a suite of search and analysis tools designed to enable linguists, lexicographers, and researchers to explore linguistic patterns within large text efficiently. At its core is the concordance search, which retrieves instances of words, phrases, or patterns in their surrounding contexts, typically displayed in keyword-in-context (KWIC) or full-sentence views. This tool supports extensive customization, including sorting results by corpus order, random selection, or relevance metrics such as Good Dictionary Examples, which prioritize illustrative usages based on linguistic criteria. Users can group concordances by frequency, attributes like part-of-speech tags, or , and apply filters to retain or exclude lines matching specific conditions, facilitating targeted analysis of up to 1,000 lines for download in preloaded . For deeper distributional analysis, Sketch Engine offers tools that identify co-occurring words and phrases, revealing syntactic and semantic relationships through statistical measures. These include lists of frequent within defined spans (e.g., left or right of the node word), sortable by metrics like t-score or logDice for reliability. Advanced querying is powered by the Corpus Query Language (CQL), a flexible syntax for specifying complex patterns, such as grammatical structures, optional elements, or alignments with tags like and part-of-speech. For instance, CQL allows searches like [lemma="run" & tag="V.*"] to capture verb forms in , enabling precise of multi-word units or rare phenomena across corpora. The platform's thesaurus and similarity functions leverage to automatically generate relations between words based on their patterns in the . The distributional thesaurus computes similarity scores based on word data to cluster synonyms, hyponyms, or contextually related terms, providing an automated alternative to manual . This tool supports exploratory queries, such as finding words similar to "" in financial versus river contexts, and is available for every word in supported corpora, drawing on principles established in early implementations like those from 2007. Diachronic analysis tools in Sketch Engine track frequency changes over time in timestamped corpora (available in 18 languages as of 2025), aiding the study of language evolution. The Trends feature generates graphs of word usage across periods, highlighting neologisms or shifts in meaning. Introduced in , the Timeline function enhances this by producing interactive visualizations for any search result, displaying normalized frequencies with options to compare multiple terms or filter by subcorpora, thus revealing granular trends like the rise of "" in recent decades. For multilingual research, Sketch Engine supports parallel corpus facilities, where aligned texts in multiple s allow querying in one language to retrieve corresponding segments in others. The parallel concordance displays results side-by-side, supporting equivalence studies through at sentence or paragraph levels, often built from bilingual or multilingual datasets using tools like Excel imports for 1:1 or M:N mappings. This enables cross-linguistic pattern analysis, such as identifying idiomatic translations, without requiring manual for basic setups.

Word Sketches and Extraction

Word sketches in Sketch Engine are algorithm-generated, one-page summaries that capture a word's grammatical and collocational behavior by organizing typical collocations into predefined categories based on syntactic relations. These summaries highlight patterns such as verbs with direct objects (e.g., for the verb "give," collocations like "advice," "information," or "money" as objects), nouns with modifiers (e.g., for "university," adjectives like "leading," "top," or "prestigious"), or subjects of verbs, providing a concise linguistic profile derived from . The generation process relies on a sketch grammar—a set of rules written in the Corpus Query Language (CQL)—that scans the for patterns around the target word, scoring collocations by frequency and significance to filter the most relevant examples. Keywords and terminology extraction tools in Sketch Engine identify significant single-word keywords and multi-word terms of a specific or domain by comparing their frequencies against a . These tools employ statistical measures such as log-likelihood or chi-squared tests to detect deviations from expected distributions, highlighting terms that are over-represented in the target text (e.g., extracting domain-specific vocabulary like "" or "" from AI-related documents). Specialized features like the Keywords & Terms tool and OneClick Terms automate this for user-uploaded texts, producing ranked lists of terms suitable for management in specialized fields. Customization of these features is achieved through adjustable sketch grammars, which can be tailored for different languages by adapting CQL rules to specific part-of-speech tagsets and , or for domains by modifying relation definitions to capture relevant patterns (e.g., adding industry-specific categories). Sketch Engine provides pre-built grammars for word sketches in 34 languages, including English, , , and , with extensions available for additional languages through user-defined rules. This approach draws from Adam Kilgarriff's emphasis on distributional properties, where a word's meaning and usage are inferred from its co-occurrences in varied grammatical contexts across large corpora.

SKELL Service

The (Sketch Engine for Language Learning) service was launched in 2014 as a free, public web-based tool providing simplified access to corpus data for non-experts, particularly language learners and educators, without requiring user login or registration. Developed by Lexical Computing, it offers a user-friendly to explore authentic usage through example sentences and basic analytical views, drawing from subsets of larger corpora maintained by Sketch Engine. Key features include simplified concordances, which display up to 40 contextual example sentences for a queried word or ; word sketches, which highlight common collocations and grammatical patterns in a tabular format; and the "Good Dictionaries" view, an algorithm-driven thesaurus showing synonyms and related terms. Unlike the full Sketch Engine, omits advanced query languages like CQL and support for custom corpora, focusing instead on straightforward searches to promote intuitive discovery. As of 2025, supports six s: English, (via ruSKELL), , , , and , with each interface tailored to provide relevant examples in the target . The service uses sampled subsets of multi-billion-word to ensure quick response times, though results include watermarks indicating the SKELL version for attribution. Designed primarily for teachers and students, aims to bridge with practical learning by offering real-world usage examples that enhance vocabulary acquisition, awareness, and writing skills, as evidenced in educational studies from the . Limitations include restricted result volumes to prevent overload and the absence of export options or detailed , encouraging users to upgrade to the commercial Sketch Engine for deeper analysis. In the , improvements to mobile responsiveness have made it more accessible on handheld devices, supporting integrations in activities and online EFL programs.

Corpora and Data Management

Available Text Corpora

Sketch Engine provides access to over 800 preloaded text corpora spanning more than 100 languages, with sizes ranging from approximately 1,000 words to 86.8 billion words, enabling diverse linguistic analyses from small specialized datasets to massive general-purpose collections. The corpora draw from varied sources, including web-crawled content, legal documents, translated , and domain-specific texts such as environmental or academic materials, offering comprehensive coverage for in , translation, and language teaching. Central to this collection is the TenTen family of corpora, which comprises web-derived texts for over 50 languages, each exceeding 10 billion words and processed with advanced cleaning, deduplication, , and to ensure high-quality linguistic data. Notable examples include the (BNC), a 100-million-word balanced sample of late 20th-century encompassing both written and spoken varieties, and the EUR-Lex parallel corpus, a multilingual repository of EU legal and public documents in 24 official languages, with a total size exceeding several billion words across all languages (e.g., English version: 630 million words), segmented by for studies. Additional domain-specific corpora feature the OpenSubtitles collection, which aggregates translated movie subtitles across 58 languages into 60 parallel sub-corpora for multimodal analysis, and the EcoLexicon English Corpus, a 23.1-million-word set of contemporary environmental texts supporting work in topics. Multilingual capabilities extend to over 100 languages, including low-resource ones like or indigenous Australian languages, often bolstered by targeted web crawls to fill representation gaps in under-documented varieties. Access is structured in tiers: open corpora, such as subsets of the BNC or EcoLexicon, are freely searchable without an account via the NoSketch Engine ; trial users and subscribers gain to full datasets, with ongoing updates ensuring relevance. Post-2020 expansions have addressed coverage gaps through additions like the ukTenTen22 (7.6 billion words of web texts), arTenTen24 (6.6 billion words of ), and 2024 releases including idTenTen24 (7.1 billion words of ), fiTenTen24 (4.4 billion words of ). As of July 2025, the ParlaTalk corpora of parliamentary debates have been expanded to 2.8 billion words in 20 languages.

Corpus Building and Customization

Corpus Architect serves as the core tool within Sketch Engine for enabling users to construct and tailor personalized text corpora without requiring specialized technical expertise. This web-based interface facilitates corpus creation either by uploading user-provided documents or by automatically crawling and harvesting content from the web using seed keywords or specified URLs via the integrated WebBootCaT technology. It supports a range of input formats, including (.txt), (.htm, .html), TEI XML (.tei, .xml), (.doc, .docx), PDF (.pdf, with OCR for scanned documents), and zipped archives for . The building process begins with users naming the , selecting the primary , and optionally adding a before proceeding to input . Uploaded texts undergo preprocessing to clean and structure the content, such as removing boilerplate or non-linguistic elements from web pages and converting complex formats to a vertical text representation suitable for indexing. Deduplication is applied to eliminate exact or near-duplicates, ensuring the corpus maintains high-quality, non-redundant . Following preprocessing, the tool automatically performs and for more than 30 , assigning positional attributes like lemmas and tags to each to support subsequent linguistic queries. Once prepared, the is compiled and indexed, generating searchable structures including word sketches and thesauri where applicable. This indexing step creates a fully functional that integrates directly with Sketch Engine's query interface, allowing users to analyze it using the same tools as pre-built collections. Small-scale corpora are available at no additional cost within standard subscriptions, while larger builds scale with institutional licensing for handling extensive datasets. Customization enhances user control over the corpus structure and utility. Users can define subcorpora to isolate specific subsets, such as by or time period, through configuration files that specify structural tags like documents (), paragraphs (), or sentences (). attributes, including details like author, publication date, or domain, can be added to enrich structural elements and enable filtered searches. For multilingual applications, parallel alignment is supported via formats like TMX or , allowing sentence-level correspondences to be established for . In the , updates to building have introduced streamlined handling of large-scale datasets through optimized processing pipelines and built-in automated assessments, such as during , to facilitate reliable use in projects.

Technical Architecture

Manatee

serves as the core backend database and indexing system for Sketch Engine, managing the storage and efficient retrieval of large-scale text . Originally developed in C++ by Pavel Rychlý, it was designed specifically for applications, enabling the handling of containing billions of words through optimized data structures such as inverted indexes for rapid query processing. Some components, including the corpus indexing tool mklcm, were rewritten in the Go programming language starting in 2016 to enhance performance and maintainability. Key functions of include processing tokenized text into a vertical format where each token is annotated with attributes such as part-of-speech () tags and lemmas, facilitating advanced linguistic analysis. During indexing, it builds positional inverted indexes that map attribute values to their occurrences in the , supporting fast searches via the Corpus Query Language (CQL). and tagging are integrated as attributes, allowing queries to target base forms or grammatical categories without reprocessing raw text. In terms of performance, is engineered to manage terabyte-scale corpora, with features like asynchronous query evaluation that display initial results before full computation, making it suitable for interactive use. Indexing supports to accelerate the building of large corpora, as introduced in version 2.152, reducing preparation time for multi-billion token datasets. The core of is available open-source as part of NoSketch Engine, an initiative that combines it with the interface for free corpus management, allowing customization for specific languages through extensible attribute handling and query optimizations. This open-source variant supports deployment in diverse environments while maintaining compatibility with Sketch Engine's proprietary extensions. Manatee interacts with the Bonito frontend to deliver query results, but its primary role remains backend data handling.

Bonito

Bonito serves as the web-based (GUI) for Sketch Engine, enabling users to input queries and interact with data through an intuitive platform. Developed as the client component in a client-server , it facilitates the display of search results such as keyword-in-context (KWIC) concordances, graphs, distributions, and word sketches, all rendered dynamically via technologies. Implemented in since version 2, leverages an object-oriented structure for maintainability and extensibility, utilizing tools like the Templating Engine to generate responsive outputs. Key features include support for multilingual user interfaces, with localization added for languages such as , Slovak, , , and in updates from 2021 to 2023, allowing seamless language selection based on browser settings or user profiles. Additionally, it provides access for programmatic interactions, enabling developers to retrieve results in or XML formats, with enhancements like keyword and customizable views introduced in versions 3.42 and 3.92. Post-2020 updates emphasized responsive design, incorporating mobile and touch compatibility, particularly for related services like , ensuring accessibility across devices by 2025. Bonito integrates closely with the corpus management system by communicating queries to the server for processing and retrieving data for , while handling frontend tasks independently. It manages user sessions through standard protocols, supporting features like subcorpus saving and query to maintain continuity during interactions. Security is enforced via role-based access controls, configurable for user groups and shared corpora, including for secured connections and permission checks to prevent unauthorized data access. The evolved significantly with the release of 2 in , transitioning from an earlier Tcl/Tk-based standalone application to a fully web-based CGI-driven , which replaced the legacy interface entirely by January 2020 to streamline maintenance and . Subsequent versions, such as 3.70 in 2021 introducing trends and 3.101 in 2023 enabling multiword sketches for queries of three or more terms, have continued to refine its capabilities for advanced linguistic analysis.

Corpus Architect

Corpus Architect is a Python-based utility integrated into Sketch Engine, designed to facilitate the creation and maintenance of custom corpora from raw text files or web sources without requiring advanced technical expertise. It serves as a dedicated for corpus preparation, enabling users to diverse data inputs into structured, queryable formats compatible with Sketch Engine's ecosystem. By incorporating web crawling capabilities via the BootCaT module, it allows automated collection of domain-specific texts using seed keywords and search engines, streamlining the assembly of corpora for linguistic analysis. The tool handles essential processes such as text cleaning to remove noise and inconsistencies, followed by annotation for linguistic features including , , (NER), and . Deduplication is a core step, employing algorithms to eliminate exact or near-duplicate content, ensuring quality and reducing redundancy during compilation. Once processed, Corpus Architect generates indexes in the format, which supports efficient storage and retrieval for subsequent querying. It also automates and the compilation of derived structures like word sketches and thesauri, enhancing the 's utility for lexicographic and research purposes. Advanced features include batch processing for handling large-scale data volumes and scripting interfaces for custom automation, allowing users to tailor workflows via scripts. The utility supports vertical file formats, where each appears on a separate line with associated attributes, facilitating precise alignment and analysis in multilingual or corpora. What distinguishes Corpus Architect within Sketch Engine is its seamless integration with the interface, enabling immediate querying and visualization of newly built corpora without additional setup.

Applications

In Lexicography and Publishing

Sketch Engine has been widely adopted by major publishers in lexicography since the early 2000s, enabling evidence-based dictionary production through corpus analysis. Oxford University Press (OUP), Macmillan, Cambridge University Press, and Collins—four of the UK's five largest dictionary publishers—have integrated it into their workflows for creating and updating monolingual and bilingual dictionaries. Macmillan was the first to use word sketches in 1999, while OUP adopted the full system shortly thereafter for thesaurus development and beyond. In dictionary compilation, Sketch Engine's word sketches provide concise summaries of a word's collocations, grammatical patterns, and usage, serving as draft entries for definitions and example selection. Lexicographers at these publishers employ term extraction tools to identify neologisms and multi-word units from large corpora, facilitating the detection of emerging language trends for inclusion in resources like learner's dictionaries. For instance, Macmillan's online dictionaries leverage these features to label core vocabulary (e.g., 7,500 "red words" for high-frequency terms) and generate corpus-attested examples, shifting from print to digital formats by 2012. The tool's impact lies in promoting data-driven , replacing intuition-based methods with statistical evidence from billions of words, which has streamlined production and improved accuracy. Reports from the 2014 highlight efficiency gains, such as generating detailed word profiles in seconds, allowing lexicographers to focus on curation rather than manual data gathering; this has supported the explosive growth of online dictionaries since 2009. A key case study involves OUP's use of the —nearly 2.1 billion words analyzed via Sketch Engine—for updating the (OED), including revisions to entries based on real-world usage across English variants. Similarly, multilingual projects, such as bilingual dictionaries, benefit from Sketch Engine's alignment tools for cross-language collocations. In the 2020s, Sketch Engine has evolved to incorporate hybrid AI-human workflows, enhancing lexicographic processes with automated features like word sense induction using language models to group collocations by meaning. This integration allows publishers to combine machine-generated insights with expert verification, as seen in recent updates to term extraction for more languages, supporting faster detection of specialized vocabulary in global resources.

In Research and Education

Sketch Engine has been extensively applied in academic linguistic research, particularly for diachronic analysis in sociolinguistics, where its Trends and Timeline tools enable researchers to track changes in word usage and frequency over time. For instance, the Timeline feature generates visualizations of language evolution, allowing studies on neologisms, semantic shifts, and sociolinguistic variations in large-scale corpora spanning decades or centuries. In translation studies, parallel corpora such as the OPUS collection facilitate comparative analysis across languages, helping scholars identify translation equivalents, idiomatic expressions, and alignment patterns in aligned sentence pairs. Additionally, researchers in domain-specific fields build custom corpora to analyze specialized texts; historians, for example, upload historical documents to create tailored corpora for examining linguistic features in archival materials like Early English Books Online. In education, Sketch Engine supports language teaching through its SKELL interface, a simplified version designed for classrooms that provides authentic examples of word usage without requiring advanced technical knowledge. Teachers in English as a Second Language (ESL) programs integrate SKELL to illustrate collocations, grammar patterns, and contextual examples, fostering corpus-based pedagogy that emphasizes real-language exposure over rote memorization. The platform also aids in analyzing learner corpora, where educators upload student writing to identify common errors, vocabulary gaps, and progress in language acquisition. The tool's community includes numerous universities and research institutions worldwide, such as and the , which provide institutional access for linguistic analysis and . Examples of its impact include sociolinguistic studies using to monitor sentiment shifts in economic terminology during crises, revealing patterns in public discourse. In ESL education, -based approaches with Sketch Engine have been adopted in programs to enhance vocabulary teaching, as demonstrated in classroom activities exploring word sketches for nuanced usage. Recent expansions from 2024 to 2025 have extended its utility to interdisciplinary areas like , integrating tools with AI for analyzing trends and multilingual data in research; as of November 2025, updates include the English Trends corpus exceeding 86 billion words for enhanced diachronic studies and timestamped corpora in 18 languages for time-specific multilingual analysis.

References

  1. [1]
    Sketch Engine: Create and search a text corpus
    Sketch Engine is the ultimate tool to explore how language works. Its algorithms analyze authentic texts of billions of words (text corpora)What can Sketch Engine do? · Word sketch · Price List · Quick Start Guide
  2. [2]
    [PDF] The Sketch Engine
    Now, we have developed the Sketch Engine, a corpus tool which takes as input a corpus of any language and a corresponding grammar patterns and which generates ...
  3. [3]
    How to use a corpus to get information about words - Sketch Engine
    Sketch Engine is an online text analysis tool that works with large samples of language, called text corpora, to identify what is typical and frequent in a ...
  4. [4]
    Sketch Engine news: enhanced tools, new corpora, and Lexicom ...
    Sketch Engine news: enhanced tools, new corpora, and Lexicom 2024 in Spain ... These regular updates enable you to use Trends, the #diachronic analysis ...Missing: 2025 | Show results with:2025
  5. [5]
    New 15-billion-word English corpus | Sketch Engine
    Check our new 15-billion-word English corpus (enTenTen) comprised of texts from the Web until the end of 2015. We used our newest advanced cleaning method ...Expand Your Linguistic... · New Corpora, Tips And... · Lexicom 2025, New Corpora...
  6. [6]
    About us - Lexical Computing
    Lexical Computing is a supplier of word databases, lexicons, n-gram databases and other language data and a developer of the Sketch Engine corpus software.
  7. [7]
    [PDF] Ten Years On 1. Introduction - The Sketch Engine
    in Pakistan (as an official language but not the mother tongue of many people) is a ... dedicated corpus linguistic tools, Google may be the best tool to use. For ...
  8. [8]
    NoSketch Engine and Sketch Engine
    NoSketch Engine is an open source version of Sketch Engine with certain functionality limitations. NoSketch Engine does not contain any corpora.Missing: GPL | Show results with:GPL
  9. [9]
    Open-source Natural Language Processing tools - Lexical Computing
    NoSketch Engine is an open-source corpus query system based on Sketch Engine. NoSketch Engine does not feature any of the automated corpus building tools ...Missing: GPL | Show results with:GPL
  10. [10]
    Choose the right corpus | Sketch Engine
    Sketch Engine provides you hundreds of corpora in various sizes from tiny (less than million words) to really huge (10+ billion words).
  11. [11]
    Sketch Engine team
    Adam Kilgarriff founded Lexical Computing, the company behind Sketch Engine, in 2003 and remained a central figure of Sketch Engine until November 2014.
  12. [12]
    The Sketch Engine: ten years on | Lexicography
    Jul 10, 2014 · The Sketch Engine is a leading corpus tool ... corpus websites and corpus tools as available for lexicography and corpus linguistics.
  13. [13]
    (PDF) The Sketch Engine - ResearchGate
    Now, we have developed the Sketch Engine, a corpus tool which takes as input a corpus of any language and a corresponding grammar patterns and which generates ...
  14. [14]
    [PDF] SkELL: Web Interface for English Language Learning - Sketch Engine
    SkELL is a web interface for English language learning, derived from Sketch Engine, offering concordance, word sketches, and a thesaurus.Missing: launch | Show results with:launch
  15. [15]
    Old interface closes down - Sketch Engine
    Sketch Engine decided not to maintain two interfaces. For this reason, the old interface closes down and will not be available any more after 20 January 2020.Old Interface · New Interface · Faqs<|control11|><|separator|>
  16. [16]
    Sketch Engine changelog - Manatee
    This software consists of three main components, which enable searching and building text corpora. Bonito – a graphical user interface to corpora maintained, ...Missing: origins | Show results with:origins
  17. [17]
    EUR-Lex parallel Corpus | Sketch Engine
    The EUR-Lex parallel corpus is a collection of multilingual corpora in all the official languages of the European Union.A General Purpose... · Important Copyright Notice · Size In Tokens
  18. [18]
    Discover the new Timeline and other features. - Sketch Engine
    Jul 1, 2024 · Track how wordusage and frequency change over time with Sketch Engine's Timeline Function! Discover trends, uncover new words, and delve into detailed changes.Missing: milestones history 2014 Manatee 2018
  19. [19]
    Automated word sense identification, multi-word term extraction for ...
    Sketch Engine supports monolingual and bilingual term extraction. Read more about linguistic tools for term extraction on our blog. Sketch Engine free trial.Missing: assisted 2025<|control11|><|separator|>
  20. [20]
    List of corpora | Sketch Engine
    This is a list of corpora preloaded in Sketch Engine and available to Sketch Engine users. In addition to these corpora, Sketch Engine holds other corpora.
  21. [21]
    Concordance - most powerful corpus search | Sketch Engine
    The concordance is the most powerful tool with a variety of search options. It can find words, phrases, tags, documents, text types or corpus structures.
  22. [22]
    CQL – Corpus Query Language - Sketch Engine
    The Corpus Query Language is a special code or query language used in Sketch Engine to search for complex grammatical or lexical patterns.
  23. [23]
    Word sketch - collocations and word combinations - Sketch Engine
    The word sketch shows the most typical collocations and word combinations of each word in the language identified in a text corpus.
  24. [24]
    distributional thesaurus - Sketch Engine
    Nov 13, 2024 · It draws on the theory of distributional semantics. The automatically produced thesaurus is available for each word in the corpus.Missing: functions | Show results with:functions
  25. [25]
    [PDF] An efficient algorithm for building a distributional thesaurus (and ...
    The Sketch Engine now allows the user to prepare keyword lists for any subcorpus, either in relation to the full corpus or in relation to another subcorpus,.
  26. [26]
    Timeline – language use over time | Sketch Engine
    The timeline function displays the changing frequency of a word or phrase over time. It provides a detailed graph with information about word frequency ...Missing: 2024 | Show results with:2024
  27. [27]
    Parallel concordance - searching translations - Sketch Engine
    The parallel concordance searches for words, phrases, tags, documents, text types or corpus structures in one language and displays the results together.
  28. [28]
    Build parallel and multilingual corpora - Sketch Engine
    This method only supports 2 languages. If your parallel corpus has more languages, an external tool or a manual procedure should be used for the alignment.Vertical File M:N · For Best Results · How To Build A Parallel...Missing: querying | Show results with:querying
  29. [29]
    Writing a Sketch Grammar
    Word Sketch Grammar is a series of rules written in the CQL query language that search for collocations in a text corpus and categorize them according to their ...
  30. [30]
    Keywords and term extraction - Sketch Engine
    Keyword and term extraction identifies typical words, single and multi-word units, and results in keywords (single words) and terms (multi-words).
  31. [31]
    The best term extraction - Sketch Engine
    Term extraction or terminology extraction is an automatic method of analysing text in order to identify phrases which fulfil the criteria for terms.
  32. [32]
    Supported languages - Sketch Engine
    This page lists all supported languages for which there are publicly available corpora. Languages with user corpora only are not included.Languages In Sketch Engine · Preloaded Corpora Features · User Corpora FeaturesMissing: enhancements 2018
  33. [33]
    SKELL – corpus tool for language learners - Sketch Engine
    A simple tool for students and teachers of English to easily check whether or how a particular phrase or a word is used by real speakers of English.English corpus · Skell · ruSKELL for Russian
  34. [34]
    A Critical Review of SkELL (Sketch Engine for Language Learning
    May 26, 2025 · Sketch Engine's simplified language learning interface offers learners authentic usage of words and phrases by tapping into its mother corpus ...
  35. [35]
    (PDF) Technology Integration and SkELL: A Novelty in English ...
    Aug 10, 2025 · On purpose, the present study introduces the implementation of SkELL (Sketch Engine for Language Learning) in English Foreign Language (EFL) ...Missing: 2020s | Show results with:2020s
  36. [36]
    Parallel corpora | Sketch Engine
    EUR-Lex ... This is an enormous corpus of various documents. The documents cover various topics. Although it is formal language on the legal side, it covers ...Texts Produced By The Eu · Eur-Lex · Non-Eu Languages
  37. [37]
    TenTen Corpora | Sketch Engine
    TenTen corpora are web-based text corpora, with 10+ billion words per language, built using specialized technology for linguistic content.
  38. [38]
    British National Corpus (BNC) search | Sketch Engine
    The British National Corpus (BNC) is a 100-million-word collection of samples of the written and spoken language of British English from the latter part of the ...
  39. [39]
    OpenSubtitles parallel corpora - Sketch Engine
    The OpenSubtitles parallel corpora are a collection of 60 corpora in 58 languages made up of translated movie subtitles in the OpenSubtitles database.
  40. [40]
    EcoLexicon corpus search - Sketch Engine
    Search the EcoLexicon corpus, an English corpus of contemporary environmental texts prepared by the LexiCon Research Group at the University of Granada.
  41. [41]
  42. [42]
    Happy New Year with a bunch of new corpora! - Sketch Engine
    In December 2024 we introduced new corpora for Lithuanian, Finnish and Swedish language.Missing: 2025 | Show results with:2025
  43. [43]
    corpus architect - Sketch Engine
    Nov 12, 2024 · an intuitive tool inside Sketch Engine for creating corpora from documents or the Web which does not require any expert knowledge.
  44. [44]
    Create a corpus from the web - Sketch Engine
    Create a multi-million-word corpus from the web within minutes. Fully automatic corpus building, lemmatization and tagging in 30+ languages.Missing: social media
  45. [45]
    Create a corpus by uploading files - Sketch Engine
    To create a corpus by uploading files, name it, select language, drag/drop files, and select 'I have my own texts'. Multiple files can be uploaded as a zip.
  46. [46]
    Preparing a Text Corpus for Sketch Engine: Overview
    Steps to prepare a text corpus for Sketch Engine · Prepare the corpus configuration file · Compile (index) the corpus · Verify corpus consistency, integrity and ...Missing: Architect features customization<|separator|>
  47. [47]
    Build a corpus from the web | Sketch Engine
    May 13, 2019 · Sketch Engine uses a deduplication procedure which is able to detect perfect duplicates as well as texts which were slightly adapted, shortened or extended.Missing: Architect | Show results with:Architect
  48. [48]
    Compiling corpus on local installation - Sketch Engine
    You are ready to compile the corpus in Sketch Engine. This can either be done from the Corpus Architect interface, or from the command line.
  49. [49]
    Fine-tune your corpus | Sketch Engine
    ### Summary of Fine-Tuning and Customization in Sketch Engine
  50. [50]
    [PDF] Manatee/Bonito -- A Modular Corpus Manager
    1. Rychlý, P.: Corpus managers and their effective implementation. Ph.D. thesis, Faculty of Informatics, Masaryk University (2000).
  51. [51]
    [PDF] Optimization of Regular Expression Evaluation within the Manatee ...
    Manatee is a state-of-the-art corpus management system providing facilities for efficient indexing (compiling) and searching billion-word-sized corpora. [6].
  52. [52]
    [PDF] Manatee, Bonito and Word Sketches for Czech
    Manatee serves as a base for the Sketch Engine [4]. As it was defined in [3], Word Sketches is a short corpus-based summary of a word's grammatical and ...Missing: origins | Show results with:origins
  53. [53]
    [PDF] Accelerating Corpus Search Using Multiple Cores - Sketch Engine
    The Manatee system (Rychlý, 2000) is a corpus manager, designed to be able to deal with ex- tremely large corpora, optimized for fast query evaluation. It ...
  54. [54]
    NoSketchEngine | langui.ch /'læŋgwɪtʃ/ /'læŋgwɪdʒ/
    Welcome to NoSketch Engine, an open-source project combining Manatee and Bonito into a powerful and free corpus management system.Missing: 2003 | Show results with:2003
  55. [55]
    acdh-oeaw/noske-ubi9: Building NoSkE for and with UBI9 - GitHub
    NoSketch Engine is an open-source project combining Manatee and Bonito and Crystal into a powerful and free corpus management and search system.
  56. [56]
    The Sketch Engine Changelog - Bonito
    Manatee – a corpus management tool including corpus building and indexing, fast querying and providing basic statistical measures, see the changelog of Manatee ...Missing: origins | Show results with:origins
  57. [57]
    [PDF] Corpus Query System Bonito – Recent Development - Sketch Engine
    At Masaryk University in Brno, a corpus manager Manatee/Bonito [1] is being developed, that is able to perform wide variety of tasks including e.g. fast.Missing: origins | Show results with:origins
  58. [58]
    [PDF] Proceedings of the 3rd Workshop on Building and Using ... - LREC
    May 22, 2010 · Corpus Architect. A recent release made available through the Sketch Engine website, Corpus Architect is a system incorporating BootCaT, a ...
  59. [59]
    (PDF) Sketch Engine for Bilingual Lexicography - ResearchGate
    Sketch Engine is a leading corpus query and corpus management tool that has been used for many large dictionary projects. The paper summarizes its features ...<|control11|><|separator|>
  60. [60]
    Using Computational Lexicography for Dictionary Production with ...
    The Sketch Engine has been adopted by four of the UK's five major dictionary publishers, national language institutes in nine European countries and over 100 ...
  61. [61]
    (PDF) The Sketch Engine: Ten Years On - ResearchGate
    The Sketch Engine is a leading corpus tool, widely used in lexicography. Now, at 10 years old, it is mature software.
  62. [62]
    Oxford English Corpus search - Sketch Engine
    The last version of this corpus contains nearly 2.1 billion words (almost 2.5 billion tokens). For more information visit Oxford Dictionaries's website. The ...
  63. [63]
    Trends – diachronic analysis | Sketch Engine
    Timelines are available via the Concordance or Wordlist tools. They are computed the same as the graphs in Trends, however, they can be generated for any word ...Missing: 2024 | Show results with:2024
  64. [64]
    OPUS parallel corpora | Sketch Engine
    Search the OPUS parallel corpora, the multilingual corpora in 40 languages. Make concordance or generate n-gram, word lists, collocations and more...
  65. [65]
    7 - Leveraging Large Corpora for Translation Using Sketch Engine
    Jun 10, 2019 · ... Sketch Engine or by building a new parallel corpus from their TM using Corpus Architect, the corpus-building component of Sketch Engine ...<|control11|><|separator|>
  66. [66]
    Historians | Sketch Engine
    ) to create a corpus from files or use our tool for building corpora from the web, e.g. downloading specific websites containing historical texts or books.Missing: development | Show results with:development
  67. [67]
    (PDF) The Sketch Engine as infrastructure for historical corpora
    Abstract A part of the case for corpus building is always that the corpus will have many users and uses. For that, it must be easy to use.
  68. [68]
    Teachers - Sketch Engine
    SKELL is a simple user-friendly interface to Sketch Engine for students and teachers of English. No need to worry about settings, just type a word and see how ...Using Sketch Engine As A... · Features To Use · Finding Examples
  69. [69]
    [PDF] A Critical Review of SkELL (Sketch Engine for Language Learning)
    Embracing. Topal's (2022) framework, this media review critically evaluates SkELL by addressing its strengths and weaknesses as a language learning resource.
  70. [70]
    [PDF] Corpora and Language Learning with the Sketch Engine and SKELL
    The Sketch Engine5 (Kilgarriff et al 2004) is a leading corpus tool which has been in use for lexicography and language research since 2004. It has two ...<|separator|>
  71. [71]
    Sketch Engine and other tools for language analysis – CASS
    All Lancaster University staff and students have now access to Sketch Engine, an online tool for the analysis of linguistic data.
  72. [72]
    New: Sketch Engine, tool for language research | Library
    Mar 11, 2025 · Sketch Engine is a tool for language research, which can also be used for text analysis or text mining.
  73. [73]
    Tracking diachronic sentiment change of economic terms in times of ...
    Our analysis shows that there were three clearly defined epochs during the timeline of the study: pre-crisis in 2007, the outburst of the crisis of 2008–2012, ...<|separator|>
  74. [74]
    Using the Sketch Engine Corpus Query Tool for Language Teaching
    Editor's Note: The web has brought a myriad of tools to our students' fingertips, and Keith Barrs has shared an engaging tool for investigating how language is ...
  75. [75]
    Bibliography of Sketch Engine
    To cite Sketch Engine in academic publications, use the following papers. If you refer to Sketch Engine in general, choose from the papers in General ...Missing: rewritten | Show results with:rewritten
  76. [76]
    Integrating critical corpus and AI literacies in applied linguistics
    These workshops focused on the use of the corpus analysis software, Sketch Engine and the Generative AI tool, ChatGPT for vocabulary and grammar learning.3. Results · 3.3. Focus Group · 4. Discussion And Conclusion