Fact-checked by Grok 2 weeks ago

EMBOSS

EMBOSS, the European Molecular Biology Open Software Suite, is a free and open-source bioinformatics software package developed specifically for molecular biology sequence analysis and related tasks.^[1] It comprises hundreds of well-documented command-line applications that support a wide range of functions, including sequence alignment, database searching, protein structure prediction, and phylogenetic analysis, all unified under a consistent interface.^[2] Designed to run on UNIX-like systems, Microsoft Windows, and MacOS, EMBOSS emphasizes extensibility through its AJAX library and integration with other open-source tools, making it accessible for both novice and expert users in the field.^[3] The origins of EMBOSS trace back to the early 1980s amid the dominance of commercial software like GCG, leading to the creation of the EGCG extensions by the EMBnet community in 1988, which served over 10,000 users at 150 sites.^[3] In 1996, following GCG's decision to withhold source code, development of EMBOSS began under Peter Rice, Alan Bleasby, and Thure Etzold at the European Bioinformatics Institute, aiming to provide a free alternative that replaces EGCG while adding new capabilities and fostering open-source collaboration.^[3] The first release, version 1.0.0, occurred on July 15, 2000, with subsequent annual updates funded by the UK Biotechnology and Biological Sciences Research Council (BBSRC), culminating in version 6.6.0 in 2013.^[3]^[4] EMBOSS holds significant importance in bioinformatics as a community-driven project that counters the trend toward proprietary software, offering robust, production-ready tools without licensing restrictions under the GNU General Public License.^[1] Its EMBASSY packages extend functionality by incorporating third-party applications, such as those for remote database access and advanced web services, enhancing its utility in research pipelines.^[5] Widely adopted in academic and institutional settings, EMBOSS supports the EMBnet network's mission to democratize access to high-quality sequence analysis, with tens of thousands of downloads and ongoing contributions from developers worldwide. As of 2025, the last major release was version 6.6.0 in 2013, and it remains available in major open-source distributions.^[1]^[3]^[6]

Introduction

Overview and Purpose

EMBOSS, the European Molecular Biology Open Software Suite, is a free, open-source software package designed for molecular biology and bioinformatics analysis.^[1] It offers a comprehensive collection of command-line tools that enable users to perform a wide range of sequence manipulation and analysis tasks efficiently.^[7] The primary purpose of EMBOSS is to deliver robust, accessible tools for sequence analysis and related bioinformatics workflows, specifically tailored to the needs of the EMBnet (European Molecular Biology network) user community.^[1] This focus ensures that the suite addresses practical requirements in molecular biology research, promoting open-source principles to foster collaboration and accessibility across academic and research environments.^[8] Targeted at molecular biologists, bioinformaticians, and researchers who require straightforward, command-line-based solutions for everyday sequence handling, EMBOSS prioritizes usability without demanding advanced programming skills.^[1] Key benefits include seamless support for standard file formats like FASTA and GenBank, efficient batch processing capabilities with no imposed size restrictions, and a uniform interface across applications that simplifies learning and operation for non-programmers.^[1] These features make EMBOSS a reliable foundation for routine bioinformatics tasks, enhancing productivity in sequence-centric studies.^[7]

History and Development

The origins of EMBOSS trace back to the early 1980s, when the commercial Genetics Computer Group (GCG) Wisconsin Package dominated molecular biology software but imposed high costs and licensing restrictions that limited access for academic and research communities, particularly within the European Molecular Biology Laboratory (EMBL) network (EMBnet).^[3] In response, EMBnet members began developing free alternatives, starting with extensions to GCG known as EGCG (Extended GCG), which emerged as a collaborative effort by 1988 to provide enhanced sequence analysis tools without proprietary dependencies.^[3] However, changes to GCG's source code licensing in 1996 halted further EGCG development, prompting the need for a fully independent open-source suite.^[9] EMBOSS was formally initiated in 1996 by Peter Rice and Alan Bleasby at EMBL, with early contributions from Thure Etzold, aiming to create a comprehensive, freely available package tailored to EMBnet's needs for sequence analysis and beyond.^[3] The project gained momentum through EMBnet workshops, starting with the first in September 1998 at Hinxton, where 30 participants collaborated on its design.^[10] By 1998, EMBOSS merged with the UK Biotechnology and Biological Sciences Research Council (BBSRC)-funded SEQNET project and was hosted at the MRC Rosalind Franklin Centre for Genomics Research (RFCGR). Initial development involved a core team including Rice, Bleasby, and later Jon Ison, all based at the European Bioinformatics Institute (EBI). The first public version was released around 2000, marking EMBOSS as a mature open-source alternative.^[9] Funding played a crucial role in EMBOSS's growth, beginning with a Wellcome Trust grant from 1997 to 2000 that supported initial tool development and integration.^[9] This was followed by joint funding from BBSRC and the Medical Research Council (MRC) from 2001 to 2004, enabling expansion to over 100 applications. The closure of RFCGR in 2004 threatened the project, but new BBSRC funding facilitated its relocation to EMBL-EBI in 2005. Subsequent BBSRC grants, including BB/D018358/1 (2006-2009) and BBR/G02264X/1 (May 2009 onward), sustained development through 2011 and beyond, focusing on maintenance and community contributions.^[11]^[9] Over time, EMBOSS evolved from basic sequence utilities into a suite of over 200 integrated applications, incorporating third-party tools and adapting to open-source standards while addressing limitations of earlier packages like EGCG.^[3] The latest stable release, version 6.6.0 in July 2013, included enhancements such as XML data handling and improved efficiency for large datasets, ensuring compatibility with modern Unix-based computing environments.^[4] Ongoing updates via community patches have maintained its relevance, though major releases have slowed as the suite stabilized.^[12]

Features

Core Capabilities

EMBOSS provides robust support for diverse input and output formats essential for bioinformatics workflows, including FASTA, EMBL, GenBank, Swiss-Prot, and PDB, among others such as GCG, Clustal, and Phylip.^[13] This enables seamless handling of sequence data from various sources without manual conversion, as EMBOSS automatically detects input formats by examining file content and structure.^[14] For output, users can specify formats explicitly or rely on defaults, ensuring compatibility with downstream analyses and databases.^[13] The suite excels in batch processing and automation, allowing users to handle large datasets through command-line scripting on Unix-like systems.^[15] Tools can be invoked in loops or pipelines via shell scripts, supporting parallel execution where multiple instances run concurrently on multi-core systems or clusters, which is particularly useful for high-throughput sequence analysis. This design facilitates automation in workflows, such as processing entire genomic datasets or integrating with job schedulers for distributed computing.^[15] Core data manipulation capabilities include functions for sequence editing, such as reformatting, trimming, and reverse complementation via tools like seqret; restriction site analysis to identify and map enzyme cut sites using databases like REBASE^[16]; and phylogenetic tree construction from aligned sequences employing methods like neighbor-joining.^[17] These operations support precise editing of nucleotide or protein sequences and enable the extraction of biologically relevant features, such as open reading frames or motifs.^[18] Application groups in EMBOSS organize these tools thematically, such as nucleic or protein handling, to streamline access.^[19] EMBOSS features a modular design that enhances extensibility, permitting users to develop and integrate custom applications through AJAX Command Definition (ACD) files, which define parameters, qualifiers, and validation rules for new programs.^[20] This allows seamless addition of specialized tools while maintaining consistency with the suite's architecture, as developers can leverage existing libraries for I/O and processing. Performance is optimized for computationally intensive tasks, with efficient algorithms for sequence alignment—such as global (needle) and local (water) methods—and motif searching against databases like PROSITE, enabling rapid scans of large query sets.^[21] Built-in statistical methods, including E-values and significance scores, validate results by assessing match reliability against background models, ensuring robust interpretation without external software. These optimizations, combined with dynamic memory allocation, support analysis of sequences without predefined length limits, scaling effectively to available system resources.^[14] These features are based on EMBOSS version 6.6.0 (2012), the latest release as of 2025.^[12]

User Interfaces

The primary user interface for EMBOSS is its command-line interface (CLI), which provides a consistent syntax across all applications through the Ajax Command Definition (ACD) language.^[22] This allows users to specify inputs using standardized qualifiers, such as -sequence for providing sequence data, enabling straightforward execution from the terminal with dynamically calculated defaults based on the input provided.^[23] For graphical interaction, EMBOSS supports Jemboss, a Java-based graphical user interface that facilitates visual workflow design and parameter adjustment.^[24] Jemboss enables users to build and manage analysis pipelines interactively, including batch processing, job queuing with systems like NQS or OpenPBS, and editing of alignments or sequences through dedicated tools, all while parsing ACD files for seamless integration with EMBOSS applications.^[23] Web-based access is available through integrations like EMBOSS Explorer, which offers a browser-based graphical interface for executing EMBOSS tools without requiring local installation or configuration.^[25] This interface simplifies accessibility by handling tool dependencies and providing a demo environment for immediate use, supported by organizations such as the National Research Council of Canada.^[25] Scripting support in EMBOSS leverages the AJAX (ACD) protocol to embed tools within languages like Perl and Python via dedicated modules, allowing programmatic control and automation of analyses.^[22] For example, BioPerl's Bio::Factory::EMBOSS and Bio::Tools::Run::EMBOSSApplication modules enable running EMBOSS programs from Perl scripts by constructing command lines and capturing outputs.^[26] Similar functionality is available in Python through command-line wrappers, though the dedicated Bio.Emboss.Applications module in Biopython is now obsolete.^[27] Simple wrapper scripts can thus automate repetitive tasks, such as processing multiple sequences with tools like needle for pairwise alignment. Customization options enhance usability, permitting users to create personal menus or shell aliases for frequent commands and set environment variables like EMBOSS_DATA to specify paths for data files and resources.^[23] These features, documented in the EMBOSS user's guide, allow tailoring the interface to specific workflows without altering core application code.^[28]

Applications

Application Groups

EMBOSS applications are organized into over 20 logical groups based on their functions, enabling users to navigate and select tools for specific bioinformatics tasks such as sequence analysis and phylogenetic studies.^[29] This structure promotes efficient discovery and supports the construction of analytical workflows by grouping related functionalities.^[30] The primary categories include Nucleic, which encompasses subgroups for tasks like sequence alignment, restriction enzyme mapping, codon usage analysis, gene finding through ORF detection and promoter prediction, motif searching, repeat identification, and primer design.^[29] Similarly, the Protein category addresses protein-specific analyses, including motif searching, secondary and tertiary structure prediction, composition evaluation, profile generation, and mutation simulation.^[29] Phylogeny groups concentrate on evolutionary analyses, covering tree building, distance matrix calculations, consensus tree methods, and handling of continuous or discrete character data.^[29] Utilities form another key category, providing essential functions for file conversion, sequence merging, database creation, indexing, and general data manipulation.^[29] In addition to these core groups, EMBASSY packages serve as third-party extensions that integrate seamlessly with EMBOSS, offering specialized applications for areas like hidden Markov model analysis via HMMER wrappers and protein domain classification.^[31] Examples of EMBASSY packages include DOMAINATRIX for domain research, PHYLIPNEW for phylogenetic tools, and others focused on structure prediction and sequence editing, collectively expanding the suite's capabilities.^[32] The overall EMBOSS distribution includes over 200 core applications across these groups, with EMBASSY adding numerous further tools through its modular packages.^[7] To aid navigation, the seealso tool provides cross-references to related applications within and across groups, enhancing workflow integration.^[33]

Notable Tools

EMBOSS includes several widely used tools for sequence analysis, each designed for specific bioinformatics tasks such as alignment, translation, and visualization. These tools leverage established algorithms and databases to provide reliable results for researchers working with nucleotide and protein sequences.^[1] Needle performs global pairwise sequence alignment using the Needleman-Wunsch algorithm, which computes the optimal alignment over the entire length of two sequences by dynamic programming in O(mn) time complexity, where m and n are the sequence lengths. It supports both nucleotide and protein sequences, defaulting to the EDNAFULL matrix for DNA and EBLOSUM62 for proteins, and allows customization of gap penalties, including a default gap opening penalty of 10.0 and extension penalty of 0.5. Needle is commonly applied in genome alignment tasks to identify overall similarities between full-length sequences, such as comparing homologous genes across species.^[34] In contrast, Water implements the Smith-Waterman algorithm for local pairwise alignments, identifying the highest-scoring regions of similarity between sequences through a modified dynamic programming approach optimized for speed, also running in O(mn) time. Like Needle, it uses default matrices (EDNAFULL for nucleotides, EBLOSUM62 for proteins) and similar gap penalty settings (opening 10.0, extension 0.5), but focuses on subsequences rather than full sequences. This tool is particularly useful for detecting conserved domains or motifs within larger sequences, such as finding similar exons in genomic data.^[35] Sixpack translates DNA sequences into proteins across all six reading frames—three forward and three reverse—while highlighting open reading frames (ORFs) longer than a user-specified minimum length, defaulting to 1 amino acid, to aid in gene prediction. It employs a selectable genetic code (e.g., standard or vertebrate mitochondrial) and outputs formatted displays with numbering and optional ORF extraction in FASTA format. Researchers use Sixpack for initial screening of unannotated genomic regions to locate potential coding sequences.^[36] The Restrict tool scans nucleotide sequences for cleavage sites of specified restriction enzymes from the REBASE database, reporting positions in a tabular format and optionally generating fragment length lists or maps for cloning experiments. It filters sites by criteria like minimum recognition length (default 4 bases), number of cuts (default 1 to 2,000,000,000), and enzyme types (e.g., blunt or sticky ends), supporting ambiguities and methylation patterns. This is essential for molecular cloning workflows, such as designing restriction digests for vector insertion.^[37] Transeq translates nucleotide sequences to proteins in one or more of the six frames, using a chosen genetic code and options to trim stop codons or clean terminal asterisks, producing outputs labeled by frame for easy identification. Complementing this, Backtranseq reverses the process by back-translating a protein sequence to the most likely nucleotide sequence based on codon usage tables (default human, customizable), facilitating codon optimization for expression studies. Together, these tools support workflows in gene synthesis and protein engineering, such as optimizing codons for heterologous expression systems.^[38]^[39] Dotmatcher creates thresholded dot plots to visualize sequence similarities, comparing all pairwise positions with a scoring matrix (EBLOSUM62 or EDNAFULL) over a sliding window (default size 10), plotting dots where scores exceed a threshold (default 23) to reveal diagonals indicating alignments, repeats, or insertions/deletions. It outputs graphics in formats like PNG or PostScript, aiding quick visual assessment of structural features in sequences.^[40] For protein structure prediction, Pepwheel generates helical wheel diagrams projecting residues onto a circle viewed along the helix axis, using symbols like squares for hydrophobic residues to highlight amphipathicity, with defaults for alpha helices (18 steps per 5 turns). This visualization helps predict transmembrane helices or interaction interfaces in proteins.^[41]

Architecture and Implementation

Programming Libraries

The core of EMBOSS's programming infrastructure is the AJAX library, a comprehensive C library that provides foundational functions for input/output operations, sequence handling, and graphics rendering in bioinformatics applications.^[42] The library supports file I/O through modules like ajfile and ajfileio, which manage buffering, file lists, and data streams essential for reading and writing biological data formats.^[42] For sequence handling, AJAX includes the ajseq module, which defines datatypes and functions for manipulating biological sequences, such as ajSeqRead for parsing input sequences from files or databases in various formats like FASTA or EMBL.^[42] Graphics capabilities are handled via the ajgraph module, which interfaces with the PLplot library to generate plots, such as sequence alignments or phylogenetic trees, ensuring consistent visualization across EMBOSS tools.^[42] Complementing AJAX is the ACD (Ajax Command Definitions) system, which standardizes the definition of application parameters through declarative files written in a simple, XML-like syntax.^[43] These ACD files describe inputs (e.g., sequences, files), outputs, and qualifiers with attributes like defaults, ranges, and prompts, enabling uniform command-line interface (CLI) parsing across all EMBOSS programs.^[43] For instance, a parameter for a sequence input might be defined as sequence: inputseq [standard: "Y"], allowing the system to validate and process CLI arguments like -sequence file.[fasta](/page/FASTA) while handling missing values interactively.^[43] This abstraction layer simplifies development by decoupling parameter logic from core application code, promoting reusability and consistency in user interfaces.^[43] EMBOSS also incorporates utility modules within the NUCLEUS sub-library of AJAX, offering specialized support for common operations in bioinformatics programming. These libraries, as implemented in EMBOSS version 6.6.0 (released July 15, 2013), remain the foundation, with no major architectural changes as of 2025.^[44]^[4] Mathematical utilities, such as those in the embmat module, provide datatypes and functions for matrix operations critical to sequence alignments, including substitution matrices like BLOSUM or PAM for scoring pairwise or multiple alignments.^[44] File management is facilitated by modules like embread for reading configuration data files and embdata for accessing embedded resources, ensuring efficient handling of auxiliary data without external dependencies.^[44] Error handling is streamlined through embexit, which offers standardized exit functions to report failures, log diagnostics, and clean up resources gracefully during application runtime.^[44] The development workflow for creating custom EMBOSS tools leverages these libraries through a structured process centered on compilation and testing. Developers write C source files that include emboss.h for AJAX access, initialize the environment with embInit(), and retrieve parameters via ACD-integrated functions like ajAcdGetSeq. To compile, tools are added to the Makefile.am (e.g., under bin_PROGRAMS), with corresponding ACD files placed in the acd/ directory, followed by running make or ajMake to build executables.^[45] Testing involves validating ACD files with the acdc utility and integrating into EMBOSS's quality assurance (QA) regression suites, which automate checks for output correctness and compliance with expected behaviors.^[45] A representative example of using AJAX APIs for a simple sequence reader is shown below, where the program reads a sequence via ACD and processes it minimally before exiting:

c
#include "emboss.h"

int main(int argc, char **argv) {
    AjPSeq seq = NULL;
    embInit("seqreader", argc, argv);  /* Initialize EMBOSS environment */
    seq = ajAcdGetSeq("sequence");     /* Read sequence using ACD parameter */
    /* Process sequence (e.g., ajSeqPrint(seq); */
    ajSeqDel(&seq);                    /* Clean up */
    ajExit();
    return 0;
}
#include "emboss.h"

int main(int argc, char **argv) {
    AjPSeq seq = NULL;
    embInit("seqreader", argc, argv);  /* Initialize EMBOSS environment */
    seq = ajAcdGetSeq("sequence");     /* Read sequence using ACD parameter */
    /* Process sequence (e.g., ajSeqPrint(seq); */
    ajSeqDel(&seq);                    /* Clean up */
    ajExit();
    return 0;
}

This snippet demonstrates the integration of ACD for input and AJAX for sequence management, with the corresponding ACD file defining the sequence parameter.^[45]

Integration with Other Software

EMBOSS facilitates integration with external bioinformatics tools through its EMBASSY framework, which consists of packages that wrap third-party applications to provide a unified interface consistent with native EMBOSS programs. These wrappers allow users to access advanced functionalities from external suites without leaving the EMBOSS environment, ensuring seamless command-line operation and standardized input/output handling. For instance, the EMBASSY Clustal Omega package (eomega) wraps the Clustal Omega multiple sequence alignment tool, enabling progressive alignment of protein or nucleotide sequences using seeded guide trees and HMM profile-profile techniques directly via EMBOSS syntax.^[46] Pipeline integration in EMBOSS emphasizes modular workflows where tools can be chained via Unix-style piping to process sequences in sequence. This allows outputs from one application to serve as inputs for another, promoting efficient, scriptable analyses. A common example involves using seqret to reformat input sequences (e.g., converting FASTA to EMBL format) and then piping the result directly to needle for pairwise global alignment using the Needleman-Wunsch algorithm. Such piping supports both simple linear workflows and more complex scripts, enhancing reproducibility in high-throughput settings. Additionally, graphical tools like G-Pipe enable the definition and parameterization of pipelines using XML-stored protocols, integrating EMBOSS applications with web interfaces for broader workflow management.^[47] EMBOSS demonstrates strong compatibility with prominent bioinformatics ecosystems, allowing it to be embedded in diverse computational pipelines. Through the BioPerl library, EMBOSS applications can be invoked programmatically in Perl scripts, leveraging Bio::Factory::EMBOSS to execute tools like alignment or motif finding while handling sequence objects and parsing outputs in BioPerl formats.^[26] In Galaxy, numerous EMBOSS tools are wrapped as native modules, enabling their use within interactive workflows for tasks such as sequence alignment (e.g., needleall) or pattern searching (e.g., fuzznuc), with automatic provenance tracking and visualization.^[48] For high-throughput platforms like Snakemake, EMBOSS's command-line nature permits easy incorporation into rule-based workflows, where rules can call EMBOSS executables for scalable, dependency-managed analyses on cluster environments.^[49] Database access in EMBOSS is designed for flexibility, supporting both local installations and remote querying to streamline data retrieval in integrated workflows. Built-in modes include single-entry access by ID, query-based retrieval for multiple entries (e.g., via accession numbers or keywords), and full database streaming, configurable through the EMBOSS data resource catalogue. Remote access is achieved via URL methods, where databases are defined with web endpoints (e.g., SRSWWW or direct HTTP queries), allowing tools to fetch sequences from servers like those at EMBL-EBI without local indexing. This supports integration with external resources, such as querying NCBI or UniProt via constructed URLs, ensuring compatibility with distributed computing setups.^[50] Extensions to EMBOSS are enabled through user-contributed EMBASSY applications, which expand the suite for specialized analyses while maintaining the core interface. The EMBASSY framework provides a template (MYEMBOSS) for developers to create custom wrappers, facilitating community contributions for niche tools. A prominent example is the EMBASSY PHYLIP package (phylipnew), which adapts Joe Felsenstein's PHYLIP phylogeny inference programs—such as dnadist for distance calculation and neighbor for tree building—into EMBOSS-compatible executables, supporting phylogenetic workflows with EMBOSS sequence handling. These extensions are distributed alongside core EMBOSS, allowing users to install and invoke them identically to native tools.^[51]

Installation and Usage

System Requirements

EMBOSS primarily supports Unix-like operating systems, including various Linux distributions such as Red Hat, SuSE, Debian, Solaris, and Tru64 Unix, as well as macOS.^[28] It also runs on Windows through Cygwin, which emulates a Unix environment, or via pre-compiled binaries.^[52] Hardware requirements for EMBOSS are minimal, accommodating any modern mid-range PC.^[53] Disk space needs range from 100 MB to 200 MB for the core installation using shared libraries, potentially tripling with static executables, plus additional space for databases.^[12] Key software dependencies include a C compiler like GCC and standard C libraries for compilation and execution.^[54] Optional components for graphical features, such as output in PNG format, require X11 development libraries or libgd (version 2.0.28 or later).^[54] Certain applications may need third-party tools in the system PATH, such as ClustalW for multiple sequence alignment or Primer3 for primer design.^[55] EMBOSS relies on a dedicated data directory, EMBOSS_DATA, containing essential files like enzyme tables, sequence motifs, and substitution matrices (e.g., BLOSUM62), totaling around 100 MB in size.^[56] These files are installed by default in locations like /usr/local/share/EMBOSS/data and are accessed via the embossdata utility.^[28] As of November 2025, the latest stable version is 6.6.0 (released July 15, 2013, with subsequent patches in distributions).^[57]

Basic Usage Examples

To use EMBOSS effectively, the environment must first be configured after installation, typically by adding the binary directory to the system PATH variable. For a standard installation in /usr/local/emboss, users can set this in a shell like bash by executing export PATH=/usr/local/emboss/bin:$PATH.^[58] Similarly, for csh or tcsh, the command is set path = (/usr/local/emboss/bin $path); rehash.^[58] The EMBOSS data directory, which contains essential files for many tools, is often automatically set during installation but can be specified via the EMBOSS_DATA environment variable if needed, such as export EMBOSS_DATA=/usr/local/emboss/share/EMBOSS/data.^[59] To verify the setup, run embossversion, which outputs the package version (e.g., EMBOSS 6.6.0) to confirm accessibility and correct installation.^[60] A simple introductory task is sequence format conversion using the seqret tool, which reads and writes sequences in various formats. For example, to convert a FASTA file to EMBL format, execute seqret myfile.fasta embl::output.embl, where the input is specified positionally and the output uses the USA (Uniform Sequence Address) notation with the desired format prefix.^[61] Alternatively, using qualifiers for clarity: seqret -sequence myfile.fasta -outseq output.embl -osformat embl.^[62] This command processes the input file and generates the reformatted output without further prompts if all parameters are provided.^[62] For pairwise sequence alignment, the needle tool implements the Needleman-Wunsch algorithm with user-defined gap penalties. A basic example aligns two FASTA files: needle -asequence seq1.fa -bsequence seq2.fa -gapopen 10.0 -gapextend 0.5 -outfile align.needle.^[63] Here, -gapopen sets the penalty for initiating a gap (default 10.0), and -gapextend adjusts the extension penalty (default 0.5); the output is a standard alignment file viewable with tools like showalign.^[63] If qualifiers are omitted, needle prompts interactively for inputs.^[62] EMBOSS supports batch processing through shell scripting or list files, enabling operations on multiple inputs efficiently. For instance, to apply the restrict tool (for restriction site analysis) to all FASTA files in a directory, use a bash loop: for [file](/page/File) in *.fa; do [restrict](/page/Restrict) $file -outfile ${file%.fa}.restrict; done.^[64] Alternatively, for seqret on a list of files, create a file input.list with sequence USAs or paths (one per line) and run seqret @input.list -outseq batch_output.embl -osformat embl.^[64] This processes all entries non-interactively, appending results to the output file. Common errors in EMBOSS usage often stem from path misconfigurations or missing resources, such as "Command not found" for tools, resolved by verifying and resetting the PATH as described earlier.^[65] Another frequent issue is failure to locate data files (e.g., scoring matrices), indicated by errors like "Data file not found," which can be fixed by confirming or setting EMBOSS_DATA to the correct directory and checking available files with embossdata -showall.^[66] Users should also ensure input files exist and match expected formats to avoid parsing errors.^[64]

Community and Licensing

Development Team and Contributions

The core development team for EMBOSS consists of Peter Rice as the lead developer (Oryza Bioinformatics Ltd), alongside Alan Bleasby (European Bioinformatics Institute, Hinxton, UK) and Jon Ison (Odin Informatics Limited), who were previously all based at the European Bioinformatics Institute (EBI).^[67]^[68] Historical contributors include members from the EMBnet network, such as Thure Etzold, who collaborated on the project's inception in 1996.^[3]^[10] EMBOSS is hosted by the Open Bioinformatics Foundation (OBF), a non-profit organization dedicated to open-source bioinformatics software.^[69] The project has received institutional support through collaborations and funding from the Biotechnology and Biological Sciences Research Council (BBSRC) and the Medical Research Council (MRC), enabling its development and maintenance over the years.^[67]^[9] Contributions to EMBOSS are welcomed via the project's SourceForge repository, where users can submit patches, new applications, or enhancements in response to feature requests and bug reports.^[68]^[70] All submissions must adhere to the coding standards defined in the AJAX library, which provides the foundational functions, data structures, and algorithms for EMBOSS applications, ensuring consistency and portability.^[42]^[71] Community engagement has historically occurred through dedicated mailing lists, including [email protected] for general user discussions and announcements, [email protected] for developer coordination, and [email protected] for release notifications; however, activity on these lists has been low since around 2013.^[72] Bug tracking and support requests are handled via the SourceForge tracker, though recent issues are limited.^[73] The team organized hands-on workshops and courses on bioinformatics software development using EMBOSS, held periodically up to around 2007 to foster skill-building and project involvement.^[74]^[75] As of November 2025, EMBOSS maintenance emphasizes stability, with efforts from distribution communities (such as Debian and Bioconda) focusing on bug fixes and ensuring compatibility with contemporary operating systems through updated packages and minor patches, despite the last major release (version 6.6.0) occurring in 2013.^[57]^[6]^[76]

Licensing and Distribution

EMBOSS is released under the GNU General Public License (GPL) version 2 or later for its applications, which permits free redistribution, modification, and use, provided that any distributed derivatives also adhere to the GPL terms.^[77] The core libraries, including AJAX and NUCLEUS, are licensed under the GNU Lesser General Public License (LGPL) version 2 or later, allowing integration into both open-source and proprietary software while requiring that modifications to the libraries themselves be made available under LGPL.^[77] These licenses ensure that users can freely modify the source code and create derivative works, such as the EMBASSY packages, but mandate sharing any improvements or modifications under the respective licenses if redistributed.^[14] The software is distributed through multiple channels to facilitate accessibility. Source code and binaries are available via SourceForge, the primary hosting platform for EMBOSS.^[12] Official tarballs for stable releases can be downloaded from the EMBOSS FTP server at ftp://emboss.open-bio.org/pub/EMBOSS/, including versions for various platforms and patches for bug fixes.^[78] Pre-compiled Debian packages are provided for Debian and Ubuntu distributions on architectures like Intel x86, AMD64, Alpha, and ARM, enabling straightforward installation via package managers.^[12]^[79] For publications utilizing EMBOSS tools, it is recommended to cite the foundational paper by Rice, Longden, and Bleasby (2000), which introduces the suite, or the EMBOSS User's Guide by Rice, Bleasby, and Ison.^[80] Commercial use of EMBOSS is permitted under the GPL and LGPL, allowing incorporation into for-profit applications or services, though the licenses disclaim any warranty and require compliance with source code distribution obligations for modified versions.

References

[1]
About EMBOSS
EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (eg EMBnet) user community.
[2]
The Applications (programs) - Emboss
The programs are listed in alphabetical order, Look at the individual applications or go to the GROUPS page to search by category. ... Report the current EMBOSS ...
[3]
1.1. History - emboss
The origins of EMBOSS go back to the early 1980s when the prevailing general-purpose molecular biology software was a commercial package called GCG.
[4]
EMBOSS Homepage
A high-quality package of free, Open Source software for molecular biology. More > Applications Hundreds of useful, well documented applications.About · EMBOSS Applications · EMBOSS Downloads · EMBOSS Servers
[5]
GEMBASSY: an EMBOSS associated software package for ...
Aug 29, 2013 · The popular European Molecular Biology Open Software Suite (EMBOSS) currently contains over 400 tools used in various bioinformatics ...
[6]
1. Introduction to EMBOSS
The European Molecular Biology Open Software Suite (EMBOSS) is a high quality, well documented package of open source software tools for molecular biology.
[7]
EMBOSS: the European Molecular Biology Open Software Suite
EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000 Jun;16(6):276-7. doi: 10.1016/s0168-9525(00)02024-2.
[8]
A brighter future for Europe's favourite molecular biology software ...
Apr 25, 2006 · EMBOSS, an open source suite of tools for the analysis of biological data, has its origins in the late 1980s when Peter Rice, a co-founder of ...
[9]
Colloquium 1 - Emboss
EMBOSS: History. In the beginning there were the. "GCGEMBL utilities". Then ... EMBnet provides regular bioinformatics training courses. EMBnet has ...
[10]
EMBOSS: Change Log
Summary of each segment:
[11]
EMBOSS Downloads
The EMBOSS package is available from our new FTP server ftp://emboss.open-bio.org/pub/EMBOSS/ in the file EMBOSS-6.5.7.tar.gz.Missing: 2025 | Show results with:2025
[12]
Sequence Formats - Emboss
This unofficial PIR format is what EMBOSS supports. If there is enough interest, we can also use NBRF database format with separate files for sequence (the main ...
[13]
1.3. Key Features - emboss
EMBOSS supports most common data formats: All sequence and many alignment and structural formats are handled. Many other data formats are handled automatically.Missing: batch extensibility
[14]
2. Administration of EMBOSS
EMBOSS was conceived as an open platform for bioinformatics applications and ... This makes the tools particularly suitable for scripting and for web interfaces.
[15]
Emboss opens up sequence analysis - ResearchGate
Aug 6, 2025 · Here, we adopted the EMBOSS epestfind program (using default parameters) [35] to count the number of PEST motifs in a protein, and only '' ...
[16]
EMBOSS: restrict manual - Bioinformatics
restrict scans one or more nucleotide sequences for cut sites for a supplied set of restriction enzymes. One or more restriction enzymes can be specified.Missing: manipulation tools
[17]
5.3. Introduction to Feature Formats - emboss
To handle the diversity, EMBOSS, where possible, uses the well defined and flexible feature formats that were developed for the major sequence databases: EMBL, ...Missing: bioinformatics batch processing extensibility
[18]
EMBOSS at CSC - extras
Nucleic restriction, Restriction enzyme sites in nucleotide sequences. Nucleic RNA folding, RNA folding methods and analysis. Nucleic transcription ...
[19]
4.1. Introduction to ACD File Development
Every EMBOSS and EMBASSY program has an ACD (AJAX Command Definition) file which describes the application, its options (parameters) and command line ...Missing: extensibility | Show results with:extensibility
[20]
Protein motif searches - EMBOSS
Search protein sequences with a sequence motif. patmatmotifs, Scan a protein sequence with motifs from the PROSITE database. preg, Regular expression search of ...Missing: performance alignment
[21]
3.1. Application Documentation
The documentation includes usage examples, which give the command line session and example input and output files for typical uses of the application.Missing: bioinformatics Explorer protocol
[22]
EMBOSS Interfaces
Jemboss can run the EMBOSS applications interactively or as a batch process. The progress of the batch processes are monitored by a job manager. Jemboss can ...
[23]
Jemboss Home Page
### Summary of Jemboss Features
[24]
EMBOSS explorer
- **Summary**: EMBOSS Explorer is a web-based graphical user interface for the EMBOSS suite of bioinformatics tools. It enables browser-based execution of EMBOSS tools without requiring local installation, making it easy to install, configure, maintain, and use.
[25]
Beginners HOWTO - BioPerl
BioPerl is an open-source toolkit for biologists to write bioinformatics scripts, focusing on sequence analysis and chaining tasks, not the Bio::Perl module.Missing: BioPython | Show results with:BioPython
[26]
Bio.Emboss.Applications module — Biopython 1.75 documentation
The Bio.Emboss.Applications module provides code to interact with and run various EMBOSS programs, following AbstractCommandline interfaces.Missing: BioPerl | Show results with:BioPerl
[27]
EMBOSS Users Guide
EMBOSS Users Guide Practical Bioinformatics Mr. Peter Rice Group Leader EMBL European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus
[28]
EMBOSS Application Groups
About • Applications • GUIs • Servers • Downloads • Licence • User docs • Developer docs • Administrator docs • Get involved • Support • Meetings • News ...Missing: page | Show results with:page
[29]
B.1. Applications and Packages Documentation - emboss
Application Groups. All applications are organised into logical groups according to their function. For descriptions of the groups see Section B. 2, “ ...Missing: page | Show results with:page
[30]
B.3. EMBASSY Packages (release R6) - emboss
The CBSTOOLS package is a set of wrappers to selected applications from the CBS group in Denmark. Domainatrix, The DOMAINATRIX programs were developed by Jon ...<|control11|><|separator|>
[31]
EMBASSY Applications - Emboss
This is now split into 5 packages according to function. Application details and documentation for DOMAINATRIX. DOMAINATRIX is available as a Beta release 0.1.0 ...
[32]
EMBOSS: seealso manual - Bioinformatics
The groups that EMBOSS applications belong to have up to two levels, for example the primary group 'ALIGNMENT' has several sub-groups, or second-level groups, ...
[33]
needle - EMBOSS
Needle finds the alignment with the maximum possible score where the score of an alignment is equal to the sum of the matches taken from the scoring matrix, ...
[34]
water - EMBOSS
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki. Please help by correcting and ...
[35]
sixpack - EMBOSS
backtranseq, Back-translate a protein sequence to a nucleotide sequence ... pepwheel, Draw a helical wheel diagram for a protein sequence. plotorf, Plot ...
[36]
EMBOSS: restrict
### Restrict Tool Summary (EMBOSS)
[37]
transeq - EMBOSS
The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki. Please help by correcting and ...
[38]
backtranseq - EMBOSS
backtranseq reads a protein sequence and writes the nucleic acid sequence it is most likely to have come from. Algorithm. backtranseq uses a codon usage table ...
[39]
dotmatcher - EMBOSS
dotmatcher generates a dotplot from two input sequences. The dotplot is an intuitive graphical representation of the regions of similarity between two sequences ...
[40]
pepwheel - EMBOSS
pepwheel draws a helical wheel diagram for a protein sequence. This displays the sequence in a helical representation as if looking down the axis of the helix.
[41]
The AJAX Library Documentation - Emboss
The AJAX Library Documentation: CVS (developers) release. This documentation is automatically derived from the comments in the source file function headers.
[42]
ACD Syntax 3.0.0 - Emboss
Every EMBOSS program will be accompanied by a so-called ACD (Ajax Command Definitions) file, which describes the parameters that the program it refers to needs.Missing: extensibility | Show results with:extensibility
[43]
The NUCLEUS Library Documentation - Emboss
This documentation will be added as part of the general AJAX make-over. LIBRARY, DATATYPES, FUNCTIONS, NOTES. Include file for all applications, Datatypes ...
[44]
Guide to writing EMBOSS applications - SourceForge
There are 5 main levels in the EMBOSS hierarchy: AJAX: This directory contains all low-level library functions. For example, sequence reading and writing, file ...
[45]
EMBASSY: CLUSTALOMEGA: eomega - Emboss
In its current form Clustal-Omega can only align protein sequences but not DNA/RNA sequences. It is envisioned that DNA/RNA will become available in a future ...Missing: BLAST | Show results with:BLAST
[46]
[PDF] Tutorial-1.71.pdf - Biopython
Apr 3, 2018 · – Standalone Blast from NCBI. – Clustalw alignment program. – EMBOSS command line tools. • ... EMBASSY packages - third party tools with an EMBOSS.
[47]
7.5. Workflow Interfaces - emboss
G-Pipe is a graphical pipeline generator that allows the definition of pipelines and parameterisation of its component methods using Pise web interfaces.
[48]
EMBOSS: needleall - Galaxy
This tool uses the Needleman-Wunsch global alignment algorithm to find the optimum alignment (including gaps) of two sequences when considering their entire ...
[49]
Basics: An example workflow | Snakemake 9.13.7 documentation
A Snakemake workflow is defined by specifying rules in a Snakefile. Rules decompose the workflow into small steps (for example, the application of a single ...Step 1: Mapping Reads · Step 4: Indexing Read... · Step 6: Using Custom ScriptsMissing: EMBOSS | Show results with:EMBOSS
[50]
Database access modes - Emboss
EMBOSS offers three modes for accessing databases: Single: EMBOSS retrieves a single sequence indexed by ID. Query: EMBOSS retrieves a set of sequences ...
[51]
PHYLIPNEW Application - Emboss
The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.61 (August 2004).
[52]
Installing under Cygwin - Emboss
1) install cygwin (http://www.cygwin.com). That's surely possible to compile without it, but it's making everything easier.
[53]
2.2. Hardware Requirements - emboss
2.2. Hardware Requirements. EMBOSS is used on a very wide variety of computer hardware. Any modern mid-range PC with a typical hardware configuration should be ...
[54]
[PDF] Building EMBOSS - Assets - Cambridge University Press
It goes without saying that your system must have a C compiler installed before attempting to build the EMBOSS package. Many, if not most, operating systems ...
[55]
2.3. Software Requirements - emboss
Most users of EMBOSS however do not need the latest source code from CVS, and will use an EMBOSS "stable" release instead (Section 2.4, “Software Releases”). ...
[56]
Debian -- Details of package emboss-data in sid
Download emboss-data. Download for all available architectures. Architecture, Package Size, Installed Size, Files. all, 59,666.0 kB, 463,019.0 kB ...
[57]
2.7. Installation - emboss
Only very basic information on EMBOSS installation is included here. For complete installation instructions see the EMBOSS Administrators Guide.Missing: bioinformatics | Show results with:bioinformatics
[58]
1.6. Post-installation of EMBOSS
Jul 15, 2010 · The most important post-installation step is to set your operating system environment so that it knows where to find the EMBOSS applications.<|separator|>
[59]
EMBOSS: embossversion manual - Bioinformatics
The version number is in three parts, separated by '.'s. The first number is the major version number - this only changes when substantial changes have been ...
[60]
EMBOSS command syntax
Which parameters and qualifiers can appear on the command line, is defined in the Ajax Command Definition (ACD) file that is associated with the EMBOSS program.
[61]
User Documentation - EMBOSS
EMBOSS programs are run by typing them at the UNIX prompt, or by using an interface. Interfaces. There are many available interfaces. Jemboss is our supported ...Missing: bioinformatics Explorer AJAX protocol
[62]
EMBOSS: needle manual - Bioinformatics
It uses the Needleman-Wunsch alignment algorithm to find the optimum alignment (including gaps) of two sequences along their entire length.
[63]
[PDF] Introduction to Sequence Analysis using ЕМ BOSS - Emboss
EMBOSS breaks the historical trend towards commercial software packages. The EMBOSS suite : • P rovides a comprehensive set of sequence analysis programs (more ...
[64]
1.8. Troubleshooting EMBOSS Installations
A commonly reported error is that the operating system does not recognise any EMBOSS commands e.g. Command not found is reported when typing embossversion .Missing: verify | Show results with:verify
[65]
Miscellaneous data files - Emboss
Apparently inexplicable errors when running EMBOSS programs may be caused by the system not using the data files one expects. The search path can be displayed ...Missing: missing | Show results with:missing
[66]
1.2. EMBOSS Developers
The core development team consists of Peter Rice, Alan Bleasby and Jon Ison and is housed at the European Bioinformatics Institute (EBI, Hinxton, ...
[67]
Getting involved - EMBOSS
The EMBOSS Project is co-ordinated by Peter Rice, Alan Bleasby and Jon Ison at the European Bioinformatics Institute. Fortnightly coordination meetings are ...
[68]
EMBOSS Homepage
Software for molecular biology - high quality, open-source and free. More > Download Download a stable version, get the latest code and fixes.
[69]
1.5. Contributing Software to EMBOSS
A very valuable way to contribute is to write code in response to EMBOSS feature requests and bug reports posted by the EMBOSS users.Missing: guidelines | Show results with:guidelines
[70]
1.3. Developer Documentation - EMBOSS Homepage
1.3.2.1. AJAX Library Documentation. AJAX is the core library used by all EMBOSS applications. It covers standard data structures and algorithms:.
[71]
EMBOSS Support - SourceForge
Mailing Lists. There are three mailing lists for EMBOSS which users can join. These provide a forum for discussions about EMBOSS and its future development.
[72]
EMBOSS Suite / Bugs - SourceForge
EMBOSS Bug Reports Many bugs in this list are cross-posted from emboss-bug emails and so will not have the original submitter's name in the report.Missing: users@ lists. workshops
[73]
EMBOSS Meetings
We run regular hands-on courses in 'Bioinformatics Software Development using EMBOSS ... Colloquia and other workshops are held whenever sufficient numbers ...Missing: annual | Show results with:annual
[74]
Bioinformatics Software Development Course - Emboss
The course will give a good introduction to programming in EMBOSS. You will gain experience in all the steps in writing a basic sequence analysis application ...Missing: annual | Show results with:annual
[75]
EMBOSS News
EMBOSS News. EMBOSS Release 6.6.0. 15th July 2013. EMBOSS-6.6.0.tar.gz is now available. It can be downloaded from the directory: ...
[76]
Debian -- Details of package emboss in sid
EMBOSS is a free Open Source software analysis package specially developed for the needs of the molecular biology (e.g. EMBnet) user community. The software ...
[77]
EMBOSS License
The AJAX and NUCLEUS libraries are released under the GNU Library General Public Licence. EMBOSS applications are released under the GNU General Public Licence.Missing: bioinformatics | Show results with:bioinformatics
[78]
1.1. Downloading EMBOSS
The file you need for the EMBOSS base installation is EMBOSS-latest.tar.gz. This will always be a link to the most recent version of EMBOSS.Missing: bioinformatics | Show results with:bioinformatics
[79]
Debian -- Package Search Results -- emboss
You have searched for packages that names contain emboss in all suites, all sections, and all architectures. Found 10 matching packages.Missing: ubuntu | Show results with:ubuntu
[80]
4. How To Cite EMBOSS
Cite EMBOSS where appropriate. Use the EMBOSS Users Guide by Rice P., Bleasby A and Ison J. or the EMBOSS website: http://emboss.open-bio.org/