Project Jupyter
Project Jupyter is a non-profit, open-source initiative focused on developing software, open standards, and services for interactive computing across multiple programming languages.[1] Originating from the IPython project in 2014, it provides tools that enable reproducible and collaborative workflows in data science, scientific computing, and education.[1] The project is best known for the Jupyter Notebook, a web-based application that integrates live code execution, visualizations, mathematical equations, and explanatory text within a single document, supporting over 100 programming languages through community-developed kernels.[2] The roots of Project Jupyter trace back to 2001, when Fernando Pérez, then a graduate student at the University of Colorado Boulder, created IPython as an enhanced interactive shell for Python to facilitate exploratory programming during his thesis work.[3] In 2004, Brian Granger joined Pérez, and together they expanded IPython into a comprehensive architecture for interactive computing, incorporating features like notebook interfaces and multi-language support.[3] By 2014, as the project grew beyond Python to encompass dozens of languages, it was rebranded as Project Jupyter—named after Julia, Python, and R—to reflect its broadened scope and commitment to open scientific computing.[1] This evolution was driven by a core team including Pérez, Granger, and others, under a modified BSD license to ensure free global accessibility.[4] Key components of Project Jupyter form a modular ecosystem designed for flexibility and extensibility. The Jupyter Notebook serves as the foundational interface, allowing users to execute code in cells and interleave it with rich media for literate programming.[2] Building on this, JupyterLab emerged in 2018 as the next-generation interface, offering a customizable workspace with multiple documents, a file browser, and extensions for advanced workflows. Other core tools include JupyterHub, a multi-user server for sharing notebooks in educational or organizational settings, and Binder, a service that launches interactive environments from repositories without installation.[1] These components are developed collaboratively on GitHub, with kernels enabling language-specific execution and frontends providing diverse user interfaces.[1] Project Jupyter has profoundly influenced modern computational practices, becoming a standard tool in fields like machine learning, astronomy, and bioinformatics due to its emphasis on reproducibility and transparency.[4] Adopted by major institutions such as NASA, CERN, and numerous universities, it supports a global community of contributors governed by an Executive Council and Steering Committees, with ongoing recognition through programs like Distinguished Jupyter Contributors.[1] Funded by sponsors including NumFOCUS and tech companies like Google and AWS, the project continues to advance open science by promoting computational narratives that blend code, data, and prose.[1]History
Origins and IPython
IPython was founded in 2001 by Fernando Pérez, then a graduate student in physics at the University of Colorado Boulder, as an enhanced command-line shell for Python specifically tailored to the demands of interactive scientific computing.[5] Pérez developed IPython to meet his own research needs during his PhD thesis work, where the standard Python interactive shell proved inadequate for handling complex data analysis, debugging, and exploration in real time.[6] The project began as a personal set of extensions to Python's REPL, merging enhancements with open-source tools like the GNU Readline library for improved command-line editing and the Leo editor for advanced line execution. Early versions of IPython introduced key features that distinguished it from the vanilla Python shell, including tab completion for efficient code entry, comprehensive history management to recall and edit previous commands, and "magic" commands such as %timeit for precise timing of code snippets. These additions facilitated rapid prototyping and iteration, essential for scientific experimentation, by providing better introspection, object debugging, and integration with Python's ecosystem for numerical computing via libraries like NumPy. Over the next decade, IPython evolved into a robust system supporting parallel computing and distributed tasks, but its core remained focused on enhancing interactivity for Python users in research settings.[3] In 2011, with the release of IPython 0.12, the project introduced the IPython notebook, a web-based interface that integrated executable code cells with their outputs and rich markdown documentation in a single, shareable document.[7] This innovation directly tackled longstanding challenges in scientific workflows, such as the difficulty in reproducing analyses due to fragmented scripts and notes, by enabling literate programming where code, results, and narrative explanations coexist seamlessly.[3] The notebook format promoted transparency and collaboration, allowing researchers to document their processes in a format that preserved both computational and explanatory elements for verification and extension.[3] This foundation in IPython's interactive tools laid the groundwork for its later expansion into the multi-language Project Jupyter in 2014.[5]Formation and Evolution
Project Jupyter was officially launched in 2014 as a spin-off from the IPython project, led by Fernando Pérez, Brian Granger, and Min Ragan-Kelley at the University of California, Berkeley.[8] This initiative separated the notebook interface and kernel components from the core IPython codebase to enable broader applicability beyond Python, fostering a more modular architecture for interactive computing.[9] The project emerged from collaborative efforts among IPython developers to address the growing need for language-agnostic tools in data science and scientific workflows.[1] The name "Jupyter" derives from the core programming languages it initially targeted—Julia ("Ju"), Python ("Py"), and R ("R")—symbolizing its commitment to multi-language support from inception.[10] This rebranding and restructuring marked a pivotal shift toward open standards for interactive environments, distancing the project from its Python-centric origins while building on IPython's interactive shell foundations. Early organizational support came from NumFOCUS, which sponsored the project starting in 2014, alongside academic partnerships such as with California Polytechnic State University (Cal Poly) for collaborative development.[11][12] In 2015, coinciding with the release of IPython 4.0, Jupyter Notebook 4.0 was introduced as the first stable version under the new project umbrella, completing the separation and enabling seamless integration with diverse kernels.[13] This release facilitated the debut of the first non-Python kernels for Julia (via IJulia) and R (via IRKernel), allowing users to execute code in these languages within the same notebook interface and demonstrating the project's multi-language vision in practice.[14]Key Milestones
In 2017, Project Jupyter held its inaugural JupyterCon conference in New York City from August 22 to 25, attracting over 700 attendees for tutorials, talks, and keynotes that fostered community collaboration and established an annual tradition for sharing advancements in interactive computing.[15][16] The development of JupyterLab progressed significantly in this period, with an alpha version released in July 2016 as a preview of the next-generation interface, followed by the stable 1.0 release on February 20, 2018, which introduced a flexible, plugin-based environment to succeed the classic Jupyter Notebook.[17][18] Funding efforts advanced Jupyter's sustainability, including a 2015 grant from the Gordon and Betty Moore Foundation that supported ongoing development and community initiatives, alongside a 2020 award from the Chan Zuckerberg Initiative's Essential Open Source Software for Science program, which provided resources for enhancing collaboration features and long-term maintenance.[19][20] Recent innovations include the 2021 launch of JupyterLite, a browser-based distribution enabling interactive computing without server dependencies via WebAssembly, and the 2023 release of Jupyter AI on August 2, which integrated generative AI models into JupyterLab for code generation and data analysis tasks.[21][22][23] Milestone software updates featured the July 26, 2023, release of Jupyter Notebook 7.0, which aligned its architecture with JupyterLab for improved extensibility and real-time collaboration while maintaining backward compatibility.[24] Governance evolved under NumFOCUS sponsorship until October 2024, when Project Jupyter transitioned to LF Charities for enhanced fiscal and operational support, culminating in the November 2024 formation of the Jupyter Foundation and the completion of 2025 Executive Council elections to guide future leadership.[25][26][27][28]Architecture
Client-Server Model
Project Jupyter employs a client-server architecture that separates the user interface from the computational backend, enabling flexible and scalable interactive computing environments. The frontend, typically a browser-based web application such as JupyterLab or the classic Notebook interface, runs in the user's web browser and handles user interactions, rendering outputs, and managing the visual layout of documents. This frontend communicates with the backend server using RESTful APIs for operations like file management and session control, and WebSockets for real-time, bidirectional messaging to support interactive features like code execution feedback and live updates.[10][29] At the core of the backend is the Jupyter Server, a Python-based application built on the Tornado web server framework, which orchestrates the overall system. It manages kernel sessions by starting and terminating language-specific kernels as needed, handles file operations within the notebook directory, and allocates resources for concurrent computations. This server acts as an intermediary, translating web requests into appropriate kernel communications and ensuring isolation between different user sessions. By decoupling the UI from the execution environment, the architecture allows the frontend to remain lightweight while offloading intensive computations to the server, which can run on local machines, remote clusters, or cloud infrastructure.[10][30] The client-server model offers several key benefits, including support for remote execution where users can interact with powerful hardware without local installation of all dependencies, multi-user environments through session isolation (extendable via tools like JupyterHub), and the ability to swap or extend components independently—for instance, replacing the default frontend with custom interfaces while retaining the same server backend. Security is integrated into this design with token-based authentication enabled by default, where a randomly generated token authenticates API requests via headers, URL parameters, or login forms, preventing unauthorized access. Additionally, the server supports configurable Cross-Origin Resource Sharing (CORS) policies through settings that allow administrators to define allowed origins, methods, and headers, facilitating secure deployments in web-based or proxied setups.[10][31][32]Kernels and Language Support
Kernels in Project Jupyter are lightweight, separate processes that execute user code in a specified programming language and return the resulting outputs, such as standard output streams, images, errors, or other data representations.[33] These processes operate independently from the frontend interface, enabling interactive computing without embedding the execution environment directly into the client.[34] By design, kernels are pluggable components that adhere to Jupyter's language-agnostic architecture, allowing developers to create implementations for virtually any programming language as long as they conform to the Jupyter messaging protocol.[33] This design facilitates broad language support, with over 100 official and community-developed kernels available as of 2025, enabling execution in diverse environments from scientific computing to general-purpose scripting.[35] Prominent examples include the Python kernel provided by ipykernel (formerly part of IPython), which powers the majority of Jupyter usage in data science and machine learning; the IRkernel for R, supporting statistical analysis and visualization; and IJulia for Julia, optimized for high-performance numerical computations.[36] Additional kernels exist for languages like JavaScript, Scala, and even domain-specific ones such as MATLAB or SQL, demonstrating Jupyter's versatility across ecosystems.[33] The lifecycle of a kernel begins with its startup, typically initiated when a user selects it for a notebook or console session, establishing a connection via the Jupyter server.[37] Once active, the kernel receives code from individual cells or commands, executes them in its environment, and communicates results back; it supports interruption for long-running tasks—such as via a "Kernel > Interrupt" command in the interface—and graceful shutdown to release resources.[34] Throughout execution, kernels handle introspection features like code completion and variable inspection, enhancing interactivity.[37] Kernels support rich, multimedia outputs beyond plain text, leveraging MIME types to transmit diverse data formats for rendering in the frontend.[38] For instance, a matplotlib plot in Python can be returned as an image/png MIME type, allowing inline visualization, while error messages might use text/plain or application/json for structured display.[38] This mechanism ensures outputs like tables, figures, or even HTML widgets are portable and consistently rendered across compatible interfaces.[39] Installation and management of kernels are handled through the Jupyter command-line tools, primarily via thejupyter kernelspec module, which registers kernel specifications in standardized directories.[40] To add a Python kernel, users typically install the ipykernel package with pip install ipykernel and register it using python -m ipykernel install --user --name mykernel, specifying options for display names and execution paths.[40] For isolated environments, such as those created with Conda, one can activate the environment, install ipykernel, and register the kernel to enable version-specific execution without conflicts.[40] Commands like jupyter kernelspec list and jupyter kernelspec remove facilitate discovery and removal, ensuring clean management across multiple installations.[41]
Communication Protocols
The Jupyter Messaging Protocol is a standardized, asynchronous communication system that enables interaction between Jupyter frontends (such as notebooks or consoles) and kernels, facilitating code execution, output display, and control operations across diverse programming languages. Built on the ZeroMQ library, it employs a publish-subscribe and request-reply pattern over multiple dedicated channels (sockets) to handle concurrent messaging without blocking, ensuring efficient and reliable data exchange in distributed computing environments.[42] The protocol defines five primary channels for distinct purposes: the shell channel manages request-reply interactions for core operations like code execution; the IOPub channel serves as a broadcast mechanism for publishing kernel outputs, such as stdout, stderr, and execution results, to all connected frontends; the stdin channel handles user input requests from the kernel; the control channel supports administrative requests like kernel interruption; and the heartbeat channel monitors kernel liveness through periodic ping-pong messages. Key message types include execute_request and execute_reply on the shell channel for initiating and responding to code runs, stream messages on IOPub for real-time stdout/stderr output, and input_request on stdin for prompting user data. Each message follows a structured format comprising a header (with metadata like session ID and message type), parent_header (referencing prior messages), content (specific payload), metadata (optional annotations), and a signature for integrity.[42] Security is enforced through HMAC-based signature verification, where a shared secret key generates a hexadecimal digest of the message signature field, allowing clients to authenticate incoming kernel messages and kernels to validate client requests; this can be disabled by setting an empty key, though it is not recommended for production use. The heartbeat channel detects kernel failures by expecting timely replies to pings, triggering reconnection if unresponsive.[42] The protocol originated in IPython and reached version 5.0 with the release of IPython 3.0 on February 27, 2015, marking a major standardization effort that decoupled language-specific kernel implementations from the core messaging framework and introduced robust support for multi-language interoperability. Subsequent evolutions, such as in Jupyter Client 5.0.0 (released February 20, 2017), refined version 5.0 with enhanced streaming capabilities for stdin/stdout via dedicated IOPub messages, improving real-time interactivity without interrupting execution flows. Later versions, like 5.3, added features such as control channel interrupts for graceful kernel shutdowns, maintaining backward compatibility while extending functionality.[43][44][45]User Interfaces
Jupyter Notebook
The Jupyter Notebook is the original web-based interactive computing environment developed as part of Project Jupyter, enabling users to create and share documents that blend executable code, explanatory text, mathematical expressions, and visualizations. Introduced in 2011 as the IPython Notebook with the release of IPython 0.12 on December 19, 2011, it was created by a team led by Fernando Pérez, Brian Granger, and Min Ragan-Kelley to facilitate exploratory computing in Python.[46][47] This interface revolutionized scientific workflows by allowing inline execution and rendering of outputs directly within the document, supporting reproducibility through a stateful kernel session.[5] At its core, the Jupyter Notebook employs a cell-based editing model, where content is organized into discrete cells of three primary types: code cells for executing programming instructions, Markdown cells for formatted text and documentation, and raw cells for unprocessed text that preserves original formatting. The user interface features a simple, single-document layout with a toolbar at the top providing essential actions such as inserting new cells, running or deleting them, changing cell types, and advancing to the next cell upon execution. An outline view sidebar displays the document's structure, allowing quick navigation to specific cells or headings, while outputs from code cells—such as plots, tables, HTML elements, or rendered LaTeX equations—are displayed inline immediately below the cell for seamless integration of results.[48] The workflow in Jupyter Notebook revolves around sequential cell execution within a persistent kernel session, where variables and results from prior cells remain in memory to support iterative development and analysis. Users can execute cells individually using Shift+Enter or run all cells via the toolbar to simulate a full script, with options to restart the kernel and clear outputs for ensuring clean, reproducible runs without residual state. This design promotes a linear, narrative-driven approach to computation, ideal for prototyping and storytelling in data science, though it maintains a focus on one document at a time. The notebook files are saved in the JSON-based .ipynb format, which encapsulates source code, outputs, and metadata for easy sharing. Over time, the classic Jupyter Notebook's emphasis on a streamlined, single-tab interface highlighted certain constraints, such as the absence of an integrated file browser and limited support for multi-document workflows, which were later addressed in subsequent interfaces like JupyterLab. Despite these, it remains a foundational tool for interactive computing, widely used for its simplicity and direct support for core interactivity features.[5]JupyterLab
JupyterLab is the next-generation user interface for Project Jupyter, launched in 2018 as a flexible and extensible environment that builds upon the classic Notebook interface. It features a modular design allowing users to arrange multiple components, such as notebooks, consoles, terminals, and file viewers, through drag-and-drop panels in a customizable layout. This enables a desktop-like workspace where documents and activities can be opened in tabs, split views, or sidebars, supporting efficient multitasking for interactive computing tasks.[18][2] The extension system in JupyterLab is built on NPM-based plugins, which allow developers to create and distribute JavaScript modules that enhance functionality without modifying the core application. These plugins support a wide range of customizations, including themes for visual styling, widgets for interactive elements like variable inspectors, and integrations such as Git for version control directly within the interface. Over 100 official extensions are available, covering tools for code formatting, data visualization, and productivity features, all installable via pip, conda, or the built-in Extension Manager.[49] JupyterLab includes support for real-time collaboration through shared sessions, enabled by the jupyter_collaboration extension starting from version 4.0, which allows multiple users to edit notebooks and other documents simultaneously with live cursors and changes visible in real time, without predefined user roles. Additionally, it integrates the Language Server Protocol (LSP) via the jupyterlab-lsp extension, providing advanced code completion, hover suggestions, linting, and navigation across supported languages like Python, Julia, and R.[50][51][52][53] Migration from the classic Jupyter Notebook to JupyterLab is facilitated by backward compatibility, with Notebook 7.0—released in 2023—adopting JupyterLab's server architecture to unify the ecosystem and enable seamless transitions. This update allows existing Notebook users to access JupyterLab's advanced features while maintaining support for legacy extensions through alternatives like nbclassic, ensuring a smooth path for adoption without disrupting workflows.[24][54]Alternative Interfaces
JupyterLite is an in-browser implementation of JupyterLab that runs entirely within web browsers using WebAssembly and Pyodide, a port of CPython to WebAssembly, allowing users to execute Python code offline without requiring a server backend.[55][56] Released in 2021, it supports JupyterLab components and extensions, enabling interactive computing sessions directly in the browser for lightweight, portable environments.[57] Voila, released in 2019, transforms Jupyter notebooks into standalone web applications or dashboards by rendering notebook outputs statically while preserving interactivity for widgets and code execution.[58] This tool facilitates sharing computational results as deployable apps, integrating with Jupyter kernels for languages like Python and Julia, without exposing the full notebook editing interface. Other alternative interfaces include nteract, a desktop application launched in 2017 that provides local editing and execution of Jupyter notebooks using Electron for cross-platform support on Windows, macOS, and Linux.[59][60] Similarly, JupyterLab Desktop, introduced in 2021, offers a packaged distribution of JupyterLab as a self-contained application, bundling a Python environment with common libraries for quick setup and offline use, though maintenance has been limited since August 2025.[61][62] As of 2025, mobile adaptations have emerged, such as accessing Jupyter environments via Binder for cloud-based execution through mobile web browsers, enabling on-the-go interaction with notebooks without native apps.[63][64]Document Formats and Features
Notebook Document Structure
The Jupyter notebook document is stored in the .ipynb file format, a JSON-based structure that ensures portability and enables seamless sharing across Jupyter-compatible tools and platforms.[65] This format encapsulates executable code, outputs, and narrative content in a single, human-readable file, facilitating reproducibility in computational workflows.[66] At the root level, the JSON object includes four primary fields: "cells", an array of cell objects; "metadata", a dictionary for notebook-wide information; "nbformat", an integer indicating the major version; and "nbformat_minor", an integer for the minor version.[65] The nbformat version 4, introduced in 2015 with Jupyter Notebook 4.0, established the current standard and has remained backward-compatible through subsequent minor updates, up to version 4.5 (nbformat_minor: 5) as of 2023.[67][68][69] The "cells" field forms the document's core, comprising an ordered list of dictionaries, each representing a cell with keys like "cell_type", "id" (a unique string identifier for the cell, introduced in version 4.5), "source" (a list of strings for the cell's content), "metadata" (cell-specific options), and, for executable cells, "outputs" (an array of output objects) and "execution_count" (an integer tracking execution order).[65] Three cell types are defined: "code" for executable snippets with associated outputs; "markdown" for rendered narrative text using Markdown syntax; and "raw" for unprocessed input that passes through without execution or formatting.[65] Code cell outputs support multiple MIME types, including text/plain for textual results and image/svg+xml for scalable vector graphics, stored in a "data" dictionary within each output object alongside execution metadata.[65] Notebook metadata encompasses fields such as "kernelspec" for kernel specifications, "language_info" detailing the default language (e.g., Python with its version and file extension), and "autosave_interval" for automatic saving frequency in milliseconds.[65] Output objects in code cells include an "execution_count" to maintain execution history, ensuring traceability even after re-execution.[65] Validation of .ipynb files relies on the nbformat library, which implements the official JSON schema for parsing, serializing, and verifying notebook structure in Python environments.[66] This library enforces schema compliance using jsonschema and supports version migration for backward compatibility, allowing notebooks from earlier nbformat versions to be read and converted without data loss.[66][65]Interactivity and Extensions
Project Jupyter supports core interactivity through features like cell magics and interactive widgets, enabling dynamic computations and visualizations within notebooks. Line magics, prefixed with a single '%', operate on the current line of input (e.g., %matplotlib inline, which configures Matplotlib to display static images of plots directly in the notebook output without opening external windows). This enhances workflow efficiency by integrating graphical outputs seamlessly into the document. Cell magics, prefixed with '%%', apply to the entire cell, supporting a range of interactive behaviors like timing code execution (e.g., %%time) or debugging.[70] The ipywidgets library, introduced in 2015, provides interactive HTML widgets such as sliders, buttons, and dropdowns for creating user interfaces within notebooks.[71] These widgets allow real-time parameter adjustment and visualization updates, for example, using a slider to vary function inputs and observe changes in plots. Widget outputs are embedded in the notebook document, supporting reproducible interactive analyses. JupyterLab's extension architecture enables customization through server extensions, which modify backend behavior, and frontend extensions, which enhance the user interface. Extensions are typically installed as prebuilt packages via pip or conda, with older methods using thejupyter labextension command for JavaScript-based frontend additions.[49] Examples include nbconvert preprocessors that transform notebooks before export or execution, such as applying filters to clean outputs, and code folding capabilities configurable in settings to collapse code sections for better readability. The Extension Manager in JupyterLab's sidebar facilitates searching, installing, and managing these extensions from repositories like PyPI.
Customization options in JupyterLab include themes for visual styling, applied via CSS files or dedicated extensions to alter colors, fonts, and layouts, and configurable keyboard shortcuts editable through the Advanced Settings Editor.[72] Users can override defaults, such as remapping cell execution commands, to match preferred workflows. Additionally, integration with the Language Server Protocol (LSP) via the jupyter-lsp extension provides language-agnostic autocompletion, hover tooltips, and linting across kernels like Python, R, and Julia, improving code editing efficiency.[53]
Security considerations in Jupyter address risks from untrusted notebooks, which may contain malicious HTML, JavaScript, or code. The platform implements a trust model where newly opened notebooks are marked untrusted by default, disabling JavaScript execution in outputs and prompting users to explicitly trust before running cells to prevent arbitrary code execution.[73] This model sanitizes outputs and requires user consent for full interactivity, mitigating threats like cross-site scripting while allowing safe handling of shared content. Once trusted—after re-execution or manual approval—the notebook enables all features, but users are advised to verify sources for potentially harmful code.