Fact-checked by Grok 2 weeks ago
References
-
[1]
Software Performance Engineering - Smith - Wiley Online LibraryJan 15, 2002 · Software performance engineering (SPE) is a method for constructing software systems to meet performance objectives.Missing: sources | Show results with:sources
-
[2]
What is Performance Engineering? | OpenTextPerformance engineering is proactive, continuous, and end-to-end application performance testing and monitoring.Missing: authoritative | Show results with:authoritative
-
[3]
What is Performance Engineering? - SplunkNov 14, 2023 · Performance engineering is the practice that ensures the software you're designing meets its expected speed and efficiency goals.Missing: authoritative | Show results with:authoritative
-
[4]
Introduction to Software Performance Engineering: Origins and ...This chapter first reviews the origins of Software Performance Engineering (SPE). It provides an overview and an extensive bibliography of the early ...
-
[5]
(PDF) Introduction to Software Performance Engineering.May 4, 2022 · Software Performance Engineering (SPE) provides a systematic, quantitative approach to constructing software systems that meet performance ...
-
[6]
Performance Engineering of Software Systems6.172 is an 18-unit class that provides a hands-on, project-based introduction to building scalable and high-performance software systems.Resources · Lecture Videos · Syllabus · Lecture Slides
- [7]
-
[8]
Performance Solutions BookPerformance Solutions offers straightforward techniques and strategies that can be used by software developers, project managers, and performance specialists.
-
[9]
Performance Engineering, State of the Art and Current TrendsAug 7, 2025 · Performance engineering aims to demonstrate that the software being developed will meet the performance needs. The goal of robustness ...
-
[10]
(PDF) Performance engineering and testing: The challenges on mobile platforms### Definitions and Distinctions
-
[11]
Performance Engineering: New and Conflicting TrendsMay 5, 2025 · Performance engineering is adjusting to major industry trends - such as cloud computing, agile development, and DevOps. As systems scale and ...
-
[12]
Optimizing SQL Queries to Improve the Performance of Information ...This article addresses the issue of database performance, specifically the execution of SQL queries. Solving this problem is essential for improving the ...
-
[13]
Performance Impact of Microservices Architecture - ResearchGateAug 8, 2025 · This study aims to investigate and assess how the microservices architecture affects the performance, with a particular emphasis on vital elements like inter- ...
-
[14]
(PDF) Origins of Software Performance Engineering: Highlights and ...Aug 7, 2025 · ... The term software performance engineering (SPE) was first introduced by C.U. Smith in 1981 [41] . Several complementary definitions have ...Missing: history | Show results with:history
-
[15]
Validity of the single processor approach to achieving large scale ...The organization of a single computer has reached its limits and that truly significant advances can be made only by interconnection of a multiplicity of ...
-
[16]
(PDF) Introduction to Software Performance Engineering: Origins ...This chapter first reviews the origins of Software Performance Engineering (SPE). It provides an overview and an extensive bibliography of the early ...
-
[17]
IT Service Management: Automate Operations - Google SRESRE's approach to IT Service Management, Use software engineers to design scalable and reliable systems. Innovation and improve product development.
-
[18]
History of SRE: Why Google Invented the SRE Role - RootlyNov 19, 2021 · The first SRE team originated at Google in 2003 under the direction of Ben Treynor Sloss, who had begun his career as a software engineer at Oracle and several ...
-
[19]
Performance Engineering for Microservices - ACM Digital LibraryThese technologies allow to efficiently exploit cloud platforms, providing a high degree of scalability, availability, and portability for microservices.
-
[20]
What Is Green Software and Why Do We Need It? - IEEE SpectrumGreen software engineering is an emerging discipline consisting of best practices to build applications that reduce carbon emissions.
-
[21]
ISO/IEC 25010:2011 - Systems and software engineeringISO/IEC 25010:2011 defines a quality in use model and a product quality model, providing consistent terminology for specifying, measuring, and evaluating ...Missing: history | Show results with:history
-
[22]
Nonfunctional Requirements - Scaled Agile FrameworkOct 13, 2023 · Nonfunctional Requirements (NFRs) are system qualities that guide the design of the solution and often serve as constraints across the relevant backlogs.
-
[23]
Nonfunctional Requirements: Examples, Types and ApproachesDec 30, 2023 · Nonfunctional requirements, or NFRs, are a set of specifications that describe the system's operation capabilities and constraints.What are nonfunctional... · Performance requirements · Portability requirements
-
[24]
Non-Functional Engineering | EY - GlobalThese non-functional requirements encompass attributes such as performance, reliability, usability, security, scalability, maintainability, and availability.
-
[25]
Reliability vs. Availability: Key Metrics for System Perform | AtlassianAvailability is often calculated using the formula: Availability = (MTBF / (MTBF + MTTR)), where MTTR is the mean time to repair.Reliability Vs Availability... · What Is System Reliability? · Mean Time To Failure (mttf)...<|control11|><|separator|>
-
[26]
[PDF] Elicitation and Modeling Non-Functional Requirements - arXivNFR is not equally considered as functional requirements in software development. Requirement gathering or discovering is known as the elicitation process.
- [27]
-
[28]
[PDF] Non-Functional Requirements (NFR) FrameworkThe NFR framework is for IT and IT-enabled business services, proposes enterprise architecture artifacts, and describes an NFR lifecycle and framework.
-
[29]
Why averages suck and percentiles are great - DynatraceSep 23, 2024 · In other cases, we see the 98th percentile degrading from 1s to 1.5 seconds, while the 95th is stable at 900ms. This means your application is ...
- [30]
-
[31]
How is Performance Addressed in DevOps? - ACM Digital LibraryThe goal of DevOps is to bring software changes into production with a high frequency and fast feedback cycles. This conflicts with software quality assurance ...Abstract · Information & Contributors · Published In
-
[32]
PROFILING, PERFORMANCE, and PERFECTIONA commonly accepted rule in computing, known variously as Pareto's rule or the 80/20 rule, is that 80% of the processing of an application occurs in 20% of the ...
-
[33]
Metrics for everything as code - DevOps GuidanceHigh infrastructure code coverage implies improved manageability, reproducibility, and automation capabilities for systems.
-
[34]
(PDF) Model-Based Performance Prediction in Software DevelopmentAug 9, 2025 · In this paper, we present a comprehensive review of recent research in the field of model-based performance prediction at software development ...
-
[35]
queueing.html - FSU Computer ScienceThe objective of these notes is to make you aware of the existence of analytic techniques for predicting OS performance, to give you a flavor of how they ...<|control11|><|separator|>
-
[36]
Discrete event systems : modeling and performance analysisThe aim of this text is to teach the student what discrete event systems are about and how they differ from "classical systems"; describe the differences ...
-
[37]
(PDF) An Introduction to Discrete-Event Modeling and SimulationJun 20, 2017 · This paper presents a short tutorial on modeling and simulation techniques, with a focus on discrete-event simulation.
-
[38]
Predicting the performance of queues–A data analytic approachThis paper proposes using data analytics, combining computer simulation to generate the data and an advanced non-linear regression technique called the ...
-
[39]
Software performance prediction at source level - IEEE XploreIn this paper we present a comprehensive method that combines analytical modeling and statistical approach to predicting the performance of application software ...
-
[40]
[PDF] Performance Modeling and Scalability Optimization of Distributed ...Aug 10, 2015 · Modeling the performance of parallel computing and distributed systems for scalability analysis, resource allocation, and capacity planning is ...
-
[41]
[PDF] Performance Modeling for Cloud Microservice ApplicationsApr 7, 2019 · Application bottleneck detection serves to identify the microser- vice(s) responsible for the performance degradation. The bottleneck of an ...
-
[42]
[PDF] Layered Queueing Network Solver and Simulator User ManualJan 20, 2022 · The Layered Queuing Network (LQN) model is a canonical form for extended queueing networks with a laye- red structure.
-
[43]
[PDF] JMT: performance engineering tools for system modelingLayered Queueing Models (LQMs) are shown to be a robust alternative to basic QNMs, while still enjoying analytical solution algorithms that facilitate their ...<|separator|>
-
[44]
Performance validation tools for software/hardware systemsThis paper compares layered queueing models (LQMs) and stochastic process algebras (SPAs) and their support for system performance validation.
- [45]
-
[46]
Flame GraphsJan 23, 2025 · The flame graph visualization is really an adjacency diagram with an inverted icicle layout, which I used to visualize profiled stack traces.on-CPU · Hot/Cold Flame Graphs · Memory Leak (and Growth... · Off-CPU
-
[47]
CPU Flame Graphs - Brendan GreggAug 30, 2021 · Flame graphs are a visualization for sampled stack traces, which allows hot code-paths to be identified quickly. See the Flame Graphs main page ...Description · Instructions · Examples · Java
-
[48]
Java VisualVM - Browsing a Heap Dump - Oracle Help CenterJava VisualVM can open heap dumps saved in the .hprof file format. When you open a saved heap dump, the heap dump opens as a tab in the main window. Taking a ...
-
[49]
HPC Storage – Getting Started with I/O Profiling - ADMIN MagazineIn this article, I talk about different ways to measure the performance of your current HPC storage system, your applications, or both.
-
[50]
Documentation### Summary of Instrumentation in OpenTelemetry
-
[51]
The Python Profilers — Python 3.14.0 documentationPython profilers (cProfile and profile) provide deterministic profiling, tracking how often and how long parts of a program execute, not for benchmarking.
-
[52]
Why continuous profiling is the fourth pillar of observability - DatadogJul 25, 2025 · Low overhead means that continuous profilers can be used alongside other monitoring tools, which is a crucial advantage. Unlike traditional code ...Missing: best | Show results with:best
-
[53]
Apache JMeter - User's Manual1. Getting Started 1.2 Optional 1.4 Running JMeter 2. Building a Test Plan 3. Elements of a Test Plan 4. Building a Web Test Plan 5. Building an Advanced Web ...Getting Started · Recording Tests · Component Reference · Distributed Testing
-
[54]
Gatling: Discover the most powerful load testing platformThe most powerful load testing platform for modern organizations · We integrate with the tools that matter · Worldwide community of 300,000 companies and millions ...Documentation · Automate load testing from... · Deploy load testing... · Pricing
-
[55]
Locust - A modern load testing frameworkLocust is an open-source load testing tool using Python code to define user behavior and simulate millions of users distributed over multiple machines.Your first test · Writing a locustfile · Distributed load generation · Installation
-
[56]
Apache JMeter - Apache JMeter™The Apache JMeter™ application is open source software, a 100% pure Java application designed to load test functional behavior and measure performance.User's Manual · Download Releases · Getting Started · Recording Tests
-
[57]
Gatling documentationGatling is a high-performance load testing tool built for efficiency, automation, and code-driven testing workflows. Test scenarios are defined as code using an ...Gatling installation · Testing WebSocket · Gatling reference documentation · Guides
-
[58]
Distributed load generation — Locust 2.42.2 documentationLocust supports distributed runs out of the box. To do this, you start one instance of Locust with the --master flag and one or more using the --worker flag.
-
[59]
Thinking Clearly about Performance - ACM QueueSep 1, 2010 · At low load, your response time is essentially the same as your response time at no load. As load ramps up, you sense a slight, gradual ...Missing: strategies | Show results with:strategies
-
[60]
Component Reference - Apache JMeter - User's ManualThis sampler lets you send an HTTP/HTTPS request to a web server. It also lets you control whether or not JMeter parses HTML files for images and other ...
-
[61]
Distributed Load Testing on AWSDistributed Load Testing on AWS automates performance testing at scale, demonstrating how applications behave under various load conditions and helping ...
-
[62]
Breakpoint testing: A beginner's guide | Grafana LabsJan 30, 2024 · Learn how a breakpoint test identifies where and how a system starts to fail and helps you prepare for those limits.
-
[63]
Performance Testing vs. Load Testing vs. Stress Testing - BlazeMeterAug 19, 2025 · Load testing and stress testing are both performance testing types that check how your application performs when many people use it at once.Performance Test Vs. Load... · Load Testing: Validating... · Stress Testing: Identifying...<|control11|><|separator|>
-
[64]
Deployment Automation: What is it & How to Start - AtlassianCI/CD pipelines automate integrating, testing, and releasing code changes faster. ... Unit, integration, system, regression, load, and other software testing can ...
-
[65]
Google SRE monitoring ditributed system - sre golden signalsThe four golden signals of monitoring are latency, traffic, errors, and saturation. If you can only measure four metrics of your user-facing system, focus on ...Setting Reasonable... · The Four Golden Signals · Monitoring For The Long Term
-
[66]
Overview - PrometheusPrometheus project documentation for Overview. ... Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud .First steps with Prometheus · Getting started with Prometheus · Media · Data model
-
[67]
Dashboards | Grafana documentationA Grafana dashboard is a set of one or more panels, organized and arranged into one or more rows, that provide an at-a-glance view of related information.
-
[68]
Elastic Stack: (ELK) Elasticsearch, Kibana & LogstashMeet the search platform that helps you search, solve, and succeed. It's comprised of Elasticsearch, Kibana, Beats, and Logstash (also known as the ELK Stack) ...Kibana · Elasticsearch · Stack Security · Integrations
-
[69]
Prometheus Alerting: Turn SLOs into Alerts - Google SRETurn SLOs into actionable alerts on significant events using Prometheus alerting. Improve precision, recall, detection time, and time for alerting.
-
[70]
Anomaly Detection in Machine Learning - IBMIn this blog we'll go over how machine learning techniques, powered by artificial intelligence, are leveraged to detect anomalous behavior.Supervised learning · Unsupervised learning
-
[71]
Architecture strategies for capacity planning - Microsoft Azure Well ...Aug 6, 2025 · Use these trends as a basis for forecasting future demand. Trend analysis can also identify the effects of one-time events that cause rapid ...
-
[72]
Jaeger: open source, distributed tracing platformJaeger is 100% open source, cloud native, and infinitely scalable. With Jaeger you can insights, monitor distributed workflows, speed, find & fix performance ...2.6 (latest) · Introduction · Getting Started · Features
-
[73]
What is SLA (Service Level Agreement)? - Amazon AWSA service level agreement (SLA) is a contract outlining a service level a supplier promises, including metrics like uptime and response time.What is a Service Level... · What are the common... · What are some examples of...
-
[74]
What Is an SLA (service level agreement)? - IBMA service level agreement (SLA) is a contract between a service provider and a customer that outlines the terms and expectations of provided service.
-
[75]
What is an SLA? Best practices for service-level agreements - CIOA service-level agreement (SLA) defines the level of service expected from a vendor, laying out metrics by which service is measured, as well as remedies ...
-
[76]
Amazon Compute Service Level AgreementMay 25, 2022 · AWS will use commercially reasonable efforts to make Amazon EC2 available for each AWS region with a Monthly Uptime Percentage of at least 99.99%.
-
[77]
Google SRE - Embracing risk and reliability engineering book### Summary of Error Budgets from https://sre.google/sre-book/embracing-risk/
- [78]
-
[79]
SLA Monitoring & Reporting: Getting What You Paid For - ObkioRating 4.9 (161) Jun 10, 2025 · SLA monitoring measures and tracks service metrics to compare against agreed standards, ensuring service providers meet their obligations.
-
[80]
Error budget and service levels best practices - New RelicMar 19, 2024 · Error budgets and burn rates help you quickly see when business-critical services are experiencing service degradations or failures, often before customers ...
-
[81]
Workload characterization for trend analysis - ACM Digital LibraryWorkload characterization for trend analysis. Authors: A. Esposito. A ... Artis, H. P., Capacity Planning for MVS Computer Systems in Ferrari, D., (ed.) ...
-
[82]
A Combined Capacity Planning and Simulation Approach for ... - MDPIMethods: The presented approach combines the use of capacity planning formulas and discrete event simulation for optimizing extensive automated guided vehicle ( ...
-
[83]
(PDF) Simulation Based Resource Capacity Planning with ConstraintsDec 9, 2021 · The research work represents the development of a new decision-making model intended for the resource capacity planning depending on the production system ...
-
[84]
Amazon EC2 Auto Scaling - AWS DocumentationAmazon EC2 Auto Scaling helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application.Quotas for Auto Scaling... · Auto Scaling benefits · Instance lifecycle
-
[85]
Right Sizing - Amazon AWSRight sizing is the process of matching instance types and sizes to your workload performance and capacity requirements at the lowest possible cost.
-
[86]
Best Practices - Apache JMeter - User's ManualIf you need large-scale load testing, consider running multiple CLI JMeter instances on multiple machines using distributed mode (or not). When using ...16.7 Reducing Resource... · 16.8 Beanshell Server · 16.9 Beanshell Scripting
-
[87]
Microsoft Fabric Capacity Planning Guide: Manage Growth and ...Sep 4, 2025 · After determining the number of subscriptions and sizes of Fabric capacities required, allow a 25%-50% buffer for peak usage or throttling.
-
[88]
Incident Management | IT Process WikiDec 31, 2023 · The primary objective of this ITIL process is to return the IT service to users as quickly as possible. Part of: Service Operation. Process ...
-
[89]
Incident Management: Processes, Best Practices & Tools | AtlassianThe ITIL incident management workflow aims to reduce downtime and minimize impact on employee productivity from incidents.
-
[90]
Blameless Postmortem for System Resilience - Google SREFor a postmortem to be truly blameless, it must focus on identifying the contributing causes of the incident without indicting any individual or team for bad or ...
-
[91]
Five Whys and Five Hows | ASQ### Summary of the 5 Whys Technique
-
[92]
Problem Management in ITIL: Process & Implementation GuideProblem Management enables IT teams to prevent incidents by identifying the root cause. Learn about the overall process, benefits, and best practices.
-
[93]
Incident Management - MTBF, MTTR, MTTA, and MTTF - AtlassianMTTR is a metric support and maintenance teams use to keep repairs on track. The goal is to get this number as low as possible by increasing the efficiency of ...Mtbf, Mttr, Mtta, And Mttf · Mtbf: Mean Time Between... · Mttf: Mean Time To Failure
-
[94]
[PDF] Reducing MTTR the Right Way Best practices for fast ... - New Relicreduce MTTR by helping you step up your incident response game: 1. Create a robust incident-management action plan. At the most basic level, teams need a ...<|separator|>
-
[95]
Spike Testing: Definition, Best Practices & Examples - Queue-itFeb 12, 2025 · Spike testing is a type of performance testing that involves flooding a site or application with sudden and extreme increases and decreases (spikes) in load.
-
[96]
The legacy problem in government agencies: an exploratory studyMay 27, 2015 · Government organizations continue to be heavily reliant on legacy systems to support their business-critical functions.<|separator|>
- [97]
-
[98]
Critical Challenges to Adopt DevOps Culture in Software Organizations: A Systematic ReviewInsufficient relevant content. The provided URL (https://ieeexplore.ieee.org/document/9690862) only displays a title and partial metadata, with no accessible full text or detailed information about organizational challenges in adopting DevOps and performance engineering.
-
[99]
How Are Performance Issues Caused and Resolved?-An Empirical ...Apr 20, 2020 · This paper contributes a large scale empirical study of 192 real-life performance issues, with an emphasis on software design.
-
[100]
Stop Wasting IT Budget: Unlock Efficiency and Business Value ...Jul 25, 2025 · According to the IDC IT benchmarking report, as much as 30% of IT spend is lost to inefficiency, and the impact on revenue can be as high as 20– ...
-
[101]
SERVIMON: AI-Driven Predictive Maintenance and Real-Time ...Oct 31, 2025 · Results: AI-based anomaly detection increases system resilience by identifying performance degradation at an early stage, minimizing downtime, ...
-
[102]
Predictive Modeling and Anomaly Detection in Large-Scale Web ...Feb 1, 2025 · This study investigates using datasets generated by the CAWAL framework [12] to improve the performance of predictive modeling and anomaly ...
-
[103]
Automated Machine Learning for Optimized Load Forecasting and ...Oct 25, 2024 · This study uses automated machine learning to forecast electrical load demand, achieving a 12.89% MAPE, and uses AutoML frameworks to assess ...
-
[104]
SYSTEMLENS: Integrating Performance Prediction, Anomaly ...Engineering self-adaptive systems for software applications necessitates accurate predictions about the state of the underlying application.
-
[105]
TESS: Automated Performance Evaluation of Self-Healing and Self ...Mar 30, 2018 · This paper deals with the problem of evaluating and testing recovery and adaptation frameworks (RAF) for distributed software systems.
-
[106]
Exploring Performance and Energy Optimization in Serverless ...Oct 30, 2025 · This review paper presents various performance metrics in serverless computing, including cost, scalability, latency, energy consumption, ...
-
[107]
[PDF] Technology Trends Outlook 2025 - McKinseyJul 1, 2025 · Key 2025 trends include AI, agentic AI, application-specific semiconductors, advanced connectivity, cloud/edge computing, and quantum ...
-
[108]
Exploring the Potential of Carbon-Aware Execution for Scientific ...Mar 19, 2025 · Resource Scaling. Carbon-aware resource scaling dynamically allocates more resources when CI is low and reduces demand when it is higher ...
-
[109]
Empowering Kubernetes Observability with eBPF on Amazon EKSDec 14, 2023 · In this blog post, we'll explore how eBPF (Extended Berkeley Packet Filter) is revolutionizing Kubernetes observability on Amazon EKS.
-
[110]
Quantum Optimization for Software Engineering: A Survey - arXivJun 20, 2025 · Quantum optimization includes solving optimization problems using quantum hardware (or corresponding classical simulators), which includes ...
-
[111]
SEI Study Analyzes Applicability of Security and Zero Trust ...Oct 27, 2025 · October 27, 2025—The Department of War (DoW) is mandated to begin adopting zero-trust (ZT) cybersecurity practices for its weapon systems ...
-
[112]
Home - Chaos MonkeyChaos Monkey is responsible for randomly terminating instances in production to ensure that engineers implement their services to be resilient to instance ...