Fact-checked by Grok 2 weeks ago

Video Multimethod Assessment Fusion

Video Multimethod Assessment Fusion (VMAF) is an objective, full-reference perceptual video quality metric designed to predict human viewers' subjective judgments of video quality, particularly for distortions caused by compression and spatial resizing. Developed by Netflix in collaboration with researchers at the University of Southern California, VMAF was first released in June 2016 as an open-source tool under the Apache License 2.0. It addresses limitations of earlier metrics like Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) by fusing multiple elementary features through machine learning to better align with human perception. At its core, VMAF employs a (SVM) regression model to integrate scores from key perceptual features, including the Visual Information Fidelity (VIF) for information integrity, the Detail Loss Metric (DLM) for sharpness preservation, and motion features derived from mean co-located pixel differences across frames. The model was trained on the Netflix Video Quality Dataset, comprising 34 diverse source clips distorted at various bitrates and resolutions, with subjective scores collected via the double-stimulus impairment scale (DSIS) method. This data-driven approach enables VMAF to achieve high correlation with mean opinion scores (), outperforming PSNR and SSIM on test sets with lower error (RMSE) values. Since its introduction, VMAF has become widely adopted in the video streaming industry for optimizing encoding ladders, quality monitoring, and perceptual optimization, with implementations available in libraries like FFmpeg and extensions including support for () content via HDR-VMAF as of 2023. Adaptations for have also been explored in research. continues to refine the metric through community contributions via , incorporating advancements like ensemble models and GPU acceleration to handle large-scale processing demands. Its Emmy Award recognition in 2020 underscores its impact on advancing perceptual video quality assessment standards.

Background

Video Quality Metrics

Objective video quality assessment metrics are computational algorithms that predict the perceptual quality of a video signal without involving human subjects, providing an automated alternative to subjective evaluations. These metrics are classified into three main categories based on the availability of reference information: full-reference (FR) metrics, which compare the distorted video against a complete pristine ; reduced-reference (RR) metrics, which utilize partial features or parameters extracted from the ; and no-reference (NR) metrics, which assess quality using only the distorted video itself. The foundations of objective video quality metrics emerged in the 1970s with early techniques, such as the (MSE), which quantifies by averaging the squared differences between corresponding pixels in the reference and test videos. For over 50 years, MSE served as a cornerstone metric in due to its simplicity and mathematical tractability, though it often failed to correlate well with human perception. By the , the field advanced toward perceptual models that incorporated aspects of human vision, driven by the growing demands of compression and the recognition that simple error measures overlooked structural and contextual distortions. In industry applications, particularly video streaming services, these metrics are essential for optimizing encoding ladders, algorithms, and , allowing providers to achieve the best perceptual quality at constrained bitrates while reducing usage and storage costs. They enable automated in production pipelines, ensuring consistent viewer experience across diverse devices and network conditions. Developing robust metrics faces significant challenges in modeling the complexities of the human visual system (HVS), which governs how viewers perceive distortions. Key HVS factors include contrast sensitivity, which varies with levels and affects the detection of subtle changes; temporal masking, where motion or scene changes hide artifacts; and spatial frequency response, which determines sensitivity to fine details across different resolutions. These elements make it difficult to create universally accurate predictors, as perceptual judgments depend on content type, viewing conditions, and individual differences.

Limitations of Traditional Approaches

Traditional video quality assessment metrics, such as (PSNR), have been widely adopted due to their computational simplicity, but they exhibit significant shortcomings in aligning with human visual perception. PSNR quantifies the difference between an original and distorted video by measuring the average squared error between corresponding pixels, formalized as \text{PSNR} = 10 \cdot \log_{10} \left( \frac{\text{MAX}^2}{\text{MSE}} \right), where \text{MAX} is the maximum possible pixel value (typically 255 for 8-bit images) and \text{MSE} is the . Despite its ease of implementation, PSNR treats all errors equally regardless of their perceptual impact, ignoring key aspects of the human visual system (HVS) such as structural , luminance masking, and contrast sensitivity, leading to poor correlation with subjective judgments. For instance, PSNR often fails to distinguish visually imperceptible distortions from noticeable ones, particularly in scenarios involving blurring or that do not drastically alter pixel intensities. The Structural Similarity Index (SSIM) addresses some of PSNR's deficiencies by incorporating perceptual principles, evaluating , , and structural between reference and distorted frames. Its core formulation is \text{SSIM}(x,y) = \frac{(2\mu_x \mu_y + c_1)(2\sigma_{xy} + c_2)}{(\mu_x^2 + \mu_y^2 + c_1)(\sigma_x^2 + \mu_y^2 + c_2)}, where \mu_x, \mu_y are the means, \sigma_x, \sigma_y are the variances, \sigma_{xy} is the of blocks x and y, and c_1, c_2 are stabilization constants. While SSIM improves upon PSNR by better capturing structural degradations and achieving higher perceptual relevance in static assessments, it remains limited in video contexts, as it operates primarily on individual frames without adequately accounting for temporal dynamics like or frame-to-frame inconsistencies. Additionally, SSIM struggles with color distortions and complex artifacts, such as those introduced by in high-motion scenes, where structural changes may not fully represent perceived quality degradation. Other metrics, including the Video Quality Metric (VQM) standardized by the ITU and Multi-Scale SSIM (MS-SSIM), attempt to extend these principles but suffer from domain-specific weaknesses that hinder broad applicability. VQM incorporates motion and perceptual models to predict quality more holistically than pixel-error methods, yet it exhibits poor generalization across diverse content types, resolutions, and distortion scenarios due to its calibration to specific broadcast conditions. MS-SSIM enhances SSIM by applying the index at multiple scales to better handle varying resolutions and viewing distances, but it still falters in generalizing to dynamic video content with temporal distortions. Empirical studies on benchmark datasets underscore these limitations, revealing consistently low correlations between traditional metrics and human subjective scores, typically measured via (MOS). For example, on the LIVE Video Quality Assessment (LIVE-VQA) database, PSNR achieves a Spearman Rank Order Correlation Coefficient (SROCC) of approximately 0.52 with MOS, while SSIM yields 0.53; on the Video Quality Experts Group (VQEG) Phase 3 dataset, these values rise modestly to 0.72 and 0.68, respectively, indicating only moderate predictive power across compression-induced distortions. Similar trends hold for VQM and MS-SSIM, with SROCC values rarely exceeding 0.75 on heterogeneous datasets, highlighting their inability to robustly predict perceptual quality in real-world streaming applications. These gaps motivated the development of fusion-based approaches that integrate multiple metrics to better approximate perception.

Development

Origins at Netflix

The development of Video Multimethod Assessment Fusion (VMAF) originated within 's engineering teams amid the company's expansion of global video streaming services in the mid-2010s. Initial research began around 2014-2015, coinciding with Netflix's intensified focus on perceptual optimization for , where videos are dynamically adjusted based on users' conditions and capabilities to maintain consistent . This effort was driven by the need to scale video encoding pipelines that produce thousands of variants per title, ensuring high perceptual quality across diverse playback scenarios without manual intervention. A primary motivation was to automate quality control processes, as traditional subjective testing—conducted by human evaluators to gauge perceived quality—was too costly, time-consuming, and unscalable for Netflix's volume of . Existing objective like PSNR often failed to align with human judgments, particularly under variable network bandwidths that cause rebuffering or resolution changes, and across heterogeneous devices ranging from smartphones to large-screen TVs. VMAF was conceived as a perceptual to predict viewer more accurately, enabling automated decisions in encoding ladders and preprocessing to minimize bitrate while maximizing quality. Early prototypes involved combining established perceptual features and testing them against Netflix's internal datasets, which included compressed video sequences derived from real streaming scenarios and annotated with subjective quality scores. These prototypes were evaluated to ensure robustness in reflecting human under compression artifacts typical of streaming . The foundational concept was publicly introduced in a 2016 Netflix TechBlog post titled "Toward a Practical Perceptual Video Quality Metric," authored by Zhi Li, Anne Aaron, Ioannis Katsavounidis, Anush Moorthy, and Megha , marking VMAF's debut as an open-source tool. This initial work at laid the groundwork for subsequent refinements, including brief early collaborations with academic researchers to enhance the metric's perceptual modeling.

Collaboration and Evolution

The development of Video Multimethod Assessment Fusion (VMAF) has been driven by a close partnership between and the University of Southern California's Media Communications Laboratory (MCL), directed by Professor C.-C. Jay Kuo. Key researchers, including Anne Aaron, Zhi Li, and others, have led the effort, integrating academic expertise in and perceptual modeling to refine the metric for real-world streaming applications. This collaboration began in the mid-2010s, focusing on fusing multiple objective metrics to better predict human-perceived video quality, and has resulted in VMAF's open-source release and widespread adoption. Major milestones include the initial open-sourcing of VMAF in June 2016, with early versions like 0.3.1 made available under a permissive to encourage community contributions. In , the collaboration earned a Technology and Engineering Emmy Award for the development of perceptual metrics for video encoding optimization, recognizing VMAF's impact on industry standards. Subsequent updates, such as version 0.6.1 around 2018-2019, improved model accuracy for higher resolutions, while the framework's flexibility allowed for tailored prediction models without overhauling the core architecture. VMAF's evolution has incorporated advancements in machine learning, starting with (SVM) regression to fuse features like visual information fidelity and detail loss metrics. Over time, the model has expanded to include device-specific variants, such as the VMAF Phone model introduced in 2018, which accounts for closer viewing distances and smaller screens typical of devices to optimize bitrate allocation. This progression reflects iterative training on diverse datasets, enhancing robustness across viewing conditions while maintaining computational efficiency. Ongoing Netflix-USC efforts have integrated support for high dynamic range (HDR) content, with the first HDR-VMAF version developed in collaboration with Dolby Laboratories by 2021 and fully released in 2023 as a format-agnostic extension. By late 2023, library version 3.0.0 added GPU accelerations via CUDA for faster processing, building on prior optimizations like integer arithmetic for up to 2x speedups. These updates, informed by continued academic-industry exchanges, have also validated VMAF's applicability to specialized formats like 360-degree video without requiring content-specific retraining.

Methodology

Feature Extraction Components

The feature extraction components of Video Multimethod Assessment Fusion (VMAF) comprise a set of perceptual features derived from both reference and distorted videos, designed to model human visual system (HVS) responses to various degradation types, including compression artifacts, scaling, and noise. These features emphasize information fidelity, structural detail, and motion, providing a robust foundation for quality prediction without relying on simplistic pixel-wise comparisons. A core feature is Visual Information Fidelity (VIF), which quantifies information loss between the reference and distorted videos by modeling the HVS as an information channel distorted by noise. VIF captures how distortions degrade the transmitted through visual channels, focusing on and components. It is particularly sensitive to blurring and additive noise, making it effective for assessing overall fidelity degradation. VIF is computed at multiple spatial scales to account for HVS multi-resolution processing. Another foundational component is the Detail Loss Metric (DLM), which measures the impairment of fine details critical for perceived . DLM assesses the loss of visible structural information affecting detail visibility, excluding additive impairments like , and is applied across scales to reflect HVS detail sensitivity. The motion feature, known as Mean Co-located Pixel Difference (MCPD), addresses temporal quality by measuring the average absolute difference in luminance values between co-located pixels in consecutive frames. This captures motion-related artifacts such as jerkiness or temporal inconsistencies, essential for dynamic content where static metrics fail. The extraction process operates at multiple spatial and temporal scales to mimic HVS multi-resolution , using a Gaussian for that progressively low-pass filters and subsamples frames into octave bands (e.g., four levels). This enables computation at coarse-to-fine resolutions, weighting contributions by HVS acuity (higher at foveal scales). Temporally, features aggregate over short windows (e.g., 5-10 frames) to balance local and global motion effects, ensuring computational efficiency while preserving perceptual relevance. These components are derived using subjective scores from Netflix's Video Quality Dataset, comprising 34 diverse source clips distorted at various bitrates and resolutions.

Fusion and Prediction Model

The fusion and prediction model in Video Multimethod Assessment Fusion (VMAF) utilizes a (SVM) regression to integrate multiple extracted features into a unified perceptual score that approximates human subjective judgments. This approach combines features such as Visual Information Fidelity (VIF) at four spatial scales, Detail Loss Metric (DLM), and motion-related metrics like Mean Co-located Pixel Difference (MCPD) by learning a nonlinear mapping from feature vectors to Mean Opinion Scores () derived from subjective viewing tests. The SVM is selected for its effectiveness in handling high-dimensional inputs and providing robust predictions aligned with perceived video . The SVM fuses specific features including VIF computed at four spatial scales, DLM, and MCPD. The prediction process normalizes the input features to a common scale and applies the SVM to output a score, expressed conceptually as VMAF = SVM(f_{\text{VIF}}, f_{\text{DLM}}, f_{\text{MCPD}}, ...), where each f represents a normalized feature value. The resulting score is then clipped and scaled to the range [0, 100], with 100 indicating pristine, undistorted video quality and lower values reflecting increasing perceptual degradation. This scaling facilitates intuitive interpretation in practical applications like streaming optimization. Training of the SVM employs a nonlinear with a (RBF) kernel to capture complex interactions among features, optimized via cross-validation on datasets encompassing diverse distortion types, including compression artifacts from codecs like H.264 and HEVC, as well as rescaling and transmission errors. Subjective data from controlled experiments, often using Absolute Category Rating (ACR) scales, serve as ground truth labels, ensuring the model generalizes across various video contents and resolutions. This process emphasizes correlation with human perception over specific distortion mechanisms. Later versions of VMAF introduce model variants trained separately for robustness across display types and content characteristics, such as a phone-optimized model for smaller screens and closer viewing distances, and a model tailored for higher resolutions and wider viewing angles. These variants maintain the core SVM architecture but use specialized training datasets to enhance accuracy for specific scenarios, effectively providing domain-adapted predictions without altering the fundamental fusion logic.

Evaluation and Performance

Correlation with Human Perception

VMAF's predictions are designed to closely align with subjective human judgments of video quality, primarily evaluated through statistical metrics such as the Pearson Linear Correlation Coefficient (PLCC) and Spearman Rank Order Correlation Coefficient (SROCC). These measures quantify the linear and monotonic relationships between VMAF scores and Mean Opinion Scores (MOS) derived from human viewers. On Netflix's custom video dataset (NFLX-TEST), VMAF version 0.3.1 achieves a PLCC of 0.963 and demonstrates superior performance compared to traditional metrics like PSNR and SSIM. Similarly, on the VQEG Phase I dataset (vqeghd3 collection), VMAF attains a PLCC of 0.939, indicating strong generalization across diverse compression conditions. Human validation studies for VMAF employ standardized double-stimulus impairment (DSIS) methodologies, where viewers rate distorted videos relative to pristine references on a from 1 (very annoying) to 5 (imperceptible). These tests typically involve 18-55 non-expert participants per clip, conducted under controlled viewing conditions on displays, with scores normalized to a 0-100 DMOS to facilitate comparisons. Prediction accuracy is further assessed via Root (RMSE), with VMAF exhibiting low errors in aligning with ; for instance, RMSE values around 12.7 on content underscore its precision on a 0-100 , though optimized configurations can yield even tighter fits. Cross-dataset evaluations highlight VMAF's robustness, with SROCC exceeding 0.90 on public benchmarks like VQEG , even when trained primarily on Netflix-specific data. This generalization mitigates content-specific biases, such as those in per-title encoding scenarios, by leveraging fusion that adapts to varied distortions. On datasets like LIVE and CSIQ, VMAF maintains competitive correlations (SROCC approximately 0.76 on LIVE and 0.61 on CSIQ for early versions), outperforming legacy metrics in overall predictive power. In outlier analysis, VMAF particularly excels at detecting perceptual artifacts like blockiness in H.264-encoded videos, where traditional metrics such as PSNR often fail to correlate with human sensitivity to such impairments. For example, human raters perceive blockiness more severely at lower bitrates than PSNR suggests, and VMAF's multi-feature fusion captures this discrepancy effectively, reducing prediction outliers in compression-heavy scenarios.

Comparative Benchmarks

In benchmarks conducted on the Video Quality (NFLX) database, VMAF demonstrates superior correlation with subjective human ratings compared to traditional full-reference metrics. Specifically, VMAF achieves a Spearman's Order Correlation Coefficient (SROCC) of 0.943, outperforming PSNR (SROCC 0.663), SSIM (SROCC 0.800), and MS-SSIM (SROCC 0.904).
MetricSROCC on NFLX Database
PSNR0.663
SSIM0.800
MS-SSIM0.904
VMAF0.943
This advantage is particularly pronounced for high-resolution content, such as and UHD videos, where VMAF better captures perceptual distortions that simpler metrics like PSNR overlook due to their focus on pixel-level errors rather than human visual system modeling. Real-world deployments at , validated through A/B studies, show VMAF-guided encoding optimizes ladders for perceptual consistency across diverse content and networks. In no-reference scenarios, where full-reference metrics like VMAF are unavailable, VMAF demonstrates higher predictive accuracy compared to NR methods such as BRISQUE on databases like LIVE (SROCC approximately 0.76 for VMAF). Regarding , VMAF requires approximately 10 times more processing than PSNR due to its multi-feature extraction and , but it remains substantially faster and more scalable than full subjective testing, which involves extensive human trials. Later versions, such as libvmaf v3.0.0 (as of December 2023), include optimizations for improved performance.

Applications and Implementations

Use in Video Streaming

Video Multimethod Assessment Fusion (VMAF) plays a pivotal role in encoding optimization for video streaming platforms, particularly through per-title encoding techniques pioneered by . In this approach, VMAF scores guide the selection of optimal bitrates for individual titles, ensuring maximum perceptual within constraints. For instance, encoders target VMAF scores above 93 to achieve "imperceptibly different" from the source, allowing for efficient without noticeable degradation. This method dynamically adjusts encoding parameters based on content complexity, such as scene changes or motion intensity, to allocate bits more effectively across frames or shots. 's implementation of VMAF-driven per-title and per-shot encoding, rolled out in , resulted in approximately 20% savings while preserving equivalent visual across their catalog. In adaptive streaming protocols like (DASH) and (HLS), VMAF facilitates the creation of bitrate ladders that enable seamless dynamic quality switching. By predicting VMAF scores for different representations, platforms can pre-encode segments to form ladders where each rung maintains consistent perceptual quality across varying network conditions. This integration accounts for network-induced impairments, such as compression artifacts from bitrate reductions, ensuring that switches minimize perceived quality drops. Content-driven VMAF estimation methods further refine these ladders by analyzing spatio-temporal features, improving bandwidth efficiency in real-time delivery scenarios. VMAF has seen widespread industry adoption for quality assurance in video streaming workflows. AWS Elemental incorporates VMAF into its MediaConvert service to evaluate and report per-frame quality metrics during encoding, helping operators verify stream integrity. Similarly, Ericsson employs VMAF in its mobile QoE assessments to benchmark video encoding quality across resolutions in network environments. These tools have also contributed to the development of efficient codecs like , where VMAF evaluations demonstrated up to 30% bitrate reductions compared to HEVC at equivalent quality levels, accelerating AV1's integration into streaming pipelines. In applications, VMAF enables real-time quality assessment, with serverless implementations processing frames to monitor and adjust outputs dynamically for broadcast events. Software libraries like libvmaf support these deployments by providing the computational backbone for VMAF calculations in production encoders.

Software and Hardware Tools

The core implementation of Video Multimethod Assessment Fusion (VMAF) is provided by libvmaf, a stand-alone C library developed and maintained by . This library enables developers to integrate VMAF into custom applications for perceptual video quality assessment and supports FFmpeg integration, allowing for efficient batch processing of video pairs to compute scores across multiple files. Python bindings for libvmaf, known as vmaf-python, facilitate scripting and in or workflows. These bindings include command-line tools that simplify score calculation on pairs of reference and distorted videos, supporting features like and logging for detailed per-frame analysis. for VMAF is available through VMAF-CUDA, an open-source release integrated into VMAF version 3.0 in late 2023. This GPU-based implementation leverages for feature extraction and prediction, achieving up to 36.9x faster processing at compared to CPU-based computation and up to 4.4x higher throughput in FFmpeg workflows. It is fully compatible with FFmpeg version 6.1 for seamless deployment in video processing pipelines. Extensions to VMAF include support for high dynamic range (HDR) content through HDR-VMAF, a format-agnostic model developed by that measures perceptual quality using BT.2020 for wide color gamut handling. For 360-degree (360VR) videos, VMAF applies effectively via spherical-to-planar projections, enabling quality assessment without core algorithmic modifications, as validated in subjective experiments. Third-party tools, such as the VapourSynth-VMAF plugin, extend accessibility by integrating VMAF computation directly into VapourSynth scripting environments for video filtering and analysis.

Limitations and Extensions

Known Challenges

Video Multimethod Assessment Fusion (VMAF) is primarily designed as a full-reference (FR) perceptual video quality metric, which restricts its effectiveness in no-reference (NR) scenarios where a high-quality reference video is unavailable, such as in or without originals. It also faces challenges with extreme distortions like severe , where precise spatial and temporal alignment between the reference and distorted videos is often compromised, leading to inaccurate predictions. These scope limitations highlight VMAF's optimization for controlled artifacts rather than real-world errors. VMAF's performance shows content dependencies, with reduced accuracy on synthetic or animated videos compared to natural content; for instance, its Spearman's rank order (SROCC) drops to approximately 0.78 on high-frame-rate (HFR) clips versus higher values around 0.95 on standard natural videos. Similarly, it exhibits biases in low-resolution videos, where scaling artifacts are not fully captured, and on gaming or cloud-rendered content with unique statistical properties, achieving only moderate correlations (Pearson linear of 0.715) due to unfamiliar types like ghosting or flickering. These issues stem from VMAF's training primarily on natural video datasets from subjective lab experiments, which do not encompass the full diversity of perceptual qualities. Computationally, VMAF demands significant resources for processing long videos, as it aggregates multiple elementary features across frames without inherent acceleration, resulting in high for applications. It is particularly sensitive to misalignment errors between reference and test videos, which can degrade scores even for minor shifts, and performs best on short clips (a few seconds) while failing to account for long-term perceptual effects like or rebuffering events. Additionally, the absence of features in its core model limits handling of color-related distortions. Ethically, reliance on VMAF for encoding optimization raises concerns about over-optimization, where algorithms may introduce unnatural artifacts—such as excessive blurring or temporal inconsistencies—to inflate scores, potentially compromising overall viewer experience despite high metric values. Ongoing extensions, such as FUNQUE, aim to mitigate some computational and scope issues while preserving correlation with human judgments.

Recent Developments

Recent advancements in Video Multimethod Assessment Fusion (VMAF) from 2023 to 2025 have focused on integrating to refine feature extraction processes and boost computational efficiency. A notable update is the development of VMAF-E by MainConcept, which employs a architecture to accelerate quality predictions, delivering up to 10x faster performance compared to traditional VMAF implementations on CPU hardware while maintaining perceptual accuracy. This enhancement addresses bottlenecks in applications by leveraging for rapid feature computation without compromising correlation with human judgments. Additionally, a 2024 study proposed a deep -based no-reference VMAF for blind video quality assessment. Extended models have emerged to adapt VMAF for specialized scenarios, including no-reference evaluations. The VMAF No Enhancement Gain (NEG) variant, refined in recent iterations, prevents artificial score inflation from preprocessing enhancements, enabling fairer assessments in codec comparisons and approximating no-reference utility by focusing on inherent distortions. For immersive media, adaptations have validated VMAF's efficacy in augmented reality (AR) and virtual reality (VR) environments; a 2024 study on the capacity of 5G networks to support 360-degree VR content utilized VMAF to assess video quality, noting its adequate performance for such immersive applications. Standardization efforts have increasingly incorporated VMAF into evaluations for emerging codecs. In MPEG testing for (), VMAF has been widely adopted to benchmark , showing superior correlations over legacy metrics like PSNR in high-resolution scenarios. Community-driven contributions via open-source repositories have extended VMAF's accessibility for niche use cases. Forks and implementations, such as the reimplementation, facilitate integration into pipelines for mobile devices (e.g., via the built-in phone model) and low-latency environments. NVIDIA's VMAF-CUDA extension further optimizes for GPU acceleration, reducing latency in processing. In 2025, additional research examined VMAF's effectiveness with neural codecs, finding it reliable in some cases, while studies on adversarial attacks revealed vulnerabilities that could inflate scores unperceived by humans. Persistent challenges, such as limitations in fully no-reference scenarios, continue to drive these innovations.

References

  1. [1]
  2. [2]
    VMAF: The Journey Continues - Netflix TechBlog
    Oct 25, 2018 · Video Multi-method Assessment Fusion, or VMAF for short, is a video quality metric that combines human vision modeling with machine learning.
  3. [3]
    Netflix/vmaf: Perceptual video quality assessment based on multi ...
    VMAF is an Emmy-winning perceptual video quality assessment algorithm developed by Netflix. This software package includes a stand-alone C library libvmaf ...Missing: original | Show results with:original
  4. [4]
    Video Quality Assessment - an overview | ScienceDirect Topics
    Objective assessment methods predict video quality algorithmically, without human input, and are classified as Full-Reference (FR), Reduced-Reference (RR), and ...
  5. [5]
    [PDF] Objective Video Quality Assessment
    This is referred to as reduced-reference (RR) image and video quality assessment. Currently, the most widely used FR objective image and video distortion/ ...
  6. [6]
    [PDF] Mean Squared Error: Love It or Leave It?
    For more than 50 years, the mean- squared error (MSE) has been the dominant quantitative performance metric in the field of signal process-.Missing: origin | Show results with:origin
  7. [7]
    Tradeoffs Between Bit-rate and Video Quality - ITS
    Companies mostly depend on ad-hoc quality assessments to choose the optimal tradeoff between bandwidth and quality of experience (QoE).
  8. [8]
  9. [9]
    Issues in vision modeling for perceptual video quality assessment
    Lossy compression algorithms used in digital video systems produce artifacts whose visibility strongly depends on the actual image content.
  10. [10]
    [PDF] Visual Perception and Quality Assessment
    This chapter is concerned with the algorithmic evaluation of quality of an image or video, which is referred to as objective quality assessment. What makes this ...<|control11|><|separator|>
  11. [11]
    State-of-the-art image and video quality assessment with a metric ...
    Next, the Euclidean distance of the images in the perceptual domain is calculated by computing the root-mean-squared error of the INRF-transformed images.
  12. [12]
    Image quality assessment: from error visibility to structural similarity
    We introduce an alternative complementary framework for quality assessment based on the degradation of structural information.
  13. [13]
    Perceptual Visual Quality Assessment: Principles, Methods ... - arXiv
    Mar 1, 2025 · Traditional VQA methods often adapt IQA techniques, such as PSNR and SSIM, to video content by applying frame-wise assessments and aggregating ...
  14. [14]
    (PDF) Review of objective video quality metrics and performance ...
    Aug 9, 2025 · In this paper, we give a classification and a short review of objective VQA metrics, with a focus on the full reference metrics.Missing: history | Show results with:history
  15. [15]
    Professor Kuo Received Technology and Engineering Emmy Award
    Feb 1, 2021 · We developed a new video quality assessment method called VMAF (Video Multimethod Assessment Fusion). VMAF is used by Netflix not only for video ...Missing: initial timeline
  16. [16]
    Toward a Better Quality Metric for the Video Community
    Dec 6, 2020 · VMAF is a video quality metric that Netflix jointly developed with a number of university collaborators and open-sourced on Github.Missing: original | Show results with:original
  17. [17]
    Toward A Practical Perceptual Video Quality Metric - Netflix TechBlog
    Jun 6, 2016 · Our method, Video Multimethod Assessment Fusion (VMAF), seeks to reflect the viewer's perception of our streaming quality.
  18. [18]
    Releases · Netflix/vmaf - GitHub
    Dec 7, 2023 · vmaf. 1.42 MB Dec 7, 2023 · vmaf.exe. 3.8 MB Dec 7, 2023 · Source code (zip). Dec 7, 2023 · Source code (tar.gz). Dec 7, 2023.Missing: timeline | Show results with:timeline<|separator|>
  19. [19]
    VMAF: A Netflix Video Quality Metric - Dr. Harilaos G. Koumaras
    Jun 25, 2016 · The proposed metric used by NetFlix is called Video Multimethod Assessment Fusion (VMAF) and seeks to reflect the viewer's perception of the ...Missing: initial | Show results with:initial
  20. [20]
    The VMAF Phone Model and Saving on Streaming to Mobile Viewers
    Feb 26, 2019 · “Invoking the phone model will generate VMAF scores higher than in the regular model, which is more suitable for laptop, TV, etc. viewing ...
  21. [21]
    Netflix reveals HDR-VMAF solution - CSI Magazine
    Nov 30, 2023 · The HDR-VMAF solution Netflix has developed is designed to be format-agnostic, capable of measuring perceptual quality of HDR video signals ...Missing: timeline | Show results with:timeline
  22. [22]
    Video Multimethod Assessment Fusion (VMAF) on 360VR Contents
    This paper describes the process carried out to validate the application of one of the most robust and influential video quality metrics, Video Multimethod ...
  23. [23]
    [PDF] sion for Full Reference Video Quality Assessment - arXiv
    Apr 13, 2018 · Netflix recently announced the Video Multimethod Fusion Approach (VMAF), which is an open-source, learning-based FR VQA model. VMAF combines ...Missing: research | Show results with:research
  24. [24]
  25. [25]
    [PDF] VMAF Reproducibility: Validating a Perceptual Practical Video ...
    This paper briefly surveys existing video quality metrics and then presents results of the new Video Multi-Method Assessment Fusion (VMAF) metric [1] proposed ...
  26. [26]
    Calculating Video Quality Using NVIDIA GPUs and VMAF-CUDA
    Mar 12, 2024 · This post showcases how CUDA-accelerated VMAF (VMAF-CUDA) enables VMAF scores to be calculated on NVIDIA GPUs.Vmaf Implementation On Gpu... · Vmaf Latency Improvements · Ffmpeg Performance...
  27. [27]
  28. [28]
    AV1 @ Scale: Film Grain Synthesis, The Awakening - Netflix TechBlog
    Jul 2, 2025 · Reduced Rebuffering: 10% fewer rebuffers and a 5% reduction in rebuffer duration resulting from the lower bitrate. Faster Start Play: Start play ...
  29. [29]
    Interpretation of objective video quality metrics | Elecard
    Oct 26, 2022 · PSNR, SSIM, and VMAF what are those and how are they involved in video quality. ... Toward A Practical Perceptual Video Quality Metric, Netflix ...
  30. [30]
    Optimized shot-based encodes: Now Streaming! - Netflix TechBlog
    Mar 9, 2018 · Assuming stable network conditions, this is the average VMAF quality you will receive on the Netflix service at that particular video bandwidth.Implementing Dynamic... · Compression Performance · Get Netflix Technology...
  31. [31]
    Formulate the Optimal Encoding Ladder with VMAF
    Sep 21, 2021 · The top rung of your ladder should be the lowest data rate that achieves a VMAF score of between 93-95. The 93 score comes from this white paper ...
  32. [32]
    VMAF-based Bitrate Ladder Estimation for Adaptive Streaming - arXiv
    Mar 12, 2021 · In this paper, we consider a content-driven approach for estimating the bitrate ladder, based on spatio-temporal features extracted from the uncompressed ...
  33. [33]
    Per-frame metric reports in AWS Elemental MediaConvert
    VMAF can be a good indicator of viewer satisfaction for streaming video quality. Values range from 0 to 100, with higher values indicating better quality. QVBR ...Missing: Ericsson | Show results with:Ericsson
  34. [34]
    Mobile QoE: Network readiness for new services - Ericsson
    Since the test video is known, and pre-encoded, VMAF could be used offline to assess the video encoding quality for the resolutions utilized, while P.1203.3 was ...
  35. [35]
    Bringing AV1 Streaming to Netflix Members' TVs - Netflix TechBlog
    Nov 9, 2021 · VMAF is a video quality metric developed and open-sourced by Netflix, and is highly correlated to visual quality. Being more efficient, AV1 ...Enabling Netflix Av1... · Challenge 1: What Is The... · Acknowledgments
  36. [36]
    Quality metrics for live streaming video | AWS for M&E Blog
    Mar 6, 2024 · This blog post outlines a solution crafted through an event-driven serverless mechanism to meticulously evaluate each frame of video content.
  37. [37]
  38. [38]
  39. [39]
    Video Multimethod Assessment Fusion (VMAF) on 360VR contents
    Jan 18, 2019 · Abstract page for arXiv paper 1901.06279: Video Multimethod Assessment Fusion (VMAF) on 360VR contents.
  40. [40]
    HomeOfVapourSynthEvolution/VapourSynth-VMAF - GitHub
    VMAF Video Multi-Method Assessment Fusion, based on https://github.com/Netflix/vmaf. Additionally, vsvmafxml can be used to store per-frame score from XML log ...
  41. [41]
    [PDF] VMAF Compression Ratings that Disregard Camera Impairments ...
    Sep 6, 2022 · The main limitation of this approach is that FR metrics cannot detect impairments that are in the original media recording. (i.e., impairments ...
  42. [42]
    Practical Evaluation of VMAF Perceptual Video Quality for WebRTC ...
    Toward a practical perceptual video quality metric. The Netflix Tech Blog. Available online: https://medium.com/netflix-techblog/toward-a-practical ...
  43. [43]
    [PDF] High Frame Rate Video Quality: VMAF & Entropic Differences
    Perceptual VQA models are an essential component of many video streaming applications such as YouTube, Netflix,. Hulu etc. where video quality is objectively ...<|control11|><|separator|>
  44. [44]
    Perceptual video quality assessment: the journey continues! - Frontiers
    Toward a practical perceptual video quality metric. Netflix Tech Blog 6. doi:10.1117/12.320105. CrossRef Full Text | Google Scholar. Li, Z., and Bampis, C. G. ...
  45. [45]
    vScore and VMAF-E, IBC 2025 - MainConcept news
    Sep 4, 2025 · Introducing VMAF-E: Real-Time, AI-Powered, and Live-Ready. VMAF-E builds on Netflix's original VMAF (Video Multi-Method Assessment Fusion) ...Mainconcept Releases Codec... · Smarter Quality Analysis... · Introducing Vmaf-E...Missing: 360VR light USC
  46. [46]
    No-Reference VMAF: A Deep Neural Network-Based Approach to ...
    We demonstrate that NR-VMAF outperforms current state-of-the-art NR metrics while achieving a prediction accuracy that is comparable to VMAF and other FR ...<|separator|>
  47. [47]
    Toward a Better Quality Metric for the Video Community
    Dec 7, 2020 · VMAF is a video quality metric that Netflix jointly developed with a number of university collaborators and open-sourced on Github.Missing: timeline | Show results with:timeline
  48. [48]
  49. [49]
    Performance Comparison of VVC, AV1, HEVC, and AVC for High ...
    ITU-T. Recommendation ITU-T P.910—Subjective Video Quality Assessment Methods for Multimedia Applications. 2023. Available online: https://www.itu.int/rec/T-REC ...