Fact-checked by Grok 2 weeks ago
References
-
[1]
[PDF] BLEU: a Method for Automatic Evaluation of Machine Translation1So we call our method the bilingual evaluation understudy,. BLEU. the evaluation bottleneck. Developers would bene- fit from an inexpensive automatic ...
-
[2]
The AI paper at the foundations of multilingual NLP - IBM Researchbut it also happens to be the French translation of ...
-
[3]
A Structured Review of the Validity of BLEU - MIT Press DirectThe BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems, especially in machine translation and natural language generation.
-
[4]
[PDF] Bleu: a method for automatic evaluation of machine translationSep 17, 2001 · Bleu uses the average logarithm with uni- form weights, which is equivalent to using the geometric mean of the modified n-gram preci- sions. 3, ...
-
[5]
[PDF] Evaluating the Output of Machine Translation SystemsSep 19, 2011 · • 2002: IBM's BLEU Metric comes out. • 2002: NIST starts MT Eval series under DARPA TIDES program, using BLEU as the official metric. • 2003 ...
-
[6]
Moses: Open Source Toolkit for Statistical Machine Translation.Jun 23, 2007 · We have used BLEU [3] for a quick evaluation of progress in translation improvement. A freely available toolkit for training and decoding of ...
-
[7]
[PDF] Quality expectations of machine translation - arXivHere, and more commonly used nowadays in the field, this score is multiplied by 100 ... BLEU score in testing, one might be better off tuning on (say) ...
-
[8]
Computing and reporting BLEU scores - Mathias MüllerDec 14, 2020 · The main goal of computing BLEU scores is not accurate tokenization, but automatic MT evaluation that correlates well with human judgement.Missing: details | Show results with:details
-
[9]
mjpost/sacrebleu: Reference BLEU implementation that ... - GitHubIt properly computes scores on detokenized outputs, using WMT (Conference on Machine Translation) standard tokenization; It produces the same values as the ...
-
[10]
nltk.translate.bleu_score - NLTKThis function finds the reference that is the closest length to the hypothesis. The closest reference length is referred to as *r* variable.Missing: software sacrebleu
-
[11]
Moses/SupportTools - Statmt.orgFeb 6, 2016 · Scoring translations with BLEU. A simple BLEU scoring tool is the script multi-bleu.perl : multi-bleu.perl reference < mt-output. Reference ...
-
[12]
Understanding MT Quality: BLEU Scores - ModernMT BlogOct 25, 2021 · BLEU is a “quality metric” score for an MT system that is attempting to measure the correspondence between a machine translation output and that of a human.
-
[13]
[PDF] ACL Anthology ID W18-6319 / revision 9 / 18 Sep 2025Moses' multi-bleu.perl cannot be used because it requires user-supplied preprocessing. The same is true of another evaluation framework, MultEval. (Clark ...
-
[14]
None### Summary of BLEU's Correlation with Human Judgments in WMT22 Metrics Task
-
[15]
[PDF] NIST 2005 Machine Translation Evaluation Official ResultsAug 1, 2005 · It has been found to generally rank systems in the same order as human assessments. BLEU, however, does not have the power to distinguish subtle ...Missing: matches 81%
-
[16]
[PDF] Tangled up in BLEU: Reevaluating the Evaluation of Automatic ...In this case, the Pearson correlation can over-estimate metric reliability, irrespective of the relationship between human and metric scores of other systems.
-
[17]
BLEU Meets COMET: Combining Lexical and Neural Metrics ... - arXivMay 30, 2023 · This paper combines neural (COMET, BLEURT) and traditional (BLEU) machine translation evaluation metrics to improve robustness, especially for ...
-
[18]
[PDF] Re-evaluating the Role of BLEU in Machine Translation ResearchWe show that an improved Bleu score is nei- ther necessary nor sufficient for achieving an actual improvement in translation qual- ity, and give two significant ...
-
[19]
AdaBLEU: A Modified BLEU Score for Morphologically Rich ...Aug 23, 2021 · She is pursuing her PhD in neural machine translation and evaluation for morphologically rich and low resource languages from the National ...Missing: poor | Show results with:poor
-
[20]
[PDF] Assessing Evaluation Metrics for Speech-to-Speech TranslationOur findings suggest BLEU is not appropriate for evaluating speech-to-speech translation for high-resource languages or non-standardized dialects, and while ...
-
[21]
A Call for Clarity in Reporting BLEU Scores - ACL AnthologyThe main culprit is different tokenization and normalization schemes applied to the reference. Pointing to the success of the parsing community, I suggest ...<|separator|>
-
[22]
[PDF] A Call for Clarity in Reporting BLEU Scores - Statmt.orgThis is of course not to claim there are no problems with BLEU. Its weaknesses abound, and much has been written about them (cf. Callison-. Burch et al. (2006); ...
-
[23]
iBLEU: Interactively Debugging and Scoring Statistical Machine ...Machine Translation (MT) systems are evaluated and debugged using the BLEU automated metric. However, the current community implementation of BLEU is not ...
-
[24]
iBLEU: Interactively Debugging and Scoring Statistical Machine ...PDF | Machine Translation (MT) systems are evaluated and debugged using the BLEU automated metric. However, the current community implementation of BLEU.
-
[25]
[PDF] Comparative Study Between METEOR and BLEU Methods of MTweights are used for translation Adequacy and Fluency. The S- score helps to weigh Content words differently from common words. DARPA-94 MT French-English ...
-
[26]
[PDF] Part 5: Machine Translation EvaluationEquation (5.3) shows the calculation of the BLEU brevity penalty, where is the length of the candidate translation, and is the length of the reference ...
-
[27]
Case-Sensitive Neural Machine Translation - PMC - NIHIn this paper, we introduce two types of case-sensitive neural machine translation (NMT) approaches to alleviate the above problems.
-
[28]
[PDF] A Systematic Comparison of Smoothing Techniques for Sentence ...Abstract. BLEU is the de facto standard machine translation (MT) evaluation metric. How- ever, because BLEU computes a geo- metric mean of n-gram precisions ...