Evaluating Enterprise MT: Why BLEU is not enough and how COMET improves quality assessment
Enterprise buyers still see BLEU scores in RFPs, benchmarks, and vendor decks as a universal measure of “translation quality.” Yet BLEU was never designed to capture meaning, domain adequacy, or business impact; it measures string overlap with a...


