Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

Matthias Sperber; Ondřej Bojar; Barry Haddow; Dávid Javorský; Xutai Ma; Matteo Negri; Jan Niehues; Peter Polák; Elizabeth Salesky; Sudoh Katsuhito; 須藤 克仁; すどう かつひと

doi:10.48550/arXiv.2406.03881

WEKO3

lat lon distance

[[sub_check.contents]]

[[sub_radio.contents]]

Field does not validate

[[sub_attr.contents]]　

インデックスリンク

インデックスツリー

アイテム

Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

http://hdl.handle.net/10935/0002006124

名前 / ファイル	ライセンス	アクション
https://arxiv.org/pdf/2406.03881

Item type

default_学術雑誌論文 / Journal Article(1)

タイトル

Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

言語

eng

キーワード

言語

主題Scheme

Other

主題

Human evaluation

キーワード

言語

主題Scheme

Other

主題

speech translation

キーワード

言語

主題Scheme

Other

主題

evaluation metrics

資源タイプ

資源タイプ識別子

http://purl.org/coar/resource_type/c_6501

資源タイプ

journal article

アクセス権

metadata only access

アクセス権URI

http://purl.org/coar/access_right/c_14cb

著者

Matthias Sperber
Ondřej Bojar
Barry Haddow
Dávid Javorský
Xutai Ma
Matteo Negri
Jan Niehues
Peter Polák
Elizabeth Salesky
須藤克仁

KAKEN2 1000000396152

	Sudoh Katsuhito
ja	須藤克仁
ja-Kana	すどうかつひと

Search repository

抄録

内容記述タイプ

Abstract

内容記述

Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior work exists on the topic of human evaluation for speech translation, which adds additional challenges such as noisy data and segmentation mismatches. We take first steps to fill this gap by conducting a comprehensive human evaluation of the results of several shared tasks from the last International Workshop on Spoken Language Translation (IWSLT 2023). We propose an effective evaluation strategy based on automatic resegmentation and direct assessment with segment context. Our analysis revealed that: 1) the proposed evaluation strategy is robust and scores well-correlated with other types of human judgements; 2) automatic metrics are usually, but not always, well-correlated with direct assessment scores; and 3) COMET as a slightly stronger automatic metric than chrF, despite the segmentation noise introduced by the resegmentation step systems. We release the collected human-annotated data in order to encourage further investigation.

言語

書誌情報

en : Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

p. 6484-6485, 発行日 2024-05

出版者

ELRA and ICCL

言語

DOI

識別子タイプ

DOI

Versions

Ver.1

2025-02-21 02:17:31.994644

Show All versions

Cite as

エクスポート

OAI-PMH

JPCOAR 2.0
JPCOAR 1.0
DublinCore
DDI

Other Formats

JSON
BIBTEX

インデックスリンク

インデックスツリー

アイテム

Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

× Matthias Sperber

× Ondřej Bojar

× Barry Haddow

× Dávid Javorský

× Xutai Ma

× Matteo Negri

× Jan Niehues

× Peter Polák

× Elizabeth Salesky

× 須藤克仁

Versions

Share

Cite as

エクスポート

インデックスリンク

インデックスツリー

アイテム

Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation

× Matthias Sperber

× Ondřej Bojar

× Barry Haddow

× Dávid Javorský

× Xutai Ma

× Matteo Negri

× Jan Niehues

× Peter Polák

× Elizabeth Salesky

× 須藤 克仁

Versions

Share

Cite as

エクスポート

× 須藤克仁