An Evaluation of Terminology-Augmented Generation (TAG) and Various Terminology Formats for the Translation Use Case

Abstract

In this study, we demonstrate the effectiveness of Terminology-Augmented Generation (TAG) for Large Language Model (LLM)-based Machine Translation (MT) and analyze the impact of terminology formats for that use case. By conducting empirical evaluations using OpenAI’s GPT-4o, GPT-4o-mini, and the open-weights Gemma3:12b and Mistral 7B, Mistral Nemo, and Mistral Large (2411) models, we system- atically explore various established terminology formats (including TBXv3) and compare the results to alternative structured formats and their impact on generation quality. Our findings, on both a preexist- ing test dataset and a dataset created from real-world customer documents, show that TAG with capable LLMs delivers results on-par or better than a fine-tuned NMT baseline, and that specific formatting strategies can improve model accuracy and recall of in-context knowledge, albeit not to the scale we orig- inally expected. Our findings inform the design of terminology integration strategies for LLM-based MT, improving term adherence, domain adequacy, and translation consistency in specialized communication.

Lackner A., Vega-Wilson A., Lang C. (2025) "An Evaluation of Terminology-Augmented Generation (TAG) and Various Terminology Formats for the Translation Use Case ", Journal of Digital Terminology and Lexicography, 1(2), 31-47. DOI: 10.25430/pupj.jdtl.1763379938  
Year of Publication
2025
Journal
Journal of Digital Terminology and Lexicography
Volume
1
Issue Number
2
Start Page
31
Last Page
47
Date Published
11/2025
ISSN Number
3103-3601
Serial Article Number
3
DOI
10.25430/pupj.jdtl.1763379938
Issue
Section
Article