Localising Drupal remains an arduous manual task. While LLMs offer high-quality translations, they often struggle with consistency without domain-specific context. In this talk, I will explain a RAG-based translation pipeline that supplements the LLM with relevant terminology and translation memory, ensuring more accurate and consistent results. I will also touch on how this system can be a basis for improving existing translations.
Participants should be familiar with Drupal's localisation system (.po files) and the general concept of Large Language Models (LLMs). Basic knowledge of Docker and API-driven workflows will help in understanding the system architecture, but not required.
Outline
This session introduces a Proof of Concept for semi-automating Drupal string translations using Retrieval-Augmented Generation (RAG) and LLMs, and explains how it can also improve the existing translations.
Manual Translation Fatigue: Translating new strings for Drupal core and modules is a tedious, ongoing process that often leads to community burnout. This project originated from community discussions seeking sustainable and scalable solutions to reduce the burden of manual localisation.
System Architecture and Components: The architecture relies on a vector DB to store and retrieve Drupal-specific context. While gpt-po-translator is used for robust .po file logic, its lack of native glossary support necessitated the creation of a custom RAG Proxy. This middleware intercepts LLM requests to inject relevant translation memory and glossary terms dynamically.
Ongoing Consistency and Maintenance: As localisation is an ongoing process, consistency can diminish over time. This system provides a foundation for reviewing existing strings by extracting common terms and applying a unified glossary to improve the quality of existing translations.
By the end of this session, participants will be able to:
- Understand the challenges of localisation and how a RAG-based translation tool reduces the manual burden.
- Identify the limitations of using LLMs for string translation and explain how RAG addresses terminology consistency issues.
- Leverage the tool as a foundation for auditing and aligning existing translations with updated community glossaries.
