Streamlining Drupal Localisation: Reducing Effort While Improving Consistency with RAG + LLM

Session Room
Room 1 (Amphitheater Pantheon)
Time Slot
Duration
40 min
Speaker(s)
Session track
Coding & Site Building
Experience level
Intermediate

Localising Drupal remains an arduous manual task. While LLMs offer high-quality translations, they often struggle with consistency without domain-specific context. In this talk, I will explain a RAG-based translation pipeline that supplements the LLM with relevant terminology and translation memory, ensuring more accurate and consistent results. I will also touch on how this system can be a basis for improving existing translations.

Prerequisite

Participants should be familiar with Drupal's localisation system (.po files) and the general concept of Large Language Models (LLMs). Basic knowledge of Docker and API-driven workflows will help in understanding the system architecture, but not required.

Outline

Outline

This session introduces a Proof of Concept for semi-automating Drupal string translations using Retrieval-Augmented Generation (RAG) and LLMs, and explains how it can also improve the existing translations.

Manual Translation Fatigue: Translating new strings for Drupal core and modules is a tedious, ongoing process that often leads to community burnout. This project originated from community discussions seeking sustainable and scalable solutions to reduce the burden of manual localisation.

System Architecture and Components: The architecture relies on a vector DB to store and retrieve Drupal-specific context. While gpt-po-translator is used for robust .po file logic, its lack of native glossary support necessitated the creation of a custom RAG Proxy. This middleware intercepts LLM requests to inject relevant translation memory and glossary terms dynamically.

Ongoing Consistency and Maintenance: As localisation is an ongoing process, consistency can diminish over time. This system provides a foundation for reviewing existing strings by extracting common terms and applying a unified glossary to improve the quality of existing translations.

Learning Objectives

By the end of this session, participants will be able to:

  • Understand the challenges of localisation and how a RAG-based translation tool reduces the manual burden.
  • Identify the limitations of using LLMs for string translation and explain how RAG addresses terminology consistency issues.
  • Leverage the tool as a foundation for auditing and aligning existing translations with updated community glossaries.

Educational Track - Drupal in a Day Sponsors

Social Night Sponsors

In-Kind Sponsors

Media Partner Sponsors