FAQs
Why doesn't your software develop a neural system like Google Translate and all the other machine translation projects these days?
While most machine translation (MT) projects use neural methods, which work well with large parallel corpora, we developed a rule-based system for four key reasons:
No Training Corpora Required: Unlike neural systems that rely on large training datasets, our system doesn't need them. This is crucial for languages with limited or no written literature, where neural methods are unusable.
Basis for Publishable Grammars: Our rule-based system generates grammar rules that accurately describe a language, providing a strong foundation for publishable grammars, unlike neural systems which produce data, not grammatical descriptions. This is especially valuable for language documentation efforts.
What does your system use for source texts? Are you using the Greek and Hebrew texts?
Initially, we used the Friberg Annotated Greek New Testament, the Louw-Nida Greek Lexicon, an annotated Hebrew Old Testament, and the Brown-Driver-Briggs Hebrew Lexicon. However, like all natural languages, Hebrew and Greek lack certain features (e.g., multiple past/future tenses, inclusive/exclusive "we," singular/dual/trial/plural distinctions, proximity degrees). We constantly enriched the annotations in these texts to compensate. Furthermore, complex lexicalized concepts and long, multi-verse sentences in Greek presented challenges, especially for new readers. Instead of maintaining separate transfer grammars for Hebrew, Greek, and English source texts (needed for Bible stories, health information, etc.), we transitioned to a unified semantic representation system. This allows us to create semantic representations for various texts, requiring only *one* transfer grammar per target language. While building these representations is time-intensive, it enables high-quality translations for numerous languages once a book is analyzed.
How does your system translate the Bible? Is it like Google Translate?
Our system is fundamentally different from neural machine translation systems like Google Translate. Because these systems rely on bilingual corpora, they are unsuitable for the many languages needing Bible translations that lack such resources. Our approach is linguistic, consisting of two core components:
Semantic Representations: We meticulously analyze biblical texts and other Christian literature, capturing their meaning in simple words and structures. Consulting various resources, we create a simplified version of each book, noting alternate interpretations where necessary. Crucially, we add detailed linguistic information (e.g., noun Number: Singular, Dual, Trial, Quadrial, Plural, Paucal) to every word, phrase, and clause. This "semantic representation" serves as the source for translation. Though time-consuming, this analysis is done only once per book, enabling translation into numerous languages.
Linguistically Based Natural Language Generator: This program uses target language lexicons and grammars created by linguists and missionaries. Our software executes these rules, handling affixation, constituent ordering, pronoun usage, collocation correction, relativization strategies, and more. Applying this linguistic knowledge to the semantic representations, the software produces initial draft translations that are understandable, grammatically correct, and faithful to the original meaning. These drafts are then refined by native speakers for naturalness and cultural relevance, resulting in high-quality translations in a fraction of the time required for manual translation.
Is your system able to translate the entire Bible?
Our software translates any content for which we have a semantic representation, including any book of the Bible. Ultimately we would like to create semantic representations for the complete Bible, commentaries, Bible study materials, and classic Christian literature, enabling rapid translation of all these resources into new languages.
How does your system handle poetry?
Translators prioritize conveying the meaning of the source text over its form. Poetry, as a form, presents a significant challenge, as replicating rhyme and rhythm across languages is nearly impossible, even for human translators. Hebrew poetry, unlike many other forms, relies primarily on patterns of stress and meaning rather than rhyme or rhythm. Consequently, whether a translation is human- or software-generated, it likely won't replicate the original stress patterns. This is a common issue in meaning-focused translations, including scholarly English versions.
Fortunately, Hebrew poetry's strength lies in its semantics. As Derek Kidner notes, it "loses less than perhaps any other in the process of translation" and "survives transplanting into almost any soil." These semantic patterns, known as parallelisms, connect related thoughts through echoing, building, contrasting, or detailing. ATW translations aim to preserve these parallelisms wherever possible.
Metaphor, another key feature of Hebrew poetry, is also addressed. We preserve metaphors in our semantic representations unless they are culturally specific and unlikely to be understood. In such cases, we might use a simile instead or provide both metaphor and simile options for the translator to choose from. For example, Psalm 23:1 is represented as both "Yahweh is my shepherd" (metaphor) and "Yahweh is like a shepherd. He cares for me" (simile).
How long does it generally take a person to start producing translations using your system, and what skills are necessary?
The time required varies depending on several factors, but we can illustrate the process using Tod Allman's experiences with English, Tagalog, and Ayta Mag-Indi.
First, users must familiarize themselves with the software and semantic representation system (approximately 40-50 hours, with video tutorials available). A background in software development is helpful, as building a language grammar involves creating sequenced rules.
Starting a new language project involves working through a series of ~300 simple sentences ("Grammar Introduction") illustrating various grammatical features. This process took Tod ~40 hours in English (with prior software knowledge). For those unfamiliar with the software, it will take longer. Working with a native speaker significantly speeds up the process. For Tagalog, the Grammar Introduction required ~30 two-hour meetings with a native speaker, followed by 2-3 hours of implementation per meeting, totaling ~150 hours.
Next, a simple story (e.g., about preventing eye infections) is translated. This took ~50 hours for Tagalog (~10 two-hour meetings plus implementation). Because Ayta Mag-Indi is similar to Tagalog, the Grammar Introduction was skipped, and the story translation took only ~30 hours (~7 two-hour meetings).
After this, users are ready to tackle a biblical book, starting with Ruth. The Tagalog translation required ~50 two-hour meetings (~200 hours total). The translation process accelerates significantly after Ruth, as the grammar becomes more complete. Later books like Luke, Esther, Daniel, and Genesis can be translated at a much faster pace (several chapters per two-hour meeting).
Ideally, with semantic representations for the entire Bible, a complete initial draft translation could be generated automatically in a couple of days, followed by native speaker editing for a publishable version—a fraction of the time required for manual translation.