Dictionary Coding User Guide

Copyright © 2002 by SYSTRAN


1. Introduction

1.1. Types of entries

To adapt translations to specific terminological needs, it is possible to:

  • Reserve words that should not be translated: DNT entries (Do Not Translate)
  • Create multilingual entries to modify current translations by giving equivalencies or adding new words.

1.1.1. DNT entries (Do Not Translate)

Do Not Translate (DNT) entries are used to circumvent the translation of company names, proper names, locations, trademarks or any titles or expressions that should not be translated.

Enter the DNT entry in the dictionary as is, without its corresponding meaning. Case-sensitive rules apply to all DNT entries. Enter the word, paying attention to capital letters, accents, etc. Each entry must be on a separate line.

As soon as you have a compound DNT of more than three words, you should put it between quotation marks and try to add external clues like (proper noun) or (location).
See the section: General coding rules.

Example 1. DNT entries

“George Bush” (proper noun) (masculine)
“Virgin Mega Store” (proper noun)
“Los Angeles” (city)
Apple (company name)
Lu (company name)
Alcampo (company name)
Telefónica (company name)
Hoechst (company name)

Example 2. English to French

« Virgin Mega Store is not a virgin mega store »

Dictionary Translated text
“Virgin Mega Store” (proper noun) Virgin Mega Store n’est pas une mémoire mega vierge.

1.1.2. Multilingual entries

You may also want to change the default translation to:

  • Give a technical equivalent for a general word,
  • Define the specific meaning of a word with multiple possible translations,
  • Add words that are not part of SYSTRAN’s standard dictionaries (Not Found Word).

For each entry, enter source=target for bilingual format (multi-target and Microsoft Excel formats are also available; refer to the Dictionary Manager documentation for details). Each entry must be on a separate line.

Example 3. Multilingual entries

a game = un gibier
to play = jouer
a product = un produit

Both simple and compound words can be entered as long as the whole entry can be treated as a single unit. This is akin to a traditional paper dictionary wherein the translation of compound words is given alongside the main entry.

Example 4. Simple and compound multilingual entries

a download store = une boutique en ligne
a drive shaft = un arbre d’entraînement
a watering can = un arrosoir
“all rights reserved” (sentence) = “tous droits réservés” (sentence)

The quality of the translation results depends greatly on the grammatical accuracy of the original document and on the proper use of the basic punctuation and typographic conventions.


1.2. Coding principles

1.2.1. Common sequences

As a general principle, enter the canonical form of your entry (the “simplest” form that is found in a paper dictionary, either single or compound words). This way all entries will be recognized whatever their form, inflected or not. Using a large linguistic thesaurus, the system will be able to recognize the linguistic behaviour of your entry and add implicit information to generate all inflected and conjugated forms.

  • For nouns, enter the singular and nominative forms (for some specific languages) and not the plural form. When an entry is coded as plural, the system only considers the plural inflection.
  • For adjectives, always enter the primary form (singular and masculine). If an adjective is coded as feminine or plural, its other basic forms will not be recognized.
  • For verbs, enter the infinitive (in English with the word to, and not the conjugated form. If a verb is not coded in its procedural form (its most basic form), it will not be recognized.

When an inflected form is entered in the dictionary instead of the canonical form, the system will only translate this inflected form, not the other inflected forms.


1.2.2. Protected sequences

Protected sequences are those words and phrases (fixed expressions) that do not undergo linguistic analysis, but that are accepted “as-is” for the final translation. As a consequence, none of the individual items will be inflected and the sequence will be translated exactly as entered in the dictionary. This is why it is important to keep in your dictionary the original formatting of the entry. If it appears in capital letters in your document, enter it in capital letters in your dictionary.

Protected sequences must be entered between quotation marks and have their grammatical category specified in parenthesis.
See the section: Advanced coding: Forcing the grammatical category.

Example 5. Protected sequences

“if desired” (sentence) = “si lo desea” (sentence)
“all rights reserved” (sentence) = “tous droits réservés” (sentence)
“OTAN” (noun)=”NATO” (noun)
“bi-parting” (adjective)=”à deux battants” (adjective)

The entries will remain invariable unless specified otherwise via additional clues or linguistic information.
See the section: Advanced coding: Forcing the number.

Quotation marks can be used for all or part of a fixed expression. They allow to use special characters that would not be recognized otherwise.

Example 6. Protected sequences with special characters

“R&D” department (noun) = département “R&D” (noun)

Any sequences of less than two letters or more than five words must be written between quotation marks.


1.2.3. Upper-case

The use of capital letters in the dictionary adheres to the same guideline as for canonical form. This means that the entry must be in its native case (in French and German, the first letter of proper nouns is in uppercase). Otherwise the system will interpret the uppercase letters as an additional linguistic clue.

For example, if a word is written in capital letters in the original document to be translated, there is no need to enter it in capital letters in the dictionary (except with regard to protected sequences) since the original format is automatically detected and respected.

Example 7. English to French

« We offer Machine Translation. »

Coding level Dictionary Translated text
Intuitive machine translation = traduction automatique Nous proposons de la Traduction Automatique.

In fact, the use of upper-case letters, in most languages, is a clue for proper nouns and acronyms. Therefore it is recommended that its use be restrained to these situations.


1.2.4. Accentuation

The use of accented characters in the dictionary adheres to the same guidelines as for canonical form. This means that the entry must be correctly accented to be properly recognized and interpreted by the system.


1.3. Coding enhancement: linguistic clues

The SYSTRAN Dictionary Manager offers the possibility to add specific linguistic information (“linguistic clues”) to dictionary entries. Using linguistic clues will greatly improve the linguistic analysis and subsequently the translation. There are two main levels of coding:

  1. Intuitive coding: adding of user-friendly linguistic clues such as determiners or particles. This intuitive coding does not require specific linguistic knowledge.
  2. Advanced coding: adding external information such as the grammatical category, the gender or the context of an entry. This level requires basic to advanced linguistic knowledge.

Note that the two levels are compatible and that they can be used in the same dictionary.


2. Intuitive coding

Intuitive coding is the practice of adding intuitive grammatical clues to an entry in order to provide more information on its nature.

Adding these simple intuitive clues (determiners, particles) will give the system valuable information about the kind of entry you are entering: whether it is a noun, a verb or an expression (sentence), masculine or feminine, singular or plural.

Example 8. Spanish to English dictionary with intuitive coding

una interfaz = an interface
ejecutar = to run
unos tipógrafos = a typeface

2.1. Forcing the grammatical category

When an entry is ambiguous, it is possible to force its grammatical category by adding a determiner (definite or indefinite article) next to it.

Example 9. English to French

« They run the run every week. »

Coding level Dictionary Translated text
Intuitive to run = faire partie de
a run = une course
Ils font partie de la course chaque semaine.

2.2. Forcing the gender

When the SYSTRAN dictionaries contain only the masculine or the feminine form of an entry (or if it is assumed as such), or when an entry is ambiguous, it is possible to force its gender by adding a determiner next to it.

Example 10. English to French

« He left a check mark in the book. »

Coding level Dictionary Translated text
General check mark = coche Il a laissé un coche dans le livre.
Intuitive check mark = une coche Il a laissé une coche dans le livre.

2.3. Forcing the number

When a singular entry needs to be translated by a plural form, it is possible to force its number by adding the plural form in the dictionary.

Example 11. English to Spanish

« His business is prosperous. »

Coding level Dictionary Translated text
General business = negocio Su negocio es próspero.
Intuitive business = negocios Sus negocios son prósperos.

Here, the subject business will be translated into the Spanish plural form negocios and any dependent items will bear the plural inflection (verb, adjectives, determiners).


3. Advanced coding

The features that fall into the realm of advanced coding are the best proof thus far of SYSTRAN’s high capacity for customization. These features allow a higher level of customization in translation, though a user must have a good general understanding of linguistic phenomena in order to act on the inflection of an entry.

Advanced coding allows a higher level of personalization: it is the practice of adding advanced linguistic information (semantic, syntactic, morphological, contextual) on the nature of an entry. These linguistic clues are always enclosed in parenthesis.

Each language has its own set of linguistic clues.

Example 12. English to French dictionary

“Virgin Mega Store” (proper noun)
John (proper noun) (masculine)
Portugal = Portugal (country)
red = rouge (adjective)
check box = coche (feminine)
business = affaire (plural)

3.1. Morphology

3.1.1. Forcing the grammatical category

When the grammatical category of an entry is ambiguous, it is possible to specify it. It must be added in parenthesis next to the entry.

  • For a verbal entry: (verb)
  • For an adjectival entry: (adjective)
  • For an adverbial entry: (sentence)
  • For a nominal entry: (noun)
  • For proper nouns: (proper noun)
  • For an acronym: (acronym)

3.1.2. Forcing the gender

If the SYSTRAN dictionaries contain the masculine form of an entry (or if it is assumed as such), or in cases in which an entry is ambiguous, it is possible to force the feminine form by adding the gender of the entry in parenthesis. Of course entries in the feminine form can also be forced to the masculine form.

This applies to nouns and proper nouns only, by adding (masculine) or (feminine).

Example 13. English to French

« He left a check mark in the book. »

Coding level Dictionary Translated text
General check mark = coche Il a laissé un coche dans le livre.
Intuitive check mark = coche (feminine) Il a laissé une coche dans le livre.

Here, the French word coche appears in the SYSTRAN French monolingual dictionary, but only as a masculine noun (and therefore does not correspond to the English noun check mark

By adding gender information to the user dictionary, using the grammatical clue (feminine), the entry is indicated as feminine regardless of the content of the monolingual dictionary, and it must be inflected as such.


3.1.3. Forcing the number

When a singular entry needs to be translated by a plural form, it is possible to force its number by adding the corresponding grammatical clue in parenthesis.

This applies to nouns only, by adding (singular) or (plural).

Example 14. English to Spanish

« His business is prosperous. »

Coding level Dictionary Translated text
General business = negocio Su negocio es próspero.
Intuitive business = negocio (plural) Sus negocios son prósperos.

Here, the subject business will be translated into the Spanish plural form negocios and any dependent items will bear the plural inflection (verb, adjectives, determiners).


3.1.4. Inflects like

It is possible to inform the system, using this advanced coding feature, of the correct inflection paradigm of an unknown entry by providing another entry that belongs to the same grammatical category and that inflects in the same manner. Thus, the feature helps the system to recognize the inflection pattern and to apply it to the entry. This is done for all grammatical categories by adding the clue (inflects like: XXX).

Example 15. French to English

« Il formate son fichier. »

Coding level Dictionary Translated text
General formater = to format He is formating his file.
Intuitive formater = to format (inflects like: to quit) He is formatting his file.

3.1.5. Plural form

Relevant only in the coding of nouns, this advanced coding feature allows users to force a particular plural form using (plural: XXX).

Also, not only does this feature provide a means for translating a singular source entry into a plural one, it also enables users to indicate the inflection pattern desired for the particular entry. As such, the system will no longer choose the form found in SYSTRAN’s linguistic resources (inflection tables and monolingual dictionary), but the form given by the user.

The plural advanced coding feature is very useful for encoding lexicons of Latin or Greek origin in which the plural form is not always well guessed by the system.

Example 16. English to French

« He has written many interesting film scripts. »

Coding level Dictionary Translated text
General film script = scénario Il a écrit beaucoup de scénarios intéressants.
Intuitive film script = scénario (plural: scénari) Il a écrit beaucoup de scénari intéressants.

3.2. Syntax

3.2.1. Prepositions

A preposition can be linked to nouns, verbs or adjectives. The preposition must be specified for both the source and target entries using (prep: XXX).

If an entry does not require a preposition, it is necessary to add (no preposition).

Example 17. English to French

« He protects his car from the rain. »

Coding level Dictionary Translated text
General to protect = protéger Il protège sa voiture contre la pluie.
Intuitive to protect (prep:from) = protéger (prep:de) Il protège sa voiture de la pluie.

Example 18. French to English

« Le maire fait don de son terrain. »

Coding level Dictionary Translated text
General faire don = to offer The mayor offered of his ground.
Intuitive faire don (prep:de) = to offer (no preposition) The mayor offers his field.

3.2.2. Bracketing

The square brackets meta-characters ([...]) isolate a compound entry within a larger one. This makes the relation between the elements of a compound clearer, thereby improving the translation. They are especially useful in making translations from an English source.

Example 19. English to Spanish

« The technical support hours are available on the web site. »

Coding level Dictionary Translated text
General technical support hour = horario del servicio técnico Las horas técnicas de la ayuda están disponibles en el website.
Intuitive technical support hour = horario del [servicio técnico] Los horarios del servicio técnico están disponibles en el sitio web.

3.3. Semantic

3.3.1. Adding a semantic category

These categories will modify the preposition or determiner that introduces the entry in the translation.

The following semantic categories apply to proper nouns only.

  • (location)
  • (city)
  • (country)
  • (first name)
  • (last name)
  • (product name)
  • (company name)

 

The following semantic categories apply to both nouns and proper nouns.

  • (human)
  • (non human)

Example 20. English to French

« Portugal is a beautiful country. »

Coding level Dictionary Translated text
General Portugal = Portugal Portugal est un beau pays.
Intuitive Portugal = Portugal (country) Le Portugal est un beau pays.

3.4. Context

The translation of polysemic entries can be controlled by defining their semantic and/or syntactic context.

3.4.1. Semantic category

Each noun can be linked to one or more semantic categories. This is accomplished by adding (semcat: XXXX) where XXXX is the name of the category defined by the user (alphabetical uppercase name).

To use such semantic categories the dictionary must contain syntactic context entries that recall the categories.


3.4.2. Syntactic context

Each verb can be linked to specific syntactic contexts. This is accomplished by adding (context: XXXX). XXXX can either be a noun, or a semantic category that must have been previously defined in the dictionary.

Example 21. Semantic category creation and use. English to Spanish

« He saved three files. He saved the records. He saved many repertories. He saved money. He saved energy. »

Dictionary Translated text
file (semcat : FILE)=fichero
record (semcat: FILE)=archivo
repertory (semcat: FILE)=repertorios
money (semcat: RESOURCE)=dinero
energy (semcat: RESOURCE)=energía
to save (context: FILE)=guardar
to save (context: RESOURCE)=ahorrar
Él guardó tres ficheros.
Él guardó los archivos.
Él guardó muchos repertorios.
Él ahorró el dinero.
Él ahorró la energía el dinero.

Example 22. Simple syntactic context. English to Italian

« My soul was saved. The files were saved. »

Dictionary Translated text
to save (context: a soul) = liberare
to save (context: a file) = conservare
La mia anima è stata liberata.
Le lime sono state conservate.

3.5. Expert features

3.5.1. Noun form

By indicating the derived nominal form of a verbal entry, SYSTRAN offers the system the possibility of an alternative translation into nominal form, by using (noun form: XXX) for verbs.

Example 23. English to French

« Using this tool is simple. »

Coding level Dictionary Translated text
Advanced to use = utiliser (noun form: utilisation) L’utilisation de cet outil est simple.

Complete la información que se solicita a continuación para descargar el documento.

*Campos obligatorios

Algunos datos no se han introducido correctamente. Por favor, compruebe los campos resaltados.

Información sobre la empresa
Datos de contacto
Código de seguridad

Introduzca los caracteres que visualiza en la imagen de arriba (sin distinción entre mayúsculas y minúsculas). Haga clic en la imagen si tiene dificultad para ver los caracteres correctamente.