![]() |
Deutscher Text | |
| Support | Tech Background | |
| Screen Shots |
| HOW TO USE SYSTRAN
The following description postulates the reader is already familiar with the basis of automatic translation. For more information about basic concepts and principles quoted in reference, consult Arnold et Al. (1994), Hutchins & Somers (1992) or Vasconcellos et Al. (1993). Thanks to SYSTRAN S.A. for the kind permission to publish this description. |
||||||||
|
|
||||||||
|
|
||||||||
| OVERVIEW | ||||||||
| SYSTRAN, famous for its past as service supplier
for the government and industry, has a flexible organization which allows
to develop at the rhythm of technologic evolution and emerging ideas in
the field of computer linguistic.
Without losing benefit from hundred of people / years invested since 1968 in the development of linguistic dictionaries and rules for its impressive choice of language pairs, SYSTRAN has learned to evolve successfully towards a sophisticated system of transfer type automatic translation. |
||||||||
| Multisource / multitarget approach | ||||||||
| Starting with the English - French system in 1974,
SYSTRAN
then seemed to become multitarget. The multitarget languages can
be linked to one single analysis unit because the analysis is dedicated
exclusively to data processing coming from the source text; in other words,
no information concerning the target language is processed during the analysis
stage.
In 1987, the SYSTRAN unit set, its consistency and the saving of analysis units were increased by combining, for Romanic languages, most of separated functions in the multisource analysis program. The development of a second series of separated analysis functions started for the altaique (from a region in Russia) languages, the Japanese and Korean. The fact SYSTRAN was a multitarget system is important for the set of units and the maintenance of numerous language pairs. For a specific language source, it means the development work on dictionaries and linguistic rules must be done only once. It will be applied to all synthesis units of target languages by the intermediate of a transferring unit of language pairs. When the analysis of the source text is finished, the data represented by symbols are transmitted to the transferring unit. Then, it will apply a series of rules which give the possibility to translate in different target languages. The "transferring" element is a characteristic of the Automatic Translation system architecture of transferring type. The final stage in the automatic translation process is the synthesis. The unit products the text in the target language (s). Theoretically, the number of target languages is unlimited. The three elements Analysis, Transfer and Synthesis are described more precisely below. On the whole, SYSTRAN has developed 11 units of source languages, which leads to a total of 29 language pairs |
||||||||
| Sources of knowledge | ||||||||
| The ordinary role of any automatic translation system is
to collect and use knowledge. SYSTRAN has two sources of knowledge : its
electronic dictionaries and the linguistic rules which interact with them.
The dictionaries contain information about the way each specific word acts,
while linguistic rules are linked to the syntax of a language or sub-language
and to semantic relations . Dictionaries contain many ascending rules on
specific requirements and preferences of words, while linguistic rules
work from top to bottom to establish syntactic and semantic links.
The important quantity of linguistic knowledge contained in dictionaries and basis of rules made it possible for SYSTRAN to stay in the first row of Automatic Translation industry since more than a quarter century. |
||||||||
| An independent composition of the language | ||||||||
| The multitarget approach of SYSTRAN can easily be extended and new pairs of languages can quickly be developed with basic characteristics put together in each linguistic unit of SYSTRAN: architecture, dictionary coding, its symbol representation system, as well as "recipes" for the analysis process, the transfer and synthesis are independent to the processed language. The coherence gives to SYSTRAN mechanism the power and efficiency for extraction of linguistic data. | ||||||||
| Reliability | ||||||||
| SYSTRAN staff always committed itself to develop
robust systems able to process a great quantities of general texts which
were not planned, from the start, for automatic translation.When SYSTRAN
is used for this type of work, it can happen that the source text was badly
organized, personalized (non standard) or even thought distorted in some
way.
In the case when analysis fails, localized ascending rules of SYSTRAN always gives the possibility to produce a translation. Even elements and expressions of an uncompleted sentence can be analyzed and synthesized with success. When a word from a source text can not be found in the STEM dictionary (dictionary of simple words), the first step is to search a different spelling, including variants on accents. If the corresponding word is still not found, then the system tries to determine the function of the word from its morphology and the direct context. SYSTRAN also has a program to find mistakes which can be activated at the end of the analysis unit. When the analysis fails, an indicator sends a signal to the transferring module to make sure that the translation process will not continue, which would complicate the problem even more. |
||||||||
| Document type | ||||||||
| While SYSTRAN priority is to process texts of universal
usage, there are types of texts which have specificity in lexical usage
and grammatical conventions. It means that a series of linguistic rules
can not adapt itself to all kinds of documents.
The document has to be defined by the user at the time of the execution for an entire text or for extracts. The following types of documents are available : resume, business correspondence, journalism, certificate, user handbook, meeting reports, word lists, prose, spoken and colloquial language. This option is implemented by shunting at different analysis points
which, punctually, adapt the rules to characteristics from the various
types of texts. Then, during the analysis, some stylistic choices are executed.
|
||||||||
|
|
||||||||
| ELECTRONIC DICTIONARIES | ||||||||
| SYSTRAN dictionaries, rich and coded in an entire
way, are essential to make a translation of quality. There is, for each
source language, two dictionaries: the dictionary of simple words
and the one for expressions.
Most source language dictionaries are multitargets. As written before, the 11 units of source languages can be combined and generate 29 linguistic pairs. On the whole, dictionaries contain a total of more than 2,3 million
entirely
coded words and expressions.
The differences related to those fields are processed by identification of codes included in dictionaries of source languages. Those work by pairs with different possible translations in the target language according to the field. The system selects a specific translation according to the specialized dictionary selected by the user at the moment of the translation execution. It is possible to select up to four specialized dictionaries, classified in preference order. |
||||||||
| Dictionary of simple words | ||||||||
| It contains simple words (terminology and roots). Each
word is accompanied by broad coded information, on its morphology, syntactic
behavior, possible functions in the case of homograph, the roles and semantic
attributes and its relations with other concepts based on semantic taxonomy
of 500 categories.
Translations in target language are aimed to all languages that the
developer will chose to include. The allocated codes indicates the part
of speech, morphology, syntactic behavior and to prepositional governing
way.
For each polysemic word (homograph in the same part of speech), many translations can be proposed according to the field or the word usage (ex.: animate / inanimate usage for words, transitive / intransitive or reflexive / non reflexive for verbs). |
||||||||
| Dictionary of expressions | ||||||||
| The dictionary of expressions can include many types of
inputs, classified as below according to their complexity
The expression of syntagmatic substitution enables syntagmatic rigid form, prepositional or adverbial locutions to be merged into one single pseudo-stem which is then coded in the stem dictionary as a single word. It will be analyzed as a single mark. The collocation allocates a simple signification to an expression which elements are analyzed and rigid and is frequent for technical noun phrase. The conditional expression indicates conditions for which one or many words need a specific target translation. Those conditions can be any syntactic criteria (including syntactic features) or semantic attribute, or any semantic relation defined by SYSTRAN. Those rules can be more or less complicated They come into action at the time of the transfer to select the translation and enable to ameliorate other operations (syntactic rearrangement, preposition management, determiners, tense) answering for the best to the requirement. of the target language. The conducting syntactic expression applies specific rules to a word during the analysis process. It is specifically used to removed ambiguities linked to polygamy or to syntactic use. The whole information in the dictionary of source language can be modified, even semantic characteristics. In the same way, rules and information can be added. The conducting syntactic expression can appear at any time of the analysis. The homographic expression removes homographic ambiguities and attributes to a single word the part of speech which suits it. To obtain more details about the structure of SYSTRAN dictionaries and
types of expressions, consult Wheeler (1983, 1987).
|
||||||||
|
|
||||||||
| DIFFERENT STEPS | ||||||||
| Management software: data processing (preprocessing). | ||||||||
| SYSTRAN input unit has filters for a wide range of word
processing and desk-top publishing (DTP). The layout codes are separated
to the text before being transmitted to the translation and then kept in
stock to be reattached later.
The preprocessing converts the entered text in SYSTRAN compatible layout. The entered text is processed sentence by sentence. The sub-program for consulting the dictionary executes a morphological analysis and identified capital letters, punctuation and word cuts. When the consultation is over, the input unit then attributes a part of speech to each word not found in the dictionary, in the basis of its morphology and immediate context. |
||||||||
| Analysis element | ||||||||
| The analysis unit studies methodically the sentence, identifying
by steps, function and correct sense of each word, expression and proposition
by means of a series of analysis programs
Each program makes choices or gets conclusions on specific type of syntactic or semantic phenomenon. For example, it can be a question of ambiguities resolution and basic syntactic relations, preposition management, semantic links, proposition types, coordination structures,... (See Wheeler, 1987 for a more detailed description). Each program adds new data to the accumulated information on the sentence. After the acquired knowledge, it is saved in an analysis area with symbol shape. It is important to take down that information accumulated throughout the analysis is linked to source language. One of the roles of analysis element is to enter and save some information about the subject and predicate of the sentence to refer to it subsequently. On top of the syntactic analysis, the following semantic relations are identified: predicate - agent, predicate - subject, modifier - word - head. Those functions are used to complete the syntactic information by linking the elements between each other. At the same time, SYSTRAN semantic taxonomy (500 categories) gives information about characteristics and relations of things, states, actions and qualities, useful to decide the behavior of words or objects if you which they represent. The different semantic categories, represented by "tags" (labels), can be coded by words or expressions either in the dictionary of simple words or by a general linguistic rule. Taxonomy is structured around 6 hierarchical trees. In general, lower branches inherit the whole properties of superior branches, but transmission can be stopped if necessary. |
||||||||
| Transfer element | ||||||||
| One of the most important functions of transferring unit
is to process grammatical differences between languages. SYSTRAN
can also modify or restructure propositions and expressions in order to
answer syntactic requirements of the target languages
The second function of the transferring unit is to select the adequate target translation. The linguistic rules and expressions from the dictionary are greatly used. The numerous syntactic and semantic relations established up to this point enable the application of a wide range of tests. Another important help to resolve ambiguities is the profusion of semantic information linked to words in their surrounding context. Thus, during the transfer, supplementary lexical rules can be applied to word classes in order to adapt tense, aspect, number, voice or any other grammatical element. The adaptations have an essential role because they are the one to guaranty a right target translation with grammatical and idiomatic level. |
||||||||
| Synthesis element | ||||||||
| From the fact of using the transfer mode, the final
translation tends to stay close to the syntax of the source language.
The synthesis unit determines grammatical choice in target languages (gender,
number, tense, aspect,...) depending on the information derived from the
analysis and syntactic requirements of the target language.
Finally, a great number of rules and tables applying specifically to the target language are used. The unit can also insert or delete determiners, including definite or indefinite articles and any other particles. |
||||||||
| Management software: exit process (post treatment) | ||||||||
| At the end of the sequence, control routines search codes which were separated and saved at the beginning, and they attach them to the translation. Then, they print the target text sentence by sentence. The post process transform again the compatible SYSTRAN layout into natural text. | ||||||||
|
|
||||||||
| REFERENCES | ||||||||
| Arnold, D. ; Balkan, L. ; Humphreys, R. Lee ; Meijer, S.
; Sadler, L. (1994) Machine Translation : An Introductory Guide. Manchester
and Oxford : NCC Blackwell.
Hutchins, W. John, & Somers, Harold L. (1992) An Introduction to
Machine Translation. London, New York, etc. : Academic Press.
Vasconcellos, M. ; Hovy, E. ; Scott, B.E., Miller, L.C. (1993) " Machine Translation : State of the Art " Byte, January, pp. 153-186. Wheeler, Peter J. (1983) " The Errant Avocado " Newsletter of the British Computer Society, Natural Language Translations Specialist Group, 13. Wheeler, Peter J. (1987) " SYSTRAN " In King, Margaret, ed., Machine Translation Today : The State of the Art. Proceedings of the Third Lugano Tutorial (Lugano, 2-7 April 1984). Edinburgh : Edinburgh University Press. Information Technology Series 2. Pp. 192-208. Copyright © SYSTRAN S.A. |
||||||||
|
||||||||