Salama.jpg

SALAMA -  Swahili Language Manager


Main | General info | For users | Dictionary compilation  | Machine translation | Language learning | Tone marking | Two phase method | Was used for | Future plans | Technical reports 

Automatic dictionary compilation

The recent development of SALAMA includes the fully automatic corpus-based dictionary compilation from Swahili to English. Any amount of text can be converted into a dictionary, with examples of use in context. The following features are fully supported:
  • lexical form of a word as a head-word
  • multi-word expressions as head-words, including idioms, nominal structures, adjectival and adverbial expressions, and even proverbs
  • full linguistic, etymological and domain-specific information of each head-word
  • glosses in English for each head-word
  • homonyms treated separately
  • frequency information of head words can be included
  • examples of use for each head-word, with user-defined length of context (the default is a sentence)
  • source information of the example can be included (e.g. bibliographical information including page number)
  • the maximum number of examples per head-word can be defined (currently between one and eleven)
  • a sophisticated method of identifying frequently occurring restricted contexts in sentence-length examples; these can be given a higher priority to ensure that they will be selected as representative examples of use (useful in selecting good examples among thousands of cases)
  • examples of use in context are located immediately after the corresponding head-word, on separate lines (if needed, examples of use in context can also be annexed on the same line with the head-word - useful in automatic sorting)
  • Application to other languages

    The system can currently be applied to the compilation of dictionaries between Swahili and any other language, provided that a conversion dictionary between English and the target language is available.  Using an electronic conversion dictionary, most of the English glosses can be converted into the target language. Manual editing is needed for checking and correcting the result, because only part of lexical data can be converted in this way.

    More general applicability

    The automatic dictionary compilation system presented here is not language-specific. It can be adapted to a dictionary compilation task between any two languages, if two conditions are fulfilled.


           
    Main