Universal Language Dictionary

filename: introduc.txt
 version: 1995.09.01
The current version of this subject has been found at:
http://www.invisiblelighthouse.com/uld/ - July'02

The following is from an earlier ULD site that closed with no forwarding address. This earlier project contained a program to prepare parallel word lists of any two languages that I don't see on the new site and I find the new all-languages-in-one-listing approach more difficult for those interested only in English. -- jlb 10Aug97
Copyright 1992-1995 by Richard K. Harrison. All rights reserved. Permission is hereby granted for unrestricted use of these files by any individual for his/her own pleasure, private research, personal communication, etc. Use of these files by any government agency, business entity, educational institution, or any other organization requires permission. acknowledgement --------------- This project is entirely the result of volunteer effort. Special thanks to those who took the time to type various vocabularies into their computers and e-mail them to me. (Their names are mentioned at the beginning of each *.DIC file) introduction ------------ The Universal Language Dictionary is an attempt to create a list of concepts, described in English, along with words to express those concepts in several "natural" and "artificial" (constructed) languages. This vocabulary of 1600 terms, along with some knowledge of a language's grammar, enables one to engage in elementary conversation and correspondence on a wide variety of topics. And, of course, it is quite interesting to see the equivalent words listed side-by-side for comparison. Furthermore, if you are creating an _a_posteriori_ planned language, these wordlists can be extremely helpful. The list of terms can also be used as a general guide for those who are in the process of designing artificial languages. A vocabulary which cannot express most of these concepts is not ready to cope with the communicative challenges of the modern world. In their current condition, the wordlists are not really sufficient for use in automated translation, because they do not contain much information about irregular verbs, noun gender, inflected forms of words and so forth. The list does not include a complete array of function words, for the following reason. Pronouns, structural particles, articles, derivational affixes and similar morphemes are difficult to translate from one language to another, and every language has a different array of them, depending on its characteristics. bibliography ------------ The choice of items was not made arbitrarily, but was based upon an examination of the lists mentioned below. Of course, a few of the items might seem idiosyncratic, and there are always difficulties involved in trying to translate a list of terms from one language to another; sometimes exact equivalents are not available. These flaws are inevitable in this kind of project. The selection of vocabulary items was influenced by: Basic English wordlist; Esperanto baza radikaro; Loglan predicate list; Lojban predicate list; VOA Special English wordlist; Concise Dictionary of 26 Languages; Roget's Thesaurus; New Horizon Ladder Dictionary of the English Language; Minimum Vocabularies of Written Chinese; Jo^yo^ kanji list. arrangement ----------- The items are grouped into 38 topic-categories, which should make it easier to find a desired item without an index. However, each item has a three-character hexadecimal serial number, which makes possible the automated generation of an index for each language included. The categories are: 1. adpositions (001-020) 2. function words (021-03E) 3. people (03F-05C) 4. titles (05D-060) 5. groupings of people (061-069) 6. body parts and substances (06A-0BE) 7. body terms (0BF-0D1) 8. bodily actions (0D2-0F9) 9. animal species and types (0FA-125) 10. plant species and types (126-15F) 11. natural world (160-188) 12. tools and implements (189-219) 13. clothing (21A-22D) 14. buildings and institutions (22E-24D) 15. government and hierarchy (24E-270) 16. business and transactions (271-2A2) 17. religion and the supernatural (2A3-2B1) 18. mind and emotion (2B2-31F) 19. communication (320-36E) 20. games (36F-380) 21. identity (381-389) 22. numerals (38A-398) 23. quantity (399-3BD) 24. degree (3BE-3CA) 25. dimension, direction (3CB-40E) 26. motion (40F-439) 27. vehicles, etc. (43A-44A) 28. time and sequence (44B-4A0) 29. substances (4A1-4D9) 30. foodstuffs (4DA-4F0) 31. forms of matter (4F1-536) 32. qualities of matter (537-557) 33. matter-related actions (558-58B) 34. misc. matter/energy terms (58C-5A2) 35. light (5A3-5BC) 36. sound (5BD-5C4) 37. heat (5C5-5CC) 38. assorted abstract concepts (5CD-640) how to assemble the dictionary and add more languages ----------------------------------------------------- Each language's vocabulary is kept in a separate file. These files are strictly formatted to facilitate automated processing. Although I've hacked together some crude programs that assemble these files into sequential dictionaries, I'm hoping that others will be inspired to create applications to use these files in a more sophisticated way. The program entitled COLLATOR.BAS is written in a version of BASIC called Microsoft QuickBasic version 4.5. This program will also run under version 3.0 or later of Microsoft BASIC for the Macintosh, and under QBasic which comes with version 6 of MS-DOS. The program entitled COMBINER.C is written in "generic" C. These programs enable you to assemble any selected group of properly formatted vocabulary lists into a multi-lingual wordlist. If you want a German-Novial-UNI dictionary, you can create one using these programs. Simply boot the program, type in the file names of the first two vocabulary lists you want to include (e.g. DEUTSCH.DIC and NOVIAL.DIC), then wait for the first batch of interleaving to be performed; then enter the name of the next vocabulary list to include (e.g. UNI.DIC) and so on. Presently we are limited to 7-bit ASCII text files; diacritical marks and other non-English characters have to be represented by various work-arounds. (These typographical work-arounds are described in the typo_con comment lines at the beginning of each language's .DIC file.) We are planning to use the ISO 8859-1 character set (when appropriate) in the future. Each language is assigned a 3-character "tag"; this should be the first 3 letters of the language's actual name (e.g. "Deu" for Deutsch/German); however, if the first 3 letters would not be sufficiently distinct -- e.g. "Esp" might mean "Esperanto" or "Espan~ol," "Int" might mean "Interling" or "Interlingua" or "Interglossa" -- then something more distinctive must be invented. Clarifications of definitions should be in (parentheses); part-of-speech, gender of noun, other grammatical data in [brackets]; lexicographers' comments in {braces}. abbreviations ------------- aj adjective aux auxiliary av adverb cj conjunction ij interjection n noun num numeral pfx prefix pn pronoun pr preposition pres present tense rel relative sfx suffix v verb vi intransitive verb vt transitive verb ``natural'' languages: Deu Deutsch (German) Eng English Ned Nederlands (Dutch) planned languages: Basic English(Jeffrey Henning) Langmaker. E-o Esperanto (L. L. Zamenhof) Igl Interglossa (Lancelot Hogben) Nov Novial (Otto Jespersen) Tso Tsolyani (Muhammad Ab-dal-Rahman Barker) UNI UNI (Elisabeth Wainscott)
site: ftp.gate.net directory: pub/users/hrick/dictionary * * * * welcome to the Universal Language Dictionary project * * * * Introduction explains the project and describes the file formats collator.bas combines *.DIC files into a multi-lingual dictionary combiner.C combines *.DIC files into a multi-lingual dictionary counter.bas counts the empty and non-empty entries in a *.DIC file indexer.bas helps create an alphabetical index to any *.DIC file basiceng.dic BASIC English english.dic English esperant.dic Esperanto interglo.dic Interglossa nederlan.dic Dutch novial.dic Novial tsolyani.dic Tsolyani uni.dic UNI


Back to: Ogden's Basic English Homepage   or   Word List
About this Page : DicIntro.html
Text file copied from the Universal Language Dictionary project just before its web site was deleted.
Last updated December 19, 1996
URL: http://ogden.basic-english.org/dicintro.html