Lexiguru documentation
Contents
1About Lexiguru
This is the complete documentation for Lexiguru version b2.0.1
Lexiguru is an online application that randomly generates words from a given definition of phonemes, frequencies and word patterns. Applications like Lexiguru are called "word generators" or "vocabulary generators".
You can use it to make words for a constructed language, to get an original nickname or password, or just for fun.
2Interface
- Use the
Examples
dropdown button to load a number of example definitions into the file editor - The phonology definition file editor is the main input. It defines the phonology and the word shapes you get from the word generator. There will already be a default phonology definition in the file editor, or your previous phonology definition that you generated words with
- Use the
Generate
button to see Lexiguru produce words - Use the
Copy
button to copy the words to your clipboard
2.1Options
- Use the
Number of words
textbox to choose the number of words to generate. The default number is 100 Word-list mode
will produce a list of wordsParagraph mode
will produce words that look vaguely like sentences by injecting punctuation into the word list and capitalising the first word of each of sentenceDebug mode
will show, line by line, each step in creating each wordEditor wrap lines
will make the file editor jump to the next line if the line escapes the width of the file editorRemove duplicates
will make sure all words generated are uniqueForce words
will force the generator to try and generate the complete number of words requested within 30 seconds, despite the number of rejections / duplicates removedSort words
andCapitalise words
should be self explanatory- The
Word divider
textbox sets the delimiter, or in other words, what the content will be between each word in the output. It is a space (\n
to get one word for each line
2.2File save / load
- Use the
Save
button to download your phonology definition as a file called 'lexiguru.txt', or what you named your file in theFile name:
field. The file is always a ".txt" type - Use the
Load
button to load a file on your system into the file editor
3Using comments
If a line contains a ;
, everything after it on that line is ignored and not interpreted as Lexiguru syntax -- unless ;
is escaped. You can use this to leave notes about what something does or why you made certain decisions.
4Word creation
4.1Classes
Classes are groups of phonemes with singular-length character names. For example.
C = t n k m ch l ʔ s r d h w b y p g F = n l ʔ t k r p V = a i e u o
This creates three groupings. C
is the group of all consonants, V
is the group of all vowels, and F
is a group of some of the consonants.
By default, the phonemes' frequencies decrease as they go to the right, according to the Gusein-Zade distribution. In the above example, when Lexiguru needs to choose a V
, it will choose a
the most at 43%, i
the second-most at 26%, e
the third-most at 17%, u
the fourth-most at 10%, and o
the fith most at 4%.
4.1.2Class-drop-off
You can modify the phonemes' frequencies using this option. The options are zipfian
, gusein-zade
, and flat
. As already stated, the default is gusein-zade.
class-drop-off: flat
4.1.1Weights
If you want to set you own frequency for a class, you can use a colon (:
) to specify the weight for each phoneme, like so:
V = a:5 e:4 i:3 o:2 u:1
V has approximately the following probabilities: a: 33%, e: 27%, i: 20%, o: 13%, u: 7%.
4.1.3"Pick-one" and classes in classes
Treated as a single unit in terms of frequency
V = a i e o [aa ii ee oo]
V has a sixth chance at being a long vowel. Using a class inside a class will also have the same result
L = aa ii ee oo V = a i e o L
4.2Macros
Macros are a system designed to provide an abbreviation for syllable shapes. They are defined similarly to phoneme classes, but with several important differences:
- Every macro's name must start with
$
.S = s
is a phoneme class;$S = s
is a macro. - Macros allow phoneme classes inside of them.
C = D
is not valid, but$C = D
works as expected. - Macros do not support multiple possibilities.
$M = a b c
will not work the way you may think.
For example:
$S = CVD? words: V?$S$S V?$S V?$S$S$S
4.2.1Pick-one
Using square brackets, ([
]
), the set is treated as if it were a class or macro.
C = t A = a u V = e y $S = C[A V]
ta tu, te, ty
4.2.2Optional
Using round brackets, ({
}
), Optionaler works the same way as “pick-one”, the only difference is that what's inside them can either appear in the word or not. The probability of each of these variants is dependent on the optional weight directive. The default is a probability of 10%
$S = CV(F) words: V?$S$S V?$S V?$S$S$S
4.2.3optionals-weight
The optionals-rate:
directive specifies how often optional phonemes or classes are selected. This number is a percentage and as previously stated the default is 10%. For example,
optionals-rate: 20
4.2.4Inter-pick-one
“Inter-pick-one”, using curly braces ({
}
), works the same as pick-one. The only difference is only one "Inter-pick-one" set will be chosen for that macro.
Inter pick one is a feature designed to generate words with stress or pitch accent systems.
C = t V = a $X = ({'}CV){'}CV words: $X
This produces any of the following words: 'ta
, ta'ta
, 'tata
. Notice here that ta
is not possible.
There are a few restrictions and peculiarities to it. Most notibly, Inter-pick-ones may not be nested inside each other. Lets look at another example.
class-drop-off: flat $Z = {a b}{x} words: $Z
The above example is rather silly, as there is nothing between each "Inter-pick-one", defeating it's whole purpose. However it is useful as an example here in showing that it produces equivalent results to the example below, which uses "pick ones" instead.
class-drop-off: flat $Z = [[a b][x]] words: $Z
In both of the above examples, we have a 25% chance of producing a
, a 25% chance of b
, and a 50% chance of x
.
See the "Romance-like" example for a language that use "Inter-pick-one" for its stress system, and "BTX" for a language that uses it for a complex pitch accent system.
4.3Words
The most common way to make a word is to use the words:
directive. Words are weighted similarly to how phonemes are weighted in classes. A word can consist of individual phonemes, phoneme classes, or a mixture of both.
Phonemes or classes that are optional can be indicated by a ?
. For example, words: CVD?
is similar to words: CV CVD
, although the weights are quite different.
If you choose from the same class twice in a row, you may put an !
after the second one, to indicate they must not be the same phoneme. For example, CC
may generate tt
, but CC!
never will.
By default, words are selected using the Zipf distribution.
4.3.1Word-drop-off
This directive modifies how often the words' frequencies decrease as they go to the right, unless they have weights of course. The options are zipfian
, gusein-zade
, and flat
. The default is zipfian
.
4.4The graphs: directive
The graphs: directive can be an important element to your phonology definition file.
Alphabetisation
The graphs directive gives Lexiguru a custom sort order for words, when the sort words checkbox is selected.
Sometimes you may want the utility of telling Lexiguru which are multigraph graphemes without alphabetisation.
graphs: a b c c(h) d e f g h i j k l m n o p p' r s t t' u v y cat chit-chat cumin frog
Definining multigraphs and others
Tells Lexiguru which multigraphs, including character + combining diacritics, to be treated as singular phonemes
Tells Lexiguru what multigraphs to treat as a single unit when using filters.
graphs: a b ch d e f g h i j k l m n o p p' r s t t' u v y words: CVD?
Tells Lexiguru which character + combining diacritic sequences to treat as a single grapheme.
V = a a̋ e i o u words: CVD?
Tells Lexiguru what character + combining diacritic sequences to be treated as alternatives of another grapheme
graphs: a <[á à ǎ â] b d e <[é è ě ê] f g h i <[í ì ǐ î] k l m n o <[ó ò ǒ ô] p r s t u <[ú ù ǔ û] w y words: CVD?
The graphs: directive has the following aliases: alphabet, letters, graphemes, multigraphs, digraphs
4.5Word creation escape characters
Characters enclosed in a set of double quotes ignore any meaning they might have had in the generator, including double quotes themselves. This way, anything including capital letters that have already been defined as classes, brackets, even spaces, can be generated.
These are the characters you must escape if you want to use them in classes, macros, or words:
Characters | Meaning |
---|---|
; |
Creates a comment |
C = |
Creates a class |
|
Space, seperates choices |
$ |
Defines a macro |
: |
Weight |
[ ] |
Pick-one |
( ) |
Optionals |
{ } |
Inter-pick-one |
" |
Escapes characters enclosed in them |