Lexifer documentation

Contents

  1. About Lexifer
  2. Interface
    1. Options
    2. File save / load
  3. On using comments
  4. The with: directory
    1. Featuresets
    2. Engines
  5. On defining frequency, phonology, and word creation
    1. Alphabetisation
    2. Phoneme classes
    3. Macros
    4. The random-weight: directive
    5. Building words
    6. Categories
  6. On filtering and rejecting words
    1. Filters
    2. Rejections
    3. Using Regular Expressions
    4. Cluster fields

1About Lexifer

This is the complete documentation for Lexifer version b2.0.1

Lexifer Online is an online application that randomly generates words from a given definition of phonemes, frequencies and word patterns. Applications like Lexifer are called "word generators" or "vocabulary generators".

You can use it to make words for a constructed language, to get an original nickname or password, or just for fun.

This version of Lexifer is a fork of this, which is a TypeScript version of Lexifer written by u/bbrk24. Software Copyright (c) 2021-2022 bbrk24Copyright (c), 2006-2023 William S. Annis.

2Interface

2.1Options

2.3File save / load

3Comments

If a line contains a #, everything after it on that line is ignored. You can use this to leave notes about what something does or why you made certain decisions.

4The with: directory

The first line of the default definition starts with with:. The with directory defines a featureset and engines.

4.1Featuresets

If you have a with: statement, you must use exactly one featureset. Currently, there are two options: std-ipa-features and std-digraph-features. The former is IPA, and the latter is ASCII-friendly. The recognised consonants are as follows:

IPA Digraph Features
p p voiceless bilabial plosive
b b voiced bilabial plosive
ɸ ph voiceless bilabial fricative
β bh voiced bilabial fricative
f f voiceless labiodental fricative
v v voiced labiodental fricative
m m voiced labial¹ nasal
t t voiceless alveolar plosive
d d voiced alveolar plosive
s s voiceless alveolar sibilant
z z voiced alveolar sibilant
θ th voiceless alveolar² fricative
ð dh voiced alveolar² fricative
ɬ lh voiceless alveolar lateral fricative
ɮ ldh voiced alveolar lateral fricative
tl voiceless alveolar lateral affricate
dl voiced alveolar lateral affricate
ts ts voiceless alveolar affricate
dz dz voiced alveolar affricate
ʃ sh voiceless postalveolar sibilant
ʒ zh voiced postalveolar sibilant
ch voiceless postalveolar affricate
j voiced postalveolar affricate
n n voiced alveolar nasal
ʈ rt voiceless retroflex plosive
ɖ rd voiced retroflex plosive
ʂ sr voiceless retroflex sibilant
ʐ zr voiced retroflex sibilant
ʈʂ rts voiceless retroflex affricate
ɖʐ rdz voiced retroflex affricate
ɳ rn voiced retroflex nasal
c ky voiceless palatal plosive
ɟ gy voiced palatal plosive
ɕ sy voiceless palatal sibilant
ʑ zy voiced palatal sibilant
ç hy voiceless palatal fricative
ʝ yy voiced palatal fricative
cy voiceless palatal affricate
jy voiced palatal affricate
ɲ ny voiced palatal nasal
k k voiceless velar plosive
g g voiced velar plosive
x kh voiceless velar fricative
ɣ gh voiced velar fricative
ŋ ng voiced velar nasal
q q voiceless uvular plosive
ɢ gq voiced uvular plosive
χ qh voiceless uvular fricative
ʁ gqh voiced uvular fricative
ɴ nq voiced uvular nasal

¹ These are both bilabial and labiodental. For example, the assimilations engine turns nf into mf and into , even though f and ɸ have different places of articulation. ² Yes, the IPA describes these as dental. However, the IPA does not make the dental/alveolar distinction elsewhere, so it is simpler to say that these are alveolar.

Choosing a specific featureset does not mean you have to use it for everything. Rather, you only need to use it for the consonants that will be considered by the engines you use (see below). Any unrecognised segments will be ignored.

4.2Engines

Engines are applied after word generation and before any user defined filters.

std-assimilations

This engine has two behaviours.

The first affects all consonants for which both voiced and voiceless versions exist. It applies leftward assimilation of voicing. For example, it would turn akda into agda and abta into apta.

The second only changes nasals, but considers all consonants except for approximants, lateral approximants, and trills. It applies leftward assimilation of place of articulation. For example, it would turn amta into anta and anka into aŋka.

coronal-metathesis

This engine only affects bilabial, alveolar, and velar plosives and nasals. It ensures that clusters of these segments have the alveolar element last. For example, it would turn atka into akta and anma into amna. It does not metathesise a nasal with a plosive; anpa would not become apna.

5On defining frequency, phonology, and word creation

This is the main purpose of the word generator. It shows how words are initially generated before being modified by any filters and rejections.

5.1Alphabetisation – the letters: directive

If you have a with: directive, there must also be letters:. If not, letters: is optional. letters: tells Lexifer what symbols you use and how to alphabetise them. It also affects how digraphs are parsed, even if std-ipa-features was chosen. For example, consider the following statements:

with: std-ipa-features
letters: t ʃ

In this case, if occurs, it will not be treated as an affricate , but as a plosive t followed by a sibilant ʃ. Additionally, words starting with t will be sorted alphabetically above words starting with ʃ. Contrast this with the following statements:

with: std-ipa-features
letters: tʃ t ʃ

In this case, is treated as an affricate. Additionally, words starting with will be sorted above words starting with tt, even though t by itself comes before ʃ.

5.2Phoneme classes

These are groupings of phonemes that have one-letter names. For example, here are the classes from the default definition:

C = t n k m ch l ꞌ s r d h w b y p g
D = n l ꞌ t k r p
V = a i e á u o

This creates three groupings. C is the group of all consonants, V is the group of all vowels, and D is a group of some of the consonants. A class cannot contain another class; this is not legal:

C = D m ch s d h w b y g

If you do this, and you have a letters: directive, Lexifer will warn you:

A phoneme class contains 'D' missing from 'letters'. Strange word shapes are likely to result.

By default, the phonemes' frequencies decrease as they go to the right, according to the Gusein-Zade distribution. In the above example, when Lexifer needs to choose a C, it will choose t the most, n the second-most, k the third-most, and so on. If you are not satisfied with the frequencies, you can use a colon (:) to specify the weight for each phoneme, like so:

V = a e i o u
# V has approximately the following probabilities:
# a: 43%, e: 26%, i: 17%, o: 10%, u: 4%
U = a:5 e:4 i:3 o:2 u:1
# U has approximately the following probabilities:
# a: 33%, e: 27%, i: 20%, o: 13%, u: 7%

Weights are relative, so a:5 e:4 i:3 o:2 u:1 is the same as a:50 e:40 i:30 o:20 u:10. Changing the order or weights of phonemes is a good way to change the feel of the language without changing the phonotactics.

If you specify a weight for any phoneme in a class, you must specify the weight for all of them. If you specify a weight of 0, the phoneme will never be selected.

Weights can be fractions, for example: C = t:2.5 k:1 n:0.75

5.3Macros

Macros are a system designed to provide an abbreviation for syllable shapes. They are defined similarly to phoneme classes, but with several important differences:

The default definition has one macro:

$S = CVD?
words: V?$S$S V?$S V?$S$S$S

This is exactly equivalent to the following definition:

words: V?CVD?CVD? V?CVD? V?CVD?CVD?CVD?

However, since most syllables are CVD?, it is quicker to use a macro.

5.4The random-weight: directive

The random-rate: directive specifies how often optional phonemes or classes are selected. This number is a percentage. For example,

random-rate: 25
words: CVD?

is equivalent to

words: CV:75 CVD:25

The default random-rate is 10%.

5.5Building words

The most common way to make a word is to use the words: directive. Words are weighted similarly to how phonemes are weighted in classes.

A word can consist of individual phonemes, phoneme classes, or a mixture of both.

Phonemes or classes that are optional can be indicated by a ?. For example, words: CVD? is similar to words: CV CVD, although the weights are quite different.

If you choose from the same class twice in a row, you may put an ! after the second one, to indicate they must not be the same phoneme. For example, CC may generate tt, but CC! never will.

By default, words are selected using the Zipf distribution.

5.6Categories

The categories: directive is an alternative to words:. You may not include both directives in the same definition.

categories: lets you define multiple types of words. The general syntax is:

categories: cat1 cat2 # ...etc
cat1 = # word shapes for cat1
cat2 = # word shapes for cat2

The categories themselves can also be weighted, but these weights only apply in paragraph mode. If you give a number of words, that is the number of words generated per category. This is where a weight of 0 could be helpful. If you want to generate parts of a word when you enter a number, but only show complete words in paragraph mode, you could have something like:

categories: root:0 prefix:0 suffix:0 full-word:1
# ...definitions of each category...

The order that the categories are declared is the order they are presented when generating a specific number of words.

6Filters and rejections

Filters and rejections modify or remove words generated from the words: or categories: directive. They are executed in the order they are written.

6.1Filters

Filters are a way to change words after they have been generated and run though the engines in the with: directory. If your spelling doesn't match up with a featureset exactly, you can use filters to achieve this.

Filters are expressed as filter: pattern > replacement. For example, if you want to spell [ŋ] the same as [n], you would say:

filter: ŋ > n

Multiple filters on one line are separated by semicolons:

filter: pattern1 > replacement1; pattern2 > replacement2

This does not mean that the two filters are run at the same time. It is identical to:

filter: pattern1 > replacement1
filter: pattern2 > replacement2

If the replacement is !, the pattern is removed from the word, but the rest of the word is left alone.

6.2Rejections

To outright forbid a sequence from occurring, use the reject: directive. The default definition contains a few of these. The first two are:

reject: wu yi

This prevents any word from having wu or yi. In reality, reject: is an abbreviation, and that statement is equivalent to:

filter: wu > REJECT; yi > REJECT

As such, you can intersperse filters and rejections, and they will be performed in order.

6.3Using Regular Expressions

filter: and reject: use ECMAScript regular expressions. If you know what that means, great; but if not, don't worry about it. The important things are:

If you want to prevent an entire part of a word from appearing twice in a row, you can reject: (..+)\1. This would prevent e.g. kiki from being generated, as it is just ki twice.

If you're confident that it is okay to simplify such occurrences, you may instead filter: (..+)\1+ > $1. This would simplify kiki into simply ki. This may not be desirable as it can make words that are significantly shorter than expected.

If you need to prevent the matching of characters without a combining diacritic to a character with a combining diacritic, you need to use (?=\w|$) after the character. For example filter: o(?=\w|$)x > oy will prevent őx becoming oy.

6.4Cluster fields

Cluster fields are a way to put a lot of related filters or rejections in a smaller space. They are laid out like tables, and start with %. For example, a cluster field could look like:

% a  i  u
a +  +  o
i -  +  uu
u -  -  +

The first character is the row, and the second character is the column. In this example, au becomes o and iu becomes uu. + means to leave the combination as-is, and - means to reject it. This table would permit ai but reject ia.

Cluster fields can also use ! in them to remove a sequence.

As with filters, these are parsed in the order presented. The cluster field ends at a blank line.