Lexiguru documentation

About Lexiguru
Interface
1. Options
2. File save / load
Using comments
Categories
1. Long form category name
2. Category-drop-off directive
3. Categories inside categories and category-sets
Assigning weights
Building words
1. Words
  1. Word-drop-off directive
2. Segments
3. Pick-one
4. Optional
  1. Optional weight
5. Inter-pick-one
  1. Inter-pick-one weight
Alphabetisation and graphs
1. Alphabetisation
2. Defining graphemes
  1. Alternative graphemes
3. Invisibility
Word creation character escape
Sound change
The change
1. Concurrent set
2. Merging set
3. Optional set
4. Reject
The condition
1. Multiple conditions in one rule
2. Optional and concurrent set
3. Word boundary
4. Syllable boundary
5. Word-based conditions
The exception
Using categories
The features directive
Wildcards and positioning
1. Wildcard
2. Ditto-mark
3. Greedy-ditto-mark
4. Anythings-mark
5. Quantifier
6. Positioner
Insertion and deletion
Advanced sound-change
1. Blocker
2. Metathesis
Logic blocks
1. If block
2. Chance block
3. Rule macro
Cluster-field
Engine
Sound-change character escape

1About Lexiguru

This is the complete documentation for Lexiguru version b2.0.1

Lexiguru is an online application that randomly generates words from a given definition of graphemes, frequencies and word patterns. Applications like Lexiguru are called "word generators" or "vocabulary generators".

You can use it to make words for a constructed language, to get an original nickname or password, or just for fun.

2Interface

Use the Examples dropdown button to load a number of example definitions into the file editor
The phonology definition file editor is the main input. It defines the phonology and the word shapes you get from the word generator. There will already be a default phonology definition in the file editor, or your previous phonology definition that you generated words with
Use the Generate button to see Lexiguru produce words
Use the Copy button to copy the words to your clipboard

2.1Options

Use the Number of words textbox to choose the number of words to generate. The default number is 100
Word-list mode will produce a list of words
Paragraph mode will produce words that look vaguely like sentences by injecting punctuation into the word list and capitalising the first word of each of sentence
Debug mode will show, line by line, each step in creating each word
Editor wrap lines will make the file editor jump to the next line if the line escapes the width of the file editor
Remove duplicates will make sure all words generated are unique
Force words will force the generator to try and generate the complete number of words requested within 30 seconds, despite the number of rejections / duplicates removed
Sort words and Capitalise words should be self explanatory
The Word divider textbox sets the delimiter, or in other words, what the content will be between each word in the output. It is a space ( ) by default. Use \n to get one word for each line

2.2File save / load

Use the Save button to download your phonology definition as a file called 'lexiguru.txt', or what you named your file in the File name: field. The file is always a ".txt" type
Use the Load button to load a file on your system into the file editor

3Using comments

If a line contains a semicolon ; everything after it on that line is ignored and not interpreted as Lexiguru syntax -- unless ; is escaped. You can use this to leave notes about what something does or why you made certain decisions.

4Categories

Note: Lexiguru uses a concept called a 'grapheme'. Graphemes are indivisible character(s) that make up a generated word in Lexiguru. So you may think of a phoneme as a type of grapheme. If we used English words shy and sky as examples to illustrate this, shy would be made up of the digraph sh + grapheme y, while sky would be made up by a sequence of graphemes s + k + y.

A category is a set of graphemes with a name, usually a singular-length character. For example:

C = t, n, k, m, ch, l, ꞌ, s, r, d, h, w, b, y, p, g
F = n, l, ꞌ, t, k, r, p
V = a, i, e, u, o

This creates three groups of graphemes. C is the group of all consonants, V is the group of all vowels, and F is a group of some of the consonants.

By default, the graphemes' frequencies decrease as they go to the right, according to the Gusein-Zade distribution. In the above example, when Lexiguru needs to choose a V, it will choose a the most at 43%, i the second-most at 26%, e the third-most at 17%, u the fourth-most at 10%, and o the fifth most at 4%.

In the previous example, the graphemes were separated by commas, however an alternative when separating options, is to use spaces:

C = t n k m ch l ꞌ s r d h w b y p g
F = n l ꞌ t k r p
V = a i e u o

You may not use both commas and spaces as separators on the same line, i.e: "a b, c".

There are two advantages to using commas over spaces. They make it clearer what separates options -- in the above example things are very simple looking, but things can get a lot more complicated. Secondly, commas make it possible to define a null / zero grapheme in a class. For example C = t, , k, p would be a category of three graphemes, and nothing. This document will be using a comma followed by a space throughout for these reasons.

4.1Long-form category names

You can also give categories long names, but you will need to enclose them in curly brackets { and } when using them:

consonant = t, n, k, m, ch, l, ꞌ, s, r, d, h, w, b, y, p, g
vowel = a, i, e, u, o
words: {consonant}{vowel}

4.2Category-drop-off

You can modify the graphemes' frequencies using this option. For example: category-drop-off: flat

zipfian has the fastest drop-off, graphemes frequencies will decrease as they go to the right, according to the Zipf distribution
gusein-zade approximates a natural frequency distribution of graphemes. As already stated, this is the default
flat will make all graphemes have an equal chance of being chosen

4.3Categories inside categories and set-categories

You can use categories inside categories, as long as the referenced category has previously been defined. For example:

class-drop-off: flat
L = aa, ii, ee, oo
V = a, i, e, o, L

In the example above, V has a 20% chance of being a long vowel.

You can also enclose a set of graphemes in square brackets [ and ]. This is called a "set-category". This set will be treated as if it were a reference to a category in terms of frequency. For example, we could write the same example as this:

class-drop-off: flat
V = a, i, e, o, [aa, ii, ee, oo]

Assigning weights to Categories in categories and "Set-categories" is possible.

Categories inside categories and "Set-categories" CANNOT be apart of any sequence. for example C = Xz or C = x[c, d] or C = [a, b][c, d] will not give the results you might want. To get sequence-like behaviour like that, you will need to use Segments.

5Assigning weights

If you want to set your own frequency for graphemes in a Category, items in a Pick-one, Optional, or Inter-pick-one set, or word-shapes in the words directive, you can use a colon : to specify the weight for each item, like so:

V = a:5, e:4, i:3, o:2, u:1
$S = [V:8 x:2]
words: $S:2 y

V has approximately the following probabilities: a: 33%, e: 27%, i: 20%, o: 13%, u: 7%. The Pick-one set in the $S segment has an 80% chance of being the V category over the x grapheme. And the first word-shape in the words: directive has twice the chance of being chosen over the next word-shape.

As you might have seen in the example above, in a sequence that has an option that has a weight, it overwrites any drop off frequencies. Also important to note is that any other option that you had not given a weight, is given a default weight of 1.

6Building words

6.1Words

The words: directive defines a set of "Word-shapes" that Lexiguru will choose from to create words. A word can consist of individual graphemes, Categories, Segments or a mixture of both.

By default, words are selected using the Zipf distribution. The first word-shape will be chosen the most often, then the second word shape the second most often and so on. Below is a very simple example that will generate words with one to three CV syllables:

C = t, n, k, m, l, s, r, d, h, w, b, j, p, g
V = a, i, o, e, u
words: CV, CVCV, CVCVCV

6.1.1Word-drop-off

This directive modifies how often the words' frequencies decrease as they go to the right, unless they have weights. The options are zipfian, gusein-zade, and flat. The default is zipfian.

You probably don't want to think about using this directive or giving word-shapes weights -- it is an uphill battle. For example, if you chose to remove duplicates in the above example, it is already removing one syllable words the most often. And if you have paragraph mode turned on, you would want simple syllables to occur very often. So it is better to rearrange the word-shapes in the words directive to get good-looking results. Nevertheless, maybe you want to use a flat distribution because you are only generating CVCV syllables of different types, or generating something that doesn't play by the rules.

6.2Segments

Segments are a system that provides an abbreviation of parts of a word-shape. Typically you would use it to define syllable-shapes. Segments are defined similarly to Categories, but with several important differences:

Every Segment's name starts with $. S = s is a Category; $S = s is a Segment.
Segments are not sets like categories are. $M = a, b, c will not work. You would need to use a "Pick-one" set, i.e: $M = [a, b, c]
Segments have an effect on the logic behind Inter-pick-ones. In this sense, Segments are not just abbreviation.

For example you could write the last example like so:

$S = CV
words: $S $S$S $S$S$S

6.3Pick-one set

A Pick-one set is a group of graphemes and Categories separated by spaces or commas, enclosed in square brackets [ and ]. Lexiguru will pick an option from that Pick-one just like it would from a Segment. For example:

V = a, u
words: t[V, x]

This will produce either ta, tu or tx.

Pick-one sets can be nested inside each other.

Anything inside the Pick-one can be assigned a weight, and a Pick-one itself can be assigned a weight as well if it is nested inside another set:

words: [a:1, b:2, [c, d]:2]

6.4Optional set

Using round brackets, ( and ), Optional works the same way as Pick-one, the only difference is that what's inside them can either appear in the word or not. The probability of each of these variants is 10% by default.

words: ta(n, t, l)

In the above example, there is a 10% chance of getting one of tan, tat or tal, but a 90% chance of ta.

6.4.1Optional weight

This default probability can be modified in two ways. The first is by attaching a percentage-based weight following a ? inside the Optional set:

$S = ta(n, t, l ?30)
words: $S

Now there is a 30% chance of getting one of tan, tat or tal.

The other way to change this probability is through the optional-rate: directive. This directive specifies how often an Optional set is selected. This number is a percentage and as previously stated the default is 10%. For example:

optional-rate: 20

You can write this number with a percentage sign on the end if you want to.

6.5Inter-pick-one set

An Inter-pick-one, using less and greater than signs < and >, works the same as Pick-one. The difference is, only one Inter-pick-one set will be chosen for that Segment or word-shape.

Inter-pick-one is a feature designed to help generate words with stress or pitch accent systems. Here is an example where it is used for a stress system:

C = t
V = a
$X = (<'>CV)<'>CV
words: $X

This produces any of the following words: 'ta, ta'ta, 'tata. Notice here that ta is not possible -- An Inter-pick-one set is only chosen after dealing with any sets any Inter-pick-one sets are nested in.

There are a few restrictions and peculiarities to it. Most notably, Inter-pick-ones may not be nested inside each other. Lets look at another example:

class-drop-off: flat
words: <a, b><x>

The above example is rather silly, as there is nothing between each Inter-pick-one, defeating it's whole purpose. However it is useful as an example here in showing that it is equivalent to the example below, which uses Pick-ones instead.

class-drop-off: flat
words: [[a, b][x]]

In both of the above examples, there is a 25% chance of producing a, a 25% chance of b, and a 50% chance of producing x.

6.5.1Inter-pick-one weight

Inter-pick-one weights begin with an @ inside the set. The number of the weight behaves like semicolon weights rather than percentage-based weights. Lets look at a scenario break it down:

class-drop-off: flat
$Y = <a:2, b:1 @3><c>
words: {$Y @8}-{d @2}

In the Segment $Y, a and b have a three times greater chance of being chosen over c, while a has a weight that makes it twice as probable than b. In the words: directive, there is one word-shape, and that word-shape has an 80% chance of being the Segment Y followed by -, and a 20% chance of the word being being -d.

See the "Romance-like" example for a language that uses Inter-pick-one for its stress system, or the "BTX" example for a language that uses it for a complex pitch accent system.

7Alphabetisation and graphemes

The alphabet:, graphs:, alphabet-graphs: and invisible: directives can be an important element to your phonology definition file. Let's go over its uses.

7.1Alphabetisation

The alphabet directive gives Lexiguru a custom alphabetisation order for words, when the sort words checkbox is selected.

alphabet: a, b, c, e, f, h, i, k, l, m, n, o, p, p', r, s, t, t', y

This would order generated words like so: cat chat cumin frog tray t'a

7.2Defining graphs

The graphs: directive tells Lexiguru which (multi)graphs, including character + combining diacritics, are to be treated as grapheme units when using sound-changes.

graphs: a, b, c, ch, e, f, h, i, k, l, m, n, o, p, p', r, s, t, t', y

In the above example, we defined ch as a grapheme. This would stop a sound change such as c -> g changing the word chat into ghat, but it will make cobra change into gobra.

"But my list of graphemes is the same as my list of alphabeticalising letters, I don't want to list them twice", you might exclaim. Well, you can create an alphabetisation order and list your graphemes in one line using the alphabet-and-graphs: directive.

7.2.1Alternative graphemes

The graphs: directive can tell Lexiguru what character + combining diacritic sequences are to be treated as alternatives of a base grapheme. Lets name these alternatives the 'children' and the base grapheme the 'parent'. You can do this by enclosing the 'children' in <[ and ] as a set, directly after their 'parent'.

Important: The left-most precomposed character of a 'child' must be the same as it's 'parent'.

This should be useful for tonal languages that mark tone with diacritics on vowels. In these tonal languages, we no longer need to list every variation of a vowel + diacritic to target a vowel:

  graphs: a, <[á, à, ā, ǎ], h, i, <[í, ì, ī, ǐ], k, l, m, n, o, <[ó, ò, ō, ǒ], t
  a -> e
; mápǎ ==> mépě

However we can still target a vowel with a tone mark, such as ǎ:

  ǎ -> e
; mápǎ ==> mápě

7.3Invisibility

Sometimes you will want characters, such as syllable dividers, to be invisible to alphabetisation. You can do this by listing these characters in the invisible-alphabet: directive.

invisible-alphabet: ., ˈ

This would order the words ˈpa.ta ˈca.ta za.ˈta ca.ˈa as ca.ˈa, ˈca.ta, ˈpa.ta, za.ˈta

8Word creation character escape

Characters enclosed in a set of double quotes ignore any meaning they might have had in the generator, including double quotes themselves. This way, anything including capital letters that have already been defined as categories, brackets, even spaces, can be generated.

These are the characters you must escape if you want to use them in Categories, Segments, the words directive, or the graphs directive:

Characters	Meaning
`;`	Comment
`C`, `D`, `Ḱ`, ...	Any one-length character can refer to a category
`{`, `}`	References a long-form category name
`,`	Separates choices
	Space, separates choices. An alternative to commas
`$`	Defines a Segment
`:`	Gives weight to a grapheme, Segment, set, or word-shape
`?`	Gives probability of an Optional set being chosen
`@`	Gives weight of an Inter-pick-one set being chosen over others
`[`, `]`	Pick-one set
`(`, `)`	Optional set
`<`, `>`	Inter-pick-one set
`"`	Escapes characters enclosed in them

9Sound change

Once words are generated, you might want to modify them to prevent certain sequences, outright reject certain words, or simulate historical sound changes. This is the purpose of the sound-change block, which implements the NASC program.

All sound changes must be used inside this block. To terminate a block you must have an END line. However, all unterminated blocks are automatically terminated at the end of the file:

BEGIN soundchange:
; Your rules go here
END

The format of all parts of a NASC rule can be summarised as CHANGE / CONDITION ! EXCEPTION | FLAG

Every rule begins on a new line and must contain a CHANGE. The CONDITION, EXCEPTION or FLAG parts are optional.

10The change

The format of The change can be expressed as BEFORE -> AFTER.

BEFORE specifies which part of the word is being changed
Then followed by a space and the > character. > can be swapped with either ->, =>, ⇒ or → if you prefer
AFTER is what BEFORE is changing into, or in other words, replacing

Let's look at a simple unconditional rule:

; Replace every /o/ with /x/
  o -> x
; bodido ==> bxdidx

In this rule, we see every instance of o become x.

10.1Concurrent set

A concurrent set in a change is achieved by listing multiple graphemes in BEFORE separated by commas, and listing the same amount of resultant graphemes in AFTER separated by commas. Changes in a concurrent change execute at the same time.

; Switch [o] and [e] around
  o, a -> a, o
; boda ==> bado

Notice that the above example is different to the example below:

  o -> a
  a -> o
; boda ==> bodo

where each change is on its own line. We can see o merge with a, then a becomes o.

10.2Merging set

A merging change is accomplished by placing graphemes enclosed in square brackets in BEFORE, with a corresponding singular grapheme in AFTER that the graphemes in the set will merge into:

; Three graphemes becoming two graphemes
  [ʃ, z], dz -> s, d
; zeʃadzas ==> sesadas

10.3Optional set

Items in an Optional set can be targeted wether or not they appear as part of a grapheme or sequence of graphemes:

; Merge /x/ and /xw/ into /k/
  x(w) -> k
; xwaxaħa ==> kakaħa

Optional change can also attach to a concurrent or merging change:

; Merge /x/, /xw/, /ħ/ and /ħw/ into /k/
  [x, ħ](w) -> k
; xwaxaħa ==> kakaka

Looking at the above example, Lets say you wanted to preserve this optional /w/ following /k/ or /ħ/. We can do this by writing this /w/ in AFTER, enclosed by round brackets:

; Like the previous rule, but preserve labialisation
  {x, ħ}(w) -> k(w)
; xwaxaħa ==> kwahaka

The Optional set can also be a merging change, or concurrent change too:

; Like the previous rule, but preserve palatalisation and labialisation 
  [x ħ](w, j) -> k(w, j)
; xwaxjaxa ==> kwakjaka

10.4Reject

To remove, or in other words, reject a word from your list of generated words, you use the ^REJECT keyword in AFTER:

a, bi -> ^REJECT

In the above example, any word that contains a or bi will be rejected.

11The condition

Conditions follow The change and are placed after a forward slash. The condition may also be called the environment.

The format of a condition is / PRE_POST

PRE is everything before the target
The underscore _ is a reference to the target
POST is everything after the target

For example:

; Change /o/ into /x/ only when it is between /p/s
  o -> x / p_p
; opoptot ==> opxptot

11.1Multiple conditions in one rule

Multiple conditions for a single rule can be made by separating each condition with additional forward slashes. The change will happen if it meets either, or both of the conditions:

; Change /o/ into /x/ only when it is between /p/s or /t/s
  o -> x / p_p / t_t
; opoptot ==> opxptxt

11.2Optional and concurrent sets

Optional and Concurrent sets can be used in conditions:

  a -> e / k(w)_[p, s]
  ; kwop-po-kos-po ==> kwxp-po-kxs-ko

11.3Word boundary

# matches to word boundaries. Either the beginning of the word if it is in BEFORE, or the end of the word if it is in AFTER

  o -> x / p_p#
; opoppop ==> opoppxp

11.4Syllable boundary

$ matches to syllable boundaries. A syllable boundary is either the beginning or end of the word, or any of the symbols defined in the syllable-boundary: directive.

For example:

  syllable-boundary: .
  t$t -> d$d
; at.ta -> ad.da

11.5Word-based condition

If we wanted to execute a sound change only on a list of words, we simply write those words as a list in a condition without any underscores

sw -> s / _o / swore, sworn

In the above example, the sound change will only execute if the word is swore or sworn

12The exception

Exceptions are placed following a ! and go after the condition, if there is one. Exceptions function exactly like the opposite of the condition -- they will make sure the content in the exception does not execute a change:

sw -> s / _o ! swore, sworn

In the above example, the sound change will not execute if the word is swore or sworn

13Using categories

You can reference categories in sound-changes by inclosing a category in curly brackets { and }. The category will behave in the same way as a concurrent or merging set:

  B = x, y, z
  sound-change:
  {B} -> ^
; xapay ==> apa

14The features directive

Lets say you had a grapheme that was a phoneme such as /i/ and wanted to target it by its distinctive features of a vowel, +high and +front, and turn it into a phoneme with +high and +back features, i.e /ɯ/. The features: directive block lets you do this:

BEGIN features:
  -voice = p, t, k, f, s
  +voice = b, d, g, v, z
END

  {-voice} -> {+voice}
; tamefa ==> dameva

This very simple example above is written to change all voiceless phonemes that have a voiced counterpart into their voiced counterparts. To accurately explain how this directive works, there must be some nomenclature discussed:

+voice is a feature, and references a series of graphemes.
-voice is an antipode, and it is the antipode counterpart of the voice feature.
In the sound-change, in BEFORE, the -voice feature is being targeted inside a matrix using curly brackets { and }. A matrix can contain multiple features to narrow-down the graphemes being targeted. For example {+high, +back} might target the graphemes u, ɯ
In the sound-change, in AFTER, {+voice} has a symetrical one-to-one correspondance of graphemes with it's counterpart in BEFORE
Lets quickly imagine a scenario where the only +voice grapheme was b. The result will be a merging of all -voice graphemes into b: tamepfa ==> bamebba. Similarly, in a different scenario where the only -voice grapheme was p, p would become the first grapheme in +voice, which happens to be b: tamepfa ==> tamebfa

Feature-pool

Feature-pools do two things at once. The graphemes that belong to a feature-pool, are defined by the graphemes defined in the features inside said feature-pool. The antipode graphemes of a feature-pool are defined by the graphemes defined in the graphs directive that are not defined in said feature-pool.

features inside a feature-pool without a defined antipode will be given an antipode. The antipode's graphemes are the graphemes found in the feature-pool but not in the feature.

Here is an example of comprehensive features of vowels:

graphs: a, e, i, o, u, m, n, p, b, t, d, k, g, f, v, s, z, h, l, r, j
BEGIN features:
  BEGIN feature-pool vowel:
    high = i, u
    mid = e, o
    low = a
    front = i, e
    back = o, u, a
    +foo = i, u
    -foo = e, o
  END
END

Here are some matrices of features and what graphemes they would capture:

{+vowel} would target the graphemes a, e, i, o, u
{-vowel} would target the graphemes p, b, t, d, k, g, f, v, s, z, h, l, r, j
{+high} would target the graphemes i, u
{-high} would target the graphemes a, e, o
{-foo} would target the graphemes e, o
{+high, +front} would target the grapheme i

15Wildcards and positioning

15.1Wildcard

Wildcard will match once to any character, or multigraph defined in the graphs: directive. Wildcard does not match word boundaries. Wildcard cannot be used in AFTER:

  a -> e / _*
; apappap ==> apappep

15.2Ditto-mark

Ditto-mark will match once to the grapheme, or grapheme in a set, category, or feature, to the left of it:

  a< -> a
; aata => ata

15.3Greedy-ditto-mark

Greedy-ditto-mark will match as many times as possible to the grapheme, or grapheme in a set, category, or feature, to the left of it

  a+ -> a
; raraaaaa ==> rara

15.4Anythings-mark

The anythings-mark is the ellipsis character … U+2026. It will match as many times to any character, or multigraph defined in the graphs: directive, as possible. However it will stop matching when the grapheme to the right of the anythings-mark, is matched:

  b…t -> x
; babãittati => xtati

As we can see, the rule matched b followed by anything else until it reached t, then stopped matching. The example below uses the anythings-mark in the condition:

; Simulate spreading of nasality to vowels
  [a i u] -> [ã ĩ ũ] / [ã ĩ ũ](…)_ 
; babãittati => babãĩttãtĩ

15.5Quantifier

The quantifier matches as many times its number to the things to the left.

  Change /o/ into /x/ only when preceded by three /r/s
  o -> x / r=[3]_
; rrrorro ==> rrrxrro

The number in the quantifier can also be a list of numbers:

  Change /o/ into /x/ only when preceded by zero or four /r/s
  o -> x / r=[0, 4]_
; orrrorro ==> xrrrxrro

The number in the quantifier can also be a range. To do this, put a : between the lowest and highest range:

  Change /o/ into /x/ only when preceded by two to four /r/s
  o -> x / r=[2:4]_
; rrrorro ==> rrrorro

Here is a useful lookup table on getting quantities of ditto-marks or wildcards:

	Wildcard	Ditto-mark
Exactly 1 of	`*`	`<`
0 or 1 of	`(*)`	`(<)`
1 or more of	`…`	`+`
0, 1, or more of	`(…)`	`(+)`
Specific number(s) of	`*=[N]`	`<=[N]`
Number range(s) of	`*=[N:N]`	`<=[N:N]`

15.6Positioner

Positioners allows a grapheme to be captured only when it is the Nth in the word:

; Change the second /o/ in a word to /x/ after the second /s/
  o@[2] -> x / s@[2]_
; sososo ==> sosxso

If we want to match the last occurence of a grapheme in a word, use -1. For the second last occurance of a grapheme in a word, use -2, and so forth:

; Change the last /o/ in a word to /x/
  o@[-1] -> x
; sososo ==> sososx

16Insertion and deletion

Insertion requires a condition to be present, and for the ^ to be present in BEFORE, representing nothing.

; insert /a/ in between /b/ and /t/
  ^ -> a / b_t
; bt ==> bat

Deletion happens when ^ is present in AFTER

; delete every /b/
  b -> ^
; bubda ==> uda

17Advanced sound-changes

17.1Blocker

A Blocker is designed to block the spread of greedy, spreading, behaviours, then stop the change from executing. For example we might want the graphemes k or g to prevent the rightward spread of nasal vowels to non nasal vowels:

  [a, i, u] -> [ã, ĩ, ũ] / [ã, ĩ, ũ]…~[k, g]_
; pabãdruliga ==> pabãdrũlĩga

17.2Metathesis

Metathesis in NASC refers to the reordering of graphemes in a word. Metathesis in real-world diachronics is usually sporadic, but can be but can be regular.

To make a rule a metathesis rule, use these symbols:

The ampersand & marks the content (if any) between the targets we want to reorder. You must use the same amount of &s in BEFORE and AFTER
Numbers in AFTER refer to the targets. Reordering these numbers reorders the targets. It is possible to have up to nine.
Underscores _ in a condition or exception, are references to the targets. Unlike a normal rule, we can have multple.

Local metathesis

A typical type of metathesis is local two-place metathesis:

; An intervocalic stop + nasal sequence becomes nasal + stop
  [stop]&[nasal] -> 2&1 / V__V 
; watna ==> wanta

Long-distance metathesis

The example below approximates metathesis that occured in Spanish:

r&l -> 2&1 / _(…)[plosive]_
; parabla ==> palabra

One-place metathesis

To simulate one-place metathesis, move &s.

The example below is metathesis where words beginning with stop + vowel will try and move an r in a stop + r cluster to form a word initial stop + r cluster:

{stop}&r -> 12& / #_{vowel}…{stop}_ 
; kabatros ==> krabatos

Metathesis madness

Three or more sounds, to a maximum of 9, switching places, are possible, with shuffling of any &:

  x&y&z -> &&321
; xaayooz ==> aaoozyx

18Logic blocks

Logic blocks are a way of executing sound changes depending on a trigger event that we are listening for.

18.1If block

Using an If block, You can make sound changes execute on a word if, or if not, other sound change(s) were applied to the word.

It should feel familiar to anyone who knows a bit about programming languages

BEGIN if: starts the if block and where sound changes will be listened to and trigger other events on the word if, or if not, it is executed on that word.
then: is where you put sound changes that will execute if the sound changes in if: did apply
else: is is where you put sound changes that will execute if the sound changes in if: did not apply
END is the end of the block

For example:

BEGIN if:
  ; Deletion of schwa before r
  ə -> ^ / _r
then:
  ; Then do metathesis of r and l
  r&l -> 2&1 / _&[plosive]_
else:
  ; Schwa becomes e if the first rule did not apply
  ə -> e
END

Note: The above example is actually quite bogus if it were a historical sound change. Sound change in natural diachronics has no memory. We can have "two-part" sound-changes such as this triggered metathesis, but a sound change executing on a word because another sound change did not apply to the word does not occur, at least not in real-life natural human languages.

18.2Chance block

The chance block is a way to apply sound-change depending on percentage-based chance:

BEGIN chance 15:
  a -> e
END

In the above example we have a 15% chance of words with an a in them such as pa becoming pe

18.3Rule macro

Rule macro saves rules to be used later in the file as many times as needed. The rules inside the define-rule-macro: block do not run until invoked using do-rule-macro::

BEGIN def-rule-macro resyllabify:
  i -> j / _[a,e,o,u]
  u -> w / _[a,e,i,o]
END

  do-rule-macro: resyllabify
  ʔ -> ^
  do-rule-macro: resyllabify
; iaruʔitua ==> jaruʔitwa ==> jaruitwa  ==> jarwitwa

In the above example we saved two rules as a macro under the name "resyllabify" and used that macro twice.

19Cluster-field

Cluster-fields are a way to target and change sequences of graphemes. They are laid out like tables, and start with %. For example:

% a  i  u
a +  +  o
i -  +  uu
u -  -  +

The first grapheme is the row, and the second grapheme is the column. In this example, au becomes o and iu becomes uu. + means to leave the combination as-is, and - means to reject the word. This table would permit ai but reject ia.

Cluster-fields can also use ^ in them to remove a sequence.

As with filters, these are parsed in the order presented. The cluster-field ends at a blank line.

20Engine

The engine statement provides useful functions that you can call at any point in the file. You can also call a list of these functions in one line e.g: engine: compose, capitalise

decompose will break-down all characters in a word into their "Unicode Normalization, Canonical Decomposition" form. For example, ñ as a singular unicode entity, \u00F1, will be broken-down into a sequence of two characters, n \u006E + ◌̃ \u0303. The typescript function is called Normalize("NFD")
compose does the opposite of decompose, it converts all characters in a word to the "Unicode Normalization, Canonical Decomposition followed by Canonical Composition" form. For example ñ as two characters \u006E\u0303, will be transformed into one character, \u00F1
capitalise will convert the first character to uppercase.
de-capitalise will convert the first character to lowercase.
to-upper-case will convert all characters to uppercase.
to-lower-case will convert all characters to lowercase.
xsampa_to_ipa will convert graphemes written in X-SAMPA into IPA
ipa_to_xsampa will convert graphemes written in IPA into X-SAMPA
unicode_entities will convert the HTML name of a unicode entity following an backslash \ instead of it's usual ampersand into it's unicode entity. For example \Agrave will make À

21Sound-change character escape

Characters	Meaning
`;`	Comment
`>`, `->`, `=>`, `⇒`, `→`	Indicates change
`,`	Separates choices
`[`, `]`	Set
`(`, `)`	Optional set
`^REJECT`	Rejects a word
`/`	Condition
`_`	The underscore `_` is a reference to the target
`#`	Word boundary
`$`	Syllable boundary
`!`	Exception
`{`, `}`	Category
`*`	Wildcard, matches exactly 1 of any character
`<`	Ditto-mark, matches exactly 1 of the previous character
`+`	Greedy-ditto-mark, matches 1 or more of the previous character
`…`	Anythings-mark, matches 1 or more of any character, equivalent to `*(+)`
`=[`, `]`	Quantifier
`@[`, `]`	Positioner
`^`	Insertion when in `BEFORE`, deletion when in `AFTER`
`~[`, `]`	Blocker
`&`	Indicates metathesis, and the reordered contents
`1`, `2`, ... `9`	In a Metathesis rule, in `AFTER`, these represent the changing graphemes
`"`	Escapes characters enclosed in them

Lexiguru documentation

Contents

1About Lexiguru

2Interface

2.1Options

2.2File save / load

3Using comments

4Categories

4.1Long-form category names

4.2Category-drop-off

4.3Categories inside categories and set-categories

5Assigning weights

6Building words

6.1Words

6.1.1Word-drop-off

6.2Segments

6.3Pick-one set

6.4Optional set

6.4.1Optional weight

6.5Inter-pick-one set

6.5.1Inter-pick-one weight

7Alphabetisation and graphemes

7.1Alphabetisation

7.2Defining graphs

7.2.1Alternative graphemes

7.3Invisibility

8Word creation character escape

9Sound change

10The change

10.1Concurrent set

10.2Merging set

10.3Optional set

10.4Reject

11The condition

11.1Multiple conditions in one rule

11.2Optional and concurrent sets

11.3Word boundary

11.4Syllable boundary

11.5Word-based condition

12The exception

13Using categories

14The features directive

15Wildcards and positioning

15.1Wildcard

15.2Ditto-mark

15.3Greedy-ditto-mark

15.4Anythings-mark

15.5Quantifier

15.6Positioner

16Insertion and deletion

17Advanced sound-changes

17.1Blocker

17.2Metathesis

18Logic blocks

18.1If block

18.2Chance block

18.3Rule macro

19Cluster-field

20Engine

21Sound-change character escape