
Vocabug
documentation
Version 0.0.0
Contents
- About Vocabug
- Interface
- Using comments
- What graphemes are
- Categories
- Assigning weights
- Building words
- Alphabetisation and graphs
- Word creation character escape
- Transform
- The change
- The condition
- The exception
- Using categories
- The features directive
- Wildcards and positioning
- Insertion and deletion
- Advanced sound-change
- Logic blocks
- Cluster-field
- Engine
- Sound-change character escape
1About Vocabug
This is the complete documentation for Vocabug, version 0.0.0
Vocabug is an online application that randomly generates vocabulary from a given definition of graphemes, frequencies and word patterns. You can use it to make words for a constructed language, to get an original nickname or password, or just for fun.
2Interface
- The textbox at the top of the program is the definition-build editor. A definition-build defines the graphemes, frequencies, word-shapes and sound changes that generate the final words. There will already be a default definition-build in the definition-build editor, or the previous definition-build that generated words
- Use the
Generate
button to see Vocabug produce words - Use the
Copy
button to copy the words to the clipboard
2.1Options
- Use the
Number of words
textbox to choose the number of words to generate. The default number is 100 Word-list mode
will produce a list of wordsParagraph mode
will produce words that look vaguely like sentences by injecting punctuation into the word list and capitalising the first word of each sentenceDebug mode
will show, line by line, each step in creating each wordEditor wrap lines
will make the definition-build editor jump to the next line if the line escapes the width of the definition-build editorRemove duplicates
will make sure all words generated are uniqueForce words
will force the generator to try and generate the complete number of words requested within 30 seconds, despite the number of rejections / duplicates removedSort words
andCapitalise words
should be self explanatory- The
Word divider
textbox sets the delimiter, or in other words, what the content will be between each word in the output. It is a space "\n
to get one word for each line
2.2File save / load
- Use the
Save
button to download the definition-build as a file called 'vocabug.txt', or whatever you named your file in theFile name:
field. The file is always a ".txt" type - Use the
Load
button to load a file from your system into the definition-build editor - Use the buttons in the
Examples
dropdown to load a number of example definition-builds into the definition-build editor
3Using comments
If a line contains a semicolon ;
everything after it on that line is ignored and not interpreted as Vocabug syntax -- unless ;
is escaped. You can use this to leave notes about what something does or why you made certain decisions.
4What graphemes are
Graphemes are indivisible meaningful characters that make a generated word in Vocabug. Phonemes can be thought of as graphemes. If we used English words sky
and shy
as examples to illustrate this, sky
is made up by the graphemes s
+ k
+ y
, while shy
is made up by sh
+ y
.
If a word is built using the syntax character ^
, that character will dissapear in the generated word. In other words ^
is a null grapheme. If you want to use ^
as a grapheme, you will need to escape it. To use other syntax characters as graphemes they must be escaped too.
5Categories
A category is a set of graphemes with a name. The name is usually a singular-length character, but can be long-form. For example:
C = t, n, k, m, ch, l, ꞌ, s, r, d, h, w, b, y, p, g F = n, l, ꞌ, t, k, r, p V = a, i, e, u, o
This creates three groups of graphemes. C
is the group of all consonants, V
is the group of all vowels, and F
is the group of some of the consonants that will be used syllable finally.
These graphemes are seperated by commas, however an alternative is to use spaces: C = t n k m ch l ꞌ s r d h w b y p g
. You may not use both commas and spaces as separators on the same line, i.e: A = a b, c
.
By default, the graphemes' frequencies decrease as they go to the right, according to the Gusein-Zade distribution. In the above example, when Vocabug needs to choose a V
, it will choose a
the most at 43%, i
the second-most at 26%, e
the third-most at 17%, u
the fourth-most at 10%, and o
the fifth most at 4%.
5.1Long-form category names
You can also give categories long names, but you will need to enclose them in curly brackets {
and }
when using them:
consonant = t, n, k, m, ch, l, ꞌ, s, r, d, h, w, b, y, p, g vowel = a, i, e, u, o words: {consonant}{vowel}
5.2Category-drop-off
This option modifies the default grapheme frequencies of categories. For example: category-drop-off: flat
. There are three options:
zipfian
has the fastest drop-off, graphemes frequencies will decrease fast as they go to the right, according to the Zipf distributiongusein-zade
approximates a natural frequency distribution of graphemes. As already stated, this is the defaultflat
will make all graphemes have an equal chance of being chosen
5.3Categories inside categories and set-categories
You can use categories inside categories, as long as the referenced category has previously been defined. For example:
class-drop-off: flat
L = aa, ii, ee, oo V = a, i, e, o, L
In the example above, V
has a 20% chance of being a long vowel.
You can also enclose a set of graphemes in square brackets [
and ]
. This is called a 'set-category'. This set will be treated as if it were a reference to a category in terms of frequency. For example, we could write the same example as this:
class-drop-off: flat
V = a, i, e, o, [aa, ii, ee, oo]
Assigning weights to categories in categories and set-categories is possible.
Categories inside categories and set-categories CANNOT be a part of any sequence. for example C = Xz
or C = x[c, d]
or C = [a, b][c, d]
will not give the results you might want. To get sequence-like behaviour like that, you will need to use segments.
6Assigning weights
If you want to set your own frequency for graphemes in a category, items in a category-set, Pick-one, Optional, or Inter-pick-one set, or word-shapes in the words directive, you can use a colon :
to specify the weight for each item, like so:
V = a:5, e:4, i:3, o:2, u:1 $S = [V:8 x:2] words: $S:2 y
V
has approximately the following probabilities: a: 33%, e: 27%, i: 20%, o: 13%, u: 7%. The Pick-one set in the $S
segment has an 80% chance of being the V category over the x grapheme. And the first word-shape in the words:
directive has twice the chance of being chosen over the next word-shape.
As you might have seen in the example above, in a sequence that has an option that has a weight, it overwrites any drop off frequencies. Also important to note is that any other option that you had not given a weight, is given a default weight of 1.
7Building words
7.1Words
The words:
directive defines a set of 'word-shapes' that Vocabug will choose from to create words. A word-shape can consist of individual graphemes, categories, segments or a mixture of both.
By default, words are selected using the Zipf distribution. The first word-shape will be chosen the most often, then the second word shape the second most often and so on. Below is a very simple example that will generate words with one to three CV syllables:
C = t, n, k, m, l, s, r, d, h, w, b, j, p, g V = a, i, o, e, u words: CV, CVCV, CVCVCV
7.1.1Word-drop-off
This directive modifies how often the words' frequencies decrease as they go to the right, unless they have weights. The options are zipfian
, gusein-zade
, and flat
. The default is zipfian
.
It is better to not use this directive or give word-shapes weights -- it is an uphill battle. For example, if you chose to remove duplicates in the above example, it is already removing one syllable words the most often. And if you have paragraph mode turned on, you would want simple syllables to occur very often. So it is best to simply rearrange the word-shapes in the words directive to get good-looking results. Nevertheless, maybe you want to use a flat distribution because you are only generating CVCV syllables of different types, or generating something that doesn't play by the rules.
7.2Segments
Segments are a system that provides an abbreviation of parts of a word-shape. Typically you would use it to define the shape of a syllable. Segments are defined similarly to categories, but with several important differences:
- Every segment's name starts with
$
.S = s
is a category;$S = s
is a segment. - Segments are not sets like categories are.
$M = a, b, c
will not work. You would need to use a "Pick-one" set, i.e:$M = [a, b, c]
- Segments have an effect on the logic behind Inter-pick-ones. In this sense, segments are not just abbreviation.
For example you could write the last example like so:
$S = CV words: $S $S$S $S$S$S
7.3Pick-one set
A Pick-one set is a group of graphemes and categories separated by spaces or commas, enclosed in square brackets [
and ]
. Vocabug will pick an option from that Pick-one just like it would from a segment. For example:
V = a, u words: t[V, x]
This will produce either ta
, tu
or tx
.
Pick-one sets can be nested inside each other.
Anything inside the Pick-one can be assigned a weight, and a Pick-one itself can be assigned a weight as well if it is nested inside another set:
words: [a:1, b:2, [c, d]:2]
7.4Optional set
Using round brackets, (
and )
, Optional works the same way as Pick-one, the only difference is that what's inside them can either appear in the word or not. The probability of each of these variants is 10% by default.
words: ta(n, t, l)
In the above example, there is a 10% chance of getting one of tan
, tat
or tal
, but a 90% chance of ta
.
7.4.1Optional weight
This default probability can be modified in two ways. The first is by attaching a percentage-based weight following a ?
inside the Optional set:
$S = ta(n, t, l ?30) words: $S
Now there is a 30% chance of getting one of tan
, tat
or tal
.
The other way to change this probability is through the optional-rate:
directive. This directive specifies how often an Optional set is selected. This number is a percentage and as previously stated the default is 10%. For example:
optional-rate: 20
You can write this number with a percentage sign on the end if you want to.
7.5Inter-pick-one set
An Inter-pick-one, using less and greater than signs <
and >
, works the same as Pick-one. The difference is, only one Inter-pick-one set will be chosen for that segment or word-shape.
Inter-pick-one is a feature designed to help generate words with stress or pitch accent systems. Here is an example where it is used for a stress system:
C = t V = a $X = (<'>CV)<'>CV words: $X
This produces any of the following words: 'ta
, ta'ta
, 'tata
, never any words with a double '
. Notice here that ta
is not possible -- An Inter-pick-one set is only chosen after dealing with any sets any Inter-pick-one sets are nested in.
There are a few restrictions and peculiarities to it. Most notably, Inter-pick-ones may not be nested inside each other. Lets look at another example:
class-drop-off: flat words: <a, b><x>
The above example is rather silly, as there is nothing between each Inter-pick-one, defeating its whole purpose. However it is useful as an example here in showing that it is equivalent to the example below, which uses Pick-ones instead.
class-drop-off: flat words: [[a, b],[x]]
In both of the above examples, there is a 25% chance of producing a
, a 25% chance of b
, and a 50% chance of producing x
.
7.5.1Inter-pick-one weight
Inter-pick-one weights begin with an @
inside the set. The number of the weight behaves like semicolon weights rather than percentage-based weights. Lets look at a scenario break it down:
class-drop-off: flat $Y = <a:2, b:1 @3><c> words: <$Y @8>-<d @2>
In the segment $Y
, a
and b
have a three times greater chance of being chosen over c
, while a
has a weight that makes it twice as probable than b
. In the words:
directive, there is one word-shape, and that word-shape has an 80% chance of being the segment Y
followed by -
, and a 20% chance of the word being being -d
.
See the "Romance-like" example for a language that uses Inter-pick-one for its stress system, or the "BTX" example for a language that uses it for a complex pitch accent system.
8Alphabetisation and graphemes
The alphabet:
, graphs:
, alphabet-graphs:
and invisible:
directives can be an important element to your definition-build. Let's go over its uses.
8.1Alphabetisation
The alphabet directive gives Vocabug a custom alphabetisation order for words, when the sort words checkbox is selected.
alphabet: a, b, c, e, f, h, i, k, l, m, n, o, p, p', r, s, t, t', y
This would order generated words like so: cat chat cumin frog tray t'a
8.2Defining graphs
The graphs:
directive tells Vocabug which (multi)graphs, including character + combining diacritics, are to be treated as grapheme units when using sound-changes.
graphs: a, b, c, ch, e, f, h, i, k, l, m, n, o, p, p', r, s, t, t', y
In the above example, we defined ch
as a grapheme. This would stop a sound change such as c -> g
changing the word chat
into ghat
, but it will make cobra
change into gobra
.
"But my list of graphemes is the same as my list of alphabeticalising letters, I don't want to list them twice", you might exclaim. Well, you can create an alphabetisation order and list your graphemes in one line using the alphabet-and-graphs:
directive.
8.2.1Alternative graphemes
The graphs:
directive can tell Vocabug which character + combining diacritic sequences are to be treated as alternatives of a base grapheme. Lets name these alternatives the 'children' and the base grapheme the 'parent'. You can do this by enclosing the 'children' in <[
and ]
as a set, directly after their 'parent'.
Important: The left-most precomposed character of a 'child' must be the same as its 'parent'.
This should be useful for tonal languages that mark tone with diacritics on vowels. In these tonal languages, we no longer need to list every variation of a vowel + diacritic to target a vowel:
graphs: a, <[á, à, ā, ǎ], h, i, <[í, ì, ī, ǐ], k, l, m, n, o, <[ó, ò, ō, ǒ], t
a -> e
; mápǎ ==> mépě
However we can still target a vowel with a tone mark, such as ǎ
:
ǎ -> e
; mápǎ ==> mápě
8.3Invisibility
Sometimes you will want characters, such as syllable dividers, to be invisible to alphabetisation. You can do this by listing these characters in the invisible-alphabet: directive.
invisible-alphabet: ., ˈ
This would order generated words ˈpa.ta ˈca.ta za.ˈta ca.ˈa
as ca.ˈa, ˈca.ta, ˈpa.ta, za.ˈta
9Word creation character escape
Characters enclosed in a set of double quotes ignore any meaning they might have had in the generator, including double quotes themselves. This way, anything including capital letters that have already been defined as categories, brackets, even spaces, can be generated.
These are the characters you must escape if you want to use them in in categories, segments and the words directive:
Characters | Meaning |
---|---|
; |
Comment |
C , D , Ḱ , ... |
Any one-length character can refer to a category |
{ , } |
References a long-form category name |
, |
Separates choices |
|
Space, separates choices. An alternative to commas |
$ |
Defines a segment |
: |
Gives weight to a grapheme, segment, set, or word-shape |
? |
Gives probability of an Optional set being chosen |
@ |
Gives weight of an Inter-pick-one set being chosen over others |
[ , ] |
Pick-one set |
( , ) |
Optional set |
< , > |
Inter-pick-one set |
" |
Escapes characters enclosed in them |
10Transform
Once words are generated, you might want to modify them to prevent certain sequences, outright reject certain words, or simulate historical sound changes. This is the purpose of the sound-change block, which implements the NASC program.
All sound changes must be used inside this block. To terminate a block you must have an END line. However, all unterminated blocks are automatically terminated at the end of the definition-build:
BEGIN soundchange:
; Your rules go here
END
The format of all parts of a NASC rule can be summarised as CHANGE / CONDITION ! EXCEPTION | FLAG
Every rule begins on a new line and must contain a CHANGE
. The CONDITION
, EXCEPTION
or FLAG
parts are optional.
If you want to target graphemes that are normally syntax characters in sound changes, you will need to escape them.
11The change
The format of the change can be expressed as BEFORE -> AFTER
.
BEFORE
specifies which part of the word is being changed- Then followed by a space and the
>
character.>
can be swapped with either->
,=>
,⇒
or→
if you prefer AFTER
is whatBEFORE
is changing into, or in other words, replacing
Let's look at a simple unconditional rule:
; Replace every /o/ with /x/
o -> x
; bodido ==> bxdidx
In this rule, we see every instance of o
become x
.
11.1Concurrent set
A concurrent set in a change is achieved by listing multiple graphemes in BEFORE
separated by commas in square brackets, and listing the same amount of resultant graphemes in AFTER
separated by commas in square brackets. Changes in a concurrent change execute at the same time:
; Switch /o/ and /e/ around
[o, a] -> [a, o]
; boda ==> bado
Notice that the above example is different to the example below:
o -> a
a -> o
; boda ==> bodo
where each change is on its own line. We can see o
merge with a
, then a
becomes o
.
In the above example, square brackets were used, but because the entire rule was a concurrent set, the square brackets are optional:
; Switch /o/ and /e/ around
o, a -> a, o
; boda ==> bado
11.2Merging set
A merging change is accomplished by placing graphemes enclosed in square brackets in BEFORE
, with a corresponding singular grapheme in AFTER
that the graphemes in the set will merge into:
; Three graphemes becoming two graphemes
[ʃ, z], dz -> s, d
; zeʃadzas ==> sesadas
11.3Optional set
Items in an Optional set can be targeted whether or not they appear as part of a grapheme or as part of a sequence of graphemes:
; Merge /x/ and /xw/ into /k/
x(w) -> k
; xwaxaħa ==> kakaħa
Optional change can also attach to a concurrent or merging change:
; Merge /x/, /xw/, /ħ/ and /ħw/ into /k/
[x, ħ](w) -> k
; xwaxaħa ==> kakaka
Looking at the above example, Lets say you wanted to preserve this optional /w/ following /k/ or /ħ/. We can do this by writing this /w/ in AFTER
, enclosed by round brackets:
; Like the previous rule, but preserve labialisation
{x, ħ}(w) -> k(w)
; xwaxaħa ==> kwahaka
The Optional set can also be a merging change, or concurrent change too:
; Like the previous rule, but preserve palatalisation and labialisation
[x ħ](w, j) -> k(w, j)
; xwaxjaxa ==> kwakjaka
11.4Reject
To remove, or in other words, reject a word, you use the ^REJECT
keyword in AFTER
:
a, bi -> ^REJECT
In the above example, any word that contains a
or bi
will be rejected.
12The condition
Conditions follow the change and are placed after a forward slash. The condition may also be called the environment.
The format of a condition is / PRE_POST
PRE
is anything in the word before the target- The underscore
_
is a reference to the target POST
is anything in the word after the target
For example:
; Change /o/ into /x/ only when it is between /p/s
o -> x / p_p
; opoptot ==> opxptot
12.1Multiple conditions in one rule
Multiple conditions for a single rule can be made by separating each condition with additional forward slashes. The change will happen if it meets either, or both of the conditions:
; Change /o/ into /x/ only when it is between /p/s or /t/s
o -> x / p_p / t_t
; opoptot ==> opxptxt
12.2Optional and concurrent sets
Optional and Concurrent sets can be used in conditions:
a -> e / k(w)_[p, s]
; kwop-po-kos-po ==> kwxp-po-kxs-ko
12.3Word boundary
#
matches to word boundaries. Either the beginning of the word if it is in BEFORE
, or the end of the word if it is in AFTER
o -> x / p_p#
; opoppop ==> opoppxp
12.4Syllable boundary
$
matches to syllable boundaries. A syllable boundary is either the beginning or end of the word, or any of the symbols defined in the syllable-boundary:
directive.
For example:
syllable-boundary: .
t$t -> d$d
; at.ta ==> ad.da
12.5Word-based condition
If we wanted to execute a sound change only on a list of words, we simply write those words as a list in a condition without any underscores
sw -> s / _o / swore, sworn
In the above example, the sound change will only execute if the word is swore
or sworn
13The exception
Exceptions are placed following a !
and go after the condition, if there is one. Exceptions function exactly like the opposite of the condition -- they will make sure the content in the exception does not execute a change:
sw -> s / _o ! swore, sworn
In the above example, the sound change will not execute if the word is swore
or sworn
14Using categories
You can reference categories in sound-changes by inclosing a category in curly brackets {
and }
. The category will behave in the same way as a concurrent or merging set:
B = x, y, z
sound-change:
{B} -> ^
; xapay ==> apa
15The features directive
Lets say you had the grapheme, or rather, phoneme /i
/ and wanted to target it by its distinctive vowel features, +high
and +front
, and turn it into a phoneme marked with +high
and +back
features, perhaps /ɯ
/. The features:
directive block lets you do this:
- Features are defined inside the features block. The features block begins with
BEGIN features
and terminates withEND
- A feature prepended with a plus sign
+
is a 'pro-feature'. For example+voice
. In the features block, we can define a set of graphemes that are marked by this feature by using this pro-feature. For example:+voice = b, d, g, v, z
- A feature prepended with a minus sign
-
is an 'anti-feature'. For example-voice
. In the features block, we can define a set of graphemes that are marked by a lack of this feature by using this anti-feature. For example:-voice = p, t, k, f, s
- Where does this leave graphemes that are not marked by either the pro-feature or the anti-feature of a feature?, you might ask. Such graphemes are unmarked by that feature.
- To target graphemes that are marked by features in a sound change, the features must be listed in a 'feature-matrix' using curly brackets
{
and}
. The graphemes in a word must be marked by each pro-/anti-feature in the feature-matrix to be targeted. For example if a feature-matrix{+high, +back}
targets the graphemes:u, ɯ
, another feature-matrix{+high, +back, -round}
would targetɯ
only.
The very simple example below is written to change all voiceless graphemes that have a voiced counterpart into their voiced counterparts:
BEGIN features: -voice = p, t, k, f, s +voice = b, d, g, v, z END {-voice} -> {+voice} ; tamefa ==> dameva
In this sound-change, in AFTER
, {+voice}
has a symetrical one-to-one change of graphemes from the graphemes in {-voice}
in BEFORE
, leading to a concurrent change. Lets quickly imagine a scenario where the only {+voice}
grapheme was b
. The result will be a merging of all -voice
graphemes into b
: tamepfa ==> bamebba
. Similarly, in a different scenario where the only -voice
grapheme was p
, p
would become the first grapheme in {+voice}
, which happens to be b
: tamepfa ==> tamebfa
Para-feature
- A feature defined without a prepended plus or minus sign is a 'para-feature'. A para-feature is a pro-feature without a listed anti-feature counterpart. Instead, the graphemes marked as the anti-feature are the graphemes in the
graphs:
directive that are not not marked by the para-feature.
Notice: If there is nographs:
directive in the definition-build, there will be zero anti-feature phonemes. If you define an anti-feature as the counterpart of a para-feature, your anti-feature will be ignored.
graphs: a, b, h, i, k, n, o, t
BEGIN features: vowel = a, i, o
END
In the above example, the matrix {-vowel}
targets the graphemes b, h, k, n, t
Combining features
We can 'combine' features. Or to be more accurate, a feature's graphemes can mirror the graphemes of other features by defining a feature with features in it. The combined features must be a pro-feature or anti-feature:
BEGIN features: labial = p, b, m alveolar = t, d, s, l, n palatal = j velar = k, g glottal = h consonant = +labial, +alveolar, +palatal, +velar, +glottal END
15.1Feature-field
Feature-fields allow graphemes to be easily marked by multiple features at the same time.
- The feature-field begins with a
%
followed by a para-feature. Think of this para-feature as the parent feature of the other features in that feature-cluster. The graphemes marked by this para-feature are listed in the first row. The graphemes marked by the anti-feature counterpart are the graphemes in thegraphs:
directive that are not not marked by the para-feature. - The graphemes being marked by the features are listed on the first row
- The features are listed in the first column
- A
+
means to mark the grapheme by that feature's pro-feature - A
-
means to mark the grapheme by that feature's anti-feature - A
.
means to leave the grapheme unmarked by that feature
Here is an example of comprehensive features of consonants and vowels:
graphs: a, e, i, o, p, b, t, d, k, g, s, h, l, j, m, n BEGIN features: %consonant m n p b t d k g s h l j voice + + - + - + - + - - + + plosive - - + + + + + + - - - - nasal + + - - - - - - - - - - fricative - - - - - - - - + + - - approx - - - - - - - - - - + + labial + - + + - - - - + + - - alveolar - + - - + + - - - - + - palatal - - - - - - - - - - - + velar - - - - - - + + - - - - glottal - - - - - - - - - + - - %vowel a e i o high - - + - mid - + - + low + - - - front - + + - back + - - + round - - - + END
Here are some matrices of these features and which graphemes they would capture:
{+plosive}
targets the graphemesb, d, g, p, t, k
{+voiced, +plosive}
targets the graphemesb, d, g
{+voiced, +labial, +plosive}
targets the graphemeb
{+vowel}
targets the graphemesa, e, i, o
{-vowel}
targets the graphemesp, b, t, d, k, g, f, v, s, z, h, l, r, j
Notice a problem that could occur with the above example? The above example has no overlapping features between consonants and vowels, which is fine. But the example below describes a language that has overlapping features between vowels and consonants, namely, syllabic consonants that carry tone. The solution here is to list all phonemes in just one feature-field:
BEGIN features: %phoneme m n p b t d k g s h l j n̩ ń̩ ǹ̩ a á à e é è i í ὶ o ó ὸ syllabic - - - - - - - - - - - - + + + + + + + + + + + + + + + vowel - - - - - - - - - - - - - - - + + + + + + + + + + + + high . . . . . . . . . . . . . . . - - - - - - + + + - - - mid . . . . . . . . . . . . . . . - - - + + + - - - + + + low . . . . . . . . . . . . . . . + + + - - - - - - - - - front . . . . . . . . . . . . . . . - - - + + + + + + - - - back . . . . . . . . . . . . . . . + + + - - - - - - + + + round . . . . . . . . . . . . . . . - - - - - - - - - + + + low_tone . . . . . . . . . . . . . . - - - + - - + - - + - - + mid_tone . . . . . . . . . . . . + - - + - - + - - + - - + - - high_tone . . . . . . . . . . . . . . + - + - - + - - + - - + - consonant + + + + + + + + + + + + + + + - - - - - - - - - - - - voice + + - + - + - + - - + + + + + + + + + + + + + + + + + plosive - - + + + + + + - - - - - - . . . . . . . . . . . . . nasal + + - - - - - - - - - - + + . . . . . . . . . . . . . fricative - - - - - - - - + + - - - - . . . . . . . . . . . . . approx - - - - - - - - - - + + - - . . . . . . . . . . . . . labial + - + + - - - - + + - - + - . . . . . . . . . . . . . alveolar - + - - + + - - - - + - - + . . . . . . . . . . . . . palatal - - - - - - - - - - - + - - . . . . . . . . . . . . . velar - - - - - - + + - - - - - - . . . . . . . . . . . . . glottal - - - - - - - - - + - - - - . . . . . . . . . . . . . END
16Wildcards and positioning
16.1Wildcard
Wildcard will match once to any character, or multigraph defined in the graphs:
directive. Wildcard does not match word boundaries. Wildcard cannot be used in AFTER
:
a -> e / _*
; apappap ==> apappep
16.2Ditto-mark
Ditto-mark will match once to the grapheme, or grapheme in a set, category, or feature, to the left of it:
a< -> a
; aata => ata
16.3Greedy-ditto-mark
Greedy-ditto-mark will match as many times as possible to the grapheme, or grapheme in a set, category, or feature, to the left of it
a+ -> a
; raraaaaa ==> rara
16.4Anythings-mark
The anythings-mark is the ellipsis character …
U+2026. It will match as many times to any character, or multigraph defined in the graphs:
directive, as needed. For example:
b…t -> x
; babãittati => xtati
As we can see, the rule matched b
followed by anything else until it reached t
, then stopped matching. Why did the anythings-mark not continue matching t
and beyond like *+
would? This is because it is non-greedy, or in other words, lazy. The anythings mark will continue matching graphemes until a grapheme that would be matched matches an item following the anythings mark.
The example below uses the anythings-mark in the condition:
; Simulate spreading of nasality to vowels
[a i u] -> [ã ĩ ũ] / [ã ĩ ũ](…)_
; babãittati => babãĩttãtĩ
16.5Quantifier
The quantifier matches as many times its number to the things to the left.
Change /o/ into /x/ only when preceded by three /r/s
o -> x / r=[3]_ ; rrrorro ==> rrrxrro
The numbers in the quantifier can also be a list of numbers:
Change /o/ into /x/ only when preceded by zero or four /r/s
o -> x / r=[0, 4]_ ; orrrorro ==> xrrrxrro
The number in the quantifier can also be a range. To do this, put a :
between the lowest and highest range:
Change /o/ into /x/ only when preceded by two to four /r/s
o -> x / r=[2:4]_ ; rrrorro ==> rrrorro
Here is a useful lookup table on getting quantities of ditto-marks or wildcards:
Wildcard | Ditto-mark | |
---|---|---|
Exactly 1 of | * |
< |
0 or 1 of | (*) |
(<) |
1 or more of | … |
+ |
0, 1, or more of | (…) |
(+) |
Specific number(s) of | *=[N] |
<=[N] |
Number range(s) of | *=[N:N] |
<=[N:N] |
16.6Positioner
Positioners allows a grapheme to be captured only when it is the Nth in the word:
; Change the second /o/ in a word to /x/ after the second /s/
o@[2] -> x / s@[2]_ ; sososo ==> sosxso
If we want to match the last occurence of a grapheme in a word, use -1
. For the second last occurance of a grapheme in a word, use -2
, and so forth:
; Change the last /o/ in a word to /x/
o@[-1] -> x ; sososo ==> sososx
The numbers in the positioner can also be a list of numbers:
; Change the first and third /o/ in a word to /x/
o@[1, 3] -> x ; sososo ==> sxsosx
The number in the positioner can also be a range. To do this, put a :
between the lowest and highest range:
; Change the first to third /o/ in a word to /x/
o@[1:3] -> x ; sososoo ==> sxsxsxo
17Insertion and deletion
Insertion requires a condition to be present, and for the ^
to be present in BEFORE
, representing nothing.
; insert /a/ in between /b/ and /t/ ^ -> a / b_t ; bt ==> bat
Deletion happens when ^
is present in AFTER
; delete every /b/ b -> ^ ; bubda ==> uda
18Advanced sound-changes
18.1Blocker
A Blocker is designed to block the spread of greedy, spreading, behaviours. For example we might want the graphemes k
or g
to prevent the rightward spread of nasal vowels to non nasal vowels:
[a, i, u] -> [ã, ĩ, ũ] / [ã, ĩ, ũ]…~[k, g]_
; pabãdruliga ==> pabãdrũlĩga
18.2Metathesis
Metathesis in NASC refers to the reordering of graphemes in a word. Metathesis in real-world diachronics is usually sporadic, but can be regular.
To make a rule a metathesis rule, use these symbols:
- The pipe
|
marks the content (if any) between the targets we want to reorder. You must use the same amount of|
s inBEFORE
as inAFTER
- Numbers in
AFTER
refer to the targets. Reordering these numbers reorders the targets. It is possible to have up to nine - Underscores
_
in a condition or exception, are references to the targets. Unlike a normal rule, we can have multple
Local metathesis
A typical type of metathesis is local two-place metathesis:
; An intervocalic stop + nasal sequence becomes nasal + stop
[stop]|[nasal] -> 2|1 / V__V
; watna ==> wanta
Long-distance metathesis
The example below approximates metathesis that occured in Spanish:
r|l -> 2|1 / _(…)[plosive]_
; parabla ==> palabra
One-place metathesis
To simulate one-place metathesis, move |
s.
The example below is metathesis where words beginning with stop
+ vowel
will try and move an r
in a stop
+ r
cluster to form a word initial stop
+ r
cluster:
{stop}|r -> 12| / #_{vowel}…{stop}_
; kabatros ==> krabatos
Metathesis madness
Three or more sounds, to a maximum of 9, switching places, are possible, with shuffling of any |
:
x|y|z -> ||321
; xaayooz ==> aaoozyx
19Logic blocks
Logic blocks are a way of executing sound changes depending on a trigger event that we are listening for.
19.1If block
Using an If block, You can make sound changes execute on a word if, or if not, other sound change(s) were applied to the word.
It should feel familiar to anyone who knows a bit about programming languages
BEGIN if:
starts the if block and where sound changes will be listened to and trigger other events on the word if, or if not, it is executed on that word.then:
is where you put sound changes that will execute if the sound changes inif:
did applyelse:
is is where you put sound changes that will execute if the sound changes inif:
did not applyEND
is the end of the block
For example:
BEGIN if:
; Deletion of schwa before r ə -> ^ / _r then:
; Then do metathesis of r and l r|l -> 2|1 / _|[plosive]_ else:
; Schwa becomes e if the first rule did not apply ə -> e END
Note: The above example is actually quite bogus if it were a historical sound change. Sound change in natural diachronics has no memory. We can have "two-part" sound-changes such as this triggered metathesis, but a sound change executing on a word because another sound change did not apply to the word does not occur, at least not in real-life natural human languages.
19.2Chance block
The chance block is a way to apply sound-change depending on percentage-based chance:
BEGIN chance 15:
a -> e
END
In the above example we have a 15% chance of words with an a
in them such as pa
becoming pe
19.3Rule macro
Rule macro saves rules to be used later in the definition-build as many times as needed. The rules inside the define-rule-macro:
block do not run until invoked using do-rule-macro:
:
BEGIN def-rule-macro resyllabify:
i -> j / _[a,e,o,u]
u -> w / _[a,e,i,o]
END
do-rule-macro: resyllabify
ʔ -> ^
do-rule-macro: resyllabify ; iaruʔitua ==> jaruʔitwa ==> jaruitwa ==> jarwitwa
In the above example we saved two rules as a macro under the name "resyllabify" and used that macro twice.
20Cluster-field
Cluster-fields are a way to target and change sequences of graphemes. They are laid out like tables, and start with %
. For example:
% a i u a + + o i - + uu u - - +
The first grapheme is the row, and the second grapheme is the column. In this example, au
becomes o and iu
becomes uu. +
means to leave the combination as-is, and -
means to reject the word. This table would permit ai
but reject ia
.
Cluster-fields can also use ^
in them to remove a sequence.
As with filters, these are parsed in the order presented. The cluster-field ends at a blank line or the end of the definition-build.
21Engine
The engine statement provides useful functions that you can call at any point in the definition-build. You can also call a list of these functions in one line e.g: engine: compose, capitalise
decompose
will break-down all characters in a word into their "Unicode Normalization, Canonical Decomposition" form. For example,ñ
as a singular unicode entity, \u00F1, will be broken-down into a sequence of two characters,n
\u006E +◌̃
\u0303. The typescript function is called Normalize("NFD")compose
does the opposite of decompose. It converts all characters in a word to the "Unicode Normalization, Canonical Decomposition followed by Canonical Composition" form. For exampleñ
as two characters \u006E\u0303, will be transformed into one character, \u00F1. The typescript function is called Normalize("NFC")capitalise
will convert the first character of a word to uppercasede-capitalise
will convert the first character of a word to lowercaseto-upper-case
will convert all characters of a word to uppercaseto-lower-case
will convert all characters of a word to lowercasexsampa_to_ipa
will convert graphemes of a word written in X-SAMPA into IPAipa_to_xsampa
will convert graphemes of a word written in IPA into X-SAMPAunicode_entities
will convert the HTML name of a unicode entity following an ampersand into its unicode entity. For exampleÀ
will makeÀ
22Sound-change character escape
Characters | Meaning |
---|---|
; |
Comment |
> , -> , => , ⇒ , → |
Indicates change |
, |
Separates choices |
[ , ] |
Concurrent or merging set |
( , ) |
Optional set |
^REJECT |
Rejects a word |
/ |
The condition follows this character |
_ |
The underscore _ is a reference to the target |
# |
Word boundary |
$ |
Syllable boundary |
! |
The exception follows this character |
{ , } |
Category or feature-matrix |
* |
Wildcard, matches exactly 1 of any character |
< |
Ditto-mark, matches exactly 1 of the previous character |
+ |
Greedy-ditto-mark, matches 1 or more of the previous character |
… |
Anythings-mark, matches 1 or more of any character. It is non-greedy |
=[ , ] |
Quantifier |
@[ , ] |
Positioner |
^ |
Insertion when in BEFORE , deletion when in AFTER |
~[ , ] |
Blocker |
| |
Indicates metathesis, and the reordered contents |
1 , 2 , ... 9 |
In a Metathesis rule, in AFTER , these represent the changing graphemes |
" |
Escapes characters enclosed in them |