NASC documentation
Contents
- About NASC
- Interface
- Using comments
- Defining graphs
- The change
- The condition
- The exception
- Using categories
- The features directive
- Wildcards and positioning
- Insertion and deletion
- Advanced sound-change
- Logic blocks
- Cluster-field
- Engine
- Character escape
1About NASC
This is the complete documentation for NASC version 0.0.0
NASC, (Neonnaut’s Applier of Sound Changes), is a sound change applier. It is designed to be an easy to use, and powerful tool for applying sound changes to words. It is designed to be used by conlangers, linguists, and anyone else who needs to apply sound changes to words. NASC has been influenced by similar SCAs, most notably: Brassica, Lexurgy, Geoff's Sound Change Applier, and KathTheDragon's SCE.
2Interface
- Use the
Examples
dropdown button to load a number of examples into the file editor - The
Input words
textbox is where you list all the words you want sound changes applied - The
Word divider
textbox sets the delimiter, or in other words, what the content is between each word in the input. It is a space (\n
if each word is on it's own line - The file editor is the main input. It defines the sound changes. There will already be a default definition in the file editor, or your previous definition that you applied to words
- Use the
Apply
button to see NASC apply your sound changes to your words - Use the
Copy
button to copy the words to your clipboard
2.1Options
Debug mode
will show, line by line, each step in changing each wordEditor wrap lines
will make the file editor jump to the next line if the line escapes the width of the file editor- The
Word divider
textbox sets the delimiter, or in other words, what the content will be between each word in the output. It is a space (\n
to get one word for each line
2.2File save / load
Use the Save
button to download your sound changes as a file called 'NASC.txt', or what you named your file in the File name:
field. The file is always a ".txt" type.
Use the Load
button to load a file on your system into the file editor.
3Using comments
If a line contains a semicolon ;
everything after it on that line is ignored and not interpreted as NASC syntax -- unless ;
is escaped. Comments are useful to explain what a rule does.
; This is a comment
e > o ;and this is a comment following a rule.
4Defining graphs
The graphs:
directive tells NASC which (multi)graphs, including character + combining diacritics, are to be treated as grapheme units when using sound-changes.
graphs: a, b, c, ch, e, f, h, i, k, l, m, n, o, p, p', r, s, t, t', y
In the above example, we defined ch
as a grapheme. This would stop a sound change such as c -> g
changing the word chat
into ghat
, but it will make cobra
change into gobra
.
"But my list of graphemes is the same as my list of alphabeticalising letters, I don't want to list them twice", you might exclaim. Well, you can create an alphabetisation order and list your graphemes in one line using the alphabet-and-graphs:
directive.
4.1Alternative graphemes
The graphs:
directive can tell NASC what character + combining diacritic sequences are to be treated as alternatives of a base grapheme. Lets name these alternatives the 'children' and the base grapheme the 'parent'. You can do this by enclosing the 'children' in <[
and ]
as a set, directly after their 'parent'.
Important: The left-most precomposed character of a 'child' must be the same as it's 'parent'.
This should be useful for tonal languages that mark tone with diacritics on vowels. In these tonal languages, we no longer need to list every variation of a vowel + diacritic to target a vowel:
graphs: a, <[á, à, ā, ǎ], h, i, <[í, ì, ī, ǐ], k, l, m, n, o, <[ó, ò, ō, ǒ], t
a -> e
; mápǎ ==> mépě
However we can still target a vowel with a tone mark, such as ǎ
:
ǎ -> e
; mápǎ ==> mápě
5The change
The format of The change can be expressed as BEFORE -> AFTER
.
BEFORE
specifies which part of the word is being changed- Then followed by a space and the
>
character.>
can be swapped with either->
,=>
,⇒
or→
if you prefer AFTER
is whatBEFORE
is changing into, or in other words, replacing
Let's look at a simple unconditional rule:
; Replace every /o/ with /x/
o -> x
; bodido ==> bxdidx
In this rule, we see every instance of o
become x
.
5.1Concurrent set
A concurrent set in a change is achieved by listing multiple graphemes in BEFORE
separated by commas, and listing the same amount of resultant graphemes in AFTER
separated by commas. Changes in a concurrent change execute at the same time.
; Switch [o] and [e] around
o, a -> a, o
; boda ==> bado
Notice that the above example is different to the example below:
o -> a
a -> o
; boda ==> bodo
where each change is on its own line. We can see o
merge with a
, then a
becomes o
.
5.2Merging set
A merging change is accomplished by placing graphemes enclosed in square brackets in BEFORE
, with a corresponding singular grapheme in AFTER
that the graphemes in the set will merge into:
; Three graphemes becoming two graphemes
[ʃ, z], dz -> s, d
; zeʃadzas ==> sesadas
5.3Optional set
Items in an Optional set can be targeted wether or not they appear as part of a grapheme or sequence of graphemes:
; Merge /x/ and /xw/ into /k/
x(w) -> k
; xwaxaħa ==> kakaħa
Optional change can also attach to a concurrent or merging change:
; Merge /x/, /xw/, /ħ/ and /ħw/ into /k/
[x, ħ](w) -> k
; xwaxaħa ==> kakaka
Looking at the above example, Lets say you wanted to preserve this optional /w/ following /k/ or /ħ/. We can do this by writing this /w/ in AFTER
, enclosed by round brackets:
; Like the previous rule, but preserve labialisation
{x, ħ}(w) -> k(w)
; xwaxaħa ==> kwahaka
The Optional set can also be a merging change, or concurrent change too:
; Like the previous rule, but preserve palatalisation and labialisation
[x ħ](w, j) -> k(w, j)
; xwaxjaxa ==> kwakjaka
5.4Reject
To remove, or in other words, reject a word from your list of words, you use the ^REJECT
keyword in AFTER
:
a, bi -> ^REJECT
In the above example, any word that contains a
or bi
will be rejected.
6The condition
Conditions follow The change and are placed after a forward slash. The condition may also be called the environment.
The format of a condition is / PRE_POST
PRE
is everything before the target- The underscore
_
is a reference to the target POST
is everything after the target
For example:
; Change /o/ into /x/ only when it is between /p/s
o -> x / p_p
; opoptot ==> opxptot
6.1Multiple conditions in one rule
Multiple conditions for a single rule can be made by separating each condition with additional forward slashes. The change will happen if it meets either, or both of the conditions:
; Change /o/ into /x/ only when it is between /p/s or /t/s
o -> x / p_p / t_t
; opoptot ==> opxptxt
6.2Optional and concurrent sets
Optional and Concurrent sets can be used in conditions:
a -> e / k(w)_[p, s]
; kwop-po-kos-po ==> kwxp-po-kxs-ko
6.3Word boundary
#
matches to word boundaries. Either the beginning of the word if it is in BEFORE
, or the end of the word if it is in AFTER
o -> x / p_p#
; opoppop ==> opoppxp
6.4Syllable boundary
$
matches to syllable boundaries. A syllable boundary is either the beginning or end of the word, or any of the symbols defined in the syllable-boundary:
directive.
For example:
syllable-boundary: .
t$t -> d$d
; at.ta -> ad.da
6.5Word-based condition
If we wanted to execute a sound change only on a list of words, we simply write those words as a list in a condition without any underscores
sw -> s / _o / swore, sworn
In the above example, the sound change will only execute if the word is swore
or sworn
7The exception
Exceptions are placed following a !
and go after the condition, if there is one. Exceptions function exactly like the opposite of the condition -- they will make sure the content in the exception does not execute a change:
sw -> s / _o ! swore, sworn
In the above example, the sound change will not execute if the word is swore
or sworn
8Using categories
A category is a set of graphemes with a name, usually a singular-length character. You must declare categories inside the categories block. For example:
BEGIN categories C = t, n, k, m, ch, l, ꞌ, s, r, d, h, w, b, y, p, g F = n, l, ꞌ, t, k, r, p V = a, i, e, u, o END
This creates three groups of graphemes. C
is the group of all consonants, V
is the group of all vowels, and F
is a group of some of the consonants.
By default, the graphemes' frequencies decrease as they go to the right, according to the Gusein-Zade distribution. In the above example, when NASC needs to choose a V
, it will choose a
the most at 43%, i
the second-most at 26%, e
the third-most at 17%, u
the fourth-most at 10%, and o
the fifth most at 4%.
In the previous example, the graphemes were separated by commas, however an alternative when separating options, is to use spaces:
BEGIN categories C = t n k m ch l ꞌ s r d h w b y p g F = n l ꞌ t k r p V = a i e u o END
You may not use both commas and spaces as separators on the same line, i.e: "a b, c".
There are two advantages to using commas over spaces. They make it clearer what separates options -- in the above example things are very simple looking, but things can get a lot more complicated. Secondly, commas make it possible to define a null / zero grapheme in a class. For example C = t, , k, p
would be a category of three graphemes, and nothing. This document will be using a comma followed by a space throughout for these reasons.
You can also give categories long names:
consonant = t, n, k, m, ch, l, ꞌ, s, r, d, h, w, b, y, p, g
You reference categories in sound-changes by inclosing a category in curly brackets {
and }
. The category will behave in the same way as a concurrent or merging set:
BEGIN categories B = x, y, z
END {B} -> ^
; xapay ==> apa
14The features directive
Lets say you had a grapheme that was a phoneme such as /i/ and wanted to target it by its distinctive features of a vowel, +high and +front, and turn it into a phoneme with +high and +back features, i.e /ɯ/. The features:
directive block lets you do this:
BEGIN features: -voice = p, t, k, f, s +voice = b, d, g, v, z END {-voice} -> {+voice} ; tamefa ==> dameva
This very simple example above is written to change all voiceless phonemes that have a voiced counterpart into their voiced counterparts. To accurately explain how this directive works, there must be some nomenclature discussed:
+voice
is a feature, and references a series of graphemes.-voice
is an antipode, and it is the antipode counterpart of thevoice
feature.- In the sound-change, in
BEFORE
, the-voice
feature is being targeted inside a matrix using curly brackets{
and}
. A matrix can contain multiple features to narrow-down the graphemes being targeted. For example{+high, +back}
might target the graphemesu, ɯ
- In the sound-change, in
AFTER
,{+voice}
has a symetrical one-to-one correspondance of graphemes with it's counterpart inBEFORE
- Lets quickly imagine a scenario where the only
+voice
grapheme wasb
. The result will be a merging of all-voice
graphemes intob
:tamepfa ==> bamebba
. Similarly, in a different scenario where the only-voice
grapheme wasp
,p
would become the first grapheme in+voice
, which happens to beb
:tamepfa ==> tamebfa
Feature-pool
Feature-pools do two things at once. The graphemes that belong to a feature-pool, are defined by the graphemes defined in the features inside said feature-pool. The antipode graphemes of a feature-pool are defined by the graphemes defined in the graphs directive that are not defined in said feature-pool.
features inside a feature-pool without a defined antipode will be given an antipode. The antipode's graphemes are the graphemes found in the feature-pool but not in the feature.
Here is an example of comprehensive features of vowels:
graphs: a, e, i, o, u, m, n, p, b, t, d, k, g, f, v, s, z, h, l, r, j BEGIN features: BEGIN feature-pool vowel: high = i, u mid = e, o low = a front = i, e back = o, u, a +foo = i, u -foo = e, o END END
Here are some matrices of features and what graphemes they would capture:
{+vowel}
would target the graphemesa, e, i, o, u
{-vowel}
would target the graphemesp, b, t, d, k, g, f, v, s, z, h, l, r, j
{+high}
would target the graphemesi, u
{-high}
would target the graphemesa, e, o
{-foo}
would target the graphemese, o
{+high, +front}
would target the graphemei
10Wildcards and positioning
10.1Wildcard
Wildcard will match once to any character, or multigraph defined in the graphs:
directive. Wildcard does not match word boundaries. Wildcard cannot be used in AFTER
:
a -> e / _*
; apappap ==> apappep
10.2Ditto-mark
Ditto-mark will match once to the grapheme, or grapheme in a set, category, or feature, to the left of it:
a< -> a
; aata => ata
10.3Greedy-ditto-mark
Greedy-ditto-mark will match as many times as possible to the grapheme, or grapheme in a set, category, or feature, to the left of it
a+ -> a
; raraaaaa ==> rara
10.4Anythings-mark
The anythings-mark is the ellipsis character …
U+2026. It will match as many times to any character, or multigraph defined in the graphs:
directive, as possible. However it will stop matching when the grapheme to the right of the anythings-mark, is matched:
b…t -> x
; babãittati => xtati
As we can see, the rule matched b
followed by anything else until it reached t
, then stopped matching. The example below uses the anythings-mark in the condition:
; Simulate spreading of nasality to vowels
[a i u] -> [ã ĩ ũ] / [ã ĩ ũ](…)_
; babãittati => babãĩttãtĩ
10.5Quantifier
The quantifier matches as many times its number to the things to the left.
Change /o/ into /x/ only when preceded by three /r/s
o -> x / r=[3]_ ; rrrorro ==> rrrxrro
The number in the quantifier can also be a list of numbers:
Change /o/ into /x/ only when preceded by zero or four /r/s
o -> x / r=[0, 4]_ ; orrrorro ==> xrrrxrro
The number in the quantifier can also be a range. To do this, put a :
between the lowest and highest range:
Change /o/ into /x/ only when preceded by two to four /r/s
o -> x / r=[2:4]_ ; rrrorro ==> rrrorro
Here is a useful lookup table on getting quantities of ditto-marks or wildcards:
Wildcard | Ditto-mark | |
Exactly 1 of | * |
< |
0 or 1 of | (*) |
(<) |
1 or more of | … |
+ |
0, 1, or more of | (…) |
(+) |
Specific number(s) of | *=[N] |
<=[N] |
Number range(s) of | *=[N:N] |
<=[N:N] |
10.6Positioner
Positioners allows a grapheme to be captured only when it is the Nth in the word:
; Change the second /o/ in a word to /x/ after the second /s/
o@[2] -> x / s@[2]_ ; sososo ==> sosxso
If we want to match the last occurence of a grapheme in a word, use -1
. For the second last occurance of a grapheme in a word, use -2
, and so forth:
; Change the last /o/ in a word to /x/
o@[-1] -> x ; sososo ==> sososx
11Insertion and deletion
Insertion requires a condition to be present, and for the ^
to be present in BEFORE
, representing nothing.
; insert /a/ in between /b/ and /t/ ^ -> a / b_t ; bt ==> bat
Deletion happens when ^
is present in AFTER
; delete every /b/ b -> ^ ; bubda ==> uda
12Advanced sound-changes
12.1Blocker
A Blocker is designed to block the spread of greedy, spreading, behaviours, then stop the change from executing. For example we might want the graphemes k or g to prevent the rightward spread of nasal vowels to non nasal vowels:
[a, i, u] -> [ã, ĩ, ũ] / [ã, ĩ, ũ]…~[k, g]_
; pabãdruliga ==> pabãdrũlĩga
12.2Metathesis
Metathesis in NASC refers to the reordering of graphemes in a word. Metathesis in real-world diachronics is usually sporadic, but can be but can be regular.
To make a rule a metathesis rule, use these symbols:
- The ampersand
&
marks the content (if any) between the targets we want to reorder. You must use the same amount of&
s inBEFORE
andAFTER
- Numbers in
AFTER
refer to the targets. Reordering these numbers reorders the targets. It is possible to have up to nine. - Underscores
_
in a condition or exception, are references to the targets. Unlike a normal rule, we can have multple.
Local metathesis
A typical type of metathesis is local two-place metathesis:
; An intervocalic stop + nasal sequence becomes nasal + stop
[stop]&[nasal] -> 2&1 / V__V
; watna ==> wanta
Long-distance metathesis
The example below approximates metathesis that occured in Spanish:
r&l -> 2&1 / _(…)[plosive]_
; parabla ==> palabra
One-place metathesis
To simulate one-place metathesis, move &
s.
The example below is metathesis where words beginning with stop
+ vowel
will try and move an r
in a stop
+ r
cluster to form a word initial stop
+ r
cluster:
{stop}&r -> 12& / #_{vowel}…{stop}_
; kabatros ==> krabatos
Metathesis madness
Three or more sounds, to a maximum of 9, switching places, are possible, with shuffling of any &
:
x&y&z -> &&321
; xaayooz ==> aaoozyx
13Logic blocks
Logic blocks are a way of executing sound changes depending on a trigger event that we are listening for.
13.1If block
Using an If block, You can make sound changes execute on a word if, or if not, other sound change(s) were applied to the word.
It should feel familiar to anyone who knows a bit about programming languages
BEGIN if:
starts the if block and where sound changes will be listened to and trigger other events on the word if, or if not, it is executed on that word.then:
is where you put sound changes that will execute if the sound changes inif:
did applyelse:
is is where you put sound changes that will execute if the sound changes inif:
did not applyEND
is the end of the block
For example:
BEGIN if:
; Deletion of schwa before r ə -> ^ / _r then:
; Then do metathesis of r and l r&l -> 2&1 / _&[plosive]_ else:
; Schwa becomes e if the first rule did not apply ə -> e END
Note: The above example is actually quite bogus if it were a historical sound change. Sound change in natural diachronics has no memory. We can have "two-part" sound-changes such as this triggered metathesis, but a sound change executing on a word because another sound change did not apply to the word does not occur, at least not in real-life natural human languages.
13.2Chance block
The chance block is a way to apply sound-change depending on percentage-based chance:
BEGIN chance 15:
a -> e
END
In the above example we have a 15% chance of words with an a
in them such as pa
becoming pe
13.3Rule macro
Rule macro saves rules to be used later in the file as many times as needed. The rules inside the define-rule-macro:
block do not run until invoked using do-rule-macro:
:
BEGIN def-rule-macro resyllabify:
i -> j / _[a,e,o,u]
u -> w / _[a,e,i,o]
END
do-rule-macro: resyllabify
ʔ -> ^
do-rule-macro: resyllabify ; iaruʔitua ==> jaruʔitwa ==> jaruitwa ==> jarwitwa
In the above example we saved two rules as a macro under the name "resyllabify" and used that macro twice.
14Cluster-field
Cluster-fields are a way to target and change sequences of graphemes. They are laid out like tables, and start with %
. For example:
% a i u a + + o i - + uu u - - +
The first grapheme is the row, and the second grapheme is the column. In this example, au
becomes o and iu
becomes uu. +
means to leave the combination as-is, and -
means to reject the word. This table would permit ai
but reject ia
.
Cluster-fields can also use ^
in them to remove a sequence.
As with filters, these are parsed in the order presented. The cluster-field ends at a blank line.
15Engine
The engine statement provides useful functions that you can call at any point in the file. You can also call a list of these functions in one line e.g: engine: compose, capitalise
decompose
will break-down all characters in a word into their "Unicode Normalization, Canonical Decomposition" form. For example,ñ
as a singular unicode entity, \u00F1, will be broken-down into a sequence of two characters,n
\u006E +◌̃
\u0303. The typescript function is called Normalize("NFD")compose
does the opposite of decompose, it converts all characters in a word to the "Unicode Normalization, Canonical Decomposition followed by Canonical Composition" form. For exampleñ
as two characters \u006E\u0303, will be transformed into one character, \u00F1capitalise
will convert the first character to uppercase.de-capitalise
will convert the first character to lowercase.to-upper-case
will convert all characters to uppercase.to-lower-case
will convert all characters to lowercase.xsampa_to_ipa
will convert graphemes written in X-SAMPA into IPAipa_to_xsampa
will convert graphemes written in IPA into X-SAMPAunicode_entities
will convert the HTML name of a unicode entity following an backslash\
instead of it's usual ampersand into it's unicode entity. For example\Agrave
will makeÀ
16Character escape
Characters enclosed in a set of double quotes ignore any meaning they might have had, including double quotes themselves. This way, anything including, brackets, even spaces, can be changed or created.
These are the characters you must escape if you want to use them as graphemes:
Characters | Meaning |
---|---|
; |
Comment |
> , -> , => , ⇒ , → |
Indicates change |
, |
Separates choices |
[ , ] |
Set |
( , ) |
Optional set |
^REJECT |
Rejects a word |
/ |
Condition |
_ |
The underscore _ is a reference to the target |
# |
Word boundary |
$ |
Syllable boundary |
! |
Exception |
{ , } |
Category |
* |
Wildcard, matches exactly 1 of any character |
< |
Ditto-mark, matches exactly 1 of the previous character |
+ |
Greedy-ditto-mark, matches 1 or more of the previous character |
… |
Anythings-mark, matches 1 or more of any character, equivalent to *(+) |
=[ , ] |
Quantifier |
@[ , ] |
Positioner |
^ |
Insertion when in BEFORE , deletion when in AFTER |
~[ , ] |
Blocker |
& |
Indicates metathesis, and the reordered contents |
1 , 2 , ... 9 |
In a Metathesis rule, in AFTER , these represent the changing graphemes |
" |
Escapes characters enclosed in them |