Nesca
documentation
Version 0.3.3
Contents
- About Nesca
- Interface
- Using comments
- About graphemes
- Defining graphemes
- Transform
- The change
- Insertion and deletion
- The condition
- The exception
- Categories
- Features
- Alternator and Optionalator
- Cluster-field
- Wildcard, repetition and positioning
- Advanced rules
1About Nesca
This is the complete documentation for Nesca version 0.3.3
Nesca is a "sound change applier", it takes a set of transformation rules and applies them to words to simulate historical sound changes or under similar conditions. It also offers other word modification utilities including capitalisation and X-SAMPA to IPA. Nesca is an easy to use but powerful tool for conlangers and linguists.
2Interface
- The textbox at the bottom of the program is the definition-build editor. A definition-build defines the sound changes. There will already be a default definition-build in the definition-build editor, or the previous definition-build that you applied to words
- The
Input words
textbox is where you list all the words you want sound changes applied - The
Output words
textbox is where your changed words will appear - Use the
Apply
button to see Nesca apply your sound changes to your words - Use the
Copy
button to copy the words inOutput words
to your clipboard
2.1Options
Word-list mode
will produce a list of changed wordsOld-to-new mode
will produce a list of changed words in the formatold word -> new word
Debug mode
will show, line by line, each step in changing each wordShow keyboard
will reveal a 'keyboard', a character selector, below the editorEditor wrap lines
will make the definition-build editor jump to the next line if the line escapes the width of the definition-build editorWord divider
sets the delimiter, or in other words, what the content is between each word. It is a newline by default. Use\n
for newline
2.2File save / load
Use the Save
button to download your sound changes as a file called 'Nesca.txt', or what you named your file in the File name:
field. The file is always a ".txt" type.
Use the Load
button to load a file on your system into the file editor.
3Using comments
Comments are made with a semicolon ;
anything after it on the same line is treated as a comment. Comments are useful to explain what a rule does.
; This is a comment
e > o ;and this is a comment following a rule.
4About graphemes
Graphemes are indivisible meaningful characters that make a word. Phonemes can be thought of as graphemes. If we use English words sky
and shy
as examples to illustrate this, sky
is made up by the graphemes s
+ k
+ y
, while shy
is made up by sh
+ y
.
4.2Escaping characters
A single-length character following the syntax character \
ignores any meaning it might have had in the program, including backslashes themselves. This way, anything including capital letters that have already been defined as categories, brackets, even spaces can be graphemes.
4.2.1Category and feature character escape
These are the characters you must escape if you want to use them in categories and features:
Characters | Meaning |
---|---|
; |
Comment |
\ |
Escapes a character after it |
"{ and } |
Named escape |
C , D , K , ... |
Any one-length capital letter can refer to a category |
+ or - or > ... |
A feature |
+- |
Begins a feature-field |
, or |
Separates choices |
4.2.2Transform character escape
These are the characters you must escape if you want to use them in the transform block:
Characters | Meaning |
---|---|
; |
Comment |
\ |
Escapes a character after it |
"{ and } |
Named escape |
> , -> , => , ⇒ or → |
Indicates change |
, or |
Separates choices |
[ and ] |
Alternator-set |
( and ) |
Optionalator-set |
C , D , K , ... |
Any one-length capital letter can refer to a category |
@{ and } |
A feature-matrix |
^ or ∅ |
Insertion when in TARGET , deletion when in RESULT |
^REJECT or ^R |
Rejects a word |
/ |
A condition follows this character |
? |
A chance condition follows this character |
! or // |
An exception follows this character |
_ |
The underscore _ is a reference to the target |
# |
Word boundary |
+ |
Quantifier, matches as 1 or more of the previous grapheme |
+{ and } |
Bounded quantifier |
: |
Geminate-mark, matches twice to the previous grapheme |
* |
Wildcard, matches exactly 1 of any grapheme |
& or … |
Anythings-mark, matches ungreedily 1 or more wildcards |
&{ or …{ and } |
Blocked-anythings-mark |
| |
Engines are placed after this character, and a space |
< |
Target-reference |
~ |
Indicates a metathesis change |
4.2.3Named escape
Named escapes, enclosed in "{
and }
allow space and combining diacritics to be used without needing to insert these characters.
The supported characters are:
Named escape | Character |
---|---|
"{Space} |
|
"{Acute} |
◌́ |
"{DoubleAcute} |
◌̋ |
"{Grave} |
◌̀ |
"{DoubleGrave} |
◌̏ |
"{Circumflex} |
◌̂ |
"{Caron} |
◌̌ |
"{Breve} |
◌̆ |
"{InvertedBreve} |
◌̑ |
"{TildeAbove} |
◌̃ |
"{TildeBelow} |
◌̰ |
"{Macron} |
◌̄ |
"{Dot} |
◌̇ |
"{DotBelow} |
◌̣ |
"{Diaeresis} |
◌̈ |
"{DiaeresisBelow} |
◌̤ |
"{Ring} |
◌̊ |
"{RingBelow} |
◌̥ |
"{Horn} |
◌̛ |
"{Hook} |
◌̉ |
"{CommaAbove} |
◌̓ |
"{CommaBelow} |
◌̦ |
"{Cedilla} |
◌̧ |
"{Ogonek} |
◌̨ |
If you are using this, you should be very interested in the Compose engine.
5Defining graphemes
The graphemes:
directive tells Nesca which (multi)graphs, including character + combining diacritics, are to be treated as grapheme units when using transformations.
In the above example, we defined ch
as a grapheme. This would stop a rule such as c -> g
changing the word chat
into ghat
, but it will make cobra
change into gobra
.
6Transform
All transforms must be used inside this block. To terminate this block you use an END
line. However, all unterminated blocks are automatically terminated at the end of the definition-build:
; Your rules go here
END
A rule can be summarised in four fields: CHANGE / CONDITION ! EXCEPTION
. The characters /
and !
that precede each field (except for the CHANGE
) are necessary for signalling each field. For example, including a !
will signal that this rule contains an exception, and all text following it until the next field marker will be interpreted as such.
Every rule begins on a new line and must contain a CHANGE
. The CONDITION
or EXCEPTION
fields are optional.
If you want to capture graphemes that are normally syntax characters in transforms, you will need to escape them.
When this document uses examples to explain transformations, the last comment shows an example word transforming. For example ; amda ==> ampa
means the rule will transform the word amda
into ampa
7The change
The format of the change can be expressed as TARGET -> RESULT
.
TARGET
specifies which part of the word is being changed- Then followed by a space and the
>
character.>
can be swapped with either->
,=>
,⇒
or→
if you prefer RESULT
is whatTARGET
is changing into, or in other words, replacing
Let's look at a simple unconditional rule:
o -> x
; bodido ==> bxdidx
In this rule, we see every instance of o
become x
.
7.1Concurrent change
Concurrent change is achieved by listing multiple graphemes in TARGET
separated by commas, and listing the same amount of resultant graphemes in RESULT
separated by commas. Changes in a concurrent change execute at the same time:
o, a -> a, o
; boda ==> bado
Notice that the above example is different to the example below:
a -> o
; boda ==> bodo
where each change is on its own line. We can see o
merge with a
, then a
becomes o
.
7.2Merging change
Instead of listing each RESULT
in a concurrent change, we can instead list just one that all the TARGET
s will merge into:
o, a -> x
; boda ==> bxdx
This is equivalent to:
o, a -> x, x
; boda ==> bxdx
7.3Reject
To remove, or in other words, reject a word, use the ^REJECT
keyword in RESULT
:
In the above example, any word that contains a
or bi
will be rejected.
A shorthand version to ^REJECT
is ^R
8Insertion and deletion
Insertion requires a condition to be present, and for a caret ^
to be present in TARGET
, representing nothing.
Deletion happens when ^
is present in RESULT
:
9The condition
Conditions follow the change and are placed after a forward slash. When a transform has a condition, the target must meet the environment described in the condition to execute.
The format of a condition is / BEFORE_AFTER
- A forward slash
/
begins a condition BEFORE
is anything in the word before the target- The underscore
_
is a reference to the target in a condition AFTER
is anything in the word after the target
For example:
o -> x / p_p
; opoptot ==> opxptot
9.1Multiple conditions in one rule
Multiple conditions for a single rule can be made by separating each condition with additional forward slashes. The change will happen if it meets either, or both of the conditions:
o -> x / p_p / t_t
; opoptot ==> opxptxt
9.2Word boundary
Hash #
matches to word boundaries. Either the beginning of the word if it is in TARGET
, or the end of the word if it is in RESULT
; opoppop ==> opoppxp
9.3The chance condition
The chance condition is placed following a ?
as a number from 0 to 100. This number represents the chance of the transformation occuring:
In the above example, the transformation will execute only 30% of the time.
10The exception
Exceptions are placed following an exclamation mark !
and go after the condition, if there is one. Exceptions function exactly like the opposite of the condition -- when a transform has an exception, the target must meet the environment described in the exception to prevent execution:
In the above example, the transformation will not execute if aa
is at the end of the word.
An alternative to using an exclamation mark is to use two forward slashes //
.
11Categories
A category is a set of graphemes with a key. The key is a singular-length capital letter. These must be defined outside of the transform block. For example:
This creates three groups of graphemes. C
is the group of all consonants, V
is the group of all vowels, and F
is the group of some of the consonants that will be used syllable finally.
These graphemes are separated by commas, however an alternative is to use spaces: C = t n k m ch l ꞌ s r d h w b y p g
.
Need more than 26 categories? Nesca supports the following additional characters as the key of a category or segment: Á Ć É Ǵ Í Ḱ Ĺ Ḿ Ń Ó Ṕ Ŕ Ś Ú Ẃ Ý Ź À È Ì Ǹ Ò Ù Ẁ Ỳ Ǎ Č Ď Ě Ǧ Ȟ Ǐ Ǩ Ľ Ň Ǒ Ř Š Ť Ǔ Ž Ä Ë Ḧ Ï Ö Ü Ẅ Ẍ Ÿ Γ Δ Θ Λ Ξ Π Σ Φ Ψ Ω
You can use categories inside categories. For example:
11.1Using categories
You can reference categories in transforms. The category will behave in the same way as an alternator set:
BEGIN transform:
B -> ^
; xapay ==> apa
If the category is part of a target, it MUST be inside an alternator set:
BEGIN transform:
[B]v -> ^
; xvapay ==> apay
12Features
Let's say you had the grapheme, or rather, phoneme /i/ and wanted to capture it by its distinctive vowel features, +high
and +front
, and turn it into a phoneme marked with +high
and +back
features, perhaps /ɯ/. Features let you do this.
The key of all features must consist of lowercase letters a to z, uppercase letters a to z, .
, -
or +
12.1Pro-feature
A feature prepended with a plus sign +
is a 'pro-feature'. For example +voice
. We can define a set of graphemes that are marked by this feature by using this pro-feature. For example:
12.2Anti-feature
A feature prepended with a minus sign -
is an 'anti-feature'. For example -voice
. We can define a set of graphemes that are marked by a lack of this feature by using this anti-feature. For example:
12.3Para-feature
A feature prepended with a greater-than-sign >
is a 'para-feature'. A para-feature is simply a pro-feature where the graphemes marked as the anti-feature of this feature are the graphemes in the graphemes:
directive that are not not marked by this para-feature:
>voice = a, i, o
Is equivalent to the below example:
-vowel = b, h, k, n, t
'Where does this leave graphemes that are not marked by either the pro-feature or the anti-feature of a feature?', you might ask. Such graphemes are unmarked by that feature.
12.4Referencing features inside features
Features can be referenced inside features. For example:
+non-yod = +vowel, ^i
Use a caret in front of a grapheme to ensure that that grapheme is not part of the pro/anti/para-feature. In the example above, the pro-feature '+non-yod
' is composed of the graphemes a
and o
-- the grapheme i
is not part of this pro-feature. Due to the recursive nature of nested features, this removed grapheme will be removed... aggressively. For example, If +non-yod
were to be referenced in a different feature, that feature would always not have i
as a grapheme.
12.5Using Features
To capture graphemes that are marked by features in a transform, the features must be listed in a 'feature-matrix' surrounded by @{
and }
. The graphemes in a word must be marked by each pro-/anti-feature in the feature-matrix to be captured. For example if a feature-matrix @{+high, +back}
captures the graphemes: u, ɯ
, another feature-matrix @{+high, +back, -round}
would capture ɯ
only.
The very simple example below is written to change all voiceless graphemes that have a voiced counterpart into their voiced counterparts:
In this rule, in RESULT
, @{+voice}
has a symmetrical one-to-one change of graphemes from the graphemes in @{-voice}
in TARGET
, leading to a concurrent change. Let's quickly imagine a scenario where the only @{+voice}
grapheme was b
. The result will be a merging of all -voice
graphemes into b
: tamepfa ==> bamebba
.
12.6Feature-field
Feature-fields allow graphemes to be easily marked by multiple features in table format.
The feature-field begins with a +-
and a space. The graphemes being marked by the features are listed on the first row. The features are listed in the first column.
For example:
+-
m n p b t d k g s h l j
voice +
+
-
+
-
+
-
+
-
-
+
+
plosive -
-
+
+
+
+
+
+
-
-
-
-
nasal +
+
-
-
-
-
-
-
-
-
-
-
fricative -
-
-
-
-
-
-
-
+
+
-
-
approx -
-
-
-
-
-
-
-
-
-
+
+
labial +
-
+
+
-
-
-
-
-
-
-
-
alveolar -
+
-
-
+
+
-
-
+
-
+
-
palatal -
-
-
-
-
-
-
-
-
-
-
+
velar -
-
-
-
-
-
+
+
-
-
-
-
glottal -
-
-
-
-
-
-
-
-
+
-
-
+-
a e i o
high -
-
+
-
mid -
+
-
+
low +
-
-
-
front -
+
+
-
back +
-
-
+
round -
-
-
+
- A
+
means to mark the grapheme by that feature's pro-feature - A
-
means to mark the grapheme by that feature's anti-feature - A
.
means to leave the grapheme unmarked by that feature
Here are some matrices of these features and which graphemes they would capture:
@{+plosive}
captures the graphemesb, d, g, p, t, k
@{+voiced, +plosive}
captures the graphemesb, d, g
@{+voiced, +labial, +plosive}
captures the graphemeb
13Alternator and Optionalator
These cannot be nested.
13.1Alternator-set
Enclosed in square brackets, [
and ]
, only one Item in an alternator set will be part of each sequence. For example:
The above example is equivalent to:
These can also be used in exceptions and conditions.
13.2Optionalator-set
Items in an optionalator, enclosed in (
and )
can be captured whether or not they appear as part of a grapheme or as part of a sequence of graphemes:
x(w) -> k
; xwaxaħa ==> kakaħa
Optional-set can also attach to an alternator-set:
[x, ħ](w) -> k
; xwaxaħa ==> kakaka
Optionalator-set cannot be used on its own, it must be connected to other content.
14Cluster-field
Cluster-field is a way to target sequences of graphemes and change them. They are laid out as tables, and start with %
followed by a space. The first part of a sequence is in the first column, and the second part is in the first row. For example:
- In this example,
np
becomes mp andmt
becomes nt +
means to not change the target cluster at all-
means to reject the word if it contained that sequence- Cluster-fields can use
^
or∅
to delete the target sequence - These are executed concurrently just like concurrent changes. Their order does not matter
- Cluster-fields can also use conditions and exceptions, just put them on their own line
15Wildcards, repetition and positioning
Wildcards and the like in this section are special tokens that can represent arbitrary amounts of arbitrary graphemes, which is especially useful when you don't know precisely how many, or of what kind of grapheme there will be between two target graphemes in a word.
15.1Quantifier
Quantifier, using +
, will match once or as many times as possible to the grapheme to the left of it. Quantifier cannot be used in RESULT
:
; raraaaaa ==> roro
15.2Bounded quantifier
The bounded quantifier matches as many times its digit(s), enclosed in +{
and }
, to the things to the left.
o -> x / r+{3}_
; ororrro ==> ororrrx
The digits in the quantifier can also be a range:
o+{2,4} -> x
; tootooooo ==> txtxo
At the beginning of the list, ,
represents all the possible numbers lower than the number to the right, not including zero.
o+{,4} -> x
; tootooooo ==> txtx
And finally at the end of the list, ,
represents all possible numbers larger than the number to the left
o+{4,} -> x
; toootooooo ==> toootx
15.3Geminate-mark
Geminate-mark using colon :
, will match twice to the grapheme, or grapheme from a set or category, to the left of it.
; aaata => oata
Unlike quantifier, a geminate mark can be used in RESULT
:
; tat => taat
15.4Kleene-star
Occasionally, you may want to match a grapheme whether it exists, there is one of it, or there is multiple of it consecutively, known as a "Kleene-star". There is no dedicated character for a Kleene star. Instead, wrap the content followed by a quantifier, in an optionalator:
; ruaruaaaaa ==> roro
15.5Wildcard
Wildcard, using asterisk *
, will match once to any grapheme. Wildcard does not match word boundaries. Wildcard cannot be used in RESULT
:
; Any grapheme becomes /x/ when any grapheme follows it
* -> x / _*
; aomp ==> xxxp
Wildcard can be placed by itself inside an optionalator (*)
, thereby allowing it to match nothing as well.
15.6Anythings-mark
The anythings-mark uses ampersand &
or the ellipsis character …
U+2026. It will match as many (but not zero) times to any grapheme as needed. For example:
; babitto => xto
As we can see, the rule matched b
followed by anything else until it reached the first t
, then stopped matching. Why did the anythings-mark not continue matching t
and beyond like *+
would? This is because it is non-greedy, or in other words, lazy. The anythings-mark will continue matching graphemes until a grapheme that would be matched matches an item following the anythings-mark.
The example below uses an optional anythings-mark in the condition:
a, i, u -> ã, ĩ, ũ / [ã, ĩ, ũ]&_
; pabãdruliga ==> pabãdrũlĩgã
15.7Blocked-anythings-mark
Blocked-anythings-mark is designed to block the spreading behaviour of the anythings-mark when certain graphemes are ahead of it. You enclose a set of graphemes inside &{
and }
that will block spreading. For example we might want the graphemes k
or g
to prevent the rightward spread of nasal vowels to non nasal vowels:
; pabãdruliga ==> pabãdrũlĩga
16Advanced rules
16.1Engine
The engine statement provides useful functions that you can call at any point in the transform block. These engines are called following a |
and a space on a new line. You can also call a list of these functions in one line. For example: | compose, capitalise
decompose
will break-down all characters in a word into their "Unicode Normalization, Canonical Decomposition" form. For example,ñ
as a singular unicode entity, \u00F1, will be broken-down into a sequence of two characters, \u006E + \u0303compose
does the opposite of decompose. It converts all characters in a word to the "Unicode Normalization, Canonical Decomposition followed by Canonical Composition" form. For example,ñ
as two characters \u006E\u0303, will be transformed into one character, \u00F1capitalise
will convert the first character of a word to uppercasedecapitalise
will convert the first character of a word to lowercaseto-upper-case
will convert all characters of a word to uppercaseto-lower-case
will convert all characters of a word to lowercasexsampa_to_ipa
will convert graphemes of a word written in X-SAMPA into IPAipa_to_xsampa
will convert graphemes of a word written in IPA into X-SAMPA
16.2Target-reference
A target-reference is a reference to the captured TARGET
graphemes. It cannot be used in TARGET
. This uses the less-than symbol <
.
Here are some examples where target-reference is employed:
Full reduplication:
"Haplology":
Reject a word when a word-initial consonant is identical to the next consonant:
16.3Metathesis change
Simple metathesis involves a tilde ~
in RESULT
. This will swap the first and last grapheme from the captured TARGET
graphemes:
; apma ==> ampa