Logo of astronaut

Nesca
documentation

Version 1.0.9

Contents

  1. About Nesca
  2. Interface
    1. Options
    2. File save / load
  3. Overall structure
    1. Comments
      1. The note directive
    2. About graphemes
    3. Escaping characters
      1. Transform character escape
    4. Named escape
  4. Categories
  5. Schema
  6. The alphabet directive
  7. The invisible directive
  8. The graphemes directive
  9. The stage directive
    1. Naming stages
    2. Rules
  10. The change
    1. Concurrent change
    2. Merging change
    3. Reject
  11. Insertion and deletion
  12. The condition
    1. Multiple conditions in one rule
    2. Word boundary
    3. Syllable boundary
      1. Syllable boundaries directive
  13. The exception
  14. Alternator and Optionalator
    1. Alternator-set
    2. Optionalator-set
  15. Using categories
  16. Features
    1. Pro-feature
    2. Anti-feature
    3. Para-feature
    4. Referencing features inside features
    5. Using features
    6. The feature-field directive
  17. Quantifiers and wildcards
    1. Quantifier
    2. Bounded quantifier
    3. Ditto-mark
    4. Wildcard
    5. Anythings-mark
      1. Laziness and cowardliness
  18. Cluster-field
  19. Advanced
    1. Routine
    2. Target-mark
    3. Metathesis-mark
    4. Empty-mark
    5. Reference
      1. Reference of singular grapheme
      2. Reference of grapheme sequence
    6. Associatemes
    7. Letter case field
    8. Recasts
  20. 1About Nesca

    This is the complete documentation for Nesca version 1.0.9

    Nesca is a Sound Change Applier. It applies transformation rules to words to change them. It can be used for historical or fictional sound changes, to spell words differently, or to convert words to other alphabets. Nesca is an easy to use but powerful tool for conlangers and linguists.

    Nesca is one of the applications belonging to The Conlangers Suite. You can install Nesca to be used in your own projects or as a command-line-interface here.

    2Interface

    • The textbox at the top of the program is the definition-build editor. A definition-build defines the sound changes. There will already be a default definition-build in the definition-build editor, or the previous definition-build that you applied to words
    • Use the Apply button to see Nesca apply your sound changes to your words. Yes, there are two apply buttons.
    • The Clear editor button clears the definition-build editor and the input words
    • The Config button will bring you to the configuration options
    • The Input words textbox is where you list all the words you want sound changes applied
    • The Help button shows this document
    • The Import button imports input words from a file on your system
    • The Output words textbox is where your changed words will appear
    • Use the button to copy the words in Output words to your clipboard
    • Use the button to download the words in Output words to your system

    2.1Options

    • Word-list mode will produce a list of changed words
    • Old-to-new mode will produce a list of changed words in the format old word -> new word
    • Debug mode will show, line by line, each step in changing each word
    • Input divider sets the delimiter, or in other words, what the content is between each input word. It is a newline by default. Use \n for newline
    • Output divider sets the delimiter, or in other words, what the content is between each output word. It is a newline by default. Use \n for newline
    • Sort words sorts the output words in alphabetical order, or the order defined in the alphabet: directive
    • Editor wrap lines will make the definition-build editor jump to the next line if the line escapes the width of the definition-build editor
    • Show keyboard will reveal a 'keyboard', a character selector, below the options. Clicking on a character will insert that character into the editor
    • Use the buttons in the Themes dropdown to change the colour theme of the editor

    2.2File save / load

    • Use the Save file button to download your sound changes as a file called 'Nesca.txt', or what you named your file in the File name: field. The file is always a ".txt" type.
    • Use the Load file button to load a file on your system into the file editor.
    • Use the buttons in the Examples dropdown to load an example into the definition-build editor

    3Overall structure

    A definition-build is comprised of two top-level concepts: 'directives' and 'decorators'.

    Directives are laid out like blocks and define the functions of Nesca. The primary directive is the stage directive, which modifies each word with transforms. The other directives define concepts that are used by this primary directives.

    Directives are written with their name on a newline, followed by a colon : on the same line, then followed by a newline. The payload after declaring a directive is interpreted according to the directive's semantics. A directive ends when a new directive begins, or when there are no more lines in the definition-build. For example:

    stage:
    example

    Decorators change a property of a directive to modify the directive's behaviour.

    Decorators start on a new line above the directive they are modifying with an at sign @, followed by the directive, a ., the property, optional whitespace, =, optional whitespace, and then the new value of the property. Or just the property if it's a boolean flag. For example:

    @stage.name = "Latin-to-Portuguese"
    stage:
    example

    To disable any directive, use the disabled flag decorator. This has the same effect as commenting out all the lines inside the directive:

    @stage.disabled
    stage:
    example

    3.1Comments

    If a line contains a semicolon ; everything after it on that line is ignored and not interpreted as Nesca syntax -- unless ; is escaped. You can use this to leave notes about what something does or why you made certain decisions.

    3.1.1The note directive

    The note directive allows you to write a comment that spans multiple lines. It should be noted that a new directive or decorator can interrupt a note.

    3.2About graphemes

    Graphemes are indivisible meaningful characters that make a word in Nesca. Phonemes can be thought of as graphemes. If we use English words sky and shy as examples to illustrate this, sky is made up by the graphemes s + k + y, while shy is made up by sh + y.

    3.3Escaping characters

    A single-length character following the syntax character \ ignores any meaning it might have had in the program, including backslashes themselves. This way, anything including capital letters that have already been defined as categories and brackets (but not whitespace) can be graphemes.

    3.3.1Transform character escape

    These are the characters you must escape if you want to use them in the stage directive:

    3.4Named escape

    Named escapes, enclosed in &[ and ] allow space and combining diacritics to be used without needing to insert these characters.

    The supported characters are:

    If you are using this, you should be very interested in the Compose routine.

    4Categories

    Categories are declared inside the categories directive on a line each. A category is a set of graphemes with a key. The key is a singular-length capital letter. For example:

    categories:
    C = t, n, k, m, ch, l,, s, r, d, h, w, b, y, p, g

    This creates two groups of graphemes. C is the group of all consonants and V is the group of all vowels.

    These graphemes are separated by commas, however an alternative is to use spaces: C = t n k m ch l ꞌ s r d h w b y p g.

    Need more than 26 categories? Nesca supports the following additional characters as the key of a category or unit: Á Ć É Ǵ Í Ḱ Ĺ Ḿ Ń Ó Ṕ Ŕ Ś Ú Ẃ Ý Ź À È Ì Ǹ Ò Ù Ẁ Ỳ Ǎ Č Ď Ě Ǧ Ȟ Ǐ Ǩ Ľ Ň Ǒ Ř Š Ť Ǔ Ž Ä Ë Ḧ Ï Ö Ü Ẅ Ẍ Ÿ Γ Δ Θ Λ Ξ Π Σ Φ Ψ Ω

    You can use categories inside categories, as long as the referenced category has previously been defined. For example:

    categories: L = aa, ii, ee, oo V = a, i, e, o, L

    5Schema

    For each input word, it is possible to split the input word into various fields such as "word" and "class" and any other fields in the input word, such as "meaning" or "gloss", with the schema directive.

    For example, let's say we have input words in this format, and this is one of the words:

    wiggyfoobar [a type of spigot] "noun"

    To indicate where a field is, the field is put between less-than and greater-than signs. Any other characters are parsed as delimiting characters. word is a required field. For example:

    schema: input = <word> [<meaning>] "<class>" output = <word> "<class>"

    This means for the input we have the following fields word, meaning, class, split by  [ , ] " and "

    The word will then be recomposed according to the format you choose in output. You can leave out any fields you do not want.

    6The alphabet directive

    The alphabet directive gives provides a custom alphabetisation order for words, when the sort words checkbox is selected.

    alphabet: a, b, c, e, f, h, i, k, l, m, n, o, p, p', r, s, t, t', y

    This would order generated words like so: cat, chat, cumin, frog, tray, t'a, yanny

    7The invisible directive

    Sometimes you will want characters, such as syllable dividers, to be invisible to alphabetisation order. You can do this by listing these characters in the invisible directive.

    invisible: ., '

    This will make these generated words: za'ta, 'ba.ta, 'za.ta be reordered into: 'ba.ta, za'ta, 'za.ta

    8The graphemes directive

    The graphemes directive dictates which (multi)graphs, including character + combining diacritics, are to be treated as grapheme units when using transformations.

    graphemes: a, b, c, ch, e, f, h, i, k, l, m, n, o, p, p', r, s, t, t', y

    In the above example, we defined ch as a grapheme. This will stop a rule such as c -> g changing the word chat into ghat, but it will make cobra change into gobra.

    Which graphemes are "associatemes" of their "bases" are declared in the graphemes directive. Read more about this in this section of the documentation.

    9The stage directive

    Once words are generated, you might want to modify them to prevent certain sequences, outright reject certain words, or simulate historical sound changes. This is the purpose of transforms, which are all declared in the stage directive:

    stage: ; Your transforms go here

    The default transform is a rule. These should be familiar to anyone who knows a little about phonological rules. The two other types of transforms are cluster-fields and routines.

    If you want to capture graphemes that are normally syntax characters in transforms, you will need to escape them.

    When this document uses examples to explain transformations, the last comment shows an example word transforming. For example ; amda ==> ampa means the rule will transform the word amda into ampa

    9.1Naming stages

    Using a decorator on a stage, you can name that stage. In debug mode, the name of the stage will be printed out when each word is being processed on that stage.

    @stage.name = My Transform Stage stage: ; Your transforms go here

    9.2Rules

    A rule can be summarised as four fields: CHANGE / CONDITION ! EXCEPTION. The operators / and ! that precede each field (except for the CHANGE) are necessary for signalling each field. For example, including a ! will signal that this rule contains an exception, and all text following it until the next field marker will be interpreted as such.

    Every rule begins on a new line and must contain a CHANGE. The CONDITION or EXCEPTION fields are optional.

    Rules can be wrapped on multiple lines after the field indicating operators:

    k -> ch / _i

    10The change

    The format of a rule's CHANGE can be expressed as TARGET -> REPLACEMENT.

    • TARGET specifies which part of the word is being changed
    • Then followed by a hyphen and greater-than-sign ->. -> can be swapped with either >>, =>, or if you prefer
    • REPLACEMENT is what TARGET is changing into, or in other words, replacing

    Let's look at a simple unconditional rule:

    ; Replace every <o> with <x>
    o -> x
    ; bodido ==> bxdidx

    In this rule, we see every instance of o become x.

    10.1Concurrent change

    Concurrent change is achieved by listing multiple graphemes in TARGET separated by commas, and listing the same amount of replacement graphemes in REPLACEMENT separated by commas. Changes in a concurrent change execute at the same time:

    ; Switch <o> and <a> around
    o, a -> a, o
    ; boda ==> bado

    Notice that the above example is different to the example below:

    o -> a
    a -> o
    ; boda ==> bodo

    Where each change is on its own line, o merges with a, then a becomes o.

    10.2Merging change

    Instead of listing each REPLACEMENT in a concurrent change, we can instead list just one that all the TARGETs will merge into:

    ; Merge <o> and <a> into <x>
    o, a -> x
    ; boda ==> bxdx

    This is equivalent to:

    ; Merge <o> and <a> into <x>
    o, a -> x, x
    ; boda ==> bxdx

    10.3Reject

    To remove, or in other words, reject a word, you use a zero 0 in REPLACEMENT:

    a, bi -> 0

    In the above example, any word that contains a or bi will be rejected.

    11Insertion and deletion

    Insertion requires a condition to be present, and for a caret ^ to be present in TARGET, representing nothing.

    ; Insert <a> in between <b> and <t> ^ -> a / b_t ; bt ==> bat

    Deletion happens when ^ is present in REPLACEMENT:

    ; Delete every <b> b -> ^ ; bubda ==> uda

    12The condition

    Conditions follow the change and are placed after a forward slash. When a transform has a condition, the target must meet the environment described in the condition to execute.

    The format of a condition is / BEFORE_AFTER

    • A forward slash / begins a condition
    • BEFORE is anything in the word before the target
    • The underscore _ is a reference to the target in a condition
    • AFTER is anything in the word after the target

    For example:

    ; Change <o> into <x> only when it is between <p>s
    o -> x / p_p
    ; opoptot ==> opxptot

    12.1Multiple conditions in one rule

    Multiple conditions for a single rule can be made by separating each condition with additional forward slashes. The change will happen if it meets either, or both of the conditions:

    ; Change <o> into <x> only when it is between <p>s or <t>s
    o -> x / p_p / t_t
    ; opoptot ==> opxptxt

    12.2Word boundary

    Hash # matches to word boundaries. Either the beginning of the word if it is in BEFORE, or the end of the word if it is in AFTER

    o -> x / p_p#
    ; opoppop ==> opoppxp

    12.3Syllable boundary

    Dollar-sign $ matches to either the character ., to any of the syllable-divider graphemes stated in the syllable-boundary directive, or if no match, tries to match word boundaries. Either the beginning of the word if it is in BEFORE, or the end of the word if it is in AFTER

    o -> x / p_p$
    ; o.pop.pop ==> o.pxp.pxp

    12.3.1Syllable boundaries directive

    The syllable-boundaries directive lets you define which graphemes are to be treated as a syllable-boundary:

    syllable-boundaries: ., ' stage: o -> x / p_p$
    ; o.pop'pop ==> o.pxp'pxp

    13The exception

    Exceptions are placed following an exclamation mark ! and go after the condition, if there is one. Exceptions function exactly like the opposite of the condition -- when a rule has an exception, the target must meet the environment described in the exception to prevent execution:

    aa -> a ! _#

    In the above example, the transformation will not execute if aa is at the end of the word.

    If there are multiple exceptions, the transform must meet all of the exceptions for it not to execute.

    An alternative to using an exclamation mark is to use two forward slashes //.

    14Alternator and Optionalator

    These are sets just like the sets in word-creation, but they cannot be nested.

    14.1Alternator-set

    Enclosed in curly braces, { and }, only one Item in an alternator set will be part of each sequence. For example:

    p{w, j} -> pp

    The above example is equivalent to:

    pw, pj -> pp

    These can also be used in exceptions and conditions:

    a -> e / {b, d}_

    14.2Optionalator-set

    Items in an optionalator, enclosed in ( and ) can be captured whether or not they appear as part of a grapheme or as part of a sequence of graphemes:

    ; Merge <x> and <xw> into <k>
    x(w) -> k
    ; xwaxaħa ==> kakaħa

    Optional-set can also attach to an alternator-set:

    ; Merge <x>, <xw>, <ħ> and <ħw> into <k>
    {x, ħ}(w) -> k
    ; xwaxaħa ==> kakaka

    Optionalator-set cannot be used on its own, it must be connected to other content.

    15Using categories

    You can reference categories in transforms. The category will behave in the same way as an alternator set:

    categories: B = x, y, z
    stage:
    B -> ^
    ; xapay ==> apa

    If the category is inside a set, it MUST be listed as an item on its own:

    categories: B = x, y stage: {B, z}v -> ^
    ; xvayazv ==> aya

    This is to say {Bz}v -> ^ is invalid.

    16Features

    Let's say you had the grapheme, or rather, phoneme /i/ and wanted to capture it by its distinctive vowel features, +high, -round and +front, and turn it into a phoneme marked with +high, -round and +back features, perhaps /ɯ/. The features directive and feature matrices let you do this. The features can be described as binary and "fully-specified".

    The key of all features must consist of lowercase letters a to z, uppercase letters a to z, ., - or +

    16.1Pro-feature

    A feature prepended with a plus sign + is a "pro-feature". For example +voice. We can define a set of graphemes that are marked by this feature by using this pro-feature. For example:

    features: +voice = b, d, g, v, z

    16.2Anti-feature

    A feature prepended with a minus sign - is an "anti-feature". For example -voice. We can define a set of graphemes that are marked by a lack of this feature by using this anti-feature. For example:

    features: -voice = p, t, k, f, s

    16.3Para-feature

    A feature prepended with a greater-than-sign > is a "para-feature". A para-feature is simply a pro-feature where the graphemes marked as the anti-feature of this feature are the graphemes in the graphemes: directive that are not not marked by this para-feature:

    graphemes: a, b, h, i, k, n, o, t features: >vowel = a, i, o

    Is equivalent to the below example:

    features: +vowel = a, i, o
    -vowel = b, h, k, n, t

    'Where does this leave graphemes that are not marked by either the pro-feature or the anti-feature of a feature?', you might ask. Such graphemes are unmarked by that feature.

    16.4Referencing features inside features

    Features can be referenced inside features. For example:

    features: +vowel = a, i, o
    +non-yod = +vowel, ^i

    Use a caret in front of a grapheme to ensure that that grapheme is not part of the pro/anti/para-feature. In the example above, the pro-feature "+non-yod" is composed of the graphemes a and o -- the grapheme i is not part of this pro-feature. Due to the recursive nature of nested features, this removed grapheme will be removed... aggressively. For example, If +non-yod were to be referenced in a different feature, that feature would always not have i as a grapheme.

    16.5Using Features

    To capture graphemes that are marked by features in a transform, the features must be listed in a "feature-matrix" surrounded by [ and ]. The graphemes in a word must be marked by each pro-/anti-feature in the feature-matrix to be captured. For example if a feature-matrix [+high, +back] captures the graphemes: u, ɯ, another feature-matrix [+high, +back, -round] would capture ɯ only.

    The very simple example below is written to change all voiceless graphemes that have a voiced counterpart into their voiced counterparts:

    features: -voice = p, t, k, f, s +voice = b, d, g, v, z stage: [-voice] -> [+voice] ; tamefa ==> dameva

    In this rule, in REPLACEMENT, [+voice] has a symmetrical one-to-one change of graphemes from the graphemes in [-voice] in TARGET, leading to a concurrent change. Let's quickly imagine a scenario where the only [+voice] grapheme was b. The result will be a merging of all -voice graphemes into b: tamepfa ==> bamebba.

    It should be noted that feature-matrices in TARGET have no carryover to feature-matrices in REPLACEMENT. For example, in a bogus rule such as o -> [+high] the program will not try to transform <o> into its [+high] counterpart, it will try and replace <o> with some grapheme marked as [+high] and will probably fail unless only one grapheme is marked as [+high].

    If the category is inside a set, it MUST be listed as an item on its own:

    features: +example = x, y stage: {[+example], z}v -> ^
    ; xvayazv ==> aya

    This is to say {[+voiced]z}v -> ^ is invalid.

    16.6The feature-field directive

    Feature-fields allow graphemes to be easily marked by multiple features in table format.

    The graphemes being marked by the features are listed on the first row. The features are listed in the first column.

    For example:

    feature-field: m n p b t d k g s h l j voice + + - + - + - + - - + + plosive - - + + + + + + - - - - nasal + + - - - - - - - - - - fricative - - - - - - - - + + - - approx - - - - - - - - - - + + labial + - + + - - - - - - - - alveolar - + - - + + - - + - + - palatal - - - - - - - - - - - + velar - - - - - - + + - - - - glottal - - - - - - - - - + - - feature-field: a e i o high - - + - mid - + - + low + - - - front - + + - back + - - + round - - - +
    • A + means to mark the grapheme by that feature's pro-feature
    • A - means to mark the grapheme by that feature's anti-feature
    • A . means to leave the grapheme unmarked by that feature

    Here are some matrices of these features and which graphemes they would capture:

    • [+plosive] captures the graphemes b, d, g, p, t, k
    • [+voiced, +plosive] captures the graphemes b, d, g
    • [+voiced, +labial, +plosive] captures the grapheme b

    17Quantifiers and wildcards

    Quantifiers and wildcards in this section are special tokens that can represent arbitrary amounts of arbitrary graphemes, which is especially useful when you don't know precisely how many, or of what kind of grapheme there will be between two target graphemes in a word.

    17.1Quantifier

    Quantifier, using +, will match once or as many times as possible to the grapheme to the left of it. Quantifier cannot be used in REPLACEMENT:

    a+ -> o
    ; raraaaaa ==> roro

    17.2Bounded quantifier

    The bounded quantifier matches as many times its digit(s), enclosed in ?[ and ], to the things to its left.

    ; Change <o> into <x> only when preceded by three <r>s
    o -> x / r?[3]_ ; ororrro ==> ororrrx

    The digits in the quantifier can also be a range:

    ; Change a sequence of 2 to 4 <o>s into <x>
    o?[2,4] -> x ; tootooooo ==> txtxo

    At the beginning of the list, , represents all the possible numbers lower than the number to the right, not including zero.

    ; Change a sequence of 1 to 4 <o>s into <x>
    o?[,4] -> x ; tootooooo ==> txtx

    And finally at the end of the list, , represents all possible numbers larger than the number to the the left

    ; Change a sequence of 4 to as many as possible <o>s into <x>
    o?[4,] -> x ; toootooooo ==> toootx

    A bounded quantifier can be used in REPLACEMENT as long as there is a definite maximum quantity. Or in other words, you cannot produce an infinite amount of something!

    17.3Ditto-mark

    Ditto-mark using colon :, will duplicate the grapheme, or grapheme from a set or category, to the left of it. In other words, you can capture an item only when it is doubled using the ditto-mark:

    a: -> o
    ; aaata => oata

    A ditto-mark can be used in REPLACEMENT:

    a -> a:
    ; tat => taat

    17.4Wildcard

    Wildcard, using asterisk *, will match once to any grapheme. Wildcard does not match word boundaries. Wildcard cannot be used in REPLACEMENT:

    ; Any grapheme becomes <x> when any grapheme follows it
    * -> x / _*
    ; aomp ==> xxxp

    Wildcard can be placed by itself inside an optionalator (*), thereby allowing it to match nothing as well.

    17.5Anythings-mark

    The anythings-mark uses percent sign % and a pair of square brackets [ and ]. It will match as many (but not zero) times to any grapheme. For example:

    b%[] -> x
    ; abitto => ax

    As we can see, the rule matched b and greedily matched every and any grapheme after it.

    The example below uses an anythings-mark in the condition:

    ; Simulate spreading of nasality to vowels
    a, i, u ->,,/ {,,}%[]_
    ; pabãdruliga ==> pabãdrũlĩgã

    17.5.1Laziness and cowardliness

    By listing graphemes and grapheme sequences inside the square brackets, we can alter the "greedy" behaviour of an anythings-mark with degrees of "laziness" and "cowardliness".

    Consuming negative lookahead, AKA "laziness":

    Sometimes it is necessary to for the anythings mark to consume graphemes we are monitoring for, and then stop consuming:

    b%[t, d] -> x
    ; babitto => xto

    As we can see, the rule matched b followed by anything else until it reached the first t, consumed that, then stopped matching. This behaviour in Regular Expression terminology is called "lazy".

    As already stated, the items to check for greediness can be a sequence of graphemes:

    b%[tr, d] -> x
    ; batitro => xo

    Sets, categories and features can also be used when monitoring for laziness and cowardliness:

    ; capture up to a plosive + <r> cluster %[{p,t,k}r] -> x ; kotatros -> xos

    Negative lookahead, AKA "cowardliness":

    Sometimes it is necessary for graphemes to block the spread without having them be consumed, which I have dubbed "cowardliness". To do this put a pipe | after the lazy items and list the cowardly items. For example we might want the graphemes k or g to prevent the rightward spread of nasal vowels to non nasal vowels:

    a, i, u ->,,/ {,,}%[| k, g]_
    ; pabãdruliga ==> pabãdrũlĩga

    18Cluster-field

    Cluster-field is a way to target sequences of graphemes and change them. They are laid out as tables, and start with < followed by a space. The first part of a sequence is in the first column, and the second part is in the first row. The clusterfield ends with a > on its own line. For example:

    < p t k m n m + nt nk + mm n mp + + nn + >
    • In this example, np becomes mp and mt becomes nt
    • These are executed concurrently just like concurrent changes. Their order does not matter
    • + means to not change the target cluster at all
    • Cluster-fields can use 0 to reject the word if it contains that sequence
    • Cluster-fields can use ^ to delete the target sequence

    19Advanced

    This is the advanced section. It presents solutions to edge-cases and novel systems to achieve the desired forms of words.

    19.1Routine

    The routine transform provides useful functions that you can call at any point in the transform block. You call a routine on a newline with <routine, optional space, =, optional space, the routine, and a closing >.

    The routines are:

    • decompose will break-down all characters in a word into their "Unicode Normalization, Canonical Decomposition" form. For example, ñ as a singular unicode entity, \u00F1, will be broken-down into a sequence of two characters, \u006E + \u0303
    • compose does the opposite of decompose. It converts all characters in a word to the "Unicode Normalization, Canonical Decomposition followed by Canonical Composition" form. For example, ñ as two characters \u006E + \u0303, will be transformed into one character, \u00F1
    • capitalise will convert the first character of a word to uppercase
    • decapitalise will convert the first character of a word to lowercase
    • to-uppercase will convert all characters of a word to uppercase
    • to-lowercase will convert all characters of a word to lowercase
    • reverse will reverse the order of graphemes in a word
    • xsampa-to-ipa will convert characters of a word written in X-Sampa into IPA. ipa-to-xsampa will convert them back
    • latin-to-hangul converts, or rather, transliterates characters written in an arbitrary romanisation into Hangul Jamo blocks. hangul-to-latin converts them back.

      When there is no initial to be found, the jamo will have an initial Ieung. Forming an initial of the next jamo is preferred over creating a final for the current jamo

    • latin-to-greek converts, or rather, transliterates characters written in an arbitrary romanisation into greek letters. greek-to-latin converts them back
    • latin-to-cyrillic converts, or rather, transliterates characters written in an arbitrary romanisation into cyrillic letters. cyrillic-to-latin converts them back

    21.2Target-mark

    A target-mark is a reference to the captured TARGET graphemes. It cannot be used in TARGET. This uses an ampersand and a capital t &T.

    Here are some examples where target-mark is employed:

    Full reduplication:

    %[] -> &T&T ; malak ==> malakmalak

    "Haplology":

    CV -> ^ / _&T ; haplology ==> haplogy

    Reject a word when a word-initial consonant is identical to the next consonant:

    C -> 0 / #_%[C]&T

    19.3Metathesis-mark

    Simple metathesis involves an ampersand and a capital m &M in REPLACEMENT. This will swap the first and last grapheme from the captured TARGET graphemes:

    ; Swap a plosive and nasal stop {p, t, k}{m, n} -> &M
    ; apma ==> ampa

    Since metathesis reference is swapping the first and last grapheme, we can effectively simulate long-distance metathesis using an anythings-mark:

    ; Simulate Old Spanish "Hyperthesis" r%[l] -> &M
    ; parabla ==> palabra

    19.4Empty-mark

    An Empty-mark using &E, inserts an "empty" grapheme into the captured TARGET graphemes. It is only allowed in TARGET

    One use for it is a trick to make one-place long-distance metathesis work, for example:

    ; The <r> of a plosive + <r> cluster is moved ; between a word initial plosive and a vowel &E{a,e,i,o,u}%[|{p,t,k}r]{p,t,k}r -> &M / #{p,t,k}_ ; kotatros -> krotatos

    19.5Reference

    Sometimes graphemes must be copied or asserted to be a certain grapheme between other graphemes. This is the purpose of "reference". Reference is fairly straightforward, but there is a lot of jargon and different behaviour between fields to explain.

    19.5.1Reference of singular grapheme

    A grapheme (or graphemes) are bound to a reference using a "reference-capture", to the right of some grapheme. A reference-capture looks like = followed by a single-digit positive number. This number is called the "reference-key" of the reference. The grapheme (or graphemes) bound to the reference is called the "reference-value".

    The key behaviours of reference-capture are:

    • If there are no graphemes to be captured by the reference-capture, nothing is captured
    • A Reference forgets its reference-value in-between rules. References do not persist between rules
    • You can have up to nine references per rule
    • A reference's reference-value can be overwritten with a new reference-capture
    • For reference-capture in conditions, a grapheme is captured only if that condition is met

    The captured grapheme can then be reproduced elsewhere in the rule with a "reference-mark", even before the reference-capture. The reference-mark invokes the reference-key of a reference.

    The key behaviours of reference-mark are:

    • A reference-mark may not be used in the TARGET of a rule.
    • In each condition or exception of a rule, a reference-mark cannot be used before content has been bound to its reference with a reference-capture. For example a -> e / 1x=1_ is invalid, and so is a -> e / 1_x=1. Reference is not recursive in conditions and exceptions.
    • If a reference-mark is used where a reference-capture has not captured anything yet, it fails silently and outputs the number of the backrefence.

    Here are some examples:

    ; Delete <ʔ> between identical vowels ʔ -> ^ / [+vowel]=1_1 ; baʔaʔe ==> baaʔe

    In the rule above, we are binding the [+vowel] feature-matrix to the reference 1, by appending =1 to it. Whatever this grapheme from [+vowel] is when the condition is met, is captured as the value of 1. Then the value of backrefence 1 is in AFTER by invoking its reference-mark.

    ; Insert an "echo vowel" at the end of <ʔ> final words ^ -> 1 / [+vowel]=1ʔ_# ; foobaʔ ==> foobaʔa

    In the rule above, we are binding the [+vowel] feature-matrix to the reference 1, by appending =1 to it. Whatever this grapheme from [+vowel] is when the condition is met is the value of 1. Then the value of 1 is inserted into REPLACEMENT by invoking its reference-mark.

    19.5.2Reference of grapheme sequence

    Now that "reference-capture" and "reference-mark" has been (hopefully) introduced and explained adequately, let's explain how to capture and reference a sequence of graphemes.

    To start capturing a sequence, you use a "start-reference-capture", &= before the graphemes to be captured. Then at the end of the graphemes to be captured, a "reference-capture" is used to bind those graphemes to a reference:

    ; Insert an "echo double vowel" at the end of vowel + vowel + <ʔ> final words ^ -> 1 / &=VV=1ʔ_# ; foobaaʔ ==> foobaaʔaa

    19.6Associatemes

    If your language encodes tone, stress, breathy voice, or other phonological features directly on vowels, you'll often need to target a particular grapheme across its variants.

    One method is to target each variant manually:

    {a, á, à} -> {e, é, è}
    ; daná ==> dené

    This workaround uses alternators, but lacks semantic clarity and scalability, and is outright tedious.

    To solve this, are "associatemes" -- aligned variant graphemes associated with their base grapheme, and other associated graphemes -- other SCAs might use the terms "floating diacritics" or "autosegmentals". These allow you to target all forms of a grapheme with a single token. To set up associatemes, they must be stated in the graphemes directive with the base associateme set of an entry inside curly braces, and each variant set in curly braces with a < to the left of the base set, or another variant set, like so:

    graphemes: {a, e, i, o, u}<{á, é, í, ó, ú}<{à, è, ì, ò, ù}

    The behaviour of associatemes are:

    • Each grouping must contain an equal number of graphemes, aligned by position. This creates a traceable overlay across tone, stress, and other features
    • This does not mean that each variant must be different by means of diacritics, they are arbitrarily variant. For example {a,b,c}<{x,y,z} is valid
    • This does not mean that we can have only one grouping set. For example {a,i,o}<{á,í,ó}, {a,b,c}<{x,y,z} is valid

    In a rule, you then put a tilde after the grapheme to mark it as a base associateme. This is called a "based-mark". For example:

    graphemes: {a, e, i, o, u}<{á, é, í, ó, ú}<{à, è, ì, ò, ù} stage: a~ -> e~
    ; daná ==> dené

    This transform targets all variants of a and carries over that association to e.

    19.7Letter case field

    This program can change the case of letters or the whole word with routines or with paragraph mode. However your language may not have the expected correspondences between lowercase and uppercase. Some examples:

    Turkish: The lowercase i becomes uppercase İ, and the uppercase I becomes lowercase ı.

    Some polynesian languages: The vowel after an "okina" is capitalised instead of the okina itself.

    Some styles in a few European languages: Sometimes both letters of a digraph will be capitalised.

    To accommodate these special cases, you can define a letter case field directive. For example:

    letter-case-field: ı i 'o th uppercase I İ 'O TH

    19.8Recasts

    Recasts, introduced in version 1.0.9, behave like rules but the RESULT is regenerated from a category according to its distribution.

    This is one way to ensure that certain combinations of graphemes are not generated in the word generation process, and to ensure that certain combinations of graphemes are generated. Recasts do not reproduce a natural linguistic phenomenon, they are shortcut to achieve your phonotactic constraints.

    To create recasts, you write it in the same way as rules, only the arrow is replaced with <recast-as> and the RESULT is a single category with optional modifiers such as quantifiers.

    Let's look at an example:

    categories: V = a, i, u, e, o Y = u, a, o, e stage: V <recast-as> Y / y_

    This means that if a y is found before a V category, the pool of graphemes for V is constrained to u, o, a, e. Where yi is now invalid, and yu is the most likely to be randomly picked.

    stage: ^ <recast-as> C / F$_V

    This inserts a consonant between a grapheme from the F category, a syllable divider and a vowel. This prevents codas from yielding to the onset.

    20Blocks

    Blocks modify the behaviour of transforms that are inside them with "condition and event" logic.

    They begin with a <@ at the beginning of a line, and end with a > on the beginning of a line.

    20.1Chance block

    This block will indicate a chance that the transformations inside the block will occur or not. They cannot be nested. This is useful for sporadic sound change.

    stage: <@chance = 60% a -> b x -> y >

    All of the above example's transforms in the block have a 60% chance of occuring on each word.