A guide to small consonant inventories

...mostly in the form of an inventory dump. By 'small' I mean 12 or fewer consonants, which isn't completely arbitrary since the smallest consonant inventory in Europe (Finnish) has 13, if you ignore the glottal stop and all the loan phonemes.

Inventory dump:

Rotokaspbtdkg6
Iaubtdkfs6
Buinptkgmnr7
Puinaveptkshmn7
Hawaiianpkʔhmnlw8
I'sakapbtdksjw8
Nasioipbtdkʔmn8
Pirahapbtkgʔsh8
Taoripiptkfshml8
Abaupkshmnrjw9
AitaRotokaspbtdkgmnŋ9
C.Miyakoptkfsmnrʋ9
Gadsupptdkʔmnjβ9
Onondagatʤkʔshnjw9
Pawneeptʦkʔshrw9
Rapoisiptkʔsɣhrβ9
Roropbtkʔhmnr9
Tahitianptʔfhmnrv9
Baribtdkshmɾrj10
Cheyenneptkʔsʃhmnv10
Crowptʧksʃxhmn10
Ekaripbtdkmnjw10
Gilberteseptkmnŋrβˠ10
Keuwpbtdkgsljw10
Iñapariptʔshmnrjw10
Mandanptkʔsʃxhrw10
Maoriptkfhmnŋrw10
Maxakalipbtdkgʔʃhj10
Mekeoptkʔfsmnŋl10
NaskapiCreeptʧkshmnjw10
Niueanptkfhmnŋlv10
Palauanbtdkʔsmŋlr10
Samoanpkʔfshmŋlv10
Sentaniptkfhmnljw10
S.Barasanobtdkgshrjw10
Tinputzptkʔshmnrβ10
Tiriyoptkshmnrjw10
Waraoptkshmnrjw10
Wichitatʦkʔshnjw10
Xavantepbtdʧʤʔhrw10
Achepbtdʧʤkgmlv11
Ainuptʦkshmnrjw11
Angaatahaptkʔʃmnŋrjw11
Arabelaptksʃhmnrjw11
Arikapuptʧkhmnrjw11
Asmatptʧkfsmnrjw11
Aucapbtdkgmnɲŋw11
BarasanaEd.pbtdcɟkghrw11
Cayugatʦkʔshnrjw11
Cherokeetʦkʔshmnljɰ11
Cubeopbtdʧkðhrjw11
Derapbtdkgmnŋjw11
Fasuptkɸshmnrjw11
Karitianaptkshmnŋrjw11
Koiaribtdkgfðhmnr11
Irantxeptkʔshmnljw11
Iwamptkshmnŋrjw11
Maranaoptkʔmnŋlrjw11
Menomineeptʧkʔshmnjw11
MiamiIllptʧksʃhmnjw11
Mohawktʦkʔshnrjw11
Nukakpbtdʧʤkgʔhr11
Oneidatʦkʔshnljw11
Senecatʣʤkʔsʃhnjw11
Shawneeptʧkʔθʃmnjw11
Tigakpbtkgsmnŋlr11
Tuscaroratʧkʔθshnrjw11
Tuyucapbtdkgshrjw11
Yaguaptʦʧkhmnrjw11
Arapahobtʧkʔθsxhnjw12
Awapbtkgʔsmnrjw12
Bandjalangptckmnɲŋlrjw12
Cacuaptʧkʔʍhmnŋlw12
Chuavebtdkgfsmnrjw12
Comancheptʦkʔshmnjw12
Creeptʧksʃhmnrjw12
Djeoromitxiptʧʤkhmnrw12
Huaoranipbtdɟkgmnɲŋw12
Ikpengptʧkgmnŋlrjw12
Irarutubtdkgɸsmnrjw12
Jamamadibtɟkʔɸsmnrw12
Karajabdɗʧʤkθʃhlrw12
Ketbtdkqsçhmnŋɮ12
Nagovisipbtdkgʔsmnrβ12
Nimboranpbtdkgshmnŋr12
Rarotonganptkʔfshmnŋrv12
S-Kiwaipbtdkgʔsmnrw12
Tifalbtdkfsmnŋljw12
Tonganptkʔfshmnŋlv12

There are a number of things you can do in small consonant inventories.

Voicing in plosives:

May be totally absent, as in the Polynesian languages (sometimes they have /v/ but I grouped that as a form of /w/).
May be fully present, as in Xavante or Rotokas.
May be present for only some of the plosives. Note that you can eliminate voicing contrasts for any one POA,
though if you eliminate it only in the labials, you'll end up with /b/, not /p/. Piraha, Awa, and Tigak have /p b t k g/ with no /d/; I'm guessing it became /r/. (Iau /d/ is notable within the Lakes-Plain languages for not allowing flapping of /d/ -- most of them do.)
Some of these languages have /g/ with no /b d/; this seems to come from lenition processes, where either p t k or b d g > w r g -- so g fails to lenite but the other plosives do. This happened in Ikpeng (where lenition applied intervocalically and /k/ redeveloped through cluster loss) and also in Rotokas though it doesn't show up on the chart -- /b d/ are usually [β ɾ] but I don't know if /g/ lenites.)

POAs: May have no labials, as with Oneida and Tuscarora. Comanche is the only language here to have both labials and /kʷ/, but it's spoken in the general vicinity of languages with /kʷ/ and no labials.
Xavante and Tahitian have no velars. I'm guessing /k/ backed to /ʔ/ in Tahitian and /k g/ fronted in Xavante, but I don't know.
It is not necessarily the case that you need three stop POAs. Abau only has /p k/.
Samoan merged its alveolars into velars, except /l/. Chain shift in the stops: t :> k > ʔ. It already had /ŋ/ when /n/ merged into it though.
Affricates only appear in American languages in this sample. Most of the American languages made it onto the list by having no labials.
Maximum of three non-glottal POAs except in Bandjalang, which is the only Australian language here.

Fricatives: Bandjalang has the largest inventory here with no fricatives, and it's Australian.
The presence of fricatives usually implies /h/, but some have /s/ as their only fricative. If there's a fricative that isn't /s/, there's also /h/ -- except in Cubeo, which only has /x/. If there's /f/, there's usually /s h/; the only exceptions are Polynesian, except Sentani, which is Papuan, and Koiari, which for some bizarre reason has /ð/.
None of these languages has more than three fricatives, unless you count Polynesian /v/ as a fricative. Tuscarora has /θ s h/, Seneca has /s ʃ h/, and Koiari has /f ð h/, but the most common three-fricative inventory by far is /f s h/. /x/ doesn't appear in any of these languages except Cubeo.

The glottal stop: Not as necessary as you might think: about a third of the languages here don't have it. Of the ones that do, some (Polynesian) got it through debuccalization of another plosive.

Nasals: Nasals actually do not imply /n/. Samoan is not anymore the only language here to have nasals without /n/, and it merged its alveolars into velars.
/ŋ/ implies /m/. /p n/ also imply /m/.
Many of the languages here with missing nasals are Amazonian languages with a full inventory of nasal vowels -- nasals are allophones of voiced stops around nasal vowels. In Piraha, nasals [m n] are allophones of /b g/ word-initially. In Keuw, voiced stops vary freely with nasals, and voiceless stops can be freely prenasalized. Rotokas and Iau really do have no nasals.

Liquids: Surprisingly, /w/ (or /v/) is more common than /j/ -- the only language with /j/ and no /w/ is Maxakali. Ekari has a velar lateral affricate /gʟ/.
/l r/ contrast is more common than you might think, even in these small inventories. It's really not that European a feature.

Iau: ...is worthy of special mention here for being probably the most phonologically bizarre language on the planet. It has six consonants, /b t d k f s/. /f/ is [ɸ~h] word-initially, but is [x] preceding /i/; word-medially it's [h]; and word-finally (/f/ is the only consonant that can occur word-finally) it's an unreleased stop [p]. /b d/ vary with nasals, and can be implosive before /ã/; /d/ can also be [l], but is never flapped.
There are eight vowels: /ã æ~ɛ ɪ i ɔ ʊ u/ and a fricated vowel /i̝/. /ã/ is always nasalized.
Despite all this, most words are monosyllabic -- and the reason Iau can pull this off is that, well, not only does it have eight tones (two level and six contour), it has tone clusters -- more than one tone can appear on a word. There is an extensive system of tone-based verbal derivation:
tai2 'pull'
tai3 'has been pulled off'
tai21 'might pull'
tai43 'land on'
tai24 'fell to ground'
tai23 'fall to ground (incompletive)'
tai34 'pull off'
tai243 'falling to ground (durative)'
tai21-34 'pull on, shake' (nb: two different contour tones)
tai21-3 'have pulled on, have shaken'


Vowel Systems

This resource is for anyone who wants to make a realistic vowel system. There are a couple of overviews elsewhere on the interwobz- I think WeepingElf has one?- but as far as I've seen them they're relics of an internet past. (I'm thinking in particular of a page with no Unicode that had to write ɨ as "i-" and had a garish pink background- ah, others have informed me it's run by whatever Bricka goes by now.) I'll run through it by numbers of vowels, like those other pages, and point out interesting deviations.

Edit: in the interests of attribution, much of this was from this page by an unknown author, Wikipedia, or Bricka's page on vowel systems.

I am stealing a classification system I found on one of those pages. It is as follows:

T indicates a triangular vowel system, by far the most common.

This is followed by the number of vowels, and:

This is by no means hard and fast, but it is useful.

For the sake of easy analysis, I'll not analyze pure length, tone or nasalization, just vowel quality. I will also not include diphthongs.

One Vowel

There are few modern languages with just one vowel, with the possibility of Nuxalk (see below) and some Chadic languages. One or two restructions of Proto-Indo-European have just /e/, but this is somewhat untenable, given that all of its daughters have at least three vowels (I think), many of its daughters have at least four or five, and you'd be hard-pressed to explain ablaut that way.

You could write the phonemic status of a conlang with just one vowel as probably just about anything you want, although it would probably have a most common vowel of a centralized vowel like [ə].

1

 ə 

1a

The Salish language Nuxálk has the system T3 (see the section on three vowels) below, but can be analyzed as having just /a/, with /i u/ analyzed as syllabic /j w/, since /n m/ also have syllabic forms. This would be 1a:

 a 

Two Vowels

This too is rare.

S2

Some other reconstructions of PIE have given it two vowels, /e o/:

e     o

V2

Several Northwest Caucasian languages have two central vowels, one low, one mid. An example is Ubykh, which also holds the world record for most consonants outside of a click language:

 ə 
a

The claim that Ubykh has only these two vowels is however a bit dubious, as a much wider range of vowels appears phonetically, influenced by the surrounding consonants. There are also several analyses of Mandarin Chinese that analyze it as having a V2 system, with the extra vowel phones coming from sequences of vowel and approximant. The Australian language Arrernte also has a V2 system, and the Ndu languages of New Guinea are rumored to have a V2 system as well. (The Ndu claim is particularly suspect; although they can all theoretically be analyzed as having a V2 system, at least one linguist has analyzed the Ndu language Iatmül as having twelve phonemic vowels. This is par for the course with strange little vowel systems.)

Because the only thing differentiating the two phonemes in a V2 system is height, they tend to have very wide ranges of allophones. In Arrernte, for example, /ə/ has the range [ɪ ~ e ~ ə ~ ʊ], with little regard for context.

Three Vowels

This is where we really get started; almost all languages have at least three vowels.

T3

Usually, this is the T3 system, as seen in Quechua, Inuktitut, Classical Arabic, most Australian languages, and Aleut:

i     u
a

T3b

A few languages have T3b, the same as above, but with no high vowels. An example is the /a e o/ system of Yanesha' (also known as Amuesha) and Cheyenne:

e     o
a

T3c

There is a final variation T3c, too, where /u/ is lowered. It's found in Pirahã and the short vowel system of Ojibwe:

i 
o
a

Presumably something out there is best analyzed as having /ə i u/ (a theoretical T3d), with no low vowels, but I don't know of one.

V3

Beyond this, there is the system V3, which is found in the Sepik family (of which Ndu is a subfamily) of New Guinea:

 ɨ 
ə
a

Like V2 systems, allophony will be rampant, and you're likely to have many more phonetic vowels than just these three. Some researchers analyze Irish as basically having a V3 system in its short vowels:

ɪ~ʊ
ɛ~ɔ
a

V3F

A variant of V3, V3F, is found in the Caddoan language Wichita:

i
ɛ
a

In other words, you have a high front vowel, a mid front vowel, and a front to back low vowel. Again, allophony is rampant.

Four Vowels

Here, we start to see that a language can have a triangular system, or a square one. The triangular-central-square analysis starts to lose its accuracy after about six vowels, but it's still useful for classificatory purposes.

T4

Most languages with four vowels have some variant of T4, of course, since most languages have triangular systems. Usually this is T3 with some sort of addition:

Central Alaskan Yup'ik (in the short vowels only) and the Taiwanese language Rukai have T4C, which we'll refer to as just T4:

i     u
ə
a

/i ɨ u a/ (*T4C) can be found in Yimas, and there are apparently some analyses of Rukai as having /ɪ ɨ u a/, which is similar. /i u ə a/ fills out the space pretty well, which is important- vowels, like gases, tend to spread out to fill their container (the mouth space) so that they're maximally distinct.

S4

There is also the vowel system /i e a u/, which can be analyzed as being the square

i   u
e a

Or as a triangular variation T4F:

i     u
e
a

T4Fb

In T4Fb, /u/ is lowered to /o/. It's common in North America: Nahuatl, Navajo, Proto-Algonquian and a slew of its descendents all have T4Fb:

i
e o
a

Regardless, S4 is found in Akkadian, Hittite (...maybe), Malagasy, and Proto-Slavic. I wouldn't be surprised if there were a language with a true S4 system /i u æ ɑ/, but I don't know of one. I also believe there is at least one language with /i u e o/, IE S4 with no low vowels at all, but I can't remember what it is.

V4

Finally, there is the V4 system, found only in the bizarre Marshallese language of Micronesia:

 ɨ 
ɘ
ɜ
a

The problem, as I said, is that Marshallese is batshit, and no fewer than twelve phonemic vowels will pop out of the woodwork if you only look for minimal pairs. The proof of the pudding is that, under such an analysis, Marshallese's glides /j w/ will have a very uneven distribution, and that eight of these vowels are best analyzed as a central vowel with a glide tacked on.

V4 is basically the largest vertical vowel system that you'd ever see in nature. Past this point, the square-vertical-triangular split begins to become less and less useful.

Five Vowels

T5

Here we find the most common vowel system, T5, found in Classical Latin, Modern Greek, Spanish, Hebrew, Japanese, Swahili, the Polynesian languages, and Basque:

i     u
e o
a

T5C

There are a few variants on this. Lokono Arawak, spoken in Suriname, is rumored to have T5C:

i   ɨ   u
e
a

T5B

The mirror image of T5C, with /o/ instead of /e/, was the vowel system of Proto-Uto-Aztecan, and is still found in some of its daughters. We'll call it T5B:

i  ɨ  u
o
a

S5

And S5 is the vowel system of the Vanuatuan language Big Nambas:

i     u
ə
e a

Six Vowels

Most of the systems I could find were just T5 with an extra vowel.

T6C

T6C, with /ə/ added, is found in Nepali and Armenian, as well as Southern Welsh (in a pinch):

i     u
e ə o
a

T6Cb

T6Cb adds /ɨ/ instead, and is found in many Slavic languages, as well as Guaraní and Comanche.

i   ɨ   u
e o
a

S6

T6F adds /æ/ or /ɛ/, and is found in Chamorro, Menominee, Persian, Forest Nenets and pre-umlaut Old English. You could refer to it as S6 if you wanted, even, and that's what we'll do.

i      u
e o
æ a

In S6, /a/ is not uncommonly /ɑ/. I don't know of any language which adds /ɔ/ or /ɑ/ instead of /ɛ/ or /æ/, and there probably isn't one, since having significantly more back vowels than front vowels like that is almost unheard of.

T6R

Common among conlangers, I think, is T6R:

i y   u
e o
a

S6R

I don't know of any languages where this appears in nature; however, the rather strange S6R appears in the Uto-Aztecan language Hopi, thus being the only North American languge to my knowledge to natively possess a front rounded vowel of any sort:

i     ɨ
ø o
ɛ a

T6Rc

Also bizarre is the system T6Rc, which is the system in the Chapacuran language Wari', spoken along the Bolivian-Brazilian border:

i y
e ø o
a

Seven Vowels

Almost all systems past this point will be using T5 as a base.

T7L

In plain T7L, usually, /ɛ ɔ/ is added. This is the vowel system of Vulgar Latin, Italian, Bengali, Brazilian Portuguese, and Yoruba:

i   u
e o
ɛ ɔ
a

T7R

T7R, found in Hungarian, adds front rounded vowels instead:

i y    u
e ø o
a

T7C

In T7C, found in Northern Welsh, Kashmiri, and Romanian, the additions are central vowels:

i ɨ u
e ə o
a

With /y/ instead of /ɨ/, we get T7Cb, the vowel system of Albanian.

T7Rc

T7Rc was found in Occitan, after a chain shift of VL ɔ -> o -> u -> y:

i y  u
e o
ɛ
a

It was also, with /æ/ instead of /ɛ/, the vowel system of West Saxon Old English (S7R).

T7Cb

Finally, there is T7Cb, found in Amharic:

i    u
e o
ɛ ə
a

Eight Vowels

Many of these are extended versions of T7L.

T8C

In T8C, found in Javanese, Catalan, São Tomean Creole, Lo-Toga (with fronting of /u/ to /ʉ/) and Slovene, /ə/ is added:

i    u
e o
ɛ ə ɔ
a

Presumably there is a language which adds /ɨ/ instead of /ə/, but I don't know of one.

T8F

Here is T8F, found in Finnish:

i y  u
e ø o
æ ɑ

Note that Finnish's system is a product of vowel harmony; a word can either have /y ø æ/, or it can have /u o ɑ/, but (except in compounds) it can't have both. Without the vowel harmony, this was also the system of some dialects of Old English.

T8B

A variant is T8B, Legion's dialect of French (disregarding nasalization) (thanks, Legion):

i y   u
e ø o
ɔ
a

T8R

T8R is the usual analysis- er, or a more usual analysis- of Mandarin:

i y   u
ɪ
e ə o
a

C8

Here, in C8, found in Igbo, we get our first "cubic" vowel system, which is just a square vowel system with an additional variable, like laxness or roundedness.

i u
ɪ ʊ
e o
a ɔ

This is the result of vowel harmony: /ɪ ʊ ɔ/ are one set of vowels, /a e i o u/ another.

C8R

Turkish's system

i y    ɯ u
e ø a o

If it looks like I've shoehorned it into a cube when it shouldn't be there, that's not true; Turkish morphology fits the vowels neatly like this.

S8L

My own dialect of American English would probably be S8L:

i     u
ɪ ʊ
ɛ ʌ
æ ɑ

Nine vowels

By now we're well past the usual number of vowels for natural languages. The systems will start getting increasingly more baroque, but also much less common.

T9L

T9L is found in Maasai; it's T7L with lax variants of /i u/:

i   u
ɪ ʊ
e o
ɛ ɔ
a

S9C

S9C adds /ɨ ə/ to T7L instead. It's found in European Portuguese and Thai:

i  ɨ  u
e ə o
ɛ ɔ
a

Some analyses of European Portuguese have /ə ɐ/ instead of /ɨ ə/, however, which would be T9C instead. With /ɯ ɤ/ instead of /ɨ ə/ [S9U] we have Lao.

S9R

S9R is- in a pinch- standard French (kudos to Legion):

i y  u
e ø o
ɛ ɔ
a

S9Rb

A variety of S9R is found in Southern Sami:

i y  ɨ ʉ  u
e o
ɛ ɑ

S9L

S9L is standard American English:

i     u
ɪ ʊ
ɛ ʌ ɔ
æ ɑ

T9F

T9F is the vowel system of Estonian and Meadow Mari; it takes Finnish and adds /ɤ/. (Estonian has lost vowel length, but /ɤ/ is a retention from Proto-Finnic, I believe.)

i y     u
e ø ɤ o
æ ɑ

T9Fb

T9Fb adds /ʉ/ instead of /ɤ/, and is Swedish without vowel length or reduction to schwa factored in. You could put in /ɨ/ instead, however.

i y  ʉ  u
e ø o
ɛ ɑ

T9Fc

T9Fc is found in Korean:

i    ɯ u
e ø o
ɛ ʌ
a

Past this point, really, your imagination's the limit; many large vowel systems will have weird outliers in an otherwise symmetrical or ordered system, like baby gamma in Estonian. But we'll keep going...

Ten Vowels

T10L

T10L is found in Hindi and Panjabi, and I believe several African languages with vowel harmony, where /ɪ ʊ ɛ ɔ a/ alternate with /i u e o ə/:

i   u
ɪ ʊ
e o
ɛ ə ɔ
a

T10R

T10R is found in Breton, adding front rounded vowels to T10L:

i y  u
e ø o
ɛ œ ɔ
a

S10C

S10C, found in Khmer, adds central vowels to a variant of C8:

i ɨ u
e o
ɛ ə ɔ
a ɑ

S10R

S10R is found in Skolt Sami:

i   u
e ɘ o
ɛ ɐ ɔ
a ɑ

Eleven Vowels

Past this point, almost everything you'll see is from Northwest Europe.

T11R

T11R takes T10R and adds a back variant to /a/. It's found in a language of Vanuatu called Sakao:

i y  u
e ø o
ɛ œ ɔ
a ɑ

T11Rb

Switch out the low vowels and you get T11Rb, standard Danish (sort of; Danish phonology is a complete cluster****):

i y   u
e ø o
ɛ œ ə ɔ
a

T11C

T11C is found in Vietnamese:

i   u
ɪ ʊ
e ɘ o
ɛ ɐ ɔ
a

Twelve Vowels

Nothing major here, except for Selkup. At this point categorization starts to become an exercise in extremely iffy pedentry, so I'll stop:

Selkup:

i y  ɨ   u
ɪ
e ø ɘ o
ɛ ɔ
æ a

Received Pronounciation:

i       u
ɪ ʊ
ə
ɛ ɜ ɔ
æ ʌ ɑ ɒ

More than Thirteen Vowels

Everything on this list is a Germanic language.

Dutch:

i y       u
ɪ ʏ o
e ø ə ɔ
ɛ
a ɑ

Danish again, according to Routledge:

i y    u
e ø ə o
ɛ œ ɐ ɔ
a ɑ

German:

i y    u
ɪ ʏ ʊ
e ø ə o
ɛ œ ɐ ɔ
a

Swedish is probably the all-time record-keeper for number of phonemic qualities, although Danish probably has it beat on phonetics. By this analysis we have 16 vowel qualities:

i y ʉ  u
ɪ ʏ ʊ
e ø ɵ o
ɛ œ ɔ
a ɑ

Well that's great and all, but how do I make a realistic vowel system?

Don't necessarily just copy a vowel system from this list.

You've probably noticed by now that certain patterns tend to recur. In particular, there are a lot of "base" vowel systems (/i a u/ (T3), /i e a o u/ (T5), and /i e ɛ a ɔ o u/ (T7L) in particular) to which languages add one or two "outliers". French and Hungarian, for example, add /y ø/ to T5 and T7L respectively.

Often you'll want to take a vowel system and add an extra dimension to it, like throwing in some central vowels, or a lax set of vowels.

For small vowel systems, most of the possible bases are covered in the overview- there simply aren't very many options. As I noted, vowels are kind of like a gas; they usually spread out to fill the vowel space very well. As a result, having a handful of vowels that are relatively close together is really only an option once you've already filled the space; a vowel system like /a ɑ ə/ is basically impossible, as is /i ɨ u/. (However, if your vowel system is very small, the vowels often will centralize a bit. Modern Quechua, for example, has /i a u/, but before the Spanish arrived it was more like /ɪ æ ʊ/.) Throwing in a random vowel often is justified when you're just one more than a "standard" system- I can't find a language that has /i y e a o u/, for example, but it wouldn't surprise me in the least if there was one, and it's certainly fair game for your conlangs.

"Filling the available space" is generally a good strategy in larger systems as well. E.g. I'd question the realism of a vowel system:

i ɨ ʉ u
e ə ɵ o
a

because distinctions between rounded and unrounded vowels simply aren't as audible towards the center of the board. (But note Southern Sami under the section about languages with nine vowels.) I'd expect this to shift very quickly to:

i y ɨ  u
e ø ə o
a

which does fill out the available space pretty well.

As your vowel system gets larger, you'll start to run out of places to put your vowels, and will generally want to play with things like roundedness. Germanic languages are so large in part because they have roundedness distinctions in a lot of vowels.

As it gets larger, too, it will get easier to throw in random vowels- there's nothing very symmetrical-looking about English, for example. It will also get a lot more unstable. Diphthongization is a classic way to deal with this; indeed it's one of the reasons English sounds so distinctive- it cleared off several vowels by making them diphthongs. Even so, there are universals that will pretty much be followed at any size: you're not going to have very many more back than front vowels (so /i a u o/ is- probably- a no-no; but then it's just Proto-Uto-Aztecan without /ɨ/, so that's what you get for following universals too heavily); vertical vowel systems aside, vowels generally like to spread out to the margins (so a system like /i ɨ u ə ɐ a/ is very unlikely); a system tends to be higher than it is wide (so a system /i ɨ u e ə o/ is also probably not possible). There are other universals about vowel systems too, I'm sure, but the relevant PDF seems to have 404ed in the mists of time.

You can often create a distinctive-looking vowel system by taking a more boring one and putting in some sound changes. Occitan, for example, underwent a chain shift in the Vulgar Latin back vowels. The English Great Vowel Shift is another example. (Or take a distinctive vowel system and make it boring...Modern Greek's /i e a o u/ is the descendent of /i y e ɛ a ɔ o/.)

How to design your own script

I have given some advice on con-scripting in the past in various threads, but I think it would be nice to assemble everything in one easily-accessible place and go into greater detail on all the points that I think are important. So, without further ado, here is my list of suggestions for designing a constructed script.

Step 1: Choose a direction

Scripts can be written in a number of directions. The reason it's important to choose a direction early on is because it can affect the shape of your glyphs and how they interact with each other.

The most basic directions are:

left-to-right, top-to-bottom

The majority of world scripts are written in this direction. The Roman alphabet follows this direction.

right-to-left, top-to-bottom

This is common in middle-eastern scripts such as Arabic and Hebrew, and many ancient scripts associated with that area.

top-to-bottom, right-to-left

This is the traditional writing direction for East Asian languages, though nowadays, left-to-right, top-to-bottom is also very commonly used.

top-to-bottom, left-to-right

This is used for some scripts, such as Mongolian.

bottom-to-top

This is extremely uncommon, but there are existing real-world scripts that were written vertically. Both left-to-right and right-to-left examples exist.

If you are starting out, I recommend picking something basic from the above list. However, there are more complex directions as well, though they are all variations of the above basic forms. They include:

(Partially) diagonal horizontal

The Nastaliq form of Arabic script, which is the standard form of writing Urdu, is I think unique in the world by being written in occasionally overlapping diagonals. The letters are connected to each other in a string that moves gradually downward, and when a new word is started, the beginning of the word often appears above the ending of the previous word in order to fill up space and make it more aesthetically pleasing.

Boustrophedon

This is when lines of text are alternately written left-to-right and then right-to-left. This may be accompanied by a 180° rotation of the glyphs, a result of the writing surface having been rotated in the scribe's hands. For obvious reasons, no modern scripts are written this way, but if you are creating an ancient script, it could be an option.

Mixed directionality

Some scripts are written in more than one direction at the same time. For example, Many (but by no means all) Mayan inscriptions were written left to right, but only in pairs; after two glyphs, a new line is started below the previous one, leading to columns two glyphs wide.

maya stone tablet with inscription

Sumerian Cuneiform was similarly written with mixed directionality. Phrases or sentences were written horizontally left-to-right within cells, but the cells were arranged vertically.

Variable directionality

Many scripts could be written in more than one direction. Ancient Egyptian was variably written in all sorts of directions, while Modern Chinese and Japanese are frequently written both horizontally left-to-right and vertically right-to-left.

Step 2: Choose your aesthetic

In this step, you will consider the aesthetic of your script. To understand just what this means, let's look at an example. Take a look at this sample script I have just designed:

sample script

You will note that it really sucks. But why? If you are designing something you want to be visually pleasing, it's not enough to know that it sucks, but to know why it sucks. The reason this script is so bad is because it lacks any sort of guiding aesthetic. Each letter appears as though it was designed independently, without any reference to the other glyphs. There is no consistency from glyph to glyph, and as a result, when they are arranged together in a line of text, they clash, and just look like a collection of random shapes.

So how can we resolve this problem?

There is no one way to resolve it, because it is a creative endeavour. You will need to come up with your design aesthetic on your own. However, there are concrete suggestions I can give to help you in your decision.

Decide which strokes appear frequently

Looking carefully as just about any modern script will reveal that they each have certain shapes or lines or angles that appear quite frequently. Some examples:

The majority of Latin lower case letters are built either out of vertical lines, circles (or portions of circles), or a combination of the two. Six letters also incorporate diagonals. You will note that while the exact angles of the diagonals differ slightly, they are as close as possible to 45° while maintaining an aesthetically pleasing form.

Georgian is similar, but different. It also incorporates circles and vertical lines, but it has fewer vertical lines, and many more c-shaped semi-circles.

Almost all Oriya letters have rounded tops. There are also a lot of circles, "n" shapes, and angled or very short straight lines

Most Thai letters have a small circle or two attached to them somewhere. Also, every single letter has at least one straight vertical line in it, and most have two. Also, similar to Oriya, the majority of them have rounded tops.

Arabic has many large cup shapes, many small vertical hooks, and of course, lots of dots.

Chinese has many straight vertical and horizontal lines, as well as gently-curving diagonals.

Glagolitic has circles and triangles everywhere. Yet, oddly enough, there is no letter that is just O or Δ.

Even something like Egyptian hieroglyphics reveals common patterns. looking closely at a lot of signs will reveals many curves, including many S curves, that gradually become wider and more open or flat on one side, sort of like part of a Fibbonacci spiral.

Mayan, by contrast, tends to favour very blunt curves. Nearly every round shape is squared off, like a square with rounded corners. As a result, there are very few real circles in Mayan, and all that do exist are small.

2. Decide which strokes appear infrequently or not at all It should come as no surprise that if some stroke types are frequent, others may not be so frequent, or may even be entirely absent. Let's take a look:

No Latin letters have very open curves, like (. There are also very few horizontal lines: in the lower-case letters, horizontal strokes appear only in e, f and t; in upper case, only A, E, F, G, H, L, T and Z.

Chinese characters entirely lack tight curves and circles

Buginese entirely lacks horizontal or vertical lines of any kind. All lines are diagonals, and although the script lacks any curved lines per se, all corners are rounded.

Futhark has no curves of any kind; all strokes are completely straight lines. It also entirely lacks horizontal lines.

Khmer has many small hooks, as well as flat M shapes on the tops of letters. Some letters also have W shapes on the bottom. Although these are all formed from diagonal strokes, the script lacks longer diagonal strokes that cover the height or width of a character.

Tibetan has many elongated descenders. It also has many curves that have one end lower than the other. Although many letters have horizontal lines on the tops, horizontal lines are otherwise almost entirely absent (only one letter has a horizontal anywhere other than the top). The most likely locations for non-top horizontals are instead occupied by the lopsided curves mentioned above.

Hiragana has relatively few straight lines, favouring curves for the most part.

Javanese never allows an entirely vertical line to appear on the left side of a letter; it always curves in at the bottom. It is also extremely hesitant about allowing a single vertical on the right side; usually, there will be at least two verticals pretty close together on the right side (though not quite always).

Some scripts don't have stroke types that they outright forbid, but there will always be a tendency toward certain strokes over others.

Think about it Look again at that sample script I made up.

sample script

Can you apply any rule at all to it? Is there any guiding principle such as the ones we have covered so far that seems to govern the formation of the characters? The answer is no, and the reason the answer is no is because when I designed the letters, I did not make any attempt to unify them in any way, resulting in an ugly, fake-looking mess.

Remember: this is a creative process here. You have to decide what you want to include, how frequent it is, what you want to eliminate, if anything, and so on. These are all just suggestions.