Kannada Unicode Design Guide

 

 

Abstract: This document provides general information about the Kannada language and conventions of its usage in computers. It provides information about the Input, Storage, Display and Printing of Kannada Characters. We strongly feel that this information gathered from various standards is necessary for the correct usage of the language in various applications of Kannada Language Computing. It also includes the sorting sequence for Kannada in Unicode.

 

Note 1: This document contains Unicode characters and can be viewed using MS Office XP on Windows XP or equivalent

Note 2: The Convention followed in Unicode (Version 3.0) Chapter 9 (South and Southeast Asian Scripts) is used in this document and might differ from the notation commonly used in the Kannada Script.

 

Contact Information:

 

Chief Investigator

Resource Centre for Indian Language Technology Solutions- Kannada

Department of Management Studies

Indian Institute of Science

Bangalore – 560 012

 

Phone   : 91-80-346 6022 / 394 2377 (Dir)

                91-80-394 2378 / 394 2567

Fax       : 91-80-346 6022 / 3600683 / 3600085

Email     : root@iltwebserver.mgmt.iisc.ernet.in

 

 

Table of Contents

1. History of Kannada Language                                                                                                              5

1.1 Description of Kannada Language                                                                                          5

1.2 Brief introduction to Kannada language                                                                                   6

1.2.1 Vowels                                                                                                                 6

1.2.2 Anuswaras                                                                                                            6

1.2.3 Visarga                                                                                                                 6

1.2.4 Avagraha                                                                                                               6

1.2.5 Consonants                                                                                                           6

1.2.6 Basic Language Rule in Kannada                                                                            7

2. Technical Characteristics                                                                                                                    9

            2.1 Kannada Alphabet Characteristic                                                                                           9

2.1.1 Consonant Letters                                                                                                 9

2.1.2 Independent Vowel Letters                                                                                      9

2.1.3 Dependent Vowel Signs                                                                                         9

2.1.4 Virama (Halant)                                                                                                     11

2.1.5 Consonant Conjuncts                                                                                             11

2.1.6 Visarg                                                                                                                   12

2.1.7 Avagrah                                                                                                                12

2.1.8 Numerals                                                                                                              12

2.1.9 Punctuation Marks                                                                                                 12

2.1.10 Ancient Signs                                                                                                      12

2.2 Fonts                                                                                                                                  12

2.2.1 Font developing Tools                                                                                             12

2.3 Keyboard                                                                                                                             13

2.4 Presentation and Storage Considerations                                                                               14

2.5 Rendering Rules                                                                                                                   14

2.5.1 Dead Consonant Rule                                                                                            15

2.5.2 Consonant RA Rules                                                                                              15

2.5.3 Ligature Rules                                                                                                       16

2.6 Sorting issues in Kannada                                                                                                     17

2.6.1 Sorting of Nukta characters                                                                                    17

2.6.2 Sorting the data records containing anuswara and visarga                                         17

2.6.3 Sorting of words with dead consonants                                                                    18

2.6.4 Sorting of Conjuncts having two different display forms                                              19

2.6.5 Sorting of Diacritic characters                                                                                 19

                        2.6.6 Conclusion                                                                                                            19

3. References                                                                                                                                        20

 

Appendix 1: Unicode chart and the Collation chart if deletion and relocation are not allowed                          21

Appendix 2: Unicode chart and the Collation chart if deletion and relocation are allowed                               24

Appendix 3: Output from FontLab displaying all glyphs in the glyph set standardised by KGP                       27

 

1. History of Kannada Language

Kannada is a south Indian language spoken in Karnataka state of India.Kannada is originated from the Dravidian Language. Telugu, Tamil, Malayalam are the other South Indian Languages originated from Dravidian Language. Kannada and Telugu have almost the same script. Malayalam and Tamil have resemblance. Kannada as a language has undergone modifications since BCs. It can be classified into four types-

Purva Halegannada (from the beginning till 10th Century)

Halegannada (from 10th Century to 12th Century)

Nadugannada (from 12th Century to 15th Century)

Hosagannada (from 15th Century)

 

1.1 Description of Kannada Script

Kannada script is the visual form of Kannada language. It originated from southern Bramhi lipi of Ashoka period. It underwent modifications periodically in the reign of Sathavahanas, Kadambas, Gangas, Rastrakutas, and Hoysalas. Even before seventh-Century, the Telugu-Kannada script was used in the inscriptions of the Kadambas of Banavasi and the early Chalukya of Badami in the west. From the middle of the seventh century the archaic variety of the Telugu-Kannada script developed a middle variety. The modern Kannada and Telugu scripts emerged in the thirteenth Century. Kannada script is also used to write Tulu, Konkani and Kodava languages.

Kannada along with other Indian language scripts shares a large number of structural features. The writing system of Kannada script encompasses the principles governing the phonetics and a syllabic writing systems, and phonemic writing systems (alphabets). The effective unit of writing Kannada is the orthographic syllable consisting of a consonant and vowel (CV) core and optionally, one or more preceding consonants, with a canonical structure of   ((C) C) CV. The orthographic syllable need not correspond exactly with a phonological syllable, especially when a consonant cluster is involved, but the writing system is built on phonological principles and tends to correspond quite closely to pronunciation. The orthographic syllable is built up of alphabetic pieces, the actual letters of Kannada script. These consist of distinct character types: Consonant letters, independent vowels and the corresponding dependent vowel signs. In a text sequence, these characters are stored in logical phonetic order.

The Kannada block of Unicode Standard (0C80 to 0CFF) is based on ISCII-1988 (Indian Standard Code for Information Interchange). The Unicode Standard (Version 3) encodes Kannada characters in the same relative positions as those coded in the ISCII-1988 standard.

 

1.2 Brief introduction to Kannada language

1.2.1 Vowels (Swaras)   Vowels are the independently existing letters which are called Swaras. They are-

There are two types of Swaras depending on the time used to pronounce. They are Hrasva Swara and Deerga Swara.

Hrasva Swara

A freely existing independent vowel which can be pronounced in a single matra time (matra kala) also called as a matra. They are-

       

Deergha Swara  A freely existing independent vowel which can be pronounced in two matras. They are-

1.2.2 Anuswaras  

1.2.3 Visarga   ಅಃ

1.2.4 Avagraha  Also called as Plutha, which is used for the third matra either in a consonant or a vowel.

1.2.5 Consonants (Vyanjanas) These are dependent on vowels to take a independent form of the Consonant. These can be divided into Vargeeya and Avargeeya.

Vargeeya Vyanjanas

ಕ್ ಖ್ ಗ್ ಘ್ ಙ್

ಚ್ ಛ್ ಜ್ ಝ್ ಞ್

ತ್ ಥ್ ದ್ ಧ್ ನ್

ಟ್ ಠ್ ಡ್ ಢ್ ಣ್

ಪ್ ಫ್ ಬ್ ಭ್ ಮ್

 

Avargeeya Vyanjanas

ಯ್ ರ್ ಲ್ ವ್ ಶ್ ಷ್ ಸ್ ಹ್ ಳ್

 

1.2.6 Basic Language Rule in Kannada

When a dependent consonant combines with an independent vowel, a Akshara is formed.

Consonant (Vyanjana) + Vowel (matra) ---> Letter (Akshara)

Example:           ಕ್      +               --->  

Based on this rule we can combine all the Consonants (Vyanjanas) with the existing Vowels (matra)

to form Kagunitha for Kannada alphabet.

 

ಕ ಕಾ ಕಿ ಕೀ ಕು ಕೂ ಕೃ ಕೆ ಕೇ ಕೈ ಕೊ ಕೋ ಕೌ ಕಂ ಕಃ

ಖ ಖಾ ಖಿ ಖೀ ಖು ಖೂ ಖೃ ಖೆ ಖೇ ಖೈ ಖೊ ಖೋ ಖೌ ಖಂ ಖಃ

ಗ ಗಾ ಗಿ ಗೀ ಗು ಗೂ ಗೃ ಗೆ ಗೇ ಗೈ ಗೊ ಗೋ ಗೌ ಗಂ ಗಃ

ಘ ಘಾ ಘಿ ಘೀ ಘೃ ಘೆ ಘೇ ಘೈ ಘೊ ಘೊ ಘೋ ಘೌ ಘಂ ಘಃ

ಙ ಙಾ ಙಿ ಙೀ ಙು ಙೂ ಙೃ ಙೆ ಙೇ ಙೈ ಙೊ ಙೋ ಙೌ ಙಂ ಙಃ

ಚ ಚಾ ಚಿ ಚೀ ಚು ಚೂ ಚೃ ಚೆ ಚೇ ಚೈ ಚೊ ಚೋ ಚೌ ಚಂ ಚಃ

ಛಾ ಛಿ ಛೀ ಛು ಛೂ ಛೃ ಛೆ ಛೇ ಛೈ ಛೊ ಛೋ ಛೌ ಛಂ ಛಃ

ಜ ಜಾ ಜಿ ಜೀ ಜು ಜೂ ಜೃ ಜೆ ಜೇ ಜೈ ಜೊ ಜೋ ಜೌ ಜಂ ಜಃ

ಝ ಝಾ ಝಿ ಝೀ ಝು ಝೂ ಝೃ ಝೆ ಝೇ ಝೈ ಝೊ ಝೋ ಝೌ ಝಂ ಝಃ

ಞ ಞಾ ಞಿ ಞೀ ಞು ಞೂ ಞೃ ಞೆ ಞೇ ಞೈ ಞೊ ಞೋ ಞೌ ಞಂ ಞಃ

ತ ತಾ ತಿ ತೀ ತು ತೂ ತೃ ತೆ ತೇ ತೈ ತೊ ತೋ ತೌ ತಂ ತಃ

ಥ ಥಾ ಥಿ ಥೀ ಥು ಥೂ ಥೃ ಥೆ ಥೇ ಥೈ ಥೊ ಥೋ ಥೌ ಥಂ ಥಃ

ದ ದಾ ದಿ ದೀ ದು ದೂ ದೃ ದೆ ದೇ ದೈ ದೊ ದೋ ದೌ ದಂ ದಃ

ಧ ಧಾ ಧಿ ಧೀ ಧು ಧೂ ಧೃ ಧೆ ಧೇ ಧೈ ಧೊ ಧೋ ಧೌ ಧಂ ಧಃ

ನ ನಾ ನಿ ನೀ ನು ನೂ ನೃ ನೆ ನೇ ನೈ ನೊ ನೋ ನೌ ನಂ ನಃ

ಟ ಟಾ ಟಿ ಟೀ ಟು ಟೂ ಟೃ ಟೆ ಟೇ ಟೈ ಟೊ ಟೋ ಟೌ ಟಂ ಟಃ

ಠ ಠಾ ಠಿ ಠೀ ಠು ಠೂ ಠೃ ಠೆ ಠೇ ಠೈ ಠೊ ಠೋ ಠೌ ಠಂ ಠಃ

ಡ ಡಾ ಡಿ ಡೀ ಡು ಡೂ ಡೃ ಡೆ ಡೇ ಡೈ ಡೊ ಡೋ ಡೌ ಡಂ ಡಃ

ಡ ಢಾ ಢಿ ಢೀ ಢು ಢೂ ಢೃ ಢೆ ಢೇ ಢೈ ಢೊ ಢೋ ಢೌ ಢಂ ಢಃ

ಣ ಣಾ ಣಿ ಣೀ ಣು ಣೂ ಣೃ ಣೆ ಣೇ ಣೈ ಣೊ ಣೋ ಣೌ ಣಂ ಣಃ

ಪ ಪಾ ಪಿ ಪೀ ಪು ಪೂ ಪೃ ಪೆ ಪೇ ಪೈ ಪೊ ಪೋ ಪೌ ಪಂ ಪಃ

ಫ ಫಾ ಫಿ ಫೀ ಫು ಫೂ ಫೃ ಫೆ ಫೇ ಫೈ ಫೊ ಫೋ ಫೌ ಫಂ ಫಃ

ಬ ಬಾ ಬಿ ಬೀ ಬು ಬೂ ಬೃ ಬೆ ಬೇ ಬೈ ಬೊ ಬೋ ಬೌ ಬಂ ಬಃ

ಭ ಭಾ ಭಿ ಭೀ ಭು ಭೂ ಭೃ ಭೆ ಭೇ ಭೈ ಭೊ ಭೋ ಭೌ ಭಂ ಭಃ

ಮ ಮಾ ಮಿ ಮೀ ಮು ಮೂ ಮೃ ಮೆ ಮೇ ಮೈ ಮೊ ಮೋ ಮೌ ಮಂ ಮಃ

ಯ ಯಾ ಯಿ ಯೀ ಯು ಯೂ ಯೃ ಯೆ ಯೇ ಯೈ ಯೊ ಯೋ ಯೌ ಯಂ ಯಃ

ರ ರಾ ರಿ ರೀ ರು ರೂ ರೃ ರೆ ರೇ ರೈ ರೊ ರೋ ರೌ ರಂ ರಃ

ಲ ಲಾ ಲಿ ಲೀ ಲು ಲೂ ಲೃ ಲೆ ಲೇ ಲೈ ಲೊ ಲೋ ಲೌ ಲಂ ಲಃ

ವ ವಾ ವಿ ವೀ ವು ವೂ ವೃ ವೆ ವೇ ವೈ ವೊ ವೋ ವೌ ವಂ ವಃ

ಶ ಶಾ ಶಿ ಶೀ ಶು ಶೂ ಶೃ ಶೆ ಶೇ ಶೈ ಶೊ ಶೋ ಶೌ ಶಂ ಶಃ

ಷ ಷಾ ಷಿ ಷೀ ಷು ಷೂ ಷೃ ಷೆ ಷೇ ಷೈ ಷೊ ಷೋ ಷೌ

ಸ ಸಾ ಸಿ ಸೀ ಸು ಸೂ ಸೃ ಸೆ ಸೇ ಸೈ ಸೊ ಸೋ ಸೌ ಸಂ ಸಃ

ಹ ಹಾ ಹಿ ಹೀ ಹು ಹೂ ಹೃ ಹೆ ಹೇ ಹೈ ಹೊ ಹೋ ಹೌ ಹಂ ಹಃ

ಳ ಳಾ ಳಿ ಳೀ ಳು ಳೂ ಳೃ ಳೆ ಳೇ ಳೈ ಳೊ ಳೋ ಳೌ ಳಂ ಳಃ

 

2. Technical Characteristics

Note: The Convention followed from this section of the document is same as the Unicode Chapter 9 (South and Southeast Asian Scripts) and might not be grammatically correct.

2.1 Kannada Alphabet Characteristic

2.1.2 Consonant Letters

Each of the consonant represents a single consonantal sound but also has the peculiarity of having inherent vowel, generally the short vowel (U+0C85).  Thus, U+0C95 Kannada letter KA represents not just K () but KA (). In the presence of the dependent vowel, however, the inherent vowel associated with a consonant letter is overridden by the dependent vowel. The different Consonants in Kannada are:

2.1.3 Dependent Vowel Signs (Matras)

The dependent vowels, also known as Swaras in Kannada, serve as the common manner of writing non-inherent vowels and are generally referred to as Swara Chinhas in Kannada or Matras in Sanskrit. The dependent vowels do not appear stand-alone; rather, they are visibly depicted in combination with a base-letter form (generally a consonant). A single consonant or a consonant cluster may have a dependent vowel applied to it to indicate the vowel quality of the syllable, when it is different from the inherent vowel. Explicit appearance of a dependent vowel in a syllable overrides the inherent vowel (U +0C85) of a single consonant letter.

There are several variations with which the dependent vowels are applied to the base letterforms. Most of them appear as non-spacing dependent vowels signs when applied to base letterforms; above to the right side of a consonant letter or a consonant cluster. The following are the exceptions and variations for the above rule:

·         The two dependent vowel signs (U+0CCC3 & U+0CC4) appear one level below and to the right of the consonant or the consonant cluster, separated by a small white space.

·         Each of the five dependent vowels (U+0CC0, U+0CC7, U+0CC8, U+0CCA & U+0CCB) are depicted  by two or three glyph components (two part or three part vowel signs ) with one component appearing with a space to the right of the consonant or the consonant cluster.

i) In the case o f three the above-mentioned two/three-part dependent vowels (at U+0CC0, U+0CC7, U+0CCB), the non-spacing components of each of them is (are) the same as the vowel sign(s) of the corresponding preceding short vowels. The spacing component for each of these dependent vowels is the same ‘length mark U+0CD5 given in Unicode version 3. The logic for this is that these dependent vowels are nothing but the long forms (independent and phonetically distinct) of the preceding short vowels.

ii) The first component of the dependent vowel (U+0CC8) mentioned above, is the same as the dependent vowel (, U+0CC6) with the second component  (U+0CD6) defined independently in Unicode version 3. The second part appears slightly below and to the right of the consonant or the consonant clusters.

·         In view of this, it is important to note that the two glyphs (the length mark and the second component of ) represent with the codes at U+0CD5 and U+0CD6 in Unicode version 3 have no independent existence and do not play any part as independent codes in the collation algorithm.

·         Unlike Devanagari, the Kannada script does not have any character with a left-side dependent vowel sign.

·         A one-to-one correspondence exists between independent vowels and dependent vowel signs.

The Matras are-

ಾ ಿ ೀ ು ೂ ೃ ೆ ೇ ೈ ೊ ೋ ೌ ಂ ಃ

2.1.4 Virama (Halant)

Like Devanagari, Kannada script also employs a sign known as Halant or vowel omission sign. A halant sign (, U+0CCD) nominally serves to cancel (or kill) the inherent vowel of the consonant to which it is applied. It functions as a combining character. When a consonant has lost its inherent vowel by the application of halant, it is known as a dead consonant. The dead consonants are the presentation forms used to depict the consonants without an inherent vowel. Their rendered forms in Kannada resemble the full consonant with vertical stem replaced by the halant sign, which marks a character core. The stem glyph (U+0CBB) is graphically and historically related to the sign denoting the inherent /a/ () vowel (U+0C85). In contrast, a live consonant is a consonant that retains its inherent vowel or is written with an explicit dependent vowel sign. The dead consonant is defined as a sequence consisting of a consonant letter followed by a halant. The default rendering for a dead consonant is to position the halant as a combining mark bound to the consonant letterform. The Halant in Kannada is

2.1.5 Consonant Conjuncts

Like any other Indian script, Kannada is also noted for a large number of consonant conjunct forms that serve as orthographic abbreviations (ligatures) of two or more adjacent forms. This abbreviation takes place only in the context of a consonant cluster. An orthographic consonant cluster is defined as a sequence of characters that represent one or more dead consonants (denoted by Cd) followed by a normal live consonant (denoted by Cl).

Corresponding to each Kannada consonant, there exists a separate and unique glyph, which is specially used to represent the corresponding consonant in a consonant cluster. Most of these conjunct consonant glyphs resemble their original consonant forms (many without the implicit vowel sign, wherever applicable).

 In Kannada, there is only one type of conjunct formation (consonant cluster) and it is depicted as follows:

·         The first consonant of the consonant cluster is rendered with the implicit or a different dependant vowel appearing as the terminal element of the consonant cluster.

·         The remaining consonants (consonants in between the first consonant and the terminal vowel element) appear in conjunct consonant glyph forms in the phonetic order. They are generally depicted directly below or sometimes below but to the right of the first consonant.

Thus, the systematically designed Kannada script font contains the conjunct glyph components, but they are not encoded as Unicode characters, because they are the resultant of ligation of distinct letters. Kannada script rendering software must be able to map appropriate combinations of characters in context to the appropriate conjunct glyphs in fonts.

2.1.6 Visarg

Comes after a vowel sound and represents a sound similar to /h/.

2.1.7 Avagraha

Avagraha sign is a spacing mark used while rendering Sanskrit text. This is located at U+0CBD.

2.1.8 Numerals