ICU 4.8.1.1
4.8.1.1
|
C API: Unicode Script Information. More...
#include "unicode/utypes.h"
Go to the source code of this file.
C API: Unicode Script Information.
Definition in file uscript.h.
typedef enum UScriptCode UScriptCode |
Constants for ISO 15924 script codes.
Many of these script codes - those from Unicode's ScriptNames.txt - are character property values for Unicode's Script property. See UAX #24 Script Names (http://www.unicode.org/reports/tr24/).
Starting with ICU 3.6, constants for most ISO 15924 script codes are included (currently excluding private-use codes Qaaa..Qabx). For scripts for which there are codes in ISO 15924 but which are not used in the Unicode Character Database (UCD), there are no Unicode characters associated with those scripts.
For example, there are no characters that have a UCD script code of Hans or Hant. All Han ideographs have the Hani script code. The Hans and Hant script codes are used with CLDR data.
ISO 15924 script codes are included for use with CLDR and similar.
enum UScriptCode |
Constants for ISO 15924 script codes.
Many of these script codes - those from Unicode's ScriptNames.txt - are character property values for Unicode's Script property. See UAX #24 Script Names (http://www.unicode.org/reports/tr24/).
Starting with ICU 3.6, constants for most ISO 15924 script codes are included (currently excluding private-use codes Qaaa..Qabx). For scripts for which there are codes in ISO 15924 but which are not used in the Unicode Character Database (UCD), there are no Unicode characters associated with those scripts.
For example, there are no characters that have a UCD script code of Hans or Hant. All Han ideographs have the Hani script code. The Hans and Hant script codes are used with CLDR data.
ISO 15924 script codes are included for use with CLDR and similar.
USCRIPT_INVALID_CODE |
|
USCRIPT_COMMON |
|
USCRIPT_INHERITED |
|
USCRIPT_ARABIC |
|
USCRIPT_ARMENIAN |
|
USCRIPT_BENGALI |
|
USCRIPT_BOPOMOFO |
|
USCRIPT_CHEROKEE |
|
USCRIPT_COPTIC |
|
USCRIPT_CYRILLIC |
|
USCRIPT_DESERET |
|
USCRIPT_DEVANAGARI |
|
USCRIPT_ETHIOPIC |
|
USCRIPT_GEORGIAN |
|
USCRIPT_GOTHIC |
|
USCRIPT_GREEK |
|
USCRIPT_GUJARATI |
|
USCRIPT_GURMUKHI |
|
USCRIPT_HAN |
|
USCRIPT_HANGUL |
|
USCRIPT_HEBREW |
|
USCRIPT_HIRAGANA |
|
USCRIPT_KANNADA |
|
USCRIPT_KATAKANA |
|
USCRIPT_KHMER |
|
USCRIPT_LAO |
|
USCRIPT_LATIN |
|
USCRIPT_MALAYALAM |
|
USCRIPT_MONGOLIAN |
|
USCRIPT_MYANMAR |
|
USCRIPT_OGHAM |
|
USCRIPT_OLD_ITALIC |
|
USCRIPT_ORIYA |
|
USCRIPT_RUNIC |
|
USCRIPT_SINHALA |
|
USCRIPT_SYRIAC |
|
USCRIPT_TAMIL |
|
USCRIPT_TELUGU |
|
USCRIPT_THAANA |
|
USCRIPT_THAI |
|
USCRIPT_TIBETAN |
|
USCRIPT_CANADIAN_ABORIGINAL |
Canadian_Aboriginal script.
|
USCRIPT_UCAS |
Canadian_Aboriginal script (alias).
|
USCRIPT_YI |
|
USCRIPT_TAGALOG |
|
USCRIPT_HANUNOO |
|
USCRIPT_BUHID |
|
USCRIPT_TAGBANWA |
|
USCRIPT_BRAILLE |
|
USCRIPT_CYPRIOT |
|
USCRIPT_LIMBU |
|
USCRIPT_LINEAR_B |
|
USCRIPT_OSMANYA |
|
USCRIPT_SHAVIAN |
|
USCRIPT_TAI_LE |
|
USCRIPT_UGARITIC |
|
USCRIPT_KATAKANA_OR_HIRAGANA |
New script code in Unicode 4.0.1.
|
USCRIPT_BUGINESE |
|
USCRIPT_GLAGOLITIC |
|
USCRIPT_KHAROSHTHI |
|
USCRIPT_SYLOTI_NAGRI |
|
USCRIPT_NEW_TAI_LUE |
|
USCRIPT_TIFINAGH |
|
USCRIPT_OLD_PERSIAN |
|
USCRIPT_BALINESE |
|
USCRIPT_BATAK |
|
USCRIPT_BLISSYMBOLS |
|
USCRIPT_BRAHMI |
|
USCRIPT_CHAM |
|
USCRIPT_CIRTH |
|
USCRIPT_OLD_CHURCH_SLAVONIC_CYRILLIC |
|
USCRIPT_DEMOTIC_EGYPTIAN |
|
USCRIPT_HIERATIC_EGYPTIAN |
|
USCRIPT_EGYPTIAN_HIEROGLYPHS |
|
USCRIPT_KHUTSURI |
|
USCRIPT_SIMPLIFIED_HAN |
|
USCRIPT_TRADITIONAL_HAN |
|
USCRIPT_PAHAWH_HMONG |
|
USCRIPT_OLD_HUNGARIAN |
|
USCRIPT_HARAPPAN_INDUS |
|
USCRIPT_JAVANESE |
|
USCRIPT_KAYAH_LI |
|
USCRIPT_LATIN_FRAKTUR |
|
USCRIPT_LATIN_GAELIC |
|
USCRIPT_LEPCHA |
|
USCRIPT_LINEAR_A |
|
USCRIPT_MANDAIC |
|
USCRIPT_MANDAEAN |
|
USCRIPT_MAYAN_HIEROGLYPHS |
|
USCRIPT_MEROITIC_HIEROGLYPHS |
|
USCRIPT_MEROITIC |
|
USCRIPT_NKO |
|
USCRIPT_ORKHON |
|
USCRIPT_OLD_PERMIC |
|
USCRIPT_PHAGS_PA |
|
USCRIPT_PHOENICIAN |
|
USCRIPT_PHONETIC_POLLARD |
|
USCRIPT_RONGORONGO |
|
USCRIPT_SARATI |
|
USCRIPT_ESTRANGELO_SYRIAC |
|
USCRIPT_WESTERN_SYRIAC |
|
USCRIPT_EASTERN_SYRIAC |
|
USCRIPT_TENGWAR |
|
USCRIPT_VAI |
|
USCRIPT_VISIBLE_SPEECH |
|
USCRIPT_CUNEIFORM |
|
USCRIPT_UNWRITTEN_LANGUAGES |
|
USCRIPT_UNKNOWN |
|
USCRIPT_CARIAN |
|
USCRIPT_JAPANESE |
|
USCRIPT_LANNA |
|
USCRIPT_LYCIAN |
|
USCRIPT_LYDIAN |
|
USCRIPT_OL_CHIKI |
|
USCRIPT_REJANG |
|
USCRIPT_SAURASHTRA |
|
USCRIPT_SIGN_WRITING |
|
USCRIPT_SUNDANESE |
|
USCRIPT_MOON |
|
USCRIPT_MEITEI_MAYEK |
|
USCRIPT_IMPERIAL_ARAMAIC |
|
USCRIPT_AVESTAN |
|
USCRIPT_CHAKMA |
|
USCRIPT_KOREAN |
|
USCRIPT_KAITHI |
|
USCRIPT_MANICHAEAN |
|
USCRIPT_INSCRIPTIONAL_PAHLAVI |
|
USCRIPT_PSALTER_PAHLAVI |
|
USCRIPT_BOOK_PAHLAVI |
|
USCRIPT_INSCRIPTIONAL_PARTHIAN |
|
USCRIPT_SAMARITAN |
|
USCRIPT_TAI_VIET |
|
USCRIPT_MATHEMATICAL_NOTATION |
|
USCRIPT_SYMBOLS |
|
USCRIPT_BAMUM |
|
USCRIPT_LISU |
|
USCRIPT_NAKHI_GEBA |
|
USCRIPT_OLD_SOUTH_ARABIAN |
|
USCRIPT_BASSA_VAH |
|
USCRIPT_DUPLOYAN_SHORTAND |
|
USCRIPT_ELBASAN |
|
USCRIPT_GRANTHA |
|
USCRIPT_KPELLE |
|
USCRIPT_LOMA |
|
USCRIPT_MENDE |
|
USCRIPT_MEROITIC_CURSIVE |
|
USCRIPT_OLD_NORTH_ARABIAN |
|
USCRIPT_NABATAEAN |
|
USCRIPT_PALMYRENE |
|
USCRIPT_SINDHI |
|
USCRIPT_WARANG_CITI |
|
USCRIPT_AFAKA |
|
USCRIPT_JURCHEN |
|
USCRIPT_MRO |
|
USCRIPT_NUSHU |
|
USCRIPT_SHARADA |
|
USCRIPT_SORA_SOMPENG |
|
USCRIPT_TAKRI |
|
USCRIPT_TANGUT |
|
USCRIPT_WOLEAI |
|
USCRIPT_CODE_LIMIT |
|
int32_t uscript_getCode | ( | const char * | nameOrAbbrOrLocale, |
UScriptCode * | fillIn, | ||
int32_t | capacity, | ||
UErrorCode * | err | ||
) |
Gets script codes associated with the given locale or ISO 15924 abbreviation or name.
Fills in USCRIPT_MALAYALAM given "Malayam" OR "Mlym". Fills in USCRIPT_LATIN given "en" OR "en_US" If required capacity is greater than capacity of the destination buffer then the error code is set to U_BUFFER_OVERFLOW_ERROR and the required capacity is returned
Note: To search by short or long script alias only, use u_getPropertyValueEnum(UCHAR_SCRIPT, alias) instead. This does a fast lookup with no access of the locale data.
nameOrAbbrOrLocale | name of the script, as given in PropertyValueAliases.txt, or ISO 15924 code or locale |
fillIn | the UScriptCode buffer to fill in the script code |
capacity | the capacity (size) fo UScriptCode buffer passed in. |
err | the error status code. |
const char* uscript_getName | ( | UScriptCode | scriptCode | ) |
Gets a script name associated with the given script code.
Returns "Malayam" given USCRIPT_MALAYALAM
scriptCode | UScriptCode enum |
UScriptCode uscript_getScript | ( | UChar32 | codepoint, |
UErrorCode * | err | ||
) |
Gets the script code associated with the given codepoint.
Returns USCRIPT_MALAYALAM given 0x0D02
codepoint | UChar32 codepoint |
err | the error status code. |
int32_t uscript_getScriptExtensions | ( | UChar32 | c, |
UScriptCode * | scripts, | ||
int32_t | capacity, | ||
UErrorCode * | errorCode | ||
) |
Writes code point c's Script_Extensions as a list of UScriptCode values to the output scripts array.
Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.
If there are more than capacity script codes to be written, then U_BUFFER_OVERFLOW_ERROR is set and the number of Script_Extensions is returned. (Usual ICU buffer handling behavior.)
The Script_Extensions property is provisional. It may be modified or removed in future versions of the Unicode Standard, and thus in ICU.
c | code point |
scripts | output script code array |
capacity | capacity of the scripts array |
errorCode | Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.) |
const char* uscript_getShortName | ( | UScriptCode | scriptCode | ) |
Gets a script name associated with the given script code.
Returns "Mlym" given USCRIPT_MALAYALAM
scriptCode | UScriptCode enum |
UBool uscript_hasScript | ( | UChar32 | c, |
UScriptCode | sc | ||
) |
Is code point c used in script sc? That is, does code point c have the Script property value sc, or do code point c's Script_Extensions include script code sc?
Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.
The Script_Extensions property is provisional. It may be modified or removed in future versions of the Unicode Standard, and thus in ICU.
c | code point |
sc | script code |