HumHub Documentation (unofficial)

voku
helper

ASCII
in package

Application

FinalYes

## 🇷🇺 Русским гражданам В Украине сейчас идет война. Силами РФ наносятся удары по гражданской инфраструктуре в [Харькове][1], [Киеве][2], [Чернигове][3], [Сумах][4], [Ирпене][5] и десятках других городов. Гибнут люди - и гражданское население, и военные, в том числе российские призывники, которых бросили воевать. Чтобы лишить собственный народ доступа к информации, правительство РФ запретило называть войну войной, закрыло независимые СМИ и принимает сейчас ряд диктаторских законов. Эти законы призваны заткнуть рот всем, кто против войны. За обычный призыв к миру сейчас можно получить несколько лет тюрьмы.

Не молчите! Молчание - знак вашего согласия с политикой российского правительства. Вы можете сделать выбор НЕ МОЛЧАТЬ.

🇺🇸 To people of Russia

There is a war in Ukraine right now. The forces of the Russian Federation are attacking civilian infrastructure in [Kharkiv][1], [Kyiv][2], [Chernihiv][3], [Sumy][4], [Irpin][5] and dozens of other cities. People are dying – both civilians and military servicemen, including Russian conscripts who were thrown into the fighting. In order to deprive its own people of access to information, the government of the Russian Federation has forbidden calling a war a war, shut down independent media and is passing a number of dictatorial laws. These laws are meant to silence all those who are against war. You can be jailed for multiple years for simply calling for peace. Do not be silent! Silence is a sign that you accept the Russian government's policy. You can choose NOT TO BE SILENT.

[1] https://cloudfront-us-east-2.images.arcpublishing.com/reuters/P7K2MSZDGFMIJPDD7CI2GIROJI.jpg "Kharkiv under attack"
[2] https://gdb.voanews.com/01bd0000-0aff-0242-fad0-08d9fc92c5b3_cx0_cy5_cw0_w1023_r1_s.jpg "Kyiv under attack"
[3] https://ichef.bbci.co.uk/news/976/cpsprodpb/163DD/production/_123510119_hi074310744.jpg "Chernihiv under attack"
[4] https://www.youtube.com/watch?v=8K-bkqKKf2A "Sumy under attack"
[5] https://cloudfront-us-east-2.images.arcpublishing.com/reuters/K4MTMLEHTRKGFK3GSKAT4GR3NE.jpg "Irpin under attack"

Constants

AMHARIC_LANGUAGE_CODE = 'am'
ARABIC_LANGUAGE_CODE = 'ar'
ARMENIAN_LANGUAGE_CODE = 'hy'
AZERBAIJANI_LANGUAGE_CODE = 'az'
BELARUSIAN_LANGUAGE_CODE = 'be'
BENGALI_LANGUAGE_CODE = 'bn'
BULGARIAN_LANGUAGE_CODE = 'bg'
CHINESE_LANGUAGE_CODE = 'zh'
CROATIAN_LANGUAGE_CODE = 'hr'
CZECH_LANGUAGE_CODE = 'cs'
DANISH_LANGUAGE_CODE = 'da'
DUTCH_LANGUAGE_CODE = 'nl'
ENGLISH_LANGUAGE_CODE = 'en'
ESPERANTO_LANGUAGE_CODE = 'eo'
ESTONIAN_LANGUAGE_CODE = 'et'
EXTRA_LATIN_CHARS_LANGUAGE_CODE = 'latin'
EXTRA_MSWORD_CHARS_LANGUAGE_CODE = 'msword'
EXTRA_WHITESPACE_CHARS_LANGUAGE_CODE = ' '
FINNISH_LANGUAGE_CODE = 'fi'
FRENCH_AUSTRIAN_LANGUAGE_CODE = 'fr_at'
FRENCH_LANGUAGE_CODE = 'fr'
FRENCH_SWITZERLAND_LANGUAGE_CODE = 'fr_ch'
GEORGIAN_LANGUAGE_CODE = 'ka'
GERMAN_AUSTRIAN_LANGUAGE_CODE = 'de_at'
GERMAN_LANGUAGE_CODE = 'de'
GERMAN_SWITZERLAND_LANGUAGE_CODE = 'de_ch'
GREEK_LANGUAGE_CODE = 'el'
GREEKLISH_LANGUAGE_CODE = 'el__greeklish'
HINDI_LANGUAGE_CODE = 'hi'
HUNGARIAN_LANGUAGE_CODE = 'hu'
ITALIAN_LANGUAGE_CODE = 'it'
JAPANESE_LANGUAGE_CODE = 'ja'
KAZAKH_LANGUAGE_CODE = 'kk'
KIRGHIZ_LANGUAGE_CODE = 'ky'
KOREAN_LANGUAGE_CODE = 'ko'
LATVIAN_LANGUAGE_CODE = 'lv'
LITHUANIAN_LANGUAGE_CODE = 'lt'
MACEDONIAN_LANGUAGE_CODE = 'mk'
MONGOLIAN_LANGUAGE_CODE = 'mn'
MYANMAR_LANGUAGE_CODE = 'my'
NORWEGIAN_LANGUAGE_CODE = 'no'
ORIYA_LANGUAGE_CODE = 'or'
PASHTO_LANGUAGE_CODE = 'ps'
PERSIAN_LANGUAGE_CODE = 'fa'
POLISH_LANGUAGE_CODE = 'pl'
PORTUGUESE_LANGUAGE_CODE = 'pt'
ROMANIAN_LANGUAGE_CODE = 'ro'
RUSSIAN_GOST_2000_B_LANGUAGE_CODE = 'ru__gost_2000_b'
RUSSIAN_LANGUAGE_CODE = 'ru'
RUSSIAN_PASSPORT_2013_LANGUAGE_CODE = 'ru__passport_2013'
SERBIAN_CYRILLIC_LANGUAGE_CODE = 'sr__cyr'
SERBIAN_LANGUAGE_CODE = 'sr'
SERBIAN_LATIN_LANGUAGE_CODE = 'sr__lat'
SLOVAK_LANGUAGE_CODE = 'sk'
SWEDISH_LANGUAGE_CODE = 'sv'
THAI_LANGUAGE_CODE = 'th'
TURKISH_LANGUAGE_CODE = 'tr'
TURKMEN_LANGUAGE_CODE = 'tk'
UKRAINIAN_LANGUAGE_CODE = 'uk'
UZBEK_LANGUAGE_CODE = 'uz'
VIETNAMESE_LANGUAGE_CODE = 'vi'

Properties

$ASCII_EXTRAS : array<string, array<string, string>>|null
$ASCII_MAPS : array<string, array<string, string>>|null
$ASCII_MAPS_AND_EXTRAS : array<string, array<string, string>>|null
$BIDI_UNI_CODE_CONTROLS_TABLE : array<int, string>: bidirectional text chars
$LANGUAGE_MAX_KEY : array<string, int>|null
$ORD : array<string, int>|null
$REGEX_ASCII : string: url: https://en.wikipedia.org/wiki/Wikipedia:ASCII#ASCII_printable_characters

Methods

charsArray() : array<string|int, mixed>: Returns an replacement array for ASCII methods.
charsArrayWithMultiLanguageValues() : array<string|int, mixed>: Returns an replacement array for ASCII methods with a mix of multiple languages.
charsArrayWithOneLanguage() : array<string|int, mixed>: Returns an replacement array for ASCII methods with one language.
charsArrayWithSingleLanguageValues() : array<string|int, mixed>: Returns an replacement array for ASCII methods with multiple languages.
clean() : string: Accepts a string and removes all non-UTF-8 characters from it + extras if needed.
getAllLanguages() : array<string|int, string>: Get all languages from the constants "ASCII::.*LANGUAGE_CODE".
is_ascii() : bool: Checks if a string is 7 bit ASCII.
normalize_msword() : string: Returns a string with smart quotes, ellipsis characters, and dashes from Windows-1252 (commonly used in Word documents) replaced by their ASCII equivalents.
normalize_whitespace() : string: Normalize the whitespace.
remove_invisible_characters() : string: Remove invisible characters from a string.
to_ascii() : string: Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed by default. The language or locale of the source string can be supplied for language-specific transliteration in any of the following formats: en, en_GB, or en-GB. For example, passing "de" results in "äöü" mapping to "aeoeue" rather than "aou" as in other languages.
to_ascii_remap() : array<string|int, string>: WARNING: This method will return broken characters and is only for special cases.
to_filename() : string: Convert given string to safe filename (and keep string case).
to_slugify() : string: Converts the string into an URL slug. This includes replacing non-ASCII characters with their closest ASCII equivalents, removing remaining non-ASCII and non-alphanumeric characters, and replacing whitespace with $separator. The separator defaults to a single dash, and the string is also converted to lowercase. The language of the source string can also be supplied for language-specific transliteration.
to_transliterate() : string: Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed unless instructed otherwise.
get_language() : string: Get the language from a string.
getData() : array<string|int, mixed>: Get data from "/data/*.php".
getDataIfExists() : array<string|int, mixed>: Get data from "/data/*.php".
prepareAsciiAndExtrasMaps() : void
prepareAsciiExtras() : void
prepareAsciiMaps() : void
to_ascii_remap_intern() : string: WARNING: This method will return broken characters and is only for special cases.

AMHARIC_LANGUAGE_CODE


    public
        mixed
    AMHARIC_LANGUAGE_CODE
    = 'am'

ARABIC_LANGUAGE_CODE


    public
        mixed
    ARABIC_LANGUAGE_CODE
    = 'ar'

ARMENIAN_LANGUAGE_CODE


    public
        mixed
    ARMENIAN_LANGUAGE_CODE
    = 'hy'

AZERBAIJANI_LANGUAGE_CODE


    public
        mixed
    AZERBAIJANI_LANGUAGE_CODE
    = 'az'

BELARUSIAN_LANGUAGE_CODE


    public
        mixed
    BELARUSIAN_LANGUAGE_CODE
    = 'be'

BENGALI_LANGUAGE_CODE


    public
        mixed
    BENGALI_LANGUAGE_CODE
    = 'bn'

BULGARIAN_LANGUAGE_CODE


    public
        mixed
    BULGARIAN_LANGUAGE_CODE
    = 'bg'

CHINESE_LANGUAGE_CODE


    public
        mixed
    CHINESE_LANGUAGE_CODE
    = 'zh'

CROATIAN_LANGUAGE_CODE


    public
        mixed
    CROATIAN_LANGUAGE_CODE
    = 'hr'

CZECH_LANGUAGE_CODE


    public
        mixed
    CZECH_LANGUAGE_CODE
    = 'cs'

DANISH_LANGUAGE_CODE


    public
        mixed
    DANISH_LANGUAGE_CODE
    = 'da'

DUTCH_LANGUAGE_CODE


    public
        mixed
    DUTCH_LANGUAGE_CODE
    = 'nl'

ENGLISH_LANGUAGE_CODE


    public
        mixed
    ENGLISH_LANGUAGE_CODE
    = 'en'

ESPERANTO_LANGUAGE_CODE


    public
        mixed
    ESPERANTO_LANGUAGE_CODE
    = 'eo'

ESTONIAN_LANGUAGE_CODE


    public
        mixed
    ESTONIAN_LANGUAGE_CODE
    = 'et'

EXTRA_LATIN_CHARS_LANGUAGE_CODE


    public
        mixed
    EXTRA_LATIN_CHARS_LANGUAGE_CODE
    = 'latin'

EXTRA_MSWORD_CHARS_LANGUAGE_CODE


    public
        mixed
    EXTRA_MSWORD_CHARS_LANGUAGE_CODE
    = 'msword'

EXTRA_WHITESPACE_CHARS_LANGUAGE_CODE


    public
        mixed
    EXTRA_WHITESPACE_CHARS_LANGUAGE_CODE
    = ' '

FINNISH_LANGUAGE_CODE


    public
        mixed
    FINNISH_LANGUAGE_CODE
    = 'fi'

FRENCH_AUSTRIAN_LANGUAGE_CODE


    public
        mixed
    FRENCH_AUSTRIAN_LANGUAGE_CODE
    = 'fr_at'

FRENCH_LANGUAGE_CODE


    public
        mixed
    FRENCH_LANGUAGE_CODE
    = 'fr'

FRENCH_SWITZERLAND_LANGUAGE_CODE


    public
        mixed
    FRENCH_SWITZERLAND_LANGUAGE_CODE
    = 'fr_ch'

GEORGIAN_LANGUAGE_CODE


    public
        mixed
    GEORGIAN_LANGUAGE_CODE
    = 'ka'

GERMAN_AUSTRIAN_LANGUAGE_CODE


    public
        mixed
    GERMAN_AUSTRIAN_LANGUAGE_CODE
    = 'de_at'

GERMAN_LANGUAGE_CODE


    public
        mixed
    GERMAN_LANGUAGE_CODE
    = 'de'

GERMAN_SWITZERLAND_LANGUAGE_CODE


    public
        mixed
    GERMAN_SWITZERLAND_LANGUAGE_CODE
    = 'de_ch'

GREEK_LANGUAGE_CODE


    public
        mixed
    GREEK_LANGUAGE_CODE
    = 'el'

GREEKLISH_LANGUAGE_CODE


    public
        mixed
    GREEKLISH_LANGUAGE_CODE
    = 'el__greeklish'

HINDI_LANGUAGE_CODE


    public
        mixed
    HINDI_LANGUAGE_CODE
    = 'hi'

HUNGARIAN_LANGUAGE_CODE


    public
        mixed
    HUNGARIAN_LANGUAGE_CODE
    = 'hu'

ITALIAN_LANGUAGE_CODE


    public
        mixed
    ITALIAN_LANGUAGE_CODE
    = 'it'

JAPANESE_LANGUAGE_CODE


    public
        mixed
    JAPANESE_LANGUAGE_CODE
    = 'ja'

KAZAKH_LANGUAGE_CODE


    public
        mixed
    KAZAKH_LANGUAGE_CODE
    = 'kk'

KIRGHIZ_LANGUAGE_CODE


    public
        mixed
    KIRGHIZ_LANGUAGE_CODE
    = 'ky'

KOREAN_LANGUAGE_CODE


    public
        mixed
    KOREAN_LANGUAGE_CODE
    = 'ko'

LATVIAN_LANGUAGE_CODE


    public
        mixed
    LATVIAN_LANGUAGE_CODE
    = 'lv'

LITHUANIAN_LANGUAGE_CODE


    public
        mixed
    LITHUANIAN_LANGUAGE_CODE
    = 'lt'

MACEDONIAN_LANGUAGE_CODE


    public
        mixed
    MACEDONIAN_LANGUAGE_CODE
    = 'mk'

MONGOLIAN_LANGUAGE_CODE


    public
        mixed
    MONGOLIAN_LANGUAGE_CODE
    = 'mn'

MYANMAR_LANGUAGE_CODE


    public
        mixed
    MYANMAR_LANGUAGE_CODE
    = 'my'

NORWEGIAN_LANGUAGE_CODE


    public
        mixed
    NORWEGIAN_LANGUAGE_CODE
    = 'no'

ORIYA_LANGUAGE_CODE


    public
        mixed
    ORIYA_LANGUAGE_CODE
    = 'or'

PASHTO_LANGUAGE_CODE


    public
        mixed
    PASHTO_LANGUAGE_CODE
    = 'ps'

PERSIAN_LANGUAGE_CODE


    public
        mixed
    PERSIAN_LANGUAGE_CODE
    = 'fa'

POLISH_LANGUAGE_CODE


    public
        mixed
    POLISH_LANGUAGE_CODE
    = 'pl'

PORTUGUESE_LANGUAGE_CODE


    public
        mixed
    PORTUGUESE_LANGUAGE_CODE
    = 'pt'

ROMANIAN_LANGUAGE_CODE


    public
        mixed
    ROMANIAN_LANGUAGE_CODE
    = 'ro'

RUSSIAN_GOST_2000_B_LANGUAGE_CODE


    public
        mixed
    RUSSIAN_GOST_2000_B_LANGUAGE_CODE
    = 'ru__gost_2000_b'

RUSSIAN_LANGUAGE_CODE


    public
        mixed
    RUSSIAN_LANGUAGE_CODE
    = 'ru'

RUSSIAN_PASSPORT_2013_LANGUAGE_CODE


    public
        mixed
    RUSSIAN_PASSPORT_2013_LANGUAGE_CODE
    = 'ru__passport_2013'

SERBIAN_CYRILLIC_LANGUAGE_CODE


    public
        mixed
    SERBIAN_CYRILLIC_LANGUAGE_CODE
    = 'sr__cyr'

SERBIAN_LANGUAGE_CODE


    public
        mixed
    SERBIAN_LANGUAGE_CODE
    = 'sr'

SERBIAN_LATIN_LANGUAGE_CODE


    public
        mixed
    SERBIAN_LATIN_LANGUAGE_CODE
    = 'sr__lat'

SLOVAK_LANGUAGE_CODE


    public
        mixed
    SLOVAK_LANGUAGE_CODE
    = 'sk'

SWEDISH_LANGUAGE_CODE


    public
        mixed
    SWEDISH_LANGUAGE_CODE
    = 'sv'

THAI_LANGUAGE_CODE


    public
        mixed
    THAI_LANGUAGE_CODE
    = 'th'

TURKISH_LANGUAGE_CODE


    public
        mixed
    TURKISH_LANGUAGE_CODE
    = 'tr'

TURKMEN_LANGUAGE_CODE


    public
        mixed
    TURKMEN_LANGUAGE_CODE
    = 'tk'

UKRAINIAN_LANGUAGE_CODE


    public
        mixed
    UKRAINIAN_LANGUAGE_CODE
    = 'uk'

UZBEK_LANGUAGE_CODE


    public
        mixed
    UZBEK_LANGUAGE_CODE
    = 'uz'

VIETNAMESE_LANGUAGE_CODE


    public
        mixed
    VIETNAMESE_LANGUAGE_CODE
    = 'vi'

$ASCII_EXTRAS


    private
    static    array<string, array<string, string>>|null
    $ASCII_EXTRAS

$ASCII_MAPS


    private
    static    array<string, array<string, string>>|null
    $ASCII_MAPS

$ASCII_MAPS_AND_EXTRAS


    private
    static    array<string, array<string, string>>|null
    $ASCII_MAPS_AND_EXTRAS

$BIDI_UNI_CODE_CONTROLS_TABLE

bidirectional text chars


    private
    static    array<int, string>
    $BIDI_UNI_CODE_CONTROLS_TABLE
     = [
    // LEFT-TO-RIGHT EMBEDDING (use -> dir = "ltr")
    8234 => "‪",
    // RIGHT-TO-LEFT EMBEDDING (use -> dir = "rtl")
    8235 => "‫",
    // POP DIRECTIONAL FORMATTING // (use -> </bdo>)
    8236 => "‬",
    // LEFT-TO-RIGHT OVERRIDE // (use -> <bdo dir = "ltr">)
    8237 => "‭",
    // RIGHT-TO-LEFT OVERRIDE // (use -> <bdo dir = "rtl">)
    8238 => "‮",
    // LEFT-TO-RIGHT ISOLATE // (use -> dir = "ltr")
    8294 => "⁦",
    // RIGHT-TO-LEFT ISOLATE // (use -> dir = "rtl")
    8295 => "⁧",
    // FIRST STRONG ISOLATE // (use -> dir = "auto")
    8296 => "⁨",
    // POP DIRECTIONAL ISOLATE
    8297 => "⁩",
]

url: https://www.w3.org/International/questions/qa-bidi-unicode-controls

$LANGUAGE_MAX_KEY


    private
    static    array<string, int>|null
    $LANGUAGE_MAX_KEY

$ORD


    private
    static    array<string, int>|null
    $ORD

$REGEX_ASCII

url: https://en.wikipedia.org/wiki/Wikipedia:ASCII#ASCII_printable_characters


    private
    static    string
    $REGEX_ASCII
     = "[^\t\x10\x13\n\r -~]"

charsArray()

Returns an replacement array for ASCII methods.


    public
            static        charsArray([bool $replace_extra_symbols = false ]) : array<string|int, mixed>

EXAMPLE: $array = ASCII::charsArray(); var_dump($array['ru']['б']); // 'b'

Parameters

$replace_extra_symbols : bool = false: [optional]
Add some more replacements e.g. "£" with " pound ".

Return values

array<string|int, mixed>

charsArrayWithMultiLanguageValues()

Returns an replacement array for ASCII methods with a mix of multiple languages.


    public
            static        charsArrayWithMultiLanguageValues([bool $replace_extra_symbols = false ]) : array<string|int, mixed>

EXAMPLE: $array = ASCII::charsArrayWithMultiLanguageValues(); var_dump($array['b']); // ['β', 'б', 'ဗ', 'ბ', 'ب']

Parameters

$replace_extra_symbols : bool = false: [optional]
Add some more replacements e.g. "£" with " pound ".

Return values

array<string|int, mixed> —

An array of replacements.

charsArrayWithOneLanguage()

Returns an replacement array for ASCII methods with one language.


    public
            static        charsArrayWithOneLanguage([string $language = self::ENGLISH_LANGUAGE_CODE ][, bool $replace_extra_symbols = false ][, bool $asOrigReplaceArray = true ]) : array<string|int, mixed>

For example, German will map 'ä' to 'ae', while other languages will simply return e.g. 'a'.

EXAMPLE: $array = ASCII::charsArrayWithOneLanguage('ru'); $tmpKey = \array_search('yo', $array['replace']); echo $array['orig'][$tmpKey]; // 'ё'

Parameters

$language : string = self::ENGLISH_LANGUAGE_CODE: [optional]
Language of the source string e.g.: en, de_at, or de-ch. (default is 'en') | ASCII::*_LANGUAGE_CODE
$replace_extra_symbols : bool = false: [optional]
Add some more replacements e.g. "£" with " pound ".
$asOrigReplaceArray : bool = true: [optional]
TRUE === return {orig: string[], replace: string[]} array

Return values

array<string|int, mixed> —

An array of replacements.

charsArrayWithSingleLanguageValues()

Returns an replacement array for ASCII methods with multiple languages.


    public
            static        charsArrayWithSingleLanguageValues([bool $replace_extra_symbols = false ][, bool $asOrigReplaceArray = true ]) : array<string|int, mixed>

EXAMPLE: $array = ASCII::charsArrayWithSingleLanguageValues(); $tmpKey = \array_search('hnaik', $array['replace']); echo $array['orig'][$tmpKey]; // '၌'

Parameters

$replace_extra_symbols : bool = false: [optional]
Add some more replacements e.g. "£" with " pound ".
$asOrigReplaceArray : bool = true: [optional]
TRUE === return {orig: string[], replace: string[]} array

Return values

array<string|int, mixed> —

An array of replacements.

clean()

Accepts a string and removes all non-UTF-8 characters from it + extras if needed.


    public
            static        clean(string $str[, bool $normalize_whitespace = true ][, bool $keep_non_breaking_space = false ][, bool $normalize_msword = true ][, bool $remove_invisible_characters = true ]) : string

Parameters

$str : string: The string to be sanitized.
$normalize_whitespace : bool = true: [optional]
Set to true, if you need to normalize the whitespace.
$keep_non_breaking_space : bool = false: [optional]
Set to true, to keep non-breaking-spaces, in combination with $normalize_whitespace
$normalize_msword : bool = true: [optional]
Set to true, if you need to normalize MS Word chars e.g.: "…" => "..."
$remove_invisible_characters : bool = true: [optional]
Set to false, if you not want to remove invisible characters e.g.: "\0"

Return values

string —

A clean UTF-8 string.

getAllLanguages()

Get all languages from the constants "ASCII::.*LANGUAGE_CODE".


    public
            static        getAllLanguages() : array<string|int, string>

Return values

array<string|int, string>

is_ascii()

Checks if a string is 7 bit ASCII.


    public
            static        is_ascii(string $str) : bool

EXAMPLE: ASCII::is_ascii('白'); // false

Parameters

$str : string: The string to check.

Return values

bool —

true if it is ASCII
false otherwise

normalize_msword()

Returns a string with smart quotes, ellipsis characters, and dashes from Windows-1252 (commonly used in Word documents) replaced by their ASCII equivalents.


    public
            static        normalize_msword(string $str) : string

EXAMPLE: ASCII::normalize_msword('„Abcdef…”'); // '"Abcdef..."'

Parameters

$str : string: The string to be normalized.

Return values

string —

A string with normalized characters for commonly used chars in Word documents.

normalize_whitespace()

Normalize the whitespace.


    public
            static        normalize_whitespace(string $str[, bool $keepNonBreakingSpace = false ][, bool $keepBidiUnicodeControls = false ][, bool $normalize_control_characters = false ]) : string

EXAMPLE: ASCII::normalize_whitespace("abc-\xc2\xa0-öäü-\xe2\x80\xaf-\xE2\x80\xAC", true); // "abc-\xc2\xa0-öäü- -"

Parameters

$str : string: The string to be normalized.
$keepNonBreakingSpace : bool = false: [optional]
Set to true, to keep non-breaking-spaces.
$keepBidiUnicodeControls : bool = false: [optional]
Set to true, to keep non-printable (for the web) bidirectional text chars.
$normalize_control_characters : bool = false: [optional]
Set to true, to convert e.g. LINE-, PARAGRAPH-SEPARATOR with "\n" and LINE TABULATION with "\t".

Return values

string —

A string with normalized whitespace.

remove_invisible_characters()

Remove invisible characters from a string.


    public
            static        remove_invisible_characters(string $str[, bool $url_encoded = false ][, string $replacement = '' ][, bool $keep_basic_control_characters = true ]) : string

e.g.: This prevents sandwiching null characters between ascii characters, like Java\0script.

copy&past from https://github.com/bcit-ci/CodeIgniter/blob/develop/system/core/Common.php

Parameters

$str : string
$url_encoded : bool = false
$replacement : string = ''
$keep_basic_control_characters : bool = true

Return values

string

to_ascii()

Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed by default. The language or locale of the source string can be supplied for language-specific transliteration in any of the following formats: en, en_GB, or en-GB. For example, passing "de" results in "äöü" mapping to "aeoeue" rather than "aou" as in other languages.


    public
            static        to_ascii(string $str[, string $language = self::ENGLISH_LANGUAGE_CODE ][, bool $remove_unsupported_chars = true ][, bool $replace_extra_symbols = false ][, bool $use_transliterate = false ][, bool|null $replace_single_chars_only = null ]) : string

EXAMPLE: ASCII::to_ascii('�Düsseldorf�', 'en'); // Dusseldorf

Parameters

$str : string: The input string.
$language : string = self::ENGLISH_LANGUAGE_CODE: [optional]
Language of the source string. (default is 'en') | ASCII::*_LANGUAGE_CODE
$remove_unsupported_chars : bool = true: [optional]
Whether or not to remove the unsupported characters.
$replace_extra_symbols : bool = false: [optional]
Add some more replacements e.g. "£" with " pound ".
$use_transliterate : bool = false: [optional]
Use ASCII::to_transliterate() for unknown chars.
$replace_single_chars_only : bool|null = null: [optional]
Single char replacement is better for the performance, but some languages need to replace more then one char at the same time. | NULL === auto-setting, depended on the language

Return values

string —

A string that contains only ASCII characters.

to_ascii_remap()

WARNING: This method will return broken characters and is only for special cases.


    public
            static        to_ascii_remap(string $str1, string $str2) : array<string|int, string>

Convert two UTF-8 encoded string to a single-byte strings suitable for functions that need the same string length after the conversion.

The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.

Parameters

$str1 : string
$str2 : string

Return values

array<string|int, string>

to_filename()

Convert given string to safe filename (and keep string case).


    public
            static        to_filename(string $str[, bool $use_transliterate = true ][, string $fallback_char = '-' ]) : string

EXAMPLE: ASCII::to_filename('שדגשדג.png', true)); // 'shdgshdg.png'

Parameters

$str : string
$use_transliterate : bool = true: ASCII::to_transliterate() is used by default - unsafe characters are simply replaced with hyphen otherwise.
$fallback_char : string = '-'

Return values

string —

A string that contains only safe characters for a filename.

to_slugify()

Converts the string into an URL slug. This includes replacing non-ASCII characters with their closest ASCII equivalents, removing remaining non-ASCII and non-alphanumeric characters, and replacing whitespace with $separator. The separator defaults to a single dash, and the string is also converted to lowercase. The language of the source string can also be supplied for language-specific transliteration.


    public
            static        to_slugify(string $str[, string $separator = '-' ][, string $language = self::ENGLISH_LANGUAGE_CODE ][, array<string, string> $replacements = [] ][, bool $replace_extra_symbols = false ][, bool $use_str_to_lower = true ][, bool $use_transliterate = false ]) : string

Parameters

$str : string
$separator : string = '-': [optional]
The string used to replace whitespace.
$language : string = self::ENGLISH_LANGUAGE_CODE: [optional]
Language of the source string. (default is 'en') | ASCII::*_LANGUAGE_CODE
$replacements : array<string, string> = []: [optional]
A map of replaceable strings.
$replace_extra_symbols : bool = false: [optional]
Add some more replacements e.g. "£" with " pound ".
$use_str_to_lower : bool = true: [optional]
Use "string to lower" for the input.
$use_transliterate : bool = false: [optional]
Use ASCII::to_transliterate() for unknown chars.

Return values

string —

A string that has been converted to an URL slug.

to_transliterate()

Returns an ASCII version of the string. A set of non-ASCII characters are replaced with their closest ASCII counterparts, and the rest are removed unless instructed otherwise.


    public
            static        to_transliterate(string $str[, string|null $unknown = '?' ][, bool $strict = false ]) : string

EXAMPLE: ASCII::to_transliterate('déjà σσς iıii'); // 'deja sss iiii'

Parameters

$str : string: The input string.
$unknown : string|null = '?': [optional]
Character use if character unknown. (default is '?') But you can also use NULL to keep the unknown chars.
$strict : bool = false: [optional]
Use "transliterator_transliterate()" from PHP-Intl

Return values

string —

A String that contains only ASCII characters.

get_language()

Get the language from a string.


    private
            static        get_language(string $language) : string

e.g.: de_at -> de_at de_DE -> de DE_DE -> de de-de -> de

Parameters

$language : string

Return values

string

getData()

Get data from "/data/*.php".


    private
            static        getData(string $file) : array<string|int, mixed>

Parameters

$file : string

Return values

array<string|int, mixed>

getDataIfExists()

Get data from "/data/*.php".


    private
            static        getDataIfExists(string $file) : array<string|int, mixed>

Parameters

$file : string

Return values

array<string|int, mixed>

prepareAsciiAndExtrasMaps()


    private
            static        prepareAsciiAndExtrasMaps() : void

prepareAsciiExtras()


    private
            static        prepareAsciiExtras() : void

prepareAsciiMaps()


    private
            static        prepareAsciiMaps() : void

to_ascii_remap_intern()

WARNING: This method will return broken characters and is only for special cases.


    private
            static        to_ascii_remap_intern(string $str, array<string|int, mixed> &$map) : string

Convert a UTF-8 encoded string to a single-byte string suitable for functions that need the same string length after the conversion.

The function simply uses (and updates) a tailored dynamic encoding (in/out map parameter) where non-ascii characters are remapped to the range [128-255] in order of appearance.

Thus, it supports up to 128 different multibyte code points max over the whole set of strings sharing this encoding.

Source: https://github.com/KEINOS/mb_levenshtein

Parameters

$str : string: UTF-8 string to be converted to extended ASCII.
$map : array<string|int, mixed>: Internal-Map of code points to ASCII characters.

Return values

string —

Mapped borken string.

ASCII in package Application

Не молчите! Молчание - знак вашего согласия с политикой российского правительства. Вы можете сделать выбор НЕ МОЛЧАТЬ.

🇺🇸 To people of Russia

Tags

Table of Contents

Constants

Properties

Methods

Constants

AMHARIC_LANGUAGE_CODE

ARABIC_LANGUAGE_CODE

ARMENIAN_LANGUAGE_CODE

AZERBAIJANI_LANGUAGE_CODE

BELARUSIAN_LANGUAGE_CODE

BENGALI_LANGUAGE_CODE

BULGARIAN_LANGUAGE_CODE

CHINESE_LANGUAGE_CODE

CROATIAN_LANGUAGE_CODE

CZECH_LANGUAGE_CODE

DANISH_LANGUAGE_CODE

DUTCH_LANGUAGE_CODE

ENGLISH_LANGUAGE_CODE

ESPERANTO_LANGUAGE_CODE

ESTONIAN_LANGUAGE_CODE

EXTRA_LATIN_CHARS_LANGUAGE_CODE

EXTRA_MSWORD_CHARS_LANGUAGE_CODE

EXTRA_WHITESPACE_CHARS_LANGUAGE_CODE

FINNISH_LANGUAGE_CODE

FRENCH_AUSTRIAN_LANGUAGE_CODE

FRENCH_LANGUAGE_CODE

FRENCH_SWITZERLAND_LANGUAGE_CODE

GEORGIAN_LANGUAGE_CODE

GERMAN_AUSTRIAN_LANGUAGE_CODE

GERMAN_LANGUAGE_CODE

GERMAN_SWITZERLAND_LANGUAGE_CODE

GREEK_LANGUAGE_CODE

GREEKLISH_LANGUAGE_CODE

HINDI_LANGUAGE_CODE

HUNGARIAN_LANGUAGE_CODE

ITALIAN_LANGUAGE_CODE

JAPANESE_LANGUAGE_CODE

KAZAKH_LANGUAGE_CODE

KIRGHIZ_LANGUAGE_CODE

KOREAN_LANGUAGE_CODE

LATVIAN_LANGUAGE_CODE

LITHUANIAN_LANGUAGE_CODE

MACEDONIAN_LANGUAGE_CODE

MONGOLIAN_LANGUAGE_CODE

MYANMAR_LANGUAGE_CODE

NORWEGIAN_LANGUAGE_CODE

ORIYA_LANGUAGE_CODE

PASHTO_LANGUAGE_CODE

PERSIAN_LANGUAGE_CODE

POLISH_LANGUAGE_CODE

PORTUGUESE_LANGUAGE_CODE

ROMANIAN_LANGUAGE_CODE

RUSSIAN_GOST_2000_B_LANGUAGE_CODE

RUSSIAN_LANGUAGE_CODE

RUSSIAN_PASSPORT_2013_LANGUAGE_CODE

SERBIAN_CYRILLIC_LANGUAGE_CODE

SERBIAN_LANGUAGE_CODE

SERBIAN_LATIN_LANGUAGE_CODE

SLOVAK_LANGUAGE_CODE

SWEDISH_LANGUAGE_CODE

THAI_LANGUAGE_CODE

TURKISH_LANGUAGE_CODE

TURKMEN_LANGUAGE_CODE

UKRAINIAN_LANGUAGE_CODE

UZBEK_LANGUAGE_CODE

VIETNAMESE_LANGUAGE_CODE

Properties

$ASCII_EXTRAS

ASCII
in package

Application