HumHub Documentation (unofficial)

EmailLexer extends AbstractLexer
in package

Application

Base class for writing simple lexers, i.e. for creating small DSLs.

Constants

AMPERSAND = 38
ASCII_INVALID_FROM = 127
ASCII_INVALID_TO = 199
ASTERISK = 42
C_DEL = 127
C_NUL = 0
CARET = 94
CATCHABLE_PATTERNS = [ '[a-zA-Z]+[46]?', //ASCII and domain literal '[^\x00-\x7F]', //UTF-8 '[0-9]+', '\r\n', '::', '\s+?', '.', ]
CRLF = 1310
DOLLAR = 36
EXCLAMATION = 33
GENERIC = 300
INVALID = 302
INVALID_CHARS_REGEX = "/[^\\p{S}\\p{C}\\p{Cc}]+/iu"
INVERT_EXCLAMATION = 173
INVERT_QUESTIONMARK = 168
MODIFIERS = 'iu'
NON_CATCHABLE_PATTERNS = ['[\xA0-\xff]+']
NUMBER_SIGN = 35
PERCENTAGE = 37
QUESTIONMARK = 63
S_AT = 64
S_BACKSLASH = 92
S_BACKTICK = 96
S_CLOSEBRACKET = 93
S_CLOSECURLYBRACES = 125
S_CLOSEPARENTHESIS = 41
S_COLON = 58
S_COMMA = 44
S_CR = 13
S_DOT = 46
S_DOUBLECOLON = 5858
S_DQUOTE = 34
S_EMPTY = null
S_EQUAL = 61
S_GREATERTHAN = 62
S_HTAB = 9
S_HYPHEN = 45
S_IPV6TAG = 301
S_LF = 10
S_LOWERTHAN = 60
S_OPENBRACKET = 91
S_OPENCURLYBRACES = 123
S_OPENPARENTHESIS = 40
S_PIPE = 124
S_PLUS = 43
S_SEMICOLON = 59
S_SLASH = 47
S_SP = 32
S_SQUOTE = 39
S_TILDE = 126
S_UNDERSCORE = 95
VALID_UTF8_REGEX = '/\p{Cc}+/u'

Properties

$lookahead : array<string|int, mixed>|Token|null: The next token in the input.
$token : array<string|int, mixed>|Token: The last matched/seen token.
$charValue : array<string|int, mixed>: US-ASCII visible characters not valid for atext (@link http://tools.ietf.org/html/rfc5322#section-3.2.3)
$hasInvalidTokens : bool
$previous : array<string|int, mixed>
$accumulator : string
$hasToRecord : bool
$input : string: Lexer original input string.
$nullToken : mixed
$peek : int: Current peek of current lexer position.
$position : int: Current lexer position in input string.
$regex : non-empty-string|null: Composed regex for input parsing.
$tokens : array<int, Token<T, V>>: Array of scanned tokens.

Methods

__construct() : mixed
clearRecorded() : void
find() : bool
getAccumulatedValues() : string
getInputUntilPosition() : string: Retrieve the original lexer's input until a given position.
getLiteral() : int|string: Gets the literal for a given token.
getPrevious() : array<string|int, mixed>: getPrevious
glimpse() : Token<T, V>|null: Peeks at the next token, returns it and immediately resets the peek.
hasInvalidTokens() : bool
isA() : bool: Checks if given value is identical to the given token.
isNextToken() : bool: Checks whether a given token matches the current lookahead.
isNextTokenAny() : bool: Checks whether any of the given tokens matches the current lookahead.
moveNext() : bool: moveNext
peek() : Token<T, V>|null: Moves the lookahead token forward.
reset() : void: Resets the lexer.
resetPeek() : void: Resets the peek pointer to 0.
resetPosition() : void: Resets the lexer position on the input to the given position.
setInput() : void: Sets the input data to be tokenized.
skipUntil() : void: Tells the lexer to skip input tokens until it sees a token with the given value.
startRecording() : void
stopRecording() : void
getCatchablePatterns() : array<string|int, string>: Lexical catchable patterns.
getModifiers() : string: Regex modifiers
getNonCatchablePatterns() : array<string|int, string>: Lexical non-catchable patterns.
getType() : int: Retrieve token type. Also processes the token value if necessary.
isInvalidChar() : bool
isNullType() : bool
isUTF8Invalid() : bool
isValid() : bool
scan() : void: Scans the input string for tokens.

AMPERSAND


    public
        mixed
    AMPERSAND
    = 38

ASCII_INVALID_FROM


    public
        mixed
    ASCII_INVALID_FROM
    = 127

ASCII_INVALID_TO


    public
        mixed
    ASCII_INVALID_TO
    = 199

ASTERISK


    public
        mixed
    ASTERISK
    = 42

C_DEL


    public
        mixed
    C_DEL
    = 127

C_NUL


    public
        mixed
    C_NUL
    = 0

CARET


    public
        mixed
    CARET
    = 94

CATCHABLE_PATTERNS


    public
        mixed
    CATCHABLE_PATTERNS
    = [
    '[a-zA-Z]+[46]?',
    //ASCII and domain literal
    '[^\x00-\x7F]',
    //UTF-8
    '[0-9]+',
    '\r\n',
    '::',
    '\s+?',
    '.',
]

CRLF


    public
        mixed
    CRLF
    = 1310

DOLLAR


    public
        mixed
    DOLLAR
    = 36

EXCLAMATION


    public
        mixed
    EXCLAMATION
    = 33

GENERIC


    public
        mixed
    GENERIC
    = 300

INVALID


    public
        mixed
    INVALID
    = 302

INVALID_CHARS_REGEX


    public
        mixed
    INVALID_CHARS_REGEX
    = "/[^\\p{S}\\p{C}\\p{Cc}]+/iu"

INVERT_EXCLAMATION


    public
        mixed
    INVERT_EXCLAMATION
    = 173

INVERT_QUESTIONMARK


    public
        mixed
    INVERT_QUESTIONMARK
    = 168

MODIFIERS


    public
        mixed
    MODIFIERS
    = 'iu'

NON_CATCHABLE_PATTERNS


    public
        mixed
    NON_CATCHABLE_PATTERNS
    = ['[\xA0-\xff]+']

NUMBER_SIGN


    public
        mixed
    NUMBER_SIGN
    = 35

PERCENTAGE


    public
        mixed
    PERCENTAGE
    = 37

QUESTIONMARK


    public
        mixed
    QUESTIONMARK
    = 63

S_AT


    public
        mixed
    S_AT
    = 64

S_BACKSLASH


    public
        mixed
    S_BACKSLASH
    = 92

S_BACKTICK


    public
        mixed
    S_BACKTICK
    = 96

S_CLOSEBRACKET


    public
        mixed
    S_CLOSEBRACKET
    = 93

S_CLOSECURLYBRACES


    public
        mixed
    S_CLOSECURLYBRACES
    = 125

S_CLOSEPARENTHESIS


    public
        mixed
    S_CLOSEPARENTHESIS
    = 41

S_COLON


    public
        mixed
    S_COLON
    = 58

S_COMMA


    public
        mixed
    S_COMMA
    = 44

S_CR


    public
        mixed
    S_CR
    = 13

S_DOT


    public
        mixed
    S_DOT
    = 46

S_DOUBLECOLON


    public
        mixed
    S_DOUBLECOLON
    = 5858

S_DQUOTE


    public
        mixed
    S_DQUOTE
    = 34

S_EMPTY


    public
        mixed
    S_EMPTY
    = null

S_EQUAL


    public
        mixed
    S_EQUAL
    = 61

S_GREATERTHAN


    public
        mixed
    S_GREATERTHAN
    = 62

S_HTAB


    public
        mixed
    S_HTAB
    = 9

S_HYPHEN


    public
        mixed
    S_HYPHEN
    = 45

S_IPV6TAG


    public
        mixed
    S_IPV6TAG
    = 301

S_LF


    public
        mixed
    S_LF
    = 10

S_LOWERTHAN


    public
        mixed
    S_LOWERTHAN
    = 60

S_OPENBRACKET


    public
        mixed
    S_OPENBRACKET
    = 91

S_OPENCURLYBRACES


    public
        mixed
    S_OPENCURLYBRACES
    = 123

S_OPENPARENTHESIS


    public
        mixed
    S_OPENPARENTHESIS
    = 40

S_PIPE


    public
        mixed
    S_PIPE
    = 124

S_PLUS


    public
        mixed
    S_PLUS
    = 43

S_SEMICOLON


    public
        mixed
    S_SEMICOLON
    = 59

S_SLASH


    public
        mixed
    S_SLASH
    = 47

S_SP


    public
        mixed
    S_SP
    = 32

S_SQUOTE


    public
        mixed
    S_SQUOTE
    = 39

S_TILDE


    public
        mixed
    S_TILDE
    = 126

S_UNDERSCORE


    public
        mixed
    S_UNDERSCORE
    = 95

VALID_UTF8_REGEX


    public
        mixed
    VALID_UTF8_REGEX
    = '/\p{Cc}+/u'

$lookahead

The next token in the input.


    public
        array<string|int, mixed>|Token|null
    $lookahead

$token

The last matched/seen token.


    public
        array<string|int, mixed>|Token
    $token

$charValue

US-ASCII visible characters not valid for atext (@link http://tools.ietf.org/html/rfc5322#section-3.2.3)


    protected
        array<string|int, mixed>
    $charValue
     = ['{' => self::S_OPENCURLYBRACES, '}' => self::S_CLOSECURLYBRACES, '(' => self::S_OPENPARENTHESIS, ')' => self::S_CLOSEPARENTHESIS, '<' => self::S_LOWERTHAN, '>' => self::S_GREATERTHAN, '[' => self::S_OPENBRACKET, ']' => self::S_CLOSEBRACKET, ':' => self::S_COLON, ';' => self::S_SEMICOLON, '@' => self::S_AT, '\\' => self::S_BACKSLASH, '/' => self::S_SLASH, ',' => self::S_COMMA, '.' => self::S_DOT, "'" => self::S_SQUOTE, "`" => self::S_BACKTICK, '"' => self::S_DQUOTE, '-' => self::S_HYPHEN, '::' => self::S_DOUBLECOLON, ' ' => self::S_SP, "\t" => self::S_HTAB, "\r" => self::S_CR, "\n" => self::S_LF, "\r\n" => self::CRLF, 'IPv6' => self::S_IPV6TAG, '' => self::S_EMPTY, '\0' => self::C_NUL, '*' => self::ASTERISK, '!' => self::EXCLAMATION, '&' => self::AMPERSAND, '^' => self::CARET, '$' => self::DOLLAR, '%' => self::PERCENTAGE, '~' => self::S_TILDE, '|' => self::S_PIPE, '_' => self::S_UNDERSCORE, '=' => self::S_EQUAL, '+' => self::S_PLUS, '¿' => self::INVERT_QUESTIONMARK, '?' => self::QUESTIONMARK, '#' => self::NUMBER_SIGN, '¡' => self::INVERT_EXCLAMATION]

$hasInvalidTokens


    protected
        bool
    $hasInvalidTokens
     = false

$previous


    protected
        array<string|int, mixed>
    $previous
     = []

$accumulator


    private
        string
    $accumulator
     = ''

$hasToRecord


    private
        bool
    $hasToRecord
     = false

$input

Lexer original input string.


    private
        string
    $input

$nullToken


    private
    static    mixed
    $nullToken
     = ['value' => '', 'type' => null, 'position' => 0]

$peek

Current peek of current lexer position.


    private
        int
    $peek
     = 0

$position

Current lexer position in input string.


    private
        int
    $position
     = 0

$regex

Composed regex for input parsing.


    private
        non-empty-string|null
    $regex

$tokens

Array of scanned tokens.


    private
        array<int, Token<T, V>>
    $tokens
     = []

__construct()


    public
                    __construct() : mixed

clearRecorded()


    public
                    clearRecorded() : void

find()


    public
                    find(int $type) : bool

Parameters

$type : int

Return values

bool

getAccumulatedValues()


    public
                    getAccumulatedValues() : string

Return values

string

getInputUntilPosition()

Retrieve the original lexer's input until a given position.


    public
                    getInputUntilPosition(int $position) : string

Parameters

$position : int

Return values

string

getLiteral()

Gets the literal for a given token.


    public
                    getLiteral(T $token) : int|string

Parameters

$token : T

Return values

int|string

getPrevious()

getPrevious


    public
                    getPrevious() : array<string|int, mixed>

Return values

array<string|int, mixed>

glimpse()

Peeks at the next token, returns it and immediately resets the peek.


    public
                    glimpse() : Token<T, V>|null

Return values

Token<T, V>|null —

The next token or NULL if there are no more tokens ahead.

hasInvalidTokens()


    public
                    hasInvalidTokens() : bool

Return values

bool

isA()

Checks if given value is identical to the given token.


    public
                    isA(string $value, int|string $token) : bool

Parameters

$value : string
$token : int|string

Return values

bool

isNextToken()

Checks whether a given token matches the current lookahead.


    public
                    isNextToken(T $type) : bool

Parameters

$type : T

Return values

bool

isNextTokenAny()

Checks whether any of the given tokens matches the current lookahead.


    public
                    isNextTokenAny(array<int, T> $types) : bool

Parameters

$types : array<int, T>

Return values

bool

moveNext()

moveNext


    public
                    moveNext() : bool

Return values

bool

peek()

Moves the lookahead token forward.


    public
                    peek() : Token<T, V>|null

Return values

Token<T, V>|null —

The next token or NULL if there are no more tokens ahead.

reset()

Resets the lexer.


    public
                    reset() : void

resetPeek()

Resets the peek pointer to 0.


    public
                    resetPeek() : void

resetPosition()

Resets the lexer position on the input to the given position.


    public
                    resetPosition([int $position = 0 ]) : void

Parameters

$position : int = 0: Position to place the lexical scanner.

setInput()

Sets the input data to be tokenized.


    public
                    setInput(string $input) : void

The Lexer is immediately reset and the new input tokenized. Any unprocessed tokens from any previous input are lost.

Parameters

$input : string: The input to be tokenized.

skipUntil()

Tells the lexer to skip input tokens until it sees a token with the given value.


    public
                    skipUntil(T $type) : void

Parameters

$type : T: The token type to skip until.

startRecording()


    public
                    startRecording() : void

stopRecording()


    public
                    stopRecording() : void

getCatchablePatterns()

Lexical catchable patterns.


    protected
                    getCatchablePatterns() : array<string|int, string>

Return values

array<string|int, string>

getModifiers()

Regex modifiers


    protected
                    getModifiers() : string

Return values

string

getNonCatchablePatterns()

Lexical non-catchable patterns.


    protected
                    getNonCatchablePatterns() : array<string|int, string>

Return values

array<string|int, string>

getType()

Retrieve token type. Also processes the token value if necessary.


    protected
                    getType(string &$value) : int

Parameters

$value : string

Return values

int

isInvalidChar()


    protected
                    isInvalidChar(string $value) : bool

Parameters

$value : string

Return values

bool

isNullType()


    protected
                    isNullType(string $value) : bool

Parameters

$value : string

Return values

bool

isUTF8Invalid()


    protected
                    isUTF8Invalid(string $value) : bool

Parameters

$value : string

Return values

bool

isValid()


    protected
                    isValid(string $value) : bool

Parameters

$value : string

Return values

bool

scan()

Scans the input string for tokens.


    protected
                    scan(string $input) : void

Parameters

$input : string: A query string.

EmailLexer extends AbstractLexer in package Application

Tags

Table of Contents

Constants

Properties

Methods

Constants

AMPERSAND

ASCII_INVALID_FROM

ASCII_INVALID_TO

ASTERISK

C_DEL

C_NUL

CARET

CATCHABLE_PATTERNS

CRLF

DOLLAR

EXCLAMATION

GENERIC

INVALID

INVALID_CHARS_REGEX

INVERT_EXCLAMATION

INVERT_QUESTIONMARK

MODIFIERS

NON_CATCHABLE_PATTERNS

NUMBER_SIGN

PERCENTAGE

QUESTIONMARK

S_AT

S_BACKSLASH

S_BACKTICK

S_CLOSEBRACKET

S_CLOSECURLYBRACES

S_CLOSEPARENTHESIS

S_COLON

S_COMMA

S_CR

S_DOT

S_DOUBLECOLON

S_DQUOTE

S_EMPTY

S_EQUAL

S_GREATERTHAN

S_HTAB

S_HYPHEN

S_IPV6TAG

S_LF

S_LOWERTHAN

S_OPENBRACKET

S_OPENCURLYBRACES

S_OPENPARENTHESIS

S_PIPE

S_PLUS

S_SEMICOLON

S_SLASH

S_SP

S_SQUOTE

S_TILDE

S_UNDERSCORE

VALID_UTF8_REGEX

Properties

$lookahead

Tags

$token

Tags

$charValue

$hasInvalidTokens

$previous

Tags

$accumulator

$hasToRecord

$input

$nullToken

EmailLexer extends AbstractLexer
in package

Application