HumHub Documentation (unofficial)

EmailLexer extends AbstractLexer
in package

Base class for writing simple lexers, i.e. for creating small DSLs.

Tags
extends

AbstractLexer<int, string>

Table of Contents

Constants

AMPERSAND  = 38
ASCII_INVALID_FROM  = 127
ASCII_INVALID_TO  = 199
ASTERISK  = 42
C_DEL  = 127
C_NUL  = 0
CARET  = 94
CATCHABLE_PATTERNS  = [ '[a-zA-Z]+[46]?', //ASCII and domain literal '[^\x00-\x7F]', //UTF-8 '[0-9]+', '\r\n', '::', '\s+?', '.', ]
CRLF  = 1310
DOLLAR  = 36
EXCLAMATION  = 33
GENERIC  = 300
INVALID  = 302
INVALID_CHARS_REGEX  = "/[^\\p{S}\\p{C}\\p{Cc}]+/iu"
INVERT_EXCLAMATION  = 173
INVERT_QUESTIONMARK  = 168
MODIFIERS  = 'iu'
NON_CATCHABLE_PATTERNS  = ['[\xA0-\xff]+']
NUMBER_SIGN  = 35
PERCENTAGE  = 37
QUESTIONMARK  = 63
S_AT  = 64
S_BACKSLASH  = 92
S_BACKTICK  = 96
S_CLOSEBRACKET  = 93
S_CLOSECURLYBRACES  = 125
S_CLOSEPARENTHESIS  = 41
S_COLON  = 58
S_COMMA  = 44
S_CR  = 13
S_DOT  = 46
S_DOUBLECOLON  = 5858
S_DQUOTE  = 34
S_EMPTY  = null
S_EQUAL  = 61
S_GREATERTHAN  = 62
S_HTAB  = 9
S_HYPHEN  = 45
S_IPV6TAG  = 301
S_LF  = 10
S_LOWERTHAN  = 60
S_OPENBRACKET  = 91
S_OPENCURLYBRACES  = 123
S_OPENPARENTHESIS  = 40
S_PIPE  = 124
S_PLUS  = 43
S_SEMICOLON  = 59
S_SLASH  = 47
S_SP  = 32
S_SQUOTE  = 39
S_TILDE  = 126
S_UNDERSCORE  = 95
VALID_UTF8_REGEX  = '/\p{Cc}+/u'

Properties

$lookahead  : array<string|int, mixed>|Token|null
The next token in the input.
$token  : array<string|int, mixed>|Token
The last matched/seen token.
$charValue  : array<string|int, mixed>
US-ASCII visible characters not valid for atext (@link http://tools.ietf.org/html/rfc5322#section-3.2.3)
$hasInvalidTokens  : bool
$previous  : array<string|int, mixed>
$accumulator  : string
$hasToRecord  : bool
$input  : string
Lexer original input string.
$nullToken  : mixed
$peek  : int
Current peek of current lexer position.
$position  : int
Current lexer position in input string.
$regex  : non-empty-string|null
Composed regex for input parsing.
$tokens  : array<int, Token<T, V>>
Array of scanned tokens.

Methods

__construct()  : mixed
clearRecorded()  : void
find()  : bool
getAccumulatedValues()  : string
getInputUntilPosition()  : string
Retrieve the original lexer's input until a given position.
getLiteral()  : int|string
Gets the literal for a given token.
getPrevious()  : array<string|int, mixed>
getPrevious
glimpse()  : Token<T, V>|null
Peeks at the next token, returns it and immediately resets the peek.
hasInvalidTokens()  : bool
isA()  : bool
Checks if given value is identical to the given token.
isNextToken()  : bool
Checks whether a given token matches the current lookahead.
isNextTokenAny()  : bool
Checks whether any of the given tokens matches the current lookahead.
moveNext()  : bool
moveNext
peek()  : Token<T, V>|null
Moves the lookahead token forward.
reset()  : void
Resets the lexer.
resetPeek()  : void
Resets the peek pointer to 0.
resetPosition()  : void
Resets the lexer position on the input to the given position.
setInput()  : void
Sets the input data to be tokenized.
skipUntil()  : void
Tells the lexer to skip input tokens until it sees a token with the given value.
startRecording()  : void
stopRecording()  : void
getCatchablePatterns()  : array<string|int, string>
Lexical catchable patterns.
getModifiers()  : string
Regex modifiers
getNonCatchablePatterns()  : array<string|int, string>
Lexical non-catchable patterns.
getType()  : int
Retrieve token type. Also processes the token value if necessary.
isInvalidChar()  : bool
isNullType()  : bool
isUTF8Invalid()  : bool
isValid()  : bool
scan()  : void
Scans the input string for tokens.

Constants

ASCII_INVALID_FROM

public mixed ASCII_INVALID_FROM = 127

ASCII_INVALID_TO

public mixed ASCII_INVALID_TO = 199

CATCHABLE_PATTERNS

public mixed CATCHABLE_PATTERNS = [ '[a-zA-Z]+[46]?', //ASCII and domain literal '[^\x00-\x7F]', //UTF-8 '[0-9]+', '\r\n', '::', '\s+?', '.', ]

INVALID_CHARS_REGEX

public mixed INVALID_CHARS_REGEX = "/[^\\p{S}\\p{C}\\p{Cc}]+/iu"

INVERT_EXCLAMATION

public mixed INVERT_EXCLAMATION = 173

INVERT_QUESTIONMARK

public mixed INVERT_QUESTIONMARK = 168

NON_CATCHABLE_PATTERNS

public mixed NON_CATCHABLE_PATTERNS = ['[\xA0-\xff]+']

S_CLOSECURLYBRACES

public mixed S_CLOSECURLYBRACES = 125

S_CLOSEPARENTHESIS

public mixed S_CLOSEPARENTHESIS = 41

S_OPENCURLYBRACES

public mixed S_OPENCURLYBRACES = 123

S_OPENPARENTHESIS

public mixed S_OPENPARENTHESIS = 40

VALID_UTF8_REGEX

public mixed VALID_UTF8_REGEX = '/\p{Cc}+/u'

Properties

$lookahead

The next token in the input.

public array<string|int, mixed>|Token|null $lookahead
Tags
psalm-suppress

NonInvariantDocblockPropertyType

psalm-var

array{position: int, type: int|null|string, value: int|string}|Token<int, string>|null

$token

The last matched/seen token.

public array<string|int, mixed>|Token $token
Tags
psalm-suppress

NonInvariantDocblockPropertyType

psalm-var

array{value:string, type:null|int, position:int}|Token<int, string>

$charValue

US-ASCII visible characters not valid for atext (@link http://tools.ietf.org/html/rfc5322#section-3.2.3)

protected array<string|int, mixed> $charValue = ['{' => self::S_OPENCURLYBRACES, '}' => self::S_CLOSECURLYBRACES, '(' => self::S_OPENPARENTHESIS, ')' => self::S_CLOSEPARENTHESIS, '<' => self::S_LOWERTHAN, '>' => self::S_GREATERTHAN, '[' => self::S_OPENBRACKET, ']' => self::S_CLOSEBRACKET, ':' => self::S_COLON, ';' => self::S_SEMICOLON, '@' => self::S_AT, '\\' => self::S_BACKSLASH, '/' => self::S_SLASH, ',' => self::S_COMMA, '.' => self::S_DOT, "'" => self::S_SQUOTE, "`" => self::S_BACKTICK, '"' => self::S_DQUOTE, '-' => self::S_HYPHEN, '::' => self::S_DOUBLECOLON, ' ' => self::S_SP, "\t" => self::S_HTAB, "\r" => self::S_CR, "\n" => self::S_LF, "\r\n" => self::CRLF, 'IPv6' => self::S_IPV6TAG, '' => self::S_EMPTY, '\0' => self::C_NUL, '*' => self::ASTERISK, '!' => self::EXCLAMATION, '&' => self::AMPERSAND, '^' => self::CARET, '$' => self::DOLLAR, '%' => self::PERCENTAGE, '~' => self::S_TILDE, '|' => self::S_PIPE, '_' => self::S_UNDERSCORE, '=' => self::S_EQUAL, '+' => self::S_PLUS, '¿' => self::INVERT_QUESTIONMARK, '?' => self::QUESTIONMARK, '#' => self::NUMBER_SIGN, '¡' => self::INVERT_EXCLAMATION]

$hasInvalidTokens

protected bool $hasInvalidTokens = false

$previous

protected array<string|int, mixed> $previous = []
Tags
psalm-var

array{value:string, type:null|int, position:int}|array<empty, empty>

$nullToken

private static mixed $nullToken = ['value' => '', 'type' => null, 'position' => 0]
Tags
psalm-var

array{value:'', type:null, position:0}

$peek

Current peek of current lexer position.

private int $peek = 0

$position

Current lexer position in input string.

private int $position = 0

$regex

Composed regex for input parsing.

private non-empty-string|null $regex

Methods

find()

public find(int $type) : bool
Parameters
$type : int
Tags
throws
UnexpectedValueException
psalm-suppress

InvalidScalarArgument

Return values
bool

getAccumulatedValues()

public getAccumulatedValues() : string
Return values
string

getInputUntilPosition()

Retrieve the original lexer's input until a given position.

public getInputUntilPosition(int $position) : string
Parameters
$position : int
Return values
string

getLiteral()

Gets the literal for a given token.

public getLiteral(T $token) : int|string
Parameters
$token : T
Return values
int|string

getPrevious()

getPrevious

public getPrevious() : array<string|int, mixed>
Return values
array<string|int, mixed>

glimpse()

Peeks at the next token, returns it and immediately resets the peek.

public glimpse() : Token<T, V>|null
Return values
Token<T, V>|null

The next token or NULL if there are no more tokens ahead.

hasInvalidTokens()

public hasInvalidTokens() : bool
Return values
bool

isA()

Checks if given value is identical to the given token.

public isA(string $value, int|string $token) : bool
Parameters
$value : string
$token : int|string
Return values
bool

isNextToken()

Checks whether a given token matches the current lookahead.

public isNextToken(T $type) : bool
Parameters
$type : T
Tags
psalm-assert-if-true

!=null $this->lookahead

Return values
bool

isNextTokenAny()

Checks whether any of the given tokens matches the current lookahead.

public isNextTokenAny(array<int, T$types) : bool
Parameters
$types : array<int, T>
Tags
psalm-assert-if-true

!=null $this->lookahead

Return values
bool

moveNext()

moveNext

public moveNext() : bool
Return values
bool

peek()

Moves the lookahead token forward.

public peek() : Token<T, V>|null
Return values
Token<T, V>|null

The next token or NULL if there are no more tokens ahead.

reset()

Resets the lexer.

public reset() : void

resetPeek()

Resets the peek pointer to 0.

public resetPeek() : void

resetPosition()

Resets the lexer position on the input to the given position.

public resetPosition([int $position = 0 ]) : void
Parameters
$position : int = 0

Position to place the lexical scanner.

setInput()

Sets the input data to be tokenized.

public setInput(string $input) : void

The Lexer is immediately reset and the new input tokenized. Any unprocessed tokens from any previous input are lost.

Parameters
$input : string

The input to be tokenized.

skipUntil()

Tells the lexer to skip input tokens until it sees a token with the given value.

public skipUntil(T $type) : void
Parameters
$type : T

The token type to skip until.

startRecording()

public startRecording() : void

getCatchablePatterns()

Lexical catchable patterns.

protected getCatchablePatterns() : array<string|int, string>
Return values
array<string|int, string>

getModifiers()

Regex modifiers

protected getModifiers() : string
Return values
string

getNonCatchablePatterns()

Lexical non-catchable patterns.

protected getNonCatchablePatterns() : array<string|int, string>
Return values
array<string|int, string>

getType()

Retrieve token type. Also processes the token value if necessary.

protected getType(string &$value) : int
Parameters
$value : string
Tags
throws
InvalidArgumentException
Return values
int

isInvalidChar()

protected isInvalidChar(string $value) : bool
Parameters
$value : string
Return values
bool

isNullType()

protected isNullType(string $value) : bool
Parameters
$value : string
Return values
bool

isUTF8Invalid()

protected isUTF8Invalid(string $value) : bool
Parameters
$value : string
Return values
bool

isValid()

protected isValid(string $value) : bool
Parameters
$value : string
Return values
bool

scan()

Scans the input string for tokens.

protected scan(string $input) : void
Parameters
$input : string

A query string.


        
On this page

Search results