EmailLexer
extends AbstractLexer
in package
Base class for writing simple lexers, i.e. for creating small DSLs.
Tags
Table of Contents
Constants
- AMPERSAND = 38
- ASCII_INVALID_FROM = 127
- ASCII_INVALID_TO = 199
- ASTERISK = 42
- C_DEL = 127
- C_NUL = 0
- CARET = 94
- CATCHABLE_PATTERNS = [ '[a-zA-Z]+[46]?', //ASCII and domain literal '[^\x00-\x7F]', //UTF-8 '[0-9]+', '\r\n', '::', '\s+?', '.', ]
- CRLF = 1310
- DOLLAR = 36
- EXCLAMATION = 33
- GENERIC = 300
- INVALID = 302
- INVALID_CHARS_REGEX = "/[^\\p{S}\\p{C}\\p{Cc}]+/iu"
- INVERT_EXCLAMATION = 173
- INVERT_QUESTIONMARK = 168
- MODIFIERS = 'iu'
- NON_CATCHABLE_PATTERNS = ['[\xA0-\xff]+']
- NUMBER_SIGN = 35
- PERCENTAGE = 37
- QUESTIONMARK = 63
- S_AT = 64
- S_BACKSLASH = 92
- S_BACKTICK = 96
- S_CLOSEBRACKET = 93
- S_CLOSECURLYBRACES = 125
- S_CLOSEPARENTHESIS = 41
- S_COLON = 58
- S_COMMA = 44
- S_CR = 13
- S_DOT = 46
- S_DOUBLECOLON = 5858
- S_DQUOTE = 34
- S_EMPTY = null
- S_EQUAL = 61
- S_GREATERTHAN = 62
- S_HTAB = 9
- S_HYPHEN = 45
- S_IPV6TAG = 301
- S_LF = 10
- S_LOWERTHAN = 60
- S_OPENBRACKET = 91
- S_OPENCURLYBRACES = 123
- S_OPENPARENTHESIS = 40
- S_PIPE = 124
- S_PLUS = 43
- S_SEMICOLON = 59
- S_SLASH = 47
- S_SP = 32
- S_SQUOTE = 39
- S_TILDE = 126
- S_UNDERSCORE = 95
- VALID_UTF8_REGEX = '/\p{Cc}+/u'
Properties
- $lookahead : array<string|int, mixed>|Token|null
- The next token in the input.
- $token : array<string|int, mixed>|Token
- The last matched/seen token.
- $charValue : array<string|int, mixed>
- US-ASCII visible characters not valid for atext (@link http://tools.ietf.org/html/rfc5322#section-3.2.3)
- $hasInvalidTokens : bool
- $previous : array<string|int, mixed>
- $accumulator : string
- $hasToRecord : bool
- $input : string
- Lexer original input string.
- $nullToken : mixed
- $peek : int
- Current peek of current lexer position.
- $position : int
- Current lexer position in input string.
- $regex : non-empty-string|null
- Composed regex for input parsing.
- $tokens : array<int, Token<T, V>>
- Array of scanned tokens.
Methods
- __construct() : mixed
- clearRecorded() : void
- find() : bool
- getAccumulatedValues() : string
- getInputUntilPosition() : string
- Retrieve the original lexer's input until a given position.
- getLiteral() : int|string
- Gets the literal for a given token.
- getPrevious() : array<string|int, mixed>
- getPrevious
- glimpse() : Token<T, V>|null
- Peeks at the next token, returns it and immediately resets the peek.
- hasInvalidTokens() : bool
- isA() : bool
- Checks if given value is identical to the given token.
- isNextToken() : bool
- Checks whether a given token matches the current lookahead.
- isNextTokenAny() : bool
- Checks whether any of the given tokens matches the current lookahead.
- moveNext() : bool
- moveNext
- peek() : Token<T, V>|null
- Moves the lookahead token forward.
- reset() : void
- Resets the lexer.
- resetPeek() : void
- Resets the peek pointer to 0.
- resetPosition() : void
- Resets the lexer position on the input to the given position.
- setInput() : void
- Sets the input data to be tokenized.
- skipUntil() : void
- Tells the lexer to skip input tokens until it sees a token with the given value.
- startRecording() : void
- stopRecording() : void
- getCatchablePatterns() : array<string|int, string>
- Lexical catchable patterns.
- getModifiers() : string
- Regex modifiers
- getNonCatchablePatterns() : array<string|int, string>
- Lexical non-catchable patterns.
- getType() : int
- Retrieve token type. Also processes the token value if necessary.
- isInvalidChar() : bool
- isNullType() : bool
- isUTF8Invalid() : bool
- isValid() : bool
- scan() : void
- Scans the input string for tokens.
Constants
AMPERSAND
public
mixed
AMPERSAND
= 38
ASCII_INVALID_FROM
public
mixed
ASCII_INVALID_FROM
= 127
ASCII_INVALID_TO
public
mixed
ASCII_INVALID_TO
= 199
ASTERISK
public
mixed
ASTERISK
= 42
C_DEL
public
mixed
C_DEL
= 127
C_NUL
public
mixed
C_NUL
= 0
CARET
public
mixed
CARET
= 94
CATCHABLE_PATTERNS
public
mixed
CATCHABLE_PATTERNS
= [
'[a-zA-Z]+[46]?',
//ASCII and domain literal
'[^\x00-\x7F]',
//UTF-8
'[0-9]+',
'\r\n',
'::',
'\s+?',
'.',
]
CRLF
public
mixed
CRLF
= 1310
DOLLAR
public
mixed
DOLLAR
= 36
EXCLAMATION
public
mixed
EXCLAMATION
= 33
GENERIC
public
mixed
GENERIC
= 300
INVALID
public
mixed
INVALID
= 302
INVALID_CHARS_REGEX
public
mixed
INVALID_CHARS_REGEX
= "/[^\\p{S}\\p{C}\\p{Cc}]+/iu"
INVERT_EXCLAMATION
public
mixed
INVERT_EXCLAMATION
= 173
INVERT_QUESTIONMARK
public
mixed
INVERT_QUESTIONMARK
= 168
MODIFIERS
public
mixed
MODIFIERS
= 'iu'
NON_CATCHABLE_PATTERNS
public
mixed
NON_CATCHABLE_PATTERNS
= ['[\xA0-\xff]+']
NUMBER_SIGN
public
mixed
NUMBER_SIGN
= 35
PERCENTAGE
public
mixed
PERCENTAGE
= 37
QUESTIONMARK
public
mixed
QUESTIONMARK
= 63
S_AT
public
mixed
S_AT
= 64
S_BACKSLASH
public
mixed
S_BACKSLASH
= 92
S_BACKTICK
public
mixed
S_BACKTICK
= 96
S_CLOSEBRACKET
public
mixed
S_CLOSEBRACKET
= 93
S_CLOSECURLYBRACES
public
mixed
S_CLOSECURLYBRACES
= 125
S_CLOSEPARENTHESIS
public
mixed
S_CLOSEPARENTHESIS
= 41
S_COLON
public
mixed
S_COLON
= 58
S_COMMA
public
mixed
S_COMMA
= 44
S_CR
public
mixed
S_CR
= 13
S_DOT
public
mixed
S_DOT
= 46
S_DOUBLECOLON
public
mixed
S_DOUBLECOLON
= 5858
S_DQUOTE
public
mixed
S_DQUOTE
= 34
S_EMPTY
public
mixed
S_EMPTY
= null
S_EQUAL
public
mixed
S_EQUAL
= 61
S_GREATERTHAN
public
mixed
S_GREATERTHAN
= 62
S_HTAB
public
mixed
S_HTAB
= 9
S_HYPHEN
public
mixed
S_HYPHEN
= 45
S_IPV6TAG
public
mixed
S_IPV6TAG
= 301
S_LF
public
mixed
S_LF
= 10
S_LOWERTHAN
public
mixed
S_LOWERTHAN
= 60
S_OPENBRACKET
public
mixed
S_OPENBRACKET
= 91
S_OPENCURLYBRACES
public
mixed
S_OPENCURLYBRACES
= 123
S_OPENPARENTHESIS
public
mixed
S_OPENPARENTHESIS
= 40
S_PIPE
public
mixed
S_PIPE
= 124
S_PLUS
public
mixed
S_PLUS
= 43
S_SEMICOLON
public
mixed
S_SEMICOLON
= 59
S_SLASH
public
mixed
S_SLASH
= 47
S_SP
public
mixed
S_SP
= 32
S_SQUOTE
public
mixed
S_SQUOTE
= 39
S_TILDE
public
mixed
S_TILDE
= 126
S_UNDERSCORE
public
mixed
S_UNDERSCORE
= 95
VALID_UTF8_REGEX
public
mixed
VALID_UTF8_REGEX
= '/\p{Cc}+/u'
Properties
$lookahead
The next token in the input.
public
array<string|int, mixed>|Token|null
$lookahead
Tags
$token
The last matched/seen token.
public
array<string|int, mixed>|Token
$token
Tags
$charValue
US-ASCII visible characters not valid for atext (@link http://tools.ietf.org/html/rfc5322#section-3.2.3)
protected
array<string|int, mixed>
$charValue
= ['{' => self::S_OPENCURLYBRACES, '}' => self::S_CLOSECURLYBRACES, '(' => self::S_OPENPARENTHESIS, ')' => self::S_CLOSEPARENTHESIS, '<' => self::S_LOWERTHAN, '>' => self::S_GREATERTHAN, '[' => self::S_OPENBRACKET, ']' => self::S_CLOSEBRACKET, ':' => self::S_COLON, ';' => self::S_SEMICOLON, '@' => self::S_AT, '\\' => self::S_BACKSLASH, '/' => self::S_SLASH, ',' => self::S_COMMA, '.' => self::S_DOT, "'" => self::S_SQUOTE, "`" => self::S_BACKTICK, '"' => self::S_DQUOTE, '-' => self::S_HYPHEN, '::' => self::S_DOUBLECOLON, ' ' => self::S_SP, "\t" => self::S_HTAB, "\r" => self::S_CR, "\n" => self::S_LF, "\r\n" => self::CRLF, 'IPv6' => self::S_IPV6TAG, '' => self::S_EMPTY, '\0' => self::C_NUL, '*' => self::ASTERISK, '!' => self::EXCLAMATION, '&' => self::AMPERSAND, '^' => self::CARET, '$' => self::DOLLAR, '%' => self::PERCENTAGE, '~' => self::S_TILDE, '|' => self::S_PIPE, '_' => self::S_UNDERSCORE, '=' => self::S_EQUAL, '+' => self::S_PLUS, '¿' => self::INVERT_QUESTIONMARK, '?' => self::QUESTIONMARK, '#' => self::NUMBER_SIGN, '¡' => self::INVERT_EXCLAMATION]
$hasInvalidTokens
protected
bool
$hasInvalidTokens
= false
$previous
protected
array<string|int, mixed>
$previous
= []
Tags
$accumulator
private
string
$accumulator
= ''
$hasToRecord
private
bool
$hasToRecord
= false
$input
Lexer original input string.
private
string
$input
$nullToken
private
static mixed
$nullToken
= ['value' => '', 'type' => null, 'position' => 0]
Tags
$peek
Current peek of current lexer position.
private
int
$peek
= 0
$position
Current lexer position in input string.
private
int
$position
= 0
$regex
Composed regex for input parsing.
private
non-empty-string|null
$regex
$tokens
Array of scanned tokens.
private
array<int, Token<T, V>>
$tokens
= []
Methods
__construct()
public
__construct() : mixed
clearRecorded()
public
clearRecorded() : void
find()
public
find(int $type) : bool
Parameters
- $type : int
Tags
Return values
boolgetAccumulatedValues()
public
getAccumulatedValues() : string
Return values
stringgetInputUntilPosition()
Retrieve the original lexer's input until a given position.
public
getInputUntilPosition(int $position) : string
Parameters
- $position : int
Return values
stringgetLiteral()
Gets the literal for a given token.
public
getLiteral(T $token) : int|string
Parameters
- $token : T
Return values
int|stringgetPrevious()
getPrevious
public
getPrevious() : array<string|int, mixed>
Return values
array<string|int, mixed>glimpse()
Peeks at the next token, returns it and immediately resets the peek.
public
glimpse() : Token<T, V>|null
Return values
Token<T, V>|null —The next token or NULL if there are no more tokens ahead.
hasInvalidTokens()
public
hasInvalidTokens() : bool
Return values
boolisA()
Checks if given value is identical to the given token.
public
isA(string $value, int|string $token) : bool
Parameters
- $value : string
- $token : int|string
Return values
boolisNextToken()
Checks whether a given token matches the current lookahead.
public
isNextToken(T $type) : bool
Parameters
- $type : T
Tags
Return values
boolisNextTokenAny()
Checks whether any of the given tokens matches the current lookahead.
public
isNextTokenAny(array<int, T> $types) : bool
Parameters
- $types : array<int, T>
Tags
Return values
boolmoveNext()
moveNext
public
moveNext() : bool
Return values
boolpeek()
Moves the lookahead token forward.
public
peek() : Token<T, V>|null
Return values
Token<T, V>|null —The next token or NULL if there are no more tokens ahead.
reset()
Resets the lexer.
public
reset() : void
resetPeek()
Resets the peek pointer to 0.
public
resetPeek() : void
resetPosition()
Resets the lexer position on the input to the given position.
public
resetPosition([int $position = 0 ]) : void
Parameters
- $position : int = 0
-
Position to place the lexical scanner.
setInput()
Sets the input data to be tokenized.
public
setInput(string $input) : void
The Lexer is immediately reset and the new input tokenized. Any unprocessed tokens from any previous input are lost.
Parameters
- $input : string
-
The input to be tokenized.
skipUntil()
Tells the lexer to skip input tokens until it sees a token with the given value.
public
skipUntil(T $type) : void
Parameters
- $type : T
-
The token type to skip until.
startRecording()
public
startRecording() : void
stopRecording()
public
stopRecording() : void
getCatchablePatterns()
Lexical catchable patterns.
protected
getCatchablePatterns() : array<string|int, string>
Return values
array<string|int, string>getModifiers()
Regex modifiers
protected
getModifiers() : string
Return values
stringgetNonCatchablePatterns()
Lexical non-catchable patterns.
protected
getNonCatchablePatterns() : array<string|int, string>
Return values
array<string|int, string>getType()
Retrieve token type. Also processes the token value if necessary.
protected
getType(string &$value) : int
Parameters
- $value : string
Tags
Return values
intisInvalidChar()
protected
isInvalidChar(string $value) : bool
Parameters
- $value : string
Return values
boolisNullType()
protected
isNullType(string $value) : bool
Parameters
- $value : string
Return values
boolisUTF8Invalid()
protected
isUTF8Invalid(string $value) : bool
Parameters
- $value : string
Return values
boolisValid()
protected
isValid(string $value) : bool
Parameters
- $value : string
Return values
boolscan()
Scans the input string for tokens.
protected
scan(string $input) : void
Parameters
- $input : string
-
A query string.