HTMLPurifier_Lexer_DirectLex
extends HTMLPurifier_Lexer
in package
Our in-house implementation of a parser.
A pure PHP parser, DirectLex has absolutely no dependencies, making it a reasonably good default for PHP4. Written with efficiency in mind, it can be four times faster than HTMLPurifier_Lexer_PEARSax3, although it pales in comparison to HTMLPurifier_Lexer_DOMLex.
Tags
Table of Contents
Properties
- $tracksLineNumbers : mixed
- Whether or not this lexer implements line-number/column-number tracking.
- $_special_entity2str : mixed
- Most common entity to raw value conversion table for special entities.
- $_whitespace : mixed
- Whitespace characters for str(c)spn.
- $_entity_parser : mixed
Methods
- __construct() : mixed
- create() : HTMLPurifier_Lexer
- Retrieves or sets the default Lexer as a Prototype Factory.
- extractBody() : mixed
- Takes a string of HTML (fragment or document) and returns the content
- normalize() : string
- Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
- parseAttr() : mixed
- parseAttributeString() : array<string|int, mixed>
- Takes the inside of an HTML tag and makes an assoc array of attributes.
- parseData() : string
- Parses special entities into the proper characters.
- parseText() : mixed
- tokenizeHTML() : array<string|int, mixed>|array<string|int, HTMLPurifier_Token>
- Lexes an HTML string into tokens.
- CDATACallback() : string
- Callback function for escapeCDATA() that does the work.
- escapeCDATA() : string
- Translates CDATA sections into regular sections (through escaping).
- escapeCommentedCDATA() : string
- Special CDATA case that is especially convoluted for <script>
- removeIEConditional() : string
- Special Internet Explorer conditional comments should be removed.
- scriptCallback() : string
- Callback function for script CDATA fudge
- substrCount() : int
- PHP 5.0.x compatible substr_count that implements offset and length
Properties
$tracksLineNumbers
Whether or not this lexer implements line-number/column-number tracking.
public
mixed
$tracksLineNumbers
= \true
Tags
$_special_entity2str
Most common entity to raw value conversion table for special entities.
protected
mixed
$_special_entity2str
= array('"' => '"', '&' => '&', '<' => '<', '>' => '>', ''' => "'", ''' => "'", ''' => "'")
Tags
$_whitespace
Whitespace characters for str(c)spn.
protected
mixed
$_whitespace
= " \t\r\n"
Tags
$_entity_parser
private
mixed
$_entity_parser
Tags
Methods
__construct()
public
__construct() : mixed
create()
Retrieves or sets the default Lexer as a Prototype Factory.
public
static create(HTMLPurifier_Config $config) : HTMLPurifier_Lexer
By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.
Parameters
- $config : HTMLPurifier_Config
Tags
Return values
HTMLPurifier_LexerextractBody()
Takes a string of HTML (fragment or document) and returns the content
public
extractBody(mixed $html) : mixed
Parameters
- $html : mixed
Tags
normalize()
Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
public
normalize(string $html, HTMLPurifier_Config $config, HTMLPurifier_Context $context) : string
Parameters
- $html : string
-
HTML.
- $config : HTMLPurifier_Config
- $context : HTMLPurifier_Context
Tags
Return values
stringparseAttr()
public
parseAttr(mixed $string, mixed $config) : mixed
Parameters
- $string : mixed
- $config : mixed
parseAttributeString()
Takes the inside of an HTML tag and makes an assoc array of attributes.
public
parseAttributeString(string $string, HTMLPurifier_Config $config, HTMLPurifier_Context $context) : array<string|int, mixed>
Parameters
- $string : string
-
Inside of tag excluding name.
- $config : HTMLPurifier_Config
- $context : HTMLPurifier_Context
Return values
array<string|int, mixed> —Assoc array of attributes.
parseData()
Parses special entities into the proper characters.
public
parseData(string $string, mixed $is_attr, mixed $config) : string
This string will translate escaped versions of the special characters into the correct ones.
Parameters
- $string : string
-
String character data to be parsed.
- $is_attr : mixed
- $config : mixed
Return values
string —Parsed character data.
parseText()
public
parseText(mixed $string, mixed $config) : mixed
Parameters
- $string : mixed
- $config : mixed
tokenizeHTML()
Lexes an HTML string into tokens.
public
tokenizeHTML(string $html, HTMLPurifier_Config $config, HTMLPurifier_Context $context) : array<string|int, mixed>|array<string|int, HTMLPurifier_Token>
Parameters
- $html : string
- $config : HTMLPurifier_Config
- $context : HTMLPurifier_Context
Return values
array<string|int, mixed>|array<string|int, HTMLPurifier_Token>CDATACallback()
Callback function for escapeCDATA() that does the work.
protected
static CDATACallback(array<string|int, mixed> $matches) : string
Parameters
- $matches : array<string|int, mixed>
-
PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.
Tags
Return values
string —Escaped internals of the CDATA section.
escapeCDATA()
Translates CDATA sections into regular sections (through escaping).
protected
static escapeCDATA(string $string) : string
Parameters
- $string : string
-
HTML string to process.
Return values
string —HTML with CDATA sections escaped.
escapeCommentedCDATA()
Special CDATA case that is especially convoluted for <script>
protected
static escapeCommentedCDATA(string $string) : string
Parameters
- $string : string
-
HTML string to process.
Return values
string —HTML with CDATA sections escaped.
removeIEConditional()
Special Internet Explorer conditional comments should be removed.
protected
static removeIEConditional(string $string) : string
Parameters
- $string : string
-
HTML string to process.
Return values
string —HTML with conditional comments removed.
scriptCallback()
Callback function for script CDATA fudge
protected
scriptCallback(array<string|int, mixed> $matches) : string
Parameters
- $matches : array<string|int, mixed>
-
, in form of array(opening tag, contents, closing tag)
Return values
stringsubstrCount()
PHP 5.0.x compatible substr_count that implements offset and length
protected
substrCount(string $haystack, string $needle, int $offset, int $length) : int
Parameters
- $haystack : string
- $needle : string
- $offset : int
- $length : int