HTMLPurifier_Lexer_PH5P
extends HTMLPurifier_Lexer_DOMLex
in package
Experimental HTML5-based parser using Jeroen van der Meer's PH5P library.
Occupies space in the HTML5 pseudo-namespace, which may cause conflicts.
Tags
Table of Contents
Properties
- $tracksLineNumbers : mixed
- Whether or not this lexer implements line-number/column-number tracking.
- $_special_entity2str : mixed
- Most common entity to raw value conversion table for special entities.
- $_entity_parser : mixed
- $factory : mixed
Methods
- __construct() : mixed
- callbackArmorCommentEntities() : string
- Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them
- callbackUndoCommentSubst() : string
- Callback function for undoing escaping of stray angled brackets in comments
- create() : HTMLPurifier_Lexer
- Retrieves or sets the default Lexer as a Prototype Factory.
- extractBody() : mixed
- Takes a string of HTML (fragment or document) and returns the content
- muteErrorHandler() : mixed
- An error handler that mutes all errors
- normalize() : string
- Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
- parseAttr() : mixed
- parseData() : string
- Parses special entities into the proper characters.
- parseText() : mixed
- tokenizeHTML() : array<string|int, HTMLPurifier_Token>
- Lexes an HTML string into tokens.
- CDATACallback() : string
- Callback function for escapeCDATA() that does the work.
- createEndNode() : mixed
- createStartNode() : bool
- escapeCDATA() : string
- Translates CDATA sections into regular sections (through escaping).
- escapeCommentedCDATA() : string
- Special CDATA case that is especially convoluted for <script>
- getData() : mixed
- Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6
- getTagName() : mixed
- Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6
- removeIEConditional() : string
- Special Internet Explorer conditional comments should be removed.
- tokenizeDOM() : mixed
- Iterative function that tokenizes a node, putting it into an accumulator.
- transformAttrToAssoc() : array<string|int, mixed>
- Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
- wrapHTML() : string
- Wraps an HTML fragment in the necessary HTML
Properties
$tracksLineNumbers
Whether or not this lexer implements line-number/column-number tracking.
public
mixed
$tracksLineNumbers
= \false
If it does, set to true.
$_special_entity2str
Most common entity to raw value conversion table for special entities.
protected
mixed
$_special_entity2str
= array('"' => '"', '&' => '&', '<' => '<', '>' => '>', ''' => "'", ''' => "'", ''' => "'")
Tags
$_entity_parser
private
mixed
$_entity_parser
Tags
$factory
private
mixed
$factory
Tags
Methods
__construct()
public
__construct() : mixed
callbackArmorCommentEntities()
Callback function that entity-izes ampersands in comments so that callbackUndoCommentSubst doesn't clobber them
public
callbackArmorCommentEntities(array<string|int, mixed> $matches) : string
Parameters
- $matches : array<string|int, mixed>
Return values
stringcallbackUndoCommentSubst()
Callback function for undoing escaping of stray angled brackets in comments
public
callbackUndoCommentSubst(array<string|int, mixed> $matches) : string
Parameters
- $matches : array<string|int, mixed>
Return values
stringcreate()
Retrieves or sets the default Lexer as a Prototype Factory.
public
static create(HTMLPurifier_Config $config) : HTMLPurifier_Lexer
By default HTMLPurifier_Lexer_DOMLex will be returned. There are a few exceptions involving special features that only DirectLex implements.
Parameters
- $config : HTMLPurifier_Config
Tags
Return values
HTMLPurifier_LexerextractBody()
Takes a string of HTML (fragment or document) and returns the content
public
extractBody(mixed $html) : mixed
Parameters
- $html : mixed
Tags
muteErrorHandler()
An error handler that mutes all errors
public
muteErrorHandler(int $errno, string $errstr) : mixed
Parameters
- $errno : int
- $errstr : string
normalize()
Takes a piece of HTML and normalizes it by converting entities, fixing encoding, extracting bits, and other good stuff.
public
normalize(string $html, HTMLPurifier_Config $config, HTMLPurifier_Context $context) : string
Parameters
- $html : string
-
HTML.
- $config : HTMLPurifier_Config
- $context : HTMLPurifier_Context
Tags
Return values
stringparseAttr()
public
parseAttr(mixed $string, mixed $config) : mixed
Parameters
- $string : mixed
- $config : mixed
parseData()
Parses special entities into the proper characters.
public
parseData(string $string, mixed $is_attr, mixed $config) : string
This string will translate escaped versions of the special characters into the correct ones.
Parameters
- $string : string
-
String character data to be parsed.
- $is_attr : mixed
- $config : mixed
Return values
string —Parsed character data.
parseText()
public
parseText(mixed $string, mixed $config) : mixed
Parameters
- $string : mixed
- $config : mixed
tokenizeHTML()
Lexes an HTML string into tokens.
public
tokenizeHTML(string $html, HTMLPurifier_Config $config, HTMLPurifier_Context $context) : array<string|int, HTMLPurifier_Token>
Parameters
- $html : string
- $config : HTMLPurifier_Config
- $context : HTMLPurifier_Context
Return values
array<string|int, HTMLPurifier_Token>CDATACallback()
Callback function for escapeCDATA() that does the work.
protected
static CDATACallback(array<string|int, mixed> $matches) : string
Parameters
- $matches : array<string|int, mixed>
-
PCRE matches array, with index 0 the entire match and 1 the inside of the CDATA section.
Tags
Return values
string —Escaped internals of the CDATA section.
createEndNode()
protected
createEndNode(DOMNode $node, array<string|int, HTMLPurifier_Token> &$tokens) : mixed
Parameters
- $node : DOMNode
- $tokens : array<string|int, HTMLPurifier_Token>
createStartNode()
protected
createStartNode(DOMNode $node, array<string|int, HTMLPurifier_Token> &$tokens, bool $collect, mixed $config) : bool
Parameters
- $node : DOMNode
-
DOMNode to be tokenized.
- $tokens : array<string|int, HTMLPurifier_Token>
-
Array-list of already tokenized tokens.
- $collect : bool
-
Says whether or start and close are collected, set to false at first recursion because it's the implicit DIV tag you're dealing with.
- $config : mixed
Tags
Return values
bool —if the token needs an endtoken
escapeCDATA()
Translates CDATA sections into regular sections (through escaping).
protected
static escapeCDATA(string $string) : string
Parameters
- $string : string
-
HTML string to process.
Return values
string —HTML with CDATA sections escaped.
escapeCommentedCDATA()
Special CDATA case that is especially convoluted for <script>
protected
static escapeCommentedCDATA(string $string) : string
Parameters
- $string : string
-
HTML string to process.
Return values
string —HTML with CDATA sections escaped.
getData()
Portably retrieve the data of a node; deals with older versions of libxml like 2.7.6
protected
getData(DOMNode $node) : mixed
Parameters
- $node : DOMNode
getTagName()
Portably retrieve the tag name of a node; deals with older versions of libxml like 2.7.6
protected
getTagName(DOMNode $node) : mixed
Parameters
- $node : DOMNode
removeIEConditional()
Special Internet Explorer conditional comments should be removed.
protected
static removeIEConditional(string $string) : string
Parameters
- $string : string
-
HTML string to process.
Return values
string —HTML with conditional comments removed.
tokenizeDOM()
Iterative function that tokenizes a node, putting it into an accumulator.
protected
tokenizeDOM(DOMNode $node, array<string|int, HTMLPurifier_Token> &$tokens, mixed $config) : mixed
To iterate is human, to recurse divine - L. Peter Deutsch
Parameters
- $node : DOMNode
-
DOMNode to be tokenized.
- $tokens : array<string|int, HTMLPurifier_Token>
-
Array-list of already tokenized tokens.
- $config : mixed
transformAttrToAssoc()
Converts a DOMNamedNodeMap of DOMAttr objects into an assoc array.
protected
transformAttrToAssoc(DOMNamedNodeMap $node_map) : array<string|int, mixed>
Parameters
- $node_map : DOMNamedNodeMap
-
DOMNamedNodeMap of DOMAttr objects.
Return values
array<string|int, mixed> —Associative array of attributes.
wrapHTML()
Wraps an HTML fragment in the necessary HTML
protected
wrapHTML(string $html, HTMLPurifier_Config $config, HTMLPurifier_Context $context[, mixed $use_div = true ]) : string
Parameters
- $html : string
- $config : HTMLPurifier_Config
- $context : HTMLPurifier_Context
- $use_div : mixed = true