The lexer is responsible for providing tokens to the parser. The project comes with two lexers: PHPParser_Lexer and
PHPParser_Lexer_Emulative. The latter is an extension of the former, which adds the ability to emulate tokens of
newer PHP versions and thus allows parsing of new code on older versions.
A lexer has to define the following public interface:
startLexing($code);
getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null);
handleHaltCompiler();
The startLexing method is invoked when the parse() method of the parser is called. It's argument will be whatever
was passed to the parse() method.
Even though startLexing is meant to accept a source code string, you could for example overwrite it to accept a file:
<?php
class FileLexer extends PHPParser_Lexer {
    public function startLexing($fileName) {
        if (!file_exists($fileName)) {
            throw new InvalidArgumentException(sprintf('File "%s" does not exist', $fileName));
        }
        parent::startLexing(file_get_contents($fileName));
    }
}
$parser = new PHPParser_Parser(new FileLexer);
var_dump($parser->parse('someFile.php'));
var_dump($parser->parse('someOtherFile.php'));
getNextToken returns the ID of the next token and sets some additional information in the three variables which it
accepts by-ref. If no more tokens are available it has to return 0, which is the ID of the EOF token.
The first by-ref variable $value should contain the textual content of the token. It is what will be available as $1
etc in the parser.
The other two by-ref variables $startAttributes and $endAttributes define which attributes will eventually be
assigned to the generated nodes: The parser will take the $startAttributes from the first token which is part of the
node and the $endAttributes from the last token that is part of the node.
E.g. if the tokens T_FUNCTION T_STRING ... '{' ... '}' constitute a node, then the $startAttributes from the
T_FUNCTION token will be taken and the $endAttributes from the '}' token.
By default the lexer creates the attributes startLine, comments (both part of $startAttributes) and endLine
(part of $endAttributes).
If you don't want all these attributes to be added (to reduce memory usage of the AST) you can simply remove them by overriding the method:
<?php
class LessAttributesLexer extends PHPParser_Lexer {
    public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) {
        $tokenId = parent::getNextToken($value, $startAttributes, $endAttributes);
        // only keep startLine attribute
        unset($startAttributes['comments']);
        unset($endAttributes['endLine']);
        return $tokenId;
    }
}
You can obviously also add additional attributes. E.g. in conjunction with the above FileLexer you might want to add
a fileName attribute to all nodes:
<?php
class FileLexer extends PHPParser_Lexer {
    protected $fileName;
    public function startLexing($fileName) {
        if (!file_exists($fileName)) {
            throw new InvalidArgumentException(sprintf('File "%s" does not exist', $fileName));
        }
        $this->fileName = $fileName;
        parent::startLexing(file_get_contents($fileName));
    }
    public function getNextToken(&$value = null, &$startAttributes = null, &$endAttributes = null) {
        $tokenId = parent::getNextToken($value, $startAttributes, $endAttributes);
        // we could use either $startAttributes or $endAttributes here, because the fileName is always the same
        // (regardless of whether it is the start or end token). We choose $endAttributes, because it is slightly
        // more efficient (as the parser has to keep a stack for the $startAttributes).
        $endAttributes['fileName'] = $fileName;
        return $tokenId;
    }
}
The method is invoked whenever a T_HALT_COMPILER token is encountered. It has to return the remaining string after the
construct (not including ();).