I was reminded, this past week, of how cool the [url=http://php.net/tokenizer]tokenizer[/url] is.
One of the guys who works in the same office as I do had what seemed to be a simple problem: he had a php file that contained ~50 functions, and wanted to summarize the API without parsing through the file, manually, and cutting out the function declarations.
We introduced him to [url=http://www.phpdoc.org/]in-line phpdoc blocks[/url] (he works (as a Jr.-level PHP developer) in the same office, but for a different company, so he doesn't have to follow our coding standards, but I digress..), but the 50-function library in question didn't have docblocks.
Sure, he could (and did) pull up a list function NAMES with [url=http://php.net/get_defined_functions]get_defined_functions[/url] (I assume by using [url=http://php.net/array_diff]array_diff[/url] against a before-and-after capture), but this didn't give him the argument names, or even the number of arguments for a given function, so I broke out some old tokenizer code I'd written.
In case you aren't familiar with the tokenizer, the PHP manual defines it as:
[quote][an interface to let you write] your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.[/quote]
The extension (which has been part of the PHP core distribution since 4.3.0) consists only of two functions: [url=http://php.net/token_get_all]token_get_all[/url] and [url=http://php.net/token_name]token_name[/url], and a [url=http://php.net/tokens]boatload of constants[/url].
Enough babble, though, let's get to the meat. I pulled out this code I'd written for PEARClops (on EFNet #PEAR) that parses PHP source files and figures out what classes, functions/methods and associated parameters are included.
[php]
function get_protos($in)
{
if (is_file(realpath($in)))
{
$in = file_get_contents($in);
}
$tokens = token_get_all($in);
$funcs = array();
$currClass = '';
$classDepth = 0;
for ($i=0; $i {
if (is_array($tokens[$i]) && $tokens[$i][0] == T_CLASS)
{
++$i; // whitespace;
$currClass = $tokens[++$i][1];
while ($tokens[++$i] != '{') {}
++$i;
$classDepth = 1;
continue;
}
elseif (is_array($tokens[$i]) && $tokens[$i][0] == T_FUNCTION)
{
$nextByRef = FALSE;
$thisFunc = array();
while ($tokens[++$i] != ')')
{
if (is_array($tokens[$i]) && $tokens[$i][0] != T_WHITESPACE)
{
if (!$thisFunc)
{
$thisFunc = array(
'name' => $tokens[$i][1],
'class' => $currClass,
);
}
else
{
$thisFunc['params'][] = array(
'byRef' => $nextByRef,
'name' => $tokens[$i][1],
);
$nextByRef = FALSE;
}
}
elseif ($tokens[$i] == '&')
{
$nextByRef = TRUE;
}
elseif ($tokens[$i] == '=')
{
while (!in_array($tokens[++$i], array(')',',')))
{
if ($tokens[$i][0] != T_WHITESPACE)
{
break;
}
}
$thisFunc['params'][count($thisFunc['params']) - 1]['default'] = $tokens[$i][1];
}
}
$funcs[] = $thisFunc;
}
elseif ($tokens[$i] == '{')
{
++$classDepth;
}
elseif ($tokens[$i] == '}')
{
--$classDepth;
}
if ($classDepth == 0)
{
$currClass = '';
}
}
return $funcs;
}
function parse_protos($funcs)
{
$protos = array();
foreach ($funcs AS $funcData)
{
$proto = '';
if ($funcData['class'])
{
$proto .= $funcData['class'];
$proto .= '::';
}
$proto .= $funcData['name'];
$proto .= '(';
if ($funcData['params'])
{
$isFirst = TRUE;
foreach ($funcData['params'] AS $param)
{
if ($isFirst)
{
$isFirst = FALSE;
}
else
{
$proto .= ', ';
}
if ($param['byRef'])
{
$proto .= '&';
}
$proto .= $param['name'];
}
}
$proto .= ")";
$protos[] = $proto;
}
return $protos;
}
echo "Functions in {$_SERVER['argv'][1]}:\n";
foreach (parse_protos(get_protos($_SERVER['argv'][1])) AS $proto)
{
echo " $proto\n";
}
?>[/php]
Save it as "parse_funcs.php" (or whatever you like) and call it like so:
php parse_funcs.php /path/to/php_file
For instance:
[code]
sean@iconoclast:~/php/scripts$ php token_funcs_cli.php ~/php/cvs/Mail_Mime/mime.php
Functions in /home/sean/php/cvs/Mail_Mime/mime.php:
Mail_mime::Mail_mime($crlf)
Mail_mime::__wakeup()
Mail_mime::setTXTBody($data, $isfile, $append)
Mail_mime::setHTMLBody($data, $isfile)
Mail_mime::addHTMLImage($file, $c_type, $name, $isfilename)
Mail_mime::addAttachment($file, $c_type, $name, $isfilename, $encoding)
Mail_mime::_file2str(&$file_name)
Mail_mime::_addTextPart(&$obj, $text)
Mail_mime::_addHtmlPart(&$obj)
Mail_mime::_addMixedPart()
Mail_mime::_addAlternativePart(&$obj)
Mail_mime::_addRelatedPart(&$obj)
Mail_mime::_addHtmlImagePart(&$obj, $value)
Mail_mime::_addAttachmentPart(&$obj, $value)
Mail_mime::get(&$build_params)
Mail_mime::headers(&$xtra_headers)
Mail_mime::txtHeaders($xtra_headers)
Mail_mime::setSubject($subject)
Mail_mime::setFrom($email)
Mail_mime::addCc($email)
Mail_mime::addBcc($email)
Mail_mime::_encodeHeaders($input)
Mail_mime::_setEOL($eol)
[/code]
Not bad, huh?
There are some not-so-obvious bugs (inheritance, mostly), but for a relatively short script, it does a pretty good job.
S