Fun with the tokenizer...

I was reminded, this past week, of how cool the tokenizer is.

One of the guys who works in the same office as I do had what seemed to be a simple problem: he had a php file that contained ~50 functions, and wanted to summarize the API without parsing through the file, manually, and cutting out the function declarations.

We introduced him to in-line phpdoc blocks (he works (as a Jr.-level PHP developer) in the same office, but for a different company, so he doesn't have to follow our coding standards, but I digress..), but the 50-function library in question didn't have docblocks.

Sure, he could (and did) pull up a list function NAMES with get_defined_functions (I assume by using array_diff against a before-and-after capture), but this didn't give him the argument names, or even the number of arguments for a given function, so I broke out some old tokenizer code I'd written.

In case you aren't familiar with the tokenizer, the PHP manual defines it as:

“[an interface to let you write] your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.”

The extension (which has been part of the PHP core distribution since 4.3.0) consists only of two functions: token_get_all and token_name, and a boatload of constants.

Enough babble, though, let's get to the meat. I pulled out this code I'd written for PEARClops (on EFNet #PEAR) that parses PHP source files and figures out what classes, functions/methods and associated parameters are included.

[php] $tokens[$i][1], 'class' => $currClass, ); } else { $thisFunc['params'][] = array( 'byRef' => $nextByRef, 'name' => $tokens[$i][1], ); $nextByRef = FALSE; } } elseif ($tokens[$i] == '&') { $nextByRef = TRUE; } elseif ($tokens[$i] == '=') { while (!in_array($tokens[++$i], array(')',','))) { if ($tokens[$i][0] != T_WHITESPACE) { break; } } $thisFunc['params'][count($thisFunc['params']) - 1]['default'] = $tokens[$i][1]; } } $funcs[] = $thisFunc; } elseif ($tokens[$i] == '{') { ++$classDepth; } elseif ($tokens[$i] == '}') { --$classDepth; }

if ($classDepth == 0) { $currClass = ''; } }

return $funcs; }

function parse_protos($funcs) { $protos = array(); foreach ($funcs AS $funcData) { $proto = ''; if ($funcData['class']) { $proto .= $funcData['class']; $proto .= '::'; } $proto .= $funcData['name']; $proto .= '('; if ($funcData['params']) { $isFirst = TRUE; foreach ($funcData['params'] AS $param) { if ($isFirst) { $isFirst = FALSE; } else { $proto .= ', '; } if ($param['byRef']) { $proto .= '&'; } $proto .= $param['name']; } } $proto .= ")"; $protos[] = $proto; } return $protos; } echo "Functions in {$_SERVER['argv'][1]}:\n"; foreach (parse_protos(get_protos($_SERVER['argv'][1])) AS $proto) { echo " $proto\n"; } ?> [/php]

Save it as "parse_funcs.php" (or whatever you like) and call it like so: php parse_funcs.php /path/to/php_file

For instance: [code] sean@iconoclast:~/php/scripts$ php token_funcs_cli.php ~/php/cvs/Mail_Mime/mime.php Functions in /home/sean/php/cvs/Mail_Mime/mime.php: Mail_mime::Mail_mime($crlf) Mail_mime::__wakeup() Mail_mime::setTXTBody($data, $isfile, $append) Mail_mime::setHTMLBody($data, $isfile) Mail_mime::addHTMLImage($file, $c_type, $name, $isfilename) Mail_mime::addAttachment($file, $c_type, $name, $isfilename, $encoding) Mail_mime::_file2str(&$file_name) Mail_mime::_addTextPart(&$obj, $text) Mail_mime::_addHtmlPart(&$obj) Mail_mime::_addMixedPart() Mail_mime::_addAlternativePart(&$obj) Mail_mime::_addRelatedPart(&$obj) Mail_mime::_addHtmlImagePart(&$obj, $value) Mail_mime::_addAttachmentPart(&$obj, $value) Mail_mime::get(&$build_params) Mail_mime::headers(&$xtra_headers) Mail_mime::txtHeaders($xtra_headers) Mail_mime::setSubject($subject) Mail_mime::setFrom($email) Mail_mime::addCc($email) Mail_mime::addBcc($email) Mail_mime::_encodeHeaders($input) Mail_mime::_setEOL($eol) [/code]

Not bad, huh?

There are some not-so-obvious bugs (inheritance, mostly), but for a relatively short script, it does a pretty good job.

 1

User

You are logged in as Anonymous.

Want to log out?

My friend Paul has a cool service called Wonderproxy that lets you test and develop GeoIP-based apps without the normal headaches. If you need to simulate remote, international traffic, you should check it out.

Clicky Web Analytics