PHP Pie?

I've often had to manipulate large blobs of text—no, make that many files containing large blobs of text.

Of course, my IDE can usually handle simple search-and-replace operations, I appreciate the simplicity of the command line interface, on most occasions.

That's one of the reasons I love working in a unixy environment, I think. There's a bunch of utilities that embrace the command line and take simple input and deliver equally simple output. I've employed sed and awk, in the past, and I still use them to perform some very simple parsing. For example, I can often be found doing something like ps auxwww | grep ssh | awk {'print $2'} to get a list of ssh process IDs, for example.

But almost anyone who's ever been enlightened to perl pie delights in its power. In a nutshell, I can do something like perl -p -i -e 's/foo/oof/g' somefile from the command line, and perl will digest every line of somefile and perform the substitution. Perl is very well suited to this type of operation, what with its contextual variables and all.

I updated the code a little, below. You now must explicitly set $_.

Read on for my PHP-based solution (lest planet-php truncate my post).
I've often found myself looking for a PHP equivalent. Not to do simple substitutions, of course, but complex ones. And since I'm most comfortable with PHP, and a I have a huge library of snippets that I can dig out to quell a problem that I may have solved years ago, I've been meaning to fill this void for a while.

Tonight, I had to come home from a dinner party, early, because my daughter was sick. Too bad, it looked like it was going to be an amazing feast, but I digress. The home-on-a-Saturday-night time left me with a bit of free time to solve one of the problems that's been floating around in my head for who-knows-how-long.

Thus, I'm happy to present my—at least mostly—working PHP pie script.

#!/usr/bin/php

<?php



// Change the shebang line above to point at your actual PHP interpreter



$interpreter = array_shift($_SERVER['argv']);

$script = array_shift($_SERVER['argv']);

$files = array_filter($_SERVER['argv']);



if (!$script) {

        fwrite(STDERR, "Usage: $interpreter <script> [files]\n");

        fwrite(STDERR, "  Iterates script over every line of every file.\n");

        fwrite(STDERR, "  \$_ contains data from the current line.\n");

        fwrite(STDERR, "  If files are not provided, STDIN/STDOUT will be used.\n");

        fwrite(STDERR, "\n");

        fwrite(STDERR, "  Example: ./pie.php '$_ = preg_replace(\"/foo/\",\"oof\",\$_);' testfile\n");

        fwrite(STDERR, "    Replaces every instance of 'foo' with 'oof' in testfile\n");

        fwrite(STDERR, "\n");

        exit(1);

}



// set up function

$func = create_function('$_', $script .';return $_;');



if (!$files) {

        // no files, use STDIN

        $buf = '';

        while (!feof(STDIN)) {

                $buf .= $func(fgets(STDIN));

        }

        echo $buf;

} else {

        foreach ($files as $f) {

               

                if (!is_dir($f) or !is_writable($f)) {

                        fwrite(STDERR, "Can't write to $f (or it's not a file)\n");

                        continue;

                }

               

                $buf = '';

                foreach (file($f) as $l) {

                        $buf .= $func($l);

                }

                file_put_contents($f, $buf);

        }

}



?>

Hope it helps someone out there.

Update: I've had some people ask me why I'm reinventing the wheel. I did cover this above—I have plenty of existing PHP code snippets, and almost no perl. I also am very comfortable in PHP, but it's been years since I've been comfortable in perl.

Here's an example of something I hacked up, today. I can (relatively) easily turn this:

dmesg | tail -n5

... which returns this:

[17214721.004000] sdc: assuming drive cache: write through

[17214721.004000]  sdc: sdc1

[17214721.024000] sd 7:0:0:0: Attached scsi disk sdc

[17214721.024000] sd 7:0:0:0: Attached scsi generic sg1 type 0

[17214722.464000] FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!

(the first field is the time since boot... useless for my feeble human brain)

into:

dmesg | ./pie.php 'static $prev = false; static $boot = false; if (!$boot) {

list($boot) = explode(" ", file_get_contents("/proc/uptime"));

$boot = time() - (int) $boot;} if (!$_) return; list($ts, $log) = explode(" ", $_, 2);

$ts = str_replace(array("[","]"), array("",""), $ts); $_ = date("H:i:s", $boot + $ts);

if ($prev && ($diff = round($boot + $ts - $prev, 2))) $_ .= " (+". $diff .")";

$_ .= " ".$log; $prev = $boot + $ts;' | tail -n 5

(line breaks added for easier reading)
... which returns:

17:07:44 sdc: assuming drive cache: write through

17:07:44  sdc: sdc1

17:07:44 (+0.02) sd 7:0:0:0: Attached scsi disk sdc

17:07:44 sd 7:0:0:0: Attached scsi generic sg1 type 0

17:07:45 (+1.44) FAT: utf8 is not a recommended IO charset for FAT filesystems, filesystem will be case sensitive!

That's the sort of thing I wouldn't be comfortable doing in perl, but I hacked up on the command line in PHP.

S

($var == TRUE) or (TRUE == $var)?

Interesting little trick I picked up a while back, been meaning to blog about it. If you're already in the loop, run along.

Prior to enlightenment, I used to write conditionals something like this:
[php]
if ($var == SOME_CONSTANT_CONDITION) {
// do something
}
[/php]

... more specifically:
[php]
if ($var == TRUE) {
// do the true thing
}
[/php]

That's how I'd "say" it, so that's how I wrote it.
But is it the best way? I now don't think so.
When reviewing other peoples' code (often from C programmers), I've seen "backwards" conditionals.. something like:
[php]
if (TRUE == $var) {
// ...
}
[/php]

Which just sounds weird. Why would you compare a constant to a variable (you'd normally compare a variable to a constant).

So, what's the big deal?

Well, a few months back, I stumbled on [url=http://www.theregister.co.uk/2003/11/07/linux_kernel_backdoor_blocked/]an old article about a backdoor almost sneaking into Linux[/url].

Here's the almost-break:
[code]
if ((options == (__WCLONE|__WALL)) && (current->uid = 0))
retval = -EINVAL;
[/code]

Ignore the constants, I don't know what they mean either. The interesting part is "current->uid = 0"

See, unless you had your eyes peeled, here, it might look like you're trying to ensure that current->uid is equal to 0 (uid 0 = root on Linux). So, if options blah blah, AND the user is root, then do something.

But wait. There's only a single equals sign. The comparison is "==". "=" is for assignment!

Fortunately, someone with good eyes noticed, and Linux is safe (if this had made it into a release, it would've been trivial to escalate your privileges to the root level).. but how many times have you had this happen to you? I'm guilty of accidentally using "=" when I mean "==". And it's hard to track down this bug.. it doesn't LOOK wrong, and the syntax is right, so...

This is nothing new. Everyone knows the = vs == problem. Everyone is over it (most of the time). But how can we reduce this problem?

A simple coding style adjustment can help enormously here.

Consider changing "$var == TRUE" to "TRUE == $var".

Why?
Simple:
[code]
sean@iconoclast:~$ php -r '$a = 0; if (FALSE = $a) $b = TRUE;'
Parse error: parse error in Command line code on line 1
[/code]

Of course, you can't ASSIGN $a to the constant FALSE. The same style applied above would've caused a a similar error in the C linux kernel code:
[code]
if ((options == (__WCLONE|__WALL)) && (0 = current->uid ))
[/code]

Obviously, "0" is a constant value--you cannot assign a value to it. The missing "=" would've popped up right away.

Cool.
Seems a little awkward at first, but in practice, it make sense.

HTH.

S

mail() replacement -- a better hack

This morning, I read Davey's [url=http://pixelated-dreams.com/archives/154-mail-without-sendmail-on-linux-a-hack.html]post[/url] about how to compile PHP in a way that allows you ro specify your own mail() function. This is kind of a cool hack, but I've been using a different approach for a while, now, that allows much better control. Read on if you're interested.
Davey's hack, if you didn't read his post, yet, centers around defining your OWN mail function, after you have instructed PHP not to build the default one.

[i]My[/i] hack doesn't require editing of the PHP source, or even a recompile. It doesn't require an auto-prepend, either, but it does require a small change to php.ini.

So, where's the magic? It lies in the [url=http://php.net/ref.mail#ini.sendmail-path]sendmail_path directive[/url].

When it comes to mail() (as well as many other things), PHP prefers to delegate the heavy lifting to another piece of software: sendmail (or a sendmail compatible command-line mail transport agent). By default, PHP will call your sendmail binary, and pass it the entire message, after composing it from the headers and body supplied by the developer.

One of the side-benefits to this system is the ability to override PHP's default, and seamlessly hook in your own sendmailesque binary or script.

Here's an example from one of my development environments:
[code]
sendmail_path=/usr/local/bin/logmail
sean@sarcosm:~$ cat /usr/local/bin/logmail
cat >> /tmp/logmail.log
[/code]

This little bit of config & code is [b]extremely[/b] useful in a non-production environment. How many of us have accidentally sent emails to actual customers from the development server? This little bit of trickery avoids this, and instead of sending the email (as PHP normally would), mail is instead logged to the /tmp/logmail.log file. Disaster avoided.

But, that file gets pretty big over time... it becomes unmanageable very quickly. So, in a different environment, I have an alternative:
[code]
sendmail_path=/usr/local/bin/trapmail
sean@sarcosm:~$ cat /usr/local/bin/trapmail
formail -R cc X-original-cc \
-R to X-original-to \
-R bcc X-original-bcc \
-f -A"To: devteam@example.com" \
| /usr/sbin/sendmail -t -i
[/code]

And what does this do? It traps all mail that would normall go OUT (say, to a customer), and instead, delivers it to devteam@example.com (with the original fields renamed for debugging purposes).

So, how does all of this solve Davey's problem?
This is something I whipped up after work, today, so it's pretty new code that likely has a few bugs lurking in it, but it's a good start:
sendmail_path=/usr/local/bin/mail_proxy.php
[php]
#!/usr/local/bin/php5

//---CONFIG
$config = array(
'host' => 'localhost',
'port' => 25,
'auth' => FALSE,
);
$logDir = '/www/logs/mail';
$logFile = 'mail_proxy.log';
$failPrefix = 'fail_';
$EOL = "\n"; // change to \r\n if you send broken mail
$defaultFrom = '"example.net Webserver" ';
//---END CONFIG

if (!$log = fopen("{$logDir}/{$logFile}", 'a')) {
die("ERROR: cannot open log file!\n");
}

require('Mail.php'); // PEAR::Mail
if (PEAR::isError($Mailer = Mail::factory('SMTP', $config))) {
fwrite($log, ts() . "Failed to create PEAR::Mail object\n");
fclose($log);
die();
}

// get headers/body
$stdin = fopen('php://stdin', 'r');
$in = '';
while (!feof($stdin)) {
$in .= fread($stdin, 1024); // read 1kB at a time
}

list ($headers, $body) = explode("$EOL$EOL", $in, 2);

$recipients = array();
$headers = explode($EOL, $headers);
$mailHdrs = array();
$lastHdr = false;
$recipFields = array('to','cc','bcc');
foreach ($headers AS $h) {
if (!preg_match('/^[a-z]/i', $h)) {
if ($lastHdr) {
$lastHdr .= "\n$h";
}
// skip this line, doesn't start with a letter
continue;
}
list($field, $val) = explode(': ', $h, 2);
if (isset($mailHdrs[$field])) {
$mailHdrs[$field] = (array) $mailHdrs[$field];
$mailHdrs[$field][] = $val;
} else {
$mailHdrs[$field] = $val;
}
if (in_array(strtolower($field), $recipFields)) {
if (preg_match_all('/[^ ;,]+@[^ ;,]+/', $val, $m)) {
$recipients = array_merge($recipients, $m[0]);;
}
}
}
if (!isset($mailHdrs['From'])) {
$mailHdrs['From'] = $defaultFrom;
}

$recipients = array_unique($recipients); // remove dupes

// send
if (PEAR::isError($send = $Mailer->send($recipients, $mailHdrs, $body))) {
$fn = uniqid($failPrefix);
file_put_contents("{$logDir}/{$fn}", $in);
fwrite($log, ts() ."Error sending mail: $fn (". $send->getMessage() .")\n");
$ret = 1; // fail
} else {
fwrite($log, ts() ."Mail sent ". count($recipients) ." recipients.\n");
$ret = 0; // success
}
fclose($log);
return $ret;

//////////////////////////////

function ts()
{
return '['. date('y.m.d H:i:s') .'] ';
}

?>[/php]

Voila. SMTP mail from a unix box that may or may not have a MTA (like sendmail) installed.

Don't forget to change the CONFIG block. Oh yeah, and ignore that first " ?php", it's my blog software trying to be "smart". The first line should begin with "#!"

S

Fun with the tokenizer...

I was reminded, this past week, of how cool the [url=http://php.net/tokenizer]tokenizer[/url] is.

One of the guys who works in the same office as I do had what seemed to be a simple problem: he had a php file that contained ~50 functions, and wanted to summarize the API without parsing through the file, manually, and cutting out the function declarations.

We introduced him to [url=http://www.phpdoc.org/]in-line phpdoc blocks[/url] (he works (as a Jr.-level PHP developer) in the same office, but for a different company, so he doesn't have to follow our coding standards, but I digress..), but the 50-function library in question didn't have docblocks.

Sure, he could (and did) pull up a list function NAMES with [url=http://php.net/get_defined_functions]get_defined_functions[/url] (I assume by using [url=http://php.net/array_diff]array_diff[/url] against a before-and-after capture), but this didn't give him the argument names, or even the number of arguments for a given function, so I broke out some old tokenizer code I'd written.

In case you aren't familiar with the tokenizer, the PHP manual defines it as:
[quote][an interface to let you write] your own PHP source analyzing or modification tools without having to deal with the language specification at the lexical level.[/quote]

The extension (which has been part of the PHP core distribution since 4.3.0) consists only of two functions: [url=http://php.net/token_get_all]token_get_all[/url] and [url=http://php.net/token_name]token_name[/url], and a [url=http://php.net/tokens]boatload of constants[/url].

Enough babble, though, let's get to the meat. I pulled out this code I'd written for PEARClops (on EFNet #PEAR) that parses PHP source files and figures out what classes, functions/methods and associated parameters are included.

[php]

function get_protos($in)
{
if (is_file(realpath($in)))
{
$in = file_get_contents($in);
}
$tokens = token_get_all($in);
$funcs = array();
$currClass = '';
$classDepth = 0;

for ($i=0; $i {
if (is_array($tokens[$i]) && $tokens[$i][0] == T_CLASS)
{
++$i; // whitespace;
$currClass = $tokens[++$i][1];
while ($tokens[++$i] != '{') {}
++$i;
$classDepth = 1;
continue;
}
elseif (is_array($tokens[$i]) && $tokens[$i][0] == T_FUNCTION)
{
$nextByRef = FALSE;
$thisFunc = array();

while ($tokens[++$i] != ')')
{
if (is_array($tokens[$i]) && $tokens[$i][0] != T_WHITESPACE)
{
if (!$thisFunc)
{
$thisFunc = array(
'name' => $tokens[$i][1],
'class' => $currClass,
);
}
else
{
$thisFunc['params'][] = array(
'byRef' => $nextByRef,
'name' => $tokens[$i][1],
);
$nextByRef = FALSE;
}
}
elseif ($tokens[$i] == '&')
{
$nextByRef = TRUE;
}
elseif ($tokens[$i] == '=')
{
while (!in_array($tokens[++$i], array(')',',')))
{
if ($tokens[$i][0] != T_WHITESPACE)
{
break;
}
}
$thisFunc['params'][count($thisFunc['params']) - 1]['default'] = $tokens[$i][1];
}
}
$funcs[] = $thisFunc;
}
elseif ($tokens[$i] == '{')
{
++$classDepth;
}
elseif ($tokens[$i] == '}')
{
--$classDepth;
}

if ($classDepth == 0)
{
$currClass = '';
}
}

return $funcs;
}

function parse_protos($funcs)
{
$protos = array();
foreach ($funcs AS $funcData)
{
$proto = '';
if ($funcData['class'])
{
$proto .= $funcData['class'];
$proto .= '::';
}
$proto .= $funcData['name'];
$proto .= '(';
if ($funcData['params'])
{
$isFirst = TRUE;
foreach ($funcData['params'] AS $param)
{
if ($isFirst)
{
$isFirst = FALSE;
}
else
{
$proto .= ', ';
}

if ($param['byRef'])
{
$proto .= '&';
}
$proto .= $param['name'];
}
}
$proto .= ")";
$protos[] = $proto;
}

return $protos;
}

echo "Functions in {$_SERVER['argv'][1]}:\n";
foreach (parse_protos(get_protos($_SERVER['argv'][1])) AS $proto)
{
echo " $proto\n";
}

?>[/php]

Save it as "parse_funcs.php" (or whatever you like) and call it like so:
php parse_funcs.php /path/to/php_file

For instance:
[code]
sean@iconoclast:~/php/scripts$ php token_funcs_cli.php ~/php/cvs/Mail_Mime/mime.php
Functions in /home/sean/php/cvs/Mail_Mime/mime.php:
Mail_mime::Mail_mime($crlf)
Mail_mime::__wakeup()
Mail_mime::setTXTBody($data, $isfile, $append)
Mail_mime::setHTMLBody($data, $isfile)
Mail_mime::addHTMLImage($file, $c_type, $name, $isfilename)
Mail_mime::addAttachment($file, $c_type, $name, $isfilename, $encoding)
Mail_mime::_file2str(&$file_name)
Mail_mime::_addTextPart(&$obj, $text)
Mail_mime::_addHtmlPart(&$obj)
Mail_mime::_addMixedPart()
Mail_mime::_addAlternativePart(&$obj)
Mail_mime::_addRelatedPart(&$obj)
Mail_mime::_addHtmlImagePart(&$obj, $value)
Mail_mime::_addAttachmentPart(&$obj, $value)
Mail_mime::get(&$build_params)
Mail_mime::headers(&$xtra_headers)
Mail_mime::txtHeaders($xtra_headers)
Mail_mime::setSubject($subject)
Mail_mime::setFrom($email)
Mail_mime::addCc($email)
Mail_mime::addBcc($email)
Mail_mime::_encodeHeaders($input)
Mail_mime::_setEOL($eol)
[/code]

Not bad, huh?

There are some not-so-obvious bugs (inheritance, mostly), but for a relatively short script, it does a pretty good job.

S

Schizophrenic Methods

Occasionally, it is useful for a developer to determine if a method is being called statically (not in an object context -- Class::method() ), or "not statically" (in an object context -- $object->method()).

This is normally (but incorrectly) done by checking $this:
[php]
class Foo
{
function bar()
{
echo "bar() called: " . (isset($this) ? 'non-statically' : 'statically');
}
}
[/php]

Why the "but incorrectly", you might ask?

A few weeks ago, I started maintaining PEAR::Mail_Mime -- it had a lot of reported bugs, and nobody was really taking care of the package. (I'm going to release 1.3.0RC1 within the next couple weeks)

Anyway, without getting too far off-topic, one of the bugs was "Fatal error: Using $this when not in object context."

Basically, the code was checking for $this->mailMimeDecode, but when called statically, $this was unset.

My fix was to check if $this was set, but once committed, Jan Schneider sent me mail telling me that my patch would not work if the method was called statically from within another object.

This hadn't even occurred to me, so I did some testing (and eventually updated the manual).

Here's the scenario:
[php]
class A
{
function foo()
{
if (isset($this)) {
echo '$this is defined (';
echo get_class($this);
echo ")\n";
} else {
echo "\$this is not defined.\n";
}
}
}

class B
{
function bar()
{
A::foo();
}
}

$a = new A();
$a->foo();
A::foo();
$b = new B();
$b->bar();
B::bar();
[/php]

The output:
[code]
$this is defined (a)
$this is not defined.
$this is defined (b)
$this is not defined.
[/code]

As you can see (if you have the human parser module installed (-: ), $this is defined, and is the calling object, even when a method is called statically (but from the context of another object).

So (and here's my point), how does a developer determine if a given method is called statically? Here's what I came up with (and is in Mail_Mime - CVS):

[php]
$isStatic = !(isset($this) && get_class($this) == __CLASS__);
[/php]

It seems a little hackish to be using __CLASS__, but nothing else came to mind, and it works in every test I came up with.

S

Side note: When I stuck this stuff in the manual, it's place in the oop4 docs is pretty good, but in the oop5 docs, I don't like that it's in The Basics but I don't know where else to put it. So, if anyone has a good suggestion, let me know.

 1

About

User


Clicky Web Analytics