Recent Happenings

I've got a bunch of stuff that I haven't found/made time to blog about, so just dropping some quick notes here:

  • I've been invited to speak at PHP Quebec 2009. I've been to this conference a few times (but not for a couple years, now), and I'm really looking forward to getting back into the conference circuit (as a speaker, not an organizer... think of all the free time I'll have! (-; Anyway, I'll be giving a talk entitled "Stupid Browser Tricks" in which I'll talk (at a high level) about Firebug, and Selenium IDE, and possibly a few other things like granular browser security, komodo macros/extensions (like a browser!) and maybe greasemonkey.
  • This year, I was once again invited back to the Microsoft Web Developers Summit (couldn't think of a better URL). This is a yearly event where Microsoft selects members of the PHP community to Redmond to have a discussion on PHP and Microsoft's offerings. This year was definitely the best one yet, as it was better organized, and it felt much less like they were trying to sell us things. Their candor was especially appreciated this year, as I think many of the attendees felt like Microsoft was asking us for our opinions instead of trying to give them to us. I wrote about this last year, and I think what I wrote still rings true, today. Thanks to the organizers... we got some great information, made our opinions clear, and had a LOT of fun (great people!).
  • I tweeted about this, but never posted it on my blog. My colleague Luke Welling is a funny guy.
  • Over the holiday weekend (I got days off, but in Canadia, we celebrate Thanksgiving in October), I found some time to work on a bunch of pet projects, including fale.ca, which is nothing special, but kind of fun. See?
  • Today, I was extended an invitation to join the Habari Cabal, which I quickly accepted. So, if you use Habari and your blog breaks in the future, it's probably my fault.
  • ... and last, but not least, Chris and I—with the help of many other people—managed to almost get the 2008 PHP Advent calendar launched in time. Word on the street is that Jon Tan is going to show the design some love, and we have a feed. The 2007 edition was a success, but was a lot of work, so I offered to pitch in this year. Thanks to everyone who's already submitted... and the rest of you slackers: get to it! (-;
  • S

PHP-Aware Diff

UPDATE (and intentionally reinserted into the feed):

I've made a bunch of changes to this code, and updated it.

It's quite a bit slower, but I really don't care (-:

It uses my new pet project, the tokalizer.

You'll probably want to grab the newly-compiled diff-php as this is the one I'll be "maintaining" (ie, when someone complains, or when it breaks for me).

(end update)

I've told a few people that I'd blog about this "soon" and that was a while ago, so I figured I'd better get on the ball.

I tweeted this almost two weeks ago:

Derick responded saying that diff -p does this for C. I tried it with PHP, and it gave me the outermost block where the change occurred (ie, the class, not the function). The
interesting thing, though, is that it changed the @@ line:

@@ -32,7 +32,7 @@ class Foo2 {

Almost what I was looking for, not not quite. I really wanted a php-aware diff that could tell me context.

So, what's a developer with almost no spare time on his hands (but an idea of how to actually accomplish this pet project) to do? Write it himself, of course! (-:

So, I did. Here's an example of the output:

--- tmp/left.php

+++ tmp/right.php

@@ -1,7 +1,7 @@ (root)

 <?php

 class Foo {

     function bar() {

-        // baz!

+        // bax!

     }

 }

 

@@ -32,7 +32,7 @@ (root):Foo2(class)

 // k

 // l

     function bar2() {

-        // baz2!

+        // bax2!

     }

 }

 

@@ -63,7 +63,7 @@ (root):Foo3(class):bar3(function)

 // k

 // l

         $test = "foo {$test}";

-        // baz2!

+        // bax2!

     }

 

     function bar4() {

@@ -93,7 +93,7 @@ (root):Foo3(class):bar4(function):bar5(function)

 // k

 // l

             $test = "foo {$test}";

-            //baz5

+            //bax5

 // a

 // b

 // c

Here's the code for my php-aware diff. I use it as my default svn diff command now (see comments). Hope you find it useful, I sure do.

#!/usr/bin/php

<?php

/// PHP-Aware diff



/// Copyright 2008, Sean Coates

///   Usage of the works is permitted provided that this instrument is retained

///   with the works, so that any entity that uses the works is notified of this

///   instrument.

///   DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.

/// (Fair License - http://www.opensource.org/licenses/fair.php )

/// Short license: do whatever you like with this.





//// save this file as diff-php

////    and make sure /path/to/diff-php is chmod +x



//// TO USE from cli:

////    /path/to/diff-php leftfile rightfile   # (compares files, as diff does)



////

//// TO USE from svn:

////    in ~/.subversion/config, add: diff-cmd = /path/to/diff-php



//// You might need to adjust DIFF_PATH, below



// the tokenizer scares me a bit (-:



class DiffPHP {

   

    const DEBUG_SYNTAX = false; // set to true to get syntax error data (== broken diffs)

   

    const DIFF_PATH = '/usr/bin/diff';

    const DIFF_OPTS = '-u';

   

    /**

     * The "left" file, as passed by svn (or cli)

     */


    protected $left;



    /**

     * The "right" file, as passed by svn (or cli)

     */


    protected $right;



    /**

     * A "nice" version of the left file.

     *

     * Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php

     */


    protected $niceLeft;



    /**

     * A "nice" version of the right file.

     *

     * Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php

     */


    protected $niceRight;



    /**

     * Captured file contents (prevents reading the file twice + diff)

     */


    protected $fileContents;

   

    /**

     * The output from the diff executable

     */


    protected $diff;

   

    /**

     * Each chunk of the diff goes in here (begins with a @@ identifier line)

     */


    protected $chunks;

   

    /**

     * Array of tokens from the Left file

     */


    protected $tokens;

   

    /**

     * Mapping of source lines to source class/functions

     */


    protected $lineMap;

   

    /**

     * Current context (used to construct line map)

     */


    protected $context;

   

    /**

     * Brace depth (used to determine if we're still in the current context)

     */


    protected $braceDepth;

   

    /**

     * Bool flag to indicate that syntax is somehow broken

     */


    protected $isBroken;

   

    /**

     * Object-wide index to keep track of the current token number

     */


    protected $tokenIndex;

   

    /**

     * Currently parsing token value

     */


    protected $currentValue;

   

    /**

     * Constructor. The magic happens here. Once instantiated, the entire

     * process runs

     */


    public function __construct() {

        $this->parseArgs();

       

        $this->fileContents = file_get_contents($this->left);



        $this->doDiff();

       

        // subject (probably) IS a PHP file:

        if (!isset($_ENV['NODIFFPHP']) && stripos($this->fileContents, '<?') !== false) {

            $this->splitDiff();

            $this->determineHierarchy();

            $this->reconstructDiff();

        } else {

            // not a PHP file; return regular diff:

            echo $this->diff;

        }

    }

   

    /**

     * Parses the passed arguments.

     *

     * Determines if it's svn (7 args) or cli (2 args), and stores the parsed

     * arguments.

     */


    protected function parseArgs() {

        // if this is being called from svn, we'll get 4 arguments

        //   (8th is argv 0 == this script)

        if (8 == $_SERVER['argc']) {

            $this->niceLeft = $_SERVER['argv'][3];

            $this->niceRight = $_SERVER['argv'][5];

            $this->left = $_SERVER['argv'][6];

            $this->right = $_SERVER['argv'][7];

        } else if (3 == $_SERVER['argc']) {

            // 2 arguments means a regular diff

            $this->niceLeft = $_SERVER['argv'][1];

            $this->niceRight = $_SERVER['argv'][2];

            $this->left = $this->niceLeft;

            $this->right = $this->niceRight;

        } else {

            die("See " . __FILE__ . " for details on how to use this script\n");

        }

    }

   

    /**

     * Calls the external diff program to get the base diff

     */


    protected function doDiff() {

        if (is_readable($this->left) && is_readable($this->right)) {

            $diffCmd = self::DIFF_PATH . ' ' . self::DIFF_OPTS . " {$this->left} {$this->right}";

            $this->diff = `$diffCmd`;

        } else {

            die("{$this->left} or {$this->right} is not readable\n");

        }

    }

   

    /**

     * Takes an identifier line (looks like: @@ -30,23 +30,79 @@) and returns

     * the begin line number

     */


    protected function parseLineNum($identifier) {

        list(,$from) = explode(" ", $identifier);

        list($from) = explode(',', $from);

        return (int) substr($from, 1);

    }

   

    /**

     * Sanitizes CRLF or CR into just LF

     */


    protected function sanitizeLineEndings($data) {

        // first, sanitize line endings:

        $data = str_replace("\r\n", "\n", $data);

        $data = str_replace("\r",   "\n", $data);

        return $data;

    }    

   

    /**

     * Actually splits the diff into chunks and stores chunks + line numbers

     */


    protected function splitDiff() {

        // now split:

        $this->diff = explode("\n", $this->sanitizeLineEndings($this->diff));

       

        // array to return:

        $this->chunks = array();

       

        // line counter

        $line = 0;

       

        // outer loop: file(s)

        $maxLine = count($this->diff);

   

        // skip first 2 lines as left, right files

        $line += 2;

   

        // descend into data chunks

        while ($line < $maxLine) {

            // next line is the chunk identifier

            $dataChunk = array();

            $dataChunk['identifier'] = $this->diff[$line++];

            $dataChunk['line'] = $this->parseLineNum($dataChunk['identifier']);

            $dataChunk['data'] = array();

            while ($line < $maxLine && !(substr($this->diff[$line], 0, 2) == '@@' && substr($this->diff[$line], -2) == '@@')) {

                $dataChunk['data'][] = $this->diff[$line++];

            }

            $this->chunks[] = $dataChunk;

        }

    }

   

    /**

     * Reconstructs the diff (with adjusted identifier lines, and outputs the

     * result)

     */


    protected function reconstructDiff() {

        $out = "--- {$this->niceLeft}\n+++ {$this->niceRight}\n";

        foreach ($this->chunks as $chunk) {

            $out .= $chunk['identifier'] . "\n";

            $out .= implode("\n", $chunk['data']) ."\n";

        }

        echo $out;

    }

   

    /**

     * Descends into a deeper context

     *

     * @param string $type friendly name, either class or function

     */


    protected function enterContext($type) {

        // next comes whitespace:

        if (is_array($this->tokens[++$this->tokenIndex])) {

            list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

        } else {

            $token = null;

            $this->currentValue = $this->tokens[$this->tokenIndex];

        }

        if ($token != T_WHITESPACE) {

            // syntax is broken, let's get out of here

            if (self::DEBUG_SYNTAX) {

                die("Syntax broken in whitespace assertion, " . $this->context[count($this->context) - 1] . "\n");

            }

            $this->isBroken = true;

            break;

        }

        $this->checkLineBreak();

       

        // next comes the name:

        if (is_array($this->tokens[++$this->tokenIndex])) {

            list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

        } else {

            $token = null;

            $this->currentValue = $this->tokens[$this->tokenIndex];

        }

        $this->context[] = $this->currentValue . "({$type})";

       

        // chew through the next few tokens until we get a "{"

        while ($this->currentValue != '{' && $this->tokenIndex < count($this->tokens)) {

            if (is_array($this->tokens[++$this->tokenIndex])) {

                list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

            } else {

                $token = null;

                $this->currentValue = $this->tokens[$this->tokenIndex];

            }

            $this->checkLineBreak();

            switch ($token) {

                // these are all valid before the brace:

                case null:

                case T_WHITESPACE:

                case T_VARIABLE:

                case T_EXTENDS:

                case T_IMPLEMENTS:

                case T_STRING:

                case T_ARRAY:

                case T_CONSTANT_ENCAPSED_STRING:

                case T_LNUMBER:

                case '=':

                    break;

               

                // if another token is found, then there's a syntax error

                // (this was added to prevent really deep looping)

                default:

                    if (self::DEBUG_SYNTAX) {

                        die("Syntax broken in token assertion, " . $this->context[count($this->context) - 1] . "," . token_name($token) . "\n");

                    }

                    $this->isBroken = true;

                    return;

            }

        }

       

        // found the starting brace

        $this->braceDepth[count($this->context) - 1] = 1;

    }    

   

    /**

     * Tokenizes the code and creates a line map

     */


    protected function tokenizeHierarchy() {

        $this->context = array('(root)');

        $this->lineMap = array('');

        $this->tokens = token_get_all($this->sanitizeLineEndings($this->fileContents));

        $this->isBroken = false;

        for ($this->tokenIndex=0; $this->tokenIndex<count($this->tokens); $this->tokenIndex++) {

            if ($this->isBroken) {

                // syntax is somehow broken; return progress, but don't go further

                return;

            }

            if (is_array($this->tokens[$this->tokenIndex])) {

                list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

            } else {

                $token = null;

                $this->currentValue = $this->tokens[$this->tokenIndex];

                //change here

            }

           

            switch ($token) {

                // check for class

                case T_CLASS:

                    // found "class"

                    $this->enterContext('class');

                    break;

               

                case T_FUNCTION:

                    // found "function"

                    $this->enterContext('function');

                    break;

               

                default:

                    $idx = count($this->context) - 1;

                    switch ($this->currentValue) {

                        case '{':

                        case T_CURLY_OPEN:

                        case T_DOLLAR_OPEN_CURLY_BRACES:

                            ++$this->braceDepth[$idx];

                            break;

                       

                        case '}':

                            --$this->braceDepth[$idx];

                            if ($this->braceDepth[$idx] == 0) {

                                // we're out of this context

                                array_pop($this->context);

                            } else if ($this->braceDepth[$idx] < 0) {

                                // bad stuff!

                                if (self::DEBUG_SYNTAX) {

                                    die("Syntax broken in brace close assertion, " . $this->context[count($this->context) - 1] . "\n");

                                }

                                $this->isBroken = true;

                            }

                            break;

                       

                        default:

                            $this->checkLineBreak();

                    }

            }

        }

    }

   

    /**

     * Determines if the currently processing token contains line breaks, and

     * if so, adjusts the lineMap accordingly

     */


    protected function checkLineBreak() {

        // check for new line:

        if (strpos($this->currentValue, "\n") !== false) {

            for ($j=1; $j<=substr_count($this->currentValue, "\n"); $j++) {

                $this->lineMap[] = implode(':', $this->context);

            }

        }

    }

   

    /**

     * Matches the chunk map to the line map

     */


    protected function determineHierarchy() {

        $this->tokenizeHierarchy();

        for ($chunknum=0; $chunknum < count($this->chunks); $chunknum++) {

            $this->chunks[$chunknum]['identifier'] .= ' ' . $this->lineMap[$this->chunks[$chunknum]['line']];

        }

    }

}



new DiffPhp;



// komode: le=unix language=php codepage=utf8 tab=4 notabs indent=4

The most up-to-date version of this file can also be found in my personal svn repostory: https://svn.caedmon.net/svn/public/diff-php/diff-php.

Please let me know if you run into any bugs.. I'm sure there are a few, but it works pretty well for me.

S

 1

About

User


Clicky Web Analytics