About

User

You are logged in as Anonymous.

Want to log out?

PHP-Aware Diff

UPDATE (and intentionally reinserted into the feed):

I've made a bunch of changes to this code, and updated it.

It's quite a bit slower, but I really don't care (-:

It uses my new pet project, the tokalizer.

You'll probably want to grab the newly-compiled diff-php as this is the one I'll be "maintaining" (ie, when someone complains, or when it breaks for me).

(end update)

I've told a few people that I'd blog about this "soon" and that was a while ago, so I figured I'd better get on the ball.

I tweeted this almost two weeks ago:

Derick responded saying that diff -p does this for C. I tried it with PHP, and it gave me the outermost block where the change occurred (ie, the class, not the function). The
interesting thing, though, is that it changed the @@ line:

@@ -32,7 +32,7 @@ class Foo2 {

Almost what I was looking for, not not quite. I really wanted a php-aware diff that could tell me context.

So, what's a developer with almost no spare time on his hands (but an idea of how to actually accomplish this pet project) to do? Write it himself, of course! (-:

So, I did. Here's an example of the output:

--- tmp/left.php

+++ tmp/right.php

@@ -1,7 +1,7 @@ (root)

 <?php

 class Foo {

     function bar() {

-        // baz!

+        // bax!

     }

 }

 

@@ -32,7 +32,7 @@ (root):Foo2(class)

 // k

 // l

     function bar2() {

-        // baz2!

+        // bax2!

     }

 }

 

@@ -63,7 +63,7 @@ (root):Foo3(class):bar3(function)

 // k

 // l

         $test = "foo {$test}";

-        // baz2!

+        // bax2!

     }

 

     function bar4() {

@@ -93,7 +93,7 @@ (root):Foo3(class):bar4(function):bar5(function)

 // k

 // l

             $test = "foo {$test}";

-            //baz5

+            //bax5

 // a

 // b

 // c

Here's the code for my php-aware diff. I use it as my default svn diff command now (see comments). Hope you find it useful, I sure do.

#!/usr/bin/php

<?php

/// PHP-Aware diff



/// Copyright 2008, Sean Coates

///   Usage of the works is permitted provided that this instrument is retained

///   with the works, so that any entity that uses the works is notified of this

///   instrument.

///   DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.

/// (Fair License - http://www.opensource.org/licenses/fair.php )

/// Short license: do whatever you like with this.





//// save this file as diff-php

////    and make sure /path/to/diff-php is chmod +x



//// TO USE from cli:

////    /path/to/diff-php leftfile rightfile   # (compares files, as diff does)



////

//// TO USE from svn:

////    in ~/.subversion/config, add: diff-cmd = /path/to/diff-php



//// You might need to adjust DIFF_PATH, below



// the tokenizer scares me a bit (-:



class DiffPHP {

   

    const DEBUG_SYNTAX = false; // set to true to get syntax error data (== broken diffs)

   

    const DIFF_PATH = '/usr/bin/diff';

    const DIFF_OPTS = '-u';

   

    /**

     * The "left" file, as passed by svn (or cli)

     */


    protected $left;



    /**

     * The "right" file, as passed by svn (or cli)

     */


    protected $right;



    /**

     * A "nice" version of the left file.

     *

     * Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php

     */


    protected $niceLeft;



    /**

     * A "nice" version of the right file.

     *

     * Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php

     */


    protected $niceRight;



    /**

     * Captured file contents (prevents reading the file twice + diff)

     */


    protected $fileContents;

   

    /**

     * The output from the diff executable

     */


    protected $diff;

   

    /**

     * Each chunk of the diff goes in here (begins with a @@ identifier line)

     */


    protected $chunks;

   

    /**

     * Array of tokens from the Left file

     */


    protected $tokens;

   

    /**

     * Mapping of source lines to source class/functions

     */


    protected $lineMap;

   

    /**

     * Current context (used to construct line map)

     */


    protected $context;

   

    /**

     * Brace depth (used to determine if we're still in the current context)

     */


    protected $braceDepth;

   

    /**

     * Bool flag to indicate that syntax is somehow broken

     */


    protected $isBroken;

   

    /**

     * Object-wide index to keep track of the current token number

     */


    protected $tokenIndex;

   

    /**

     * Currently parsing token value

     */


    protected $currentValue;

   

    /**

     * Constructor. The magic happens here. Once instantiated, the entire

     * process runs

     */


    public function __construct() {

        $this->parseArgs();

       

        $this->fileContents = file_get_contents($this->left);



        $this->doDiff();

       

        // subject (probably) IS a PHP file:

        if (!isset($_ENV['NODIFFPHP']) && stripos($this->fileContents, '<?') !== false) {

            $this->splitDiff();

            $this->determineHierarchy();

            $this->reconstructDiff();

        } else {

            // not a PHP file; return regular diff:

            echo $this->diff;

        }

    }

   

    /**

     * Parses the passed arguments.

     *

     * Determines if it's svn (7 args) or cli (2 args), and stores the parsed

     * arguments.

     */


    protected function parseArgs() {

        // if this is being called from svn, we'll get 4 arguments

        //   (8th is argv 0 == this script)

        if (8 == $_SERVER['argc']) {

            $this->niceLeft = $_SERVER['argv'][3];

            $this->niceRight = $_SERVER['argv'][5];

            $this->left = $_SERVER['argv'][6];

            $this->right = $_SERVER['argv'][7];

        } else if (3 == $_SERVER['argc']) {

            // 2 arguments means a regular diff

            $this->niceLeft = $_SERVER['argv'][1];

            $this->niceRight = $_SERVER['argv'][2];

            $this->left = $this->niceLeft;

            $this->right = $this->niceRight;

        } else {

            die("See " . __FILE__ . " for details on how to use this script\n");

        }

    }

   

    /**

     * Calls the external diff program to get the base diff

     */


    protected function doDiff() {

        if (is_readable($this->left) && is_readable($this->right)) {

            $diffCmd = self::DIFF_PATH . ' ' . self::DIFF_OPTS . " {$this->left} {$this->right}";

            $this->diff = `$diffCmd`;

        } else {

            die("{$this->left} or {$this->right} is not readable\n");

        }

    }

   

    /**

     * Takes an identifier line (looks like: @@ -30,23 +30,79 @@) and returns

     * the begin line number

     */


    protected function parseLineNum($identifier) {

        list(,$from) = explode(" ", $identifier);

        list($from) = explode(',', $from);

        return (int) substr($from, 1);

    }

   

    /**

     * Sanitizes CRLF or CR into just LF

     */


    protected function sanitizeLineEndings($data) {

        // first, sanitize line endings:

        $data = str_replace("\r\n", "\n", $data);

        $data = str_replace("\r",   "\n", $data);

        return $data;

    }    

   

    /**

     * Actually splits the diff into chunks and stores chunks + line numbers

     */


    protected function splitDiff() {

        // now split:

        $this->diff = explode("\n", $this->sanitizeLineEndings($this->diff));

       

        // array to return:

        $this->chunks = array();

       

        // line counter

        $line = 0;

       

        // outer loop: file(s)

        $maxLine = count($this->diff);

   

        // skip first 2 lines as left, right files

        $line += 2;

   

        // descend into data chunks

        while ($line < $maxLine) {

            // next line is the chunk identifier

            $dataChunk = array();

            $dataChunk['identifier'] = $this->diff[$line++];

            $dataChunk['line'] = $this->parseLineNum($dataChunk['identifier']);

            $dataChunk['data'] = array();

            while ($line < $maxLine && !(substr($this->diff[$line], 0, 2) == '@@' && substr($this->diff[$line], -2) == '@@')) {

                $dataChunk['data'][] = $this->diff[$line++];

            }

            $this->chunks[] = $dataChunk;

        }

    }

   

    /**

     * Reconstructs the diff (with adjusted identifier lines, and outputs the

     * result)

     */


    protected function reconstructDiff() {

        $out = "--- {$this->niceLeft}\n+++ {$this->niceRight}\n";

        foreach ($this->chunks as $chunk) {

            $out .= $chunk['identifier'] . "\n";

            $out .= implode("\n", $chunk['data']) ."\n";

        }

        echo $out;

    }

   

    /**

     * Descends into a deeper context

     *

     * @param string $type friendly name, either class or function

     */


    protected function enterContext($type) {

        // next comes whitespace:

        if (is_array($this->tokens[++$this->tokenIndex])) {

            list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

        } else {

            $token = null;

            $this->currentValue = $this->tokens[$this->tokenIndex];

        }

        if ($token != T_WHITESPACE) {

            // syntax is broken, let's get out of here

            if (self::DEBUG_SYNTAX) {

                die("Syntax broken in whitespace assertion, " . $this->context[count($this->context) - 1] . "\n");

            }

            $this->isBroken = true;

            break;

        }

        $this->checkLineBreak();

       

        // next comes the name:

        if (is_array($this->tokens[++$this->tokenIndex])) {

            list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

        } else {

            $token = null;

            $this->currentValue = $this->tokens[$this->tokenIndex];

        }

        $this->context[] = $this->currentValue . "({$type})";

       

        // chew through the next few tokens until we get a "{"

        while ($this->currentValue != '{' && $this->tokenIndex < count($this->tokens)) {

            if (is_array($this->tokens[++$this->tokenIndex])) {

                list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

            } else {

                $token = null;

                $this->currentValue = $this->tokens[$this->tokenIndex];

            }

            $this->checkLineBreak();

            switch ($token) {

                // these are all valid before the brace:

                case null:

                case T_WHITESPACE:

                case T_VARIABLE:

                case T_EXTENDS:

                case T_IMPLEMENTS:

                case T_STRING:

                case T_ARRAY:

                case T_CONSTANT_ENCAPSED_STRING:

                case T_LNUMBER:

                case '=':

                    break;

               

                // if another token is found, then there's a syntax error

                // (this was added to prevent really deep looping)

                default:

                    if (self::DEBUG_SYNTAX) {

                        die("Syntax broken in token assertion, " . $this->context[count($this->context) - 1] . "," . token_name($token) . "\n");

                    }

                    $this->isBroken = true;

                    return;

            }

        }

       

        // found the starting brace

        $this->braceDepth[count($this->context) - 1] = 1;

    }    

   

    /**

     * Tokenizes the code and creates a line map

     */


    protected function tokenizeHierarchy() {

        $this->context = array('(root)');

        $this->lineMap = array('');

        $this->tokens = token_get_all($this->sanitizeLineEndings($this->fileContents));

        $this->isBroken = false;

        for ($this->tokenIndex=0; $this->tokenIndex<count($this->tokens); $this->tokenIndex++) {

            if ($this->isBroken) {

                // syntax is somehow broken; return progress, but don't go further

                return;

            }

            if (is_array($this->tokens[$this->tokenIndex])) {

                list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];

            } else {

                $token = null;

                $this->currentValue = $this->tokens[$this->tokenIndex];

                //change here

            }

           

            switch ($token) {

                // check for class

                case T_CLASS:

                    // found "class"

                    $this->enterContext('class');

                    break;

               

                case T_FUNCTION:

                    // found "function"

                    $this->enterContext('function');

                    break;

               

                default:

                    $idx = count($this->context) - 1;

                    switch ($this->currentValue) {

                        case '{':

                        case T_CURLY_OPEN:

                        case T_DOLLAR_OPEN_CURLY_BRACES:

                            ++$this->braceDepth[$idx];

                            break;

                       

                        case '}':

                            --$this->braceDepth[$idx];

                            if ($this->braceDepth[$idx] == 0) {

                                // we're out of this context

                                array_pop($this->context);

                            } else if ($this->braceDepth[$idx] < 0) {

                                // bad stuff!

                                if (self::DEBUG_SYNTAX) {

                                    die("Syntax broken in brace close assertion, " . $this->context[count($this->context) - 1] . "\n");

                                }

                                $this->isBroken = true;

                            }

                            break;

                       

                        default:

                            $this->checkLineBreak();

                    }

            }

        }

    }

   

    /**

     * Determines if the currently processing token contains line breaks, and

     * if so, adjusts the lineMap accordingly

     */


    protected function checkLineBreak() {

        // check for new line:

        if (strpos($this->currentValue, "\n") !== false) {

            for ($j=1; $j<=substr_count($this->currentValue, "\n"); $j++) {

                $this->lineMap[] = implode(':', $this->context);

            }

        }

    }

   

    /**

     * Matches the chunk map to the line map

     */


    protected function determineHierarchy() {

        $this->tokenizeHierarchy();

        for ($chunknum=0; $chunknum < count($this->chunks); $chunknum++) {

            $this->chunks[$chunknum]['identifier'] .= ' ' . $this->lineMap[$this->chunks[$chunknum]['line']];

        }

    }

}



new DiffPhp;



// komode: le=unix language=php codepage=utf8 tab=4 notabs indent=4

The most up-to-date version of this file can also be found in my personal svn repostory: https://svn.caedmon.net/svn/public/diff-php/diff-php.

Please let me know if you run into any bugs.. I'm sure there are a few, but it works pretty well for me.

S


0 Responses to PHP-Aware Diff

  1. There are currently no comments.

Leave a Reply




Clicky Web Analytics