UPDATE (and intentionally reinserted into the feed):
I've made a bunch of changes to this code, and updated it.
It's quite a bit slower, but I really don't care (-:
It uses my new pet project, the tokalizer.
You'll probably want to grab the newly-compiled diff-php as this is the one I'll be "maintaining" (ie, when someone complains, or when it breaks for me).
(end update)
I've told a few people that I'd blog about this "soon" and that was a while ago, so I figured I'd better get on the ball.
I tweeted this almost two weeks ago:
Derick responded saying that diff -p does this for C. I tried it with PHP, and it gave me the outermost block where the change occurred (ie, the class, not the function). The
interesting thing, though, is that it changed the @@ line:
Almost what I was looking for, not not quite. I really wanted a php-aware diff that could tell me context.
So, what's a developer with almost no spare time on his hands (but an idea of how to actually accomplish this pet project) to do? Write it himself, of course! (-:
So, I did. Here's an example of the output:
+++ tmp/right.php
@@ -1,7 +1,7 @@ (root)
<?php
class Foo {
function bar() {
- // baz!
+ // bax!
}
}
@@ -32,7 +32,7 @@ (root):Foo2(class)
// k
// l
function bar2() {
- // baz2!
+ // bax2!
}
}
@@ -63,7 +63,7 @@ (root):Foo3(class):bar3(function)
// k
// l
$test = "foo {$test}";
- // baz2!
+ // bax2!
}
function bar4() {
@@ -93,7 +93,7 @@ (root):Foo3(class):bar4(function):bar5(function)
// k
// l
$test = "foo {$test}";
- //baz5
+ //bax5
// a
// b
// c
Here's the code for my php-aware diff. I use it as my default svn diff command now (see comments). Hope you find it useful, I sure do.
<?php
/// PHP-Aware diff
/// Copyright 2008, Sean Coates
/// Usage of the works is permitted provided that this instrument is retained
/// with the works, so that any entity that uses the works is notified of this
/// instrument.
/// DISCLAIMER: THE WORKS ARE WITHOUT WARRANTY.
/// (Fair License - http://www.opensource.org/licenses/fair.php )
/// Short license: do whatever you like with this.
//// save this file as diff-php
//// and make sure /path/to/diff-php is chmod +x
//// TO USE from cli:
//// /path/to/diff-php leftfile rightfile # (compares files, as diff does)
////
//// TO USE from svn:
//// in ~/.subversion/config, add: diff-cmd = /path/to/diff-php
//// You might need to adjust DIFF_PATH, below
// the tokenizer scares me a bit (-:
class DiffPHP {
const DEBUG_SYNTAX = false; // set to true to get syntax error data (== broken diffs)
const DIFF_PATH = '/usr/bin/diff';
const DIFF_OPTS = '-u';
/**
* The "left" file, as passed by svn (or cli)
*/
protected $left;
/**
* The "right" file, as passed by svn (or cli)
*/
protected $right;
/**
* A "nice" version of the left file.
*
* Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php
*/
protected $niceLeft;
/**
* A "nice" version of the right file.
*
* Instead of foo/bar/.svn/base/whatever.php, it would just be whatever.php
*/
protected $niceRight;
/**
* Captured file contents (prevents reading the file twice + diff)
*/
protected $fileContents;
/**
* The output from the diff executable
*/
protected $diff;
/**
* Each chunk of the diff goes in here (begins with a @@ identifier line)
*/
protected $chunks;
/**
* Array of tokens from the Left file
*/
protected $tokens;
/**
* Mapping of source lines to source class/functions
*/
protected $lineMap;
/**
* Current context (used to construct line map)
*/
protected $context;
/**
* Brace depth (used to determine if we're still in the current context)
*/
protected $braceDepth;
/**
* Bool flag to indicate that syntax is somehow broken
*/
protected $isBroken;
/**
* Object-wide index to keep track of the current token number
*/
protected $tokenIndex;
/**
* Currently parsing token value
*/
protected $currentValue;
/**
* Constructor. The magic happens here. Once instantiated, the entire
* process runs
*/
public function __construct() {
$this->parseArgs();
$this->fileContents = file_get_contents($this->left);
$this->doDiff();
// subject (probably) IS a PHP file:
if (!isset($_ENV['NODIFFPHP']) && stripos($this->fileContents, '<?') !== false) {
$this->splitDiff();
$this->determineHierarchy();
$this->reconstructDiff();
} else {
// not a PHP file; return regular diff:
echo $this->diff;
}
}
/**
* Parses the passed arguments.
*
* Determines if it's svn (7 args) or cli (2 args), and stores the parsed
* arguments.
*/
protected function parseArgs() {
// if this is being called from svn, we'll get 4 arguments
// (8th is argv 0 == this script)
if (8 == $_SERVER['argc']) {
$this->niceLeft = $_SERVER['argv'][3];
$this->niceRight = $_SERVER['argv'][5];
$this->left = $_SERVER['argv'][6];
$this->right = $_SERVER['argv'][7];
} else if (3 == $_SERVER['argc']) {
// 2 arguments means a regular diff
$this->niceLeft = $_SERVER['argv'][1];
$this->niceRight = $_SERVER['argv'][2];
$this->left = $this->niceLeft;
$this->right = $this->niceRight;
} else {
die("See " . __FILE__ . " for details on how to use this script\n");
}
}
/**
* Calls the external diff program to get the base diff
*/
protected function doDiff() {
if (is_readable($this->left) && is_readable($this->right)) {
$diffCmd = self::DIFF_PATH . ' ' . self::DIFF_OPTS . " {$this->left} {$this->right}";
$this->diff = `$diffCmd`;
} else {
die("{$this->left} or {$this->right} is not readable\n");
}
}
/**
* Takes an identifier line (looks like: @@ -30,23 +30,79 @@) and returns
* the begin line number
*/
protected function parseLineNum($identifier) {
list(,$from) = explode(" ", $identifier);
list($from) = explode(',', $from);
return (int) substr($from, 1);
}
/**
* Sanitizes CRLF or CR into just LF
*/
protected function sanitizeLineEndings($data) {
// first, sanitize line endings:
$data = str_replace("\r\n", "\n", $data);
$data = str_replace("\r", "\n", $data);
return $data;
}
/**
* Actually splits the diff into chunks and stores chunks + line numbers
*/
protected function splitDiff() {
// now split:
$this->diff = explode("\n", $this->sanitizeLineEndings($this->diff));
// array to return:
$this->chunks = array();
// line counter
$line = 0;
// outer loop: file(s)
$maxLine = count($this->diff);
// skip first 2 lines as left, right files
$line += 2;
// descend into data chunks
while ($line < $maxLine) {
// next line is the chunk identifier
$dataChunk = array();
$dataChunk['identifier'] = $this->diff[$line++];
$dataChunk['line'] = $this->parseLineNum($dataChunk['identifier']);
$dataChunk['data'] = array();
while ($line < $maxLine && !(substr($this->diff[$line], 0, 2) == '@@' && substr($this->diff[$line], -2) == '@@')) {
$dataChunk['data'][] = $this->diff[$line++];
}
$this->chunks[] = $dataChunk;
}
}
/**
* Reconstructs the diff (with adjusted identifier lines, and outputs the
* result)
*/
protected function reconstructDiff() {
$out = "--- {$this->niceLeft}\n+++ {$this->niceRight}\n";
foreach ($this->chunks as $chunk) {
$out .= $chunk['identifier'] . "\n";
$out .= implode("\n", $chunk['data']) ."\n";
}
echo $out;
}
/**
* Descends into a deeper context
*
* @param string $type friendly name, either class or function
*/
protected function enterContext($type) {
// next comes whitespace:
if (is_array($this->tokens[++$this->tokenIndex])) {
list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
} else {
$token = null;
$this->currentValue = $this->tokens[$this->tokenIndex];
}
if ($token != T_WHITESPACE) {
// syntax is broken, let's get out of here
if (self::DEBUG_SYNTAX) {
die("Syntax broken in whitespace assertion, " . $this->context[count($this->context) - 1] . "\n");
}
$this->isBroken = true;
break;
}
$this->checkLineBreak();
// next comes the name:
if (is_array($this->tokens[++$this->tokenIndex])) {
list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
} else {
$token = null;
$this->currentValue = $this->tokens[$this->tokenIndex];
}
$this->context[] = $this->currentValue . "({$type})";
// chew through the next few tokens until we get a "{"
while ($this->currentValue != '{' && $this->tokenIndex < count($this->tokens)) {
if (is_array($this->tokens[++$this->tokenIndex])) {
list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
} else {
$token = null;
$this->currentValue = $this->tokens[$this->tokenIndex];
}
$this->checkLineBreak();
switch ($token) {
// these are all valid before the brace:
case null:
case T_WHITESPACE:
case T_VARIABLE:
case T_EXTENDS:
case T_IMPLEMENTS:
case T_STRING:
case T_ARRAY:
case T_CONSTANT_ENCAPSED_STRING:
case T_LNUMBER:
case '=':
break;
// if another token is found, then there's a syntax error
// (this was added to prevent really deep looping)
default:
if (self::DEBUG_SYNTAX) {
die("Syntax broken in token assertion, " . $this->context[count($this->context) - 1] . "," . token_name($token) . "\n");
}
$this->isBroken = true;
return;
}
}
// found the starting brace
$this->braceDepth[count($this->context) - 1] = 1;
}
/**
* Tokenizes the code and creates a line map
*/
protected function tokenizeHierarchy() {
$this->context = array('(root)');
$this->lineMap = array('');
$this->tokens = token_get_all($this->sanitizeLineEndings($this->fileContents));
$this->isBroken = false;
for ($this->tokenIndex=0; $this->tokenIndex<count($this->tokens); $this->tokenIndex++) {
if ($this->isBroken) {
// syntax is somehow broken; return progress, but don't go further
return;
}
if (is_array($this->tokens[$this->tokenIndex])) {
list($token, $this->currentValue) = $this->tokens[$this->tokenIndex];
} else {
$token = null;
$this->currentValue = $this->tokens[$this->tokenIndex];
//change here
}
switch ($token) {
// check for class
case T_CLASS:
// found "class"
$this->enterContext('class');
break;
case T_FUNCTION:
// found "function"
$this->enterContext('function');
break;
default:
$idx = count($this->context) - 1;
switch ($this->currentValue) {
case '{':
case T_CURLY_OPEN:
case T_DOLLAR_OPEN_CURLY_BRACES:
++$this->braceDepth[$idx];
break;
case '}':
--$this->braceDepth[$idx];
if ($this->braceDepth[$idx] == 0) {
// we're out of this context
array_pop($this->context);
} else if ($this->braceDepth[$idx] < 0) {
// bad stuff!
if (self::DEBUG_SYNTAX) {
die("Syntax broken in brace close assertion, " . $this->context[count($this->context) - 1] . "\n");
}
$this->isBroken = true;
}
break;
default:
$this->checkLineBreak();
}
}
}
}
/**
* Determines if the currently processing token contains line breaks, and
* if so, adjusts the lineMap accordingly
*/
protected function checkLineBreak() {
// check for new line:
if (strpos($this->currentValue, "\n") !== false) {
for ($j=1; $j<=substr_count($this->currentValue, "\n"); $j++) {
$this->lineMap[] = implode(':', $this->context);
}
}
}
/**
* Matches the chunk map to the line map
*/
protected function determineHierarchy() {
$this->tokenizeHierarchy();
for ($chunknum=0; $chunknum < count($this->chunks); $chunknum++) {
$this->chunks[$chunknum]['identifier'] .= ' ' . $this->lineMap[$this->chunks[$chunknum]['line']];
}
}
}
new DiffPhp;
// komode: le=unix language=php codepage=utf8 tab=4 notabs indent=4
The most up-to-date version of this file can also be found in my personal svn repostory: https://svn.caedmon.net/svn/public/diff-php/diff-php.
Please let me know if you run into any bugs.. I'm sure there are a few, but it works pretty well for me.
S
