Earlier this week, I spent most of a day tracing through code in search of the source of a bug that was causing part of our application to fail in strange ways.
In the back end, we have models that connect to CouchDB. These models implement the Iterator pattern to allow easy traversal of a record’s keys.
When I wrote the code to implement Iterator several months
ago, I dutifully checked the PHP
Manual and adapted the reference example that I found there:
<?php
class Record implements Iterator
{
// (partial class, showing the iterator implementation only)
public $_data = array();
public function rewind()
{
reset($this->_data);
}
public function current()
{
return current($this->_data);
}
public function key()
{
return key($this->_data);
}
public function next()
{
return next($this->_data);
}
public function valid()
{
return (current($this->_data) !== false);
}
}Little did I realize that this implementation is very broken. I’ll explain why, below.
Over the past few years, I’ve implemented many iterators in
this way, using PHP’s implicit array manipulation functions
(reset(), current(), key(),
next()). These functions are very convenient because PHP
arrays are so powerful — arrays in PHP work like ordered hash tables in other
languages.
PHP’s implicit management of an array’s iteration index
(the value that is incremented by next() and
referenced by key()) is indeed convenient, but the convenience
can sometimes be offset by its very implicitness — the value is hidden from
you, the PHP programmer.
In PHP, generic array iteration (without the implicit iterator) isn’t actually as simple as it sounds. Remember that arrays aren’t arrays in the traditional sense, but ordered hash tables. Consider this:
$data = array('zero','one','two','three');
for ($i=0; $i<count($data); $i++) {
// yeah, don't calculate count() on every iteration
echo "{$data[$i]}\n";
}Output:
zero
one
two
three
This first example is easy to iterate — the array contains sequential, numeric, zero-based keys. It gets more complicated when using non-sequential, and non-numeric keys:
$data = array(
'apple',
'cow' => 'moo',
'pig' => 'oink',
'orange'
);
for ($i=0; $i<count($data); $i++) {
echo "{$data[$i]}\n";
}Output:
apple
orange
Notice: Undefined offset: 2 in - on line 10
Notice: Undefined offset: 3 in - on line 10
I could use foreach, but because a numeric loop illustrates the point more clearly, here’s how I might implement the above code so that it works:
$data = array(
'apple',
'cow' => 'moo',
'pig' => 'oink',
'orange'
);
$k = array_keys($data);
for ($i=0; $i<count($data); $i++) {
echo "{$data[$k[$i]]}\n";
}Output:
apple
moo
oink
orange
This brings us back to the Iterator implementation. Why
isn’t the code above correct? Take a closer look at this:
public function valid()
{
return (current($this->_data) !== false);
}A value of false in the array is indistinguishable from a
false value returned by current(). Using the
above implementation with the following array would cause it to bail after
orange (and subsequently might cause you to waste a day
tracking down the cause):
array(
'apple',
'orange',
false,
'banana',
);On Tuesday night, I
updated the manual
to use an
improved Iterator implementation.
It’s probably a bit slower (so you
can use the internal-indexing implementation if you’re sure your arrays
will never contain false), but my implementation is more
robust.
<?php
/**
* A mixed-key iterator implementation
*
* Note: these array_keys() calls are slow. The array keys could be cached
* as long as the cache value is invalidated when $_data is changed.
*/
class It implements Iterator
{
public $_data = array();
protected $_index = 0;
public function rewind()
{
$this->_index = 0;
}
public function current()
{
$k = array_keys($this->_data);
return $this->_data[$k[$this->_index]];
}
public function key()
{
$k = array_keys($this->_data);
return $k[$this->_index];
}
public function next()
{
$k = array_keys($this->_data);
if (isset($k[++$this->_index])) {
return $this->_data[$k[$this->_index]];
} else {
return false;
}
}
public function valid()
{
$k = array_keys($this->_data);
return isset($k[$this->_index]);
}
}To use it:
$it = new It;
$it->_data = array(
'one' => 1,
'two' => 2,
false,
'four' => 4
);
foreach ($it as $k => $v) {
echo "{$k}: {$v}\n";
}Output:
one: 1
two: 2
0:
four: 4
Sam Shull
2010 Jul 29 11:58
I have always just used:
public function valid () {
$key = $this->key();
return $key !== false ν
}
**FYI: key() is incorrect in the second example.
Sean Coates
2010 Jul 29 12:02
Can you elaborate, Sam? Which example is incorrect?
S
Darcy Clarke
2010 Jul 29 12:05
Very nice workaround to a problem I'm sure a lot of people have run in to. I, personally, have had to write numerous lines of code to check values in my array's and it's nice to see other's are coming up with ways to achieve similar results with better scalability.
Nate Abele
2011 May 08 22:49
While going over some older code today, I came up with a solution which (I think) involves slightly less overhead. At the least, it's less complicated, as you don't have to manually track the index.
When checking if current($this->_data) === false (or next($this->_data) === false), also check to see if key($this->_data) === null. A minor optimization, but the simplicity makes me happy.
Besides, you'd have to be a dummy to use a null array key. :-)