1. Wikis are not for documentation

    Yesterday, I posted a rant about Docbook, Wikis, and how they're fundamentally different, on the PHP documentation list.

    Since the audience of that list is fairly limited, and its participants generally of the same mind, concerning Docbook's virtues, I thought I'd re-post it here for all to see.

    I'd like to know your opinion. Especially if you're proficient in Docbook, but disagree with me. ---- There is a recurring idea in the PEAR community that they'll set up a wiki and fix all of the PEARdoc problems. "Don't worry," they say, "we'll write a Docbook exporter for our wiki markup."

    I am of the opinion that this is simply impossible without comprimising either 1) Docbook's robustness or 2) Wiki's simplicity

    In (1), I think we phpdoc'ers would agree that docbook is robust. It allows very detailed parsing that allows us to generate things like PDF and CHM. It allows us to parse the document tree and determine which extensions exist, constants in said extensions, functions in said extensions, the prototypes for said functions. Docbook is primarily focused on meta-data, and while visual markup is considered in meta-data markup, docbook prefers to remain output-ignorant. Docbook is _not_ what-you-think-is-what-you-get, as Wikis often claim to be. Docbook is extremely structured, by nature.

    In (2), Wikis are known (and consequentially popular) for their simplicity. Anyone who's tried to create a *||Multi|Column|Table||* would agree that they're meant for simple markup only (but often have certain "complicated" functionality). Sure, it would be possible to implement wiki-specific markup for things like <classname> and &entities; but for each one, additional wiki syntax would need to be added. Once enough new syntax was added to accomplish similar goals (robust wiki->docbook conversion), you'd have a toolbox that's no simpler than docbook. Wikis are unstructured, by nature.

    I believe that anyone who REALLY thinks this should be done has never written a <methodsynopsis> block.

    Now, don't get me wrong. I think wikis have a purpose. I even run one for the doc-folk, but IMO, that purpose is NOT documentation.

    Docbook isn't magic; it's just XML. Once you get past the mental block of "this is hard", it really isn't.

    </rant>

  2. XSS Woes

    A predominant PHP developer (whose name I didn't get permission to drop, so I won't, but many of you know who I mean) has been doing a bunch of research related to Cross Site Scripting (XSS), lately. It's really opened opened my eyes to how much I take user input for granted.

    Don't get me wrong. I write by the "never trust users" mantra. The issue, in this case, is something abusable that completely slipped under my radar.

    Most developers worth their paycheque, I'm sure, know the common rules of "never trust the user", such as "escape all user-supplied data on output," "always validate user input," and "don't rely on something not in your control to do so (ie. Javascript cannot be trusted)." "Don't output unescaped input" goes without saying, in most cases. Only a fool would "echo $_GET['param'];" (and we're all foolish sometimes, aren't we?).

    The problem that was demonstrated to me exploited something I considered to be safe. The filename portion of request URI. Now I know just how wrong I was.

    Consider this: you build a simple script; let's call it simple.php but that doesn't really matter. simple.php looks something like this:

    <html>
     <body>
      <?php
      if (isset($_REQUEST['submitted']) && $_REQUEST['submitted'] == '1') {
        echo "Form submitted!";
      }
      ?>
      <form action="<?php echo $_SERVER['PHP_SELF']; ?>">
       <input type="hidden" name="submitted" value="1" />
       <input type="submit" value="Submit!" />
      </form>
     </body>
    </html>

    Alright. Let's put this script at: http://example.com/tests/simple.php. On a properly-configured web server, you would expect the script to always render to this, on request:

    <html>
     <body>
      <form action="/tests/simple.php">
       <input type="hidden" name="submitted" value="1" />
       <input type="submit" value="Submit!" />
      </form>
     </body>
    </html>

    Right? No.

    What I forgot about, as I suspect some of you have, too (or maybe I'm the only loser who didn't think of this (-; ), is that $_SERVER['PHP_SELF'] can be manipulated by the user.

    How's that? If I put a script at /simple/test.php, $_SERVER['PHP_SELF'] should always be "/simple/test.php", right?

    Wrong, again.

    See, there's a feature of Apache (I think it's Apache, anyway) that you may have used for things like short URLs, or to optimize your query-string-heavy website to make it search-engine friendly. $_SERVER['PATH_INFO']-based URLs.

    Quickly, this is when scripts are able to receive data in the GET string, but before the question mark that separates the file name from the parameters. In a URL like http://www.example.com/download.php/path/to/file, download.php would be

    executed, and /path/to/file would (usually, depending on config) be available to the script via $_SERVER['PATH_INFO'].

    The quirk is that $_SERVER['PHP_SELF'] contains this extra data, opening up the door to potential attack. Even something as simple the code above is vulnerable to such exploits.

    Let's look at our simple.php script, again, but requested in a slightly different manner: http://example.com/tests/simple.php/extra_data_here

    It would still "work"--the output, in this case, would be:

    <html>
     <body>
      <form action="/tests/simple.php/extra_data_here">
       <input type="hidden" name="submitted" value="1" />
       <input type="submit" value="Submit!" />
      </form>
     </body>
    </html>

    I hope that the problem is now obvious. Consider: http://example.com/tests/simple.php/%22%3E%3Cscript%3Ealert('xss')%3C/script%3E%3Cfoo

    The output suddenly becomes very alarming:

    <html>
     <body>
      <form action="/tests/simple.php/"><script>alert('xss')</script><foo">
       <input type="hidden" name="submitted" value="1" />
       <input type="submit" value="Submit!" />
      </form>
     </body>
    </html>

    If you ignore the obviously-incorrect <foo"> tag, you'll see what's happening. The would-be attacker has successfully exploited a critical (if you consider XSS critical) flaw in your logic, and, by getting a user to click the link (even through a redirect script), he has executed the Javascript of his choice on your user's client (obviously, this requires the user to have Javascript enabled). My alert() example is non-malicious, but it's trivial to write similarly-invoked Javascript that changes the action of a form, or usurps cookies (and submits them in a hidden iframe, or through an image tag's URL, to a server that records this personal data).

    The solution should also be obvious. Convert the user-supplied data to entities. The code becomes:

    <html>
     <body>
      <?php
      if (isset($_REQUEST['submitted']) && $_REQUEST['submitted'] == '1') {
        echo "Form submitted!";
      }
      ?>
      <form action="<?php echo htmlentities($_SERVER['PHP_SELF']); ?>">
       <input type="hidden" name="submitted" value="1" />
       <input type="submit" value="Submit!" />
      </form>
     </body>
    </html>

    And an attack, as above, would be rendered:

    <html>
     <body>
      <form action="/tests/simple.php/&amp;quot;&amp;gt;&amp;lt;script&amp;gt;alert('xss')&amp;lt;/script&amp;gt;&amp;lt;foo">
       <input type="hidden" name="submitted" value="1" />
       <input type="submit" value="Submit!" />
      </form>
     </body>
    </html>

    This still violates the assumption that the script name and path are the only data in $_SERVER['PHP_SELF'], but the payload has been neutralized.

    Needless to say, I felt silly for not thinking of such a simple exploit, earlier. As the aforementioned PHP developer said, at the time (to paraphrase): if guys who consider themselves experts in PHP development don't notice these things, there's little hope for the unwashed masses who have just written their first 'echo "hello world!\n";'. He's working on a generic user-input filtering mechanism that can be applied globally to all user input. Hopefully we'll see it in PECL, soon. Don't forget about the other data in $_SERVER, either..

    ... ...

    Upon experimenting with this exploit on my own server (and watching the raw data in my _SUPERGLOBALS, conveniently, via phpinfo()), I noticed something very interesting that reminded me that even though trusting this data was a stupid mistake on my part, I'm not the only one to do so. A fun (and by fun, I mean nauseating) little game to play: create a file called "info.php" (or whatever name you like). In it, place only "<php phpinfo(); ?>". Now request it like this: http://your-server/path/to/info.php/%22%3E%3Cimg%20src=http://www.perl.com/images/75-logo.jpg%3E%3Cblah

    Nice huh? A little less nauseating: it's fixed in CVS.

  3. php|a April 2005 Released

    So, as I alluded last week: my other-other-work has come to fruition. I'm the new Editor-in-Chief of php|architect (in addition to my regular development job).

    It's been a lot of work, but it feels really good to see my first month come together. Can't wait to see it in the newstands.

    ...and that's all the writing I have time for, right now (-:

  4. Busy

    Since the last week of March, I've been SO busy...

    Some of it work-related stuff, some of it other-work-related stuff (which some of you already know about, but I'll talk about this more in a week (or so))...

    I managed to actually release Mail_Mime (as you may have read about in a previous entry). I attended PHP Quebec (as you may read about in the second part of this entry), and unless the building inspection turns up something unexpected, I'll soon be the owner of my first house.

    Anyway, I've been meaning to write a little bit about Conference PHP Quebec, but because of the aforementioned busy-ness (and business), I haven't had time. So, tonight, after my daughter has (finally!) fallen asleep, and my wife well on her way, I pulled out the laptop, and, from the comfort of my side of the bed, am finally jotting down a few things about this wonderful conference.

    Now, I'm no conference connaisseur, but the 2005 Conference PHP Quebec was the best conference I've ever been to. Admittedly, I've only ever been to the 2003 and 2004 editions of the same, so I may just be a naive conference-goer.

    By a huge margin, my favourite part of the conference was simply getting to spend some time, face-to-face, with guys who I've only ever been able to talk to on IRC and mailing lists. I spent a great deal of non-workshop time with Derick, Ilia, Toby, Marcus(helly) and John(coogle). I also had the opportunity to spend some time chatting with Shane Caraveo (Activestate), Chris Shiflett (Brainbulb), Daniel Kushner (Zend), and Rasmus (everyone knows Rasmus, right?).

    I got to take the Zend Cert exam, which was, on one hand, easier than expected (so glad I didn't have to know the parameter order of the twisted string/array libraries), and on the other hand, harder than expected (three+ levels of reference tracing within the code I was expected to head-parse).

    Damien invited me to the speakers/sponsors dinner, which was really a highlight of the conference, for me. As I said, I got to spend some time with the various community members, and joking around/talking shop with these guys while eating a VERY good gourmet french meal (smoked duck, blueberry bison, and the best creme brulee I've ever had), and drinking good wine was just all-around a great time.

    Chris Shiflett's session tied Marcus Boerger's in my mind--both of them were great. Lots of great info was shared. Toby's, while interesting, covered mostly stuff I'm already well-versed on (I even got to speak for a few minutes, near the end, when someone asked about .phpt).

    The little bit of time I got to spend talking to Rasmus really opened my eyes (Rasmus, if you're reading this, please don't take it the wrong way): I gained much respect for him. See, before this year, I'd never talked to him one-on-one, really. I'd only attended his sessions (which, the first year was interesting, because I was (shamefully, now) somewhat starstruck, and the second year moderately interesting). What bugged me most was the Rasmus fanboys who seem to worship the guy (yes, I said starstruck, but I didn't WORSHIP him ok? (-: ). It seemed like his time had passed, and the proverbial reins of our favourite language were taken up by the likes of Andi and Zeev. It seemed to me that Rasmus was now (then) riding the fame-wave.

    Well, I'm happy to admit that I was completely wrong. (-: (And if you're still reading, I'm sorry for doubting you.) Every time I heard him speak, in smaller groups, this time, he spoke intelligently of things like character encodings and APC, when he spoke candidly about PHP Internals, Filtering (because we'd had a brief spat on internals), and even Yahoo!, it was with wisdom, and not arrogance. It's really hard to explain, but he went from "hack" to "hacker" in my mind. I feel somewhat stupid for writing off his intelligence as one Rasmussing talk after another.

    That said, I had the privilege of talking, eating, learning and drinking with some VERY intelligent people, those two days. I can't wait until we can get together again (Toby tried to convince me to submit a paper for the conference in Frankfurt, this fall.. we'll see (-: ).

    Last thought: I was making stupid faces in every photo that was taken, it seems.. yay me.

  5. Mail_Mime 1.3.0-stable released

    Very quick note to say that I just found a few minutes to release PEAR::Mail_Mime 1.3.0-stable.

    I'm at Conference PHP Quebec (sitting beside Toby, actually), more on that later.