[FX.php List] Odd browser bug or?

Erik Andreas Cayré erik at cayre.dk
Mon Apr 30 23:08:43 MDT 2007


It seems I may not have provided enough background for my question...

First: I'm fully aware the ampersand has special meaning in both HTML  
and URLs.
Most browsers will tolerate ampersands in URLS in HTML, which are  
unencoded, and (not) fail gracefully when stumbling on one of these,  
unless (as Kevin pointed out) it identifies a valid entity.

After getting the site working in july 2006 – this is the first PHP/ 
HTML project I've done by myself – I started debugging all the  
beginner errors I had made. One method was running pages through  
validator.w3.org and that helped med catch many bugs including  
unencoded ampersands in URLS in my HTML code.

I also set up a custom error.php file for Apache, mostly to catch  
users' bookmarks to the old site (which had been done in lasso 3) and  
give them a better experience than stumbling on a page not found error.
In addition I call a homegrown error function, everytime I discover  
something went wrong, to analyze and possibly fix/enhance the site.

The error I'm asking about is reported by my custom error reporting  
at a point in my code, where some of the URL was recognized but the  
rest is not.

 From my index.php:

// select page to show
	if (isset($_GET['p'])) {
		$p = $_GET['p'];
	} else {
		$p = '';
		$link = '/?p=';
	}
	if (isset ($p) &&  $p != 'login') {
		$link = '/?p=' . $p;
	} else {
		$link = '/';
	}

	switch ($p) {

		case 'assoc':
			$associd = (isset($_GET['assoc'])) ? intval($_GET['assoc']) : '';
			if ($associd == '') {
				header('Location: http://' . $_SERVER['HTTP_HOST'] . dirname 
($_SERVER['PHP_SELF']));
				reporterror('undefined url');
				exit;
			} elseif ($associd < 1) {
				header('Location: http://' . $_SERVER['HTTP_HOST'] . dirname 
($_SERVER['PHP_SELF']));
				reporterror('undefined associd: "' . $_GET['assoc'] . '"');
				exit;
			} else {
				$page = 'pages/assoc.php';

The error being reported is an attempt to access
/?p=assoc&amp;asscoc=58 (or another number)

This should only ever happen in case some HTML contains
/?p=assoc&amp;amp;assoc=58

I have been unable to find any of my HTML looking like the above.

So my question remains: has anyone ever heard of a bug elsewhere  
which might create this?
(I'm not saying it's not my problem. I just can't think of where it  
might be...)


Den 01/05/2007 kl. 7.06 skrev Kevin Futter:

> On 30/4/07 6:31 PM, "Erik Andreas Cayré" <erik at cayre.dk> wrote:
>
>> I've spent some hours looking through my error log for www.dagkort.dk
>> to fix whatever may be left to fix.
>>
>> One recurring error which I don't undestand is this:
>>
>> URL: /?p=assoc&amp;assoc=58
>>
>> To the best og my knowledge noone should be accessing an URL like
>> this, instead accessing:
>>
>> URL: /?p=assoc&assoc=58 (which works fine)
>>
>> I've checked my site (though not completely exhaustively), and I
>> couln't find any links misspelled to result in the above...
>> I see the error generated by several different User-Agents, both
>> browsers (MSIE 5.0 Win98) and crawlers (eg. nicebot)
>>
>> Doeas anyone on the list know of some bug or other plausible
>> explanation for this?
>> I'm guessing certain browsers/crawlers mey erroneously attempt to
>> access an URL like the above, but I'm not certain.
>>
>> Any suggestions?
>
> As Dale has already pointed out, &amp; is the HTML entity  
> representing the
> ampersand character. It's actually a requirement of the spec that all
> ampersands in HTMl, INCLUDING URLs*, be encoded (either by entity or
> character reference). So, the URL causing the error is actually not  
> only
> legitimate, but matching the spec exactly, and shouldn't be causing an
> error. I'd say that the user agents involved are choking on it.  
> However, if
> you're not doing any manual or automatic encoding yourself, the real
> question becomes how did it get there?
>
> * The reason for this is that compliant browsers treat the  
> ampersand as the
> beginning of an entity, and that's its only valid function is HTML.  
> So,
> query string joins using the ampersand risk being interpreted as  
> entities,
> and if the characters that follow the ampersand actually make up a
> recognisable entity, they'll be parsed as such and the URL will  
> fail (I've
> seen it happen!). If you encode the ampersand as an entity, it's  
> parsed
> properly as an ampersand, not the beginning of an entity. Sounds  
> circular I
> know, but that's how it works.
>
>
> -- 
> Kevin Futter
> Webmaster, St. Bernard's College
> http://www.sbc.melb.catholic.edu.au/
>
>
> ###################################################################### 
> ###############
> This e-mail message has been scanned for Viruses and Content and  
> cleared
> by MailMarshal
> ###################################################################### 
> ###############
>
> This e-mail and any attachments may be confidential. You must not  
> disclose or use the information in this e-mail if you are not the  
> intended recipient. If you have received this e-mail in error,  
> please notify us immediately and delete the e-mail and all copies.  
> The College does not guarantee that this e-mail is virus or error  
> free.  The attached files are provided and may only be used on the  
> basis that the user assumes all responsibility for any loss, damage  
> or consequence resulting directly or indirectly from the use of the  
> attached files, whether caused by the negligence of the sender or  
> not. The content and opinions in this e-mail are not necessarily  
> those of the College.
> _______________________________________________
> FX.php_List mailing list
> FX.php_List at mail.iviking.org
> http://www.iviking.org/mailman/listinfo/fx.php_list



---
Erik Andreas Cayré
Spangsbjerg Møllevej 169
6705 Esbjerg Ø

Privat Tel: 75150512
Mobil: 40161183

---
»Interesse kan skabe læring på en skala sammenlignet med frygt, som  
en nuklear eksplosion i forhold til en kineser.«
--Stanley Kubrick

»Kun p....sure mennesker kan ændre verden. Innovation skabes ikke af  
'markedsanalyse', men af folk, der er afsindigt irriterede over  
tingenes tilstand «
--Tom Peters

»Hvis du ikke kan forklare det simpelt, forstår  du det ikke godt nok.«
-- Albert Einstein

»Hvis du ikke har tid til at gøre det rigtigt, hvornår vil du så have  
tid til at lave det om?«
-- John Wooden, basketball coach




More information about the FX.php_List mailing list