Page 1 of 2

index.php?id=8,0,0,1,0,0 to 8.0.0.1.0.0.html HOWTO

Posted: Wed 19. Nov 2003, 23:12
by rudeboy
hi all

i am new with phpwcms. donwloaded installed, expirienced some errors but found it cool. after thinking about searchengine compatibility i read some threads about and implemented a solution for it, thought probably someone would also be interested:

step 1:

open index.php in your root directory

add the following lines at the begin (directly after gnu licence) of your index.php

Code: Select all

ob_start();
// set the error reporting level for this script
header ("Expires: Mon, 01 Jan 1990 01:00:00 GMT");
header ("Content-type: text/html");
header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header ("Cache-Control: no-cache, must-revalidate");
header ("Pragma: no-cache");
$_GET['id'] = str_replace(".",",",$_GET['id']);
go to the bottom of index.php and enter the following lines after your </body>
</html>

Code: Select all

<?php
$wcms = ob_get_contents();
while(@ob_end_clean());
include_once("./include/inc_rewrite/rewrite_url.php");
echo $wcms;
?>
step 2:

go to /include/ and create a new directory called 'inc_rewrite'

step 3:

create a new file called 'rewrite_url.php' and insert the following code:

Code: Select all

<?php
// this makes url from index.php?id=8,0,0,1,0,0 to 8,0,0,1,0,0.html

function url_search($query)	
{
	$noid = substr($query, 4);
	$file = str_replace(",", ".", $noid).".html"; //further use
	//$file = $noid.".html";
	$link = "<a href=\"".$file."\"";
	return($link);
	//unset($link);
}

function js_url_search($query)	
{
	$noid = substr($query, 4);
	$file = str_replace(",", ".", $noid).".html"; //further use
	//$file = $noid.".html";
	$link = "onClick=\"location.href='".$file."'";
	return($link);
	//unset($link);
}

//  this regex's call the function
$allowed_chars_in_url = "[".implode("]|[",array("@",",","\.","+","&","-","_","=","*","#","\/","%","?"))."]";
$wcms = preg_replace("/(<a href=\"index.php?)(([a-z]|[A-Z]|[0-9]|".$allowed_chars_in_url.")*)(\")/e","url_search('\\2')",$wcms);
$wcms = preg_replace("/(onClick=\"location.href='index.php?)(([a-z]|[A-Z]|[0-9]|".$allowed_chars_in_url.")*)(\')/e","js_url_search('\\2')",$wcms);

?>
don't forget to save your file!

step 4:

see what happened and reload your output page, you will now see the links in new format. if you click you will get an error message (404)

step 5:

go to your wcms root directory and create a new file called '.htaccess'. insert this code to the newly created file:

Code: Select all

RewriteEngine on 
RewriteRule ^(.*)\.html$ ../index.php?id=$1 
RewriteRule ^index.html$ ../index.php
check if ../ is needed on your host! and again don't forget to save!

step 6:

see your pageoutput and browse through, your links should now be cleany set and if you click a page should be delivered (without warranty) it works for me ;)

greetings marc

Posted: Wed 19. Nov 2003, 23:56
by Florian
Hey rudeboy,

thanks for your great and very "clean" (ob_start / ob_and_clean) code :).

But I don't think, that the robots will have with an Link containing an .html or not at the end.
I build my own CMS as an interim solution until our really new page is done and Google don't have any problems with the URL (e.g. http://www.hyparchiv.com/index.php?site ... ede7f89ce7) Our URL is encrypted by an Hash (thats the reason why it's so long). So I think, the problem are the commas in the URL. But these commas are used by the navigation (correct me, if I'm wrong) to detect the pages and the level where they are stored in the navigation. So, maybe we should think about how we can seperate the commas by crypting them while serving the pages to the client (so the server uses the URL as it is for now the user/client an crypted one like given before).
But before we are trying to rule the searchengines we sholud take a look in the references of them ;) Has anybody a factsheet or so where the operating mode of the spyders is declared?

Cheers,
Florian

Posted: Thu 20. Nov 2003, 00:08
by rudeboy
hi flo

i have this crypted style on a private site:

http://www.example.com/index/base64_enc ... erystring)

well this does work well for there and goolge is spidering my page like a fool. because in every crypted querystring is a seperate sid, the same page is spidered multiple, looks for goolge different...

well about the coma. i thought about solution like 0.0.0.0.html, then the url would be very internet conform? don't you think so. give me some minutes i will look at, probably it's only a small extendhack ;) readyou

Posted: Thu 20. Nov 2003, 00:16
by rudeboy
hi again

thought about and changed something in step 1 and step 3 in the functions i edited my post above so you can see the changes! you now have urls like for example 0.0.0.0.html. ok?

Posted: Thu 20. Nov 2003, 01:42
by sporto
Great clean fix. Nice job.

No reason to think this wouldn't work well for search engines.

Posted: Thu 20. Nov 2003, 08:40
by Oliver Georgi
@rudeboy

Thank you. I will test this on weekend and implement this into phpwcms. But I think the ob_ things are not neccessary. The complete content is in a var when delivered to the index.php - and only there is rewriting needed.

But I will check.

Oliver

Posted: Thu 20. Nov 2003, 09:28
by rudeboy
hai

for sure you are right, the ob part is not really needed ;), but tought one never know what else outside this var is liked / needed to be replaced. happy trying :).

probably you should leave the header informations after ob start, because of damn proxy / chaching reasons.

Posted: Thu 20. Nov 2003, 09:38
by Oliver Georgi
yeah I know.

Posted: Thu 20. Nov 2003, 09:51
by Florian
Hi there,

the ob part are very important.
If some people have (in firm or private) a paranoid admin, who turned on every security switch he could enable (like safeMode PHP && MySQL) you will have maybe a lot of trouble with the extraction of the strings (so I had at "my" 'System /headers allredy been send' a.s.o.).
So I would advise to keep the ob parts to serve the highest level of system indipendence. What we don't need are the '@'. Since PHP4.x.x extratcion will be handled automaticly by the pharser.

Cheers,
Florian.

Posted: Thu 20. Nov 2003, 16:26
by photoads
Just been on PCworld website and they have a similar url string :
http://www.pcworld.com/home/index/0,00.asp


spidered by google 16,800 times
http://www.google.co.uk/search?sourceid ... 2C00%2Easp

Can't wait to get going with this one!

Posted: Thu 20. Nov 2003, 16:33
by Oliver Georgi
I have uploaded a new patch package with adding above code.

The URL replace in phpwcms does work - but my mod_rewrite on the project site does not. Sorry, I have no time to search a solution at the moment.

Please test it.

Oliver

Posted: Fri 21. Nov 2003, 03:07
by rudeboy
hi there

i downloaded the patch and "installed" it. still works great on my demosite :D . if you have time you should lookup your apache errorlog, then we could find out what's not ok on your server with the rewrite .htaccess. probably the ../ is not needed or something else (more) is needed on your host. other comments if it works are welcome :)

Apache's mod_rewrite

Posted: Sat 22. Nov 2003, 12:29
by sunflare
I had some problems with rewrite rules too, maybe this little checklist will help:
1)
Make sure that URL-rewriting is enabled.
Go to your httpd.conf and look for

Code: Select all

LoadModule rewrite_module modules/mod_rewrite.so
AddModule mod_rewrite.c
2)
Maybe that url Rewriting in .htaccess doesn't work for some reasons. I defined my rules in the virtual server:

Code: Select all

<VirtualHost 127.0.0.7>
    ...
    RewriteLog "e:/rewrite.log"
    RewriteLogLevel 2
    RewriteEngine On
    RewriteOptions MaxRedirects=10
    RewriteRule .........
    RewriteRule .........
</VirtualHost>
You can log your requests on mod_rewrite with RewriteLog and RewriteLogLevel (not necessarily)
If your Apache version is higher than 1.3.27 you can use RewriteOptions MaxRedirect to prevent endless loops! Be careful with your rules! They can make trouble with Apache's Alias - Directive...
And dont forget to restart Apache...
More Information about RewriteRules: http://httpd.apache.org/docs/mod/mod_rewrite.html

Posted: Sat 22. Nov 2003, 15:28
by Oliver Georgi
I have checked it on my developement server [Windows XP Professional, Apache 2.0.48] and it works, but only with these RewriteRules:

Code: Select all

#RewriteLog "d:/rewrite.log" 
#RewriteLogLevel 2
RewriteEngine On
RewriteOptions MaxRedirects=10
RewriteRule ^\/(.*?).html$ /index.php?id=$1 
RewriteRule ^\/index.html$ /index.php
Oliver

Posted: Sat 22. Nov 2003, 16:41
by sunflare
Yes, of course, these commas at my code should be replaced with a valid rewrite rule, I'm sometimes a bit too lazy to type *oops*

I'm currently working on an advanced url-rewriting for your cms. Instead of just rewriting the parameters, I'd like to use 'speaking urls' like /main/sub/subtext_1/ for e.g.
It's a bit more difficult because you need some kind of lookup table: which id refers to which speaking url... like this

1,0,0,1,0,0 => /my/speaking/url/

Furthermore, all the links, images etc. URLs are not set absolutely, that means not starting with an "/".
Also all actions with INSERT INTO or UPDATE statements that changes IDs must be written/updated into that lookup file / or lookup table (lets see whats better...). I just yet got no clue where all these may appear...

But I will post that in a special new thread.