index.php?id=8,0,0,1,0,0 to 8.0.0.1.0.0.html HOWTO

Get help with installation and running phpwcms here. Please do not post bug reports or feature requests here.

was this helpful?

yes!
9
90%
no!
1
10%
 
Total votes: 10

rudeboy
Posts: 16
Joined: Wed 19. Nov 2003, 21:19

index.php?id=8,0,0,1,0,0 to 8.0.0.1.0.0.html HOWTO

Post by rudeboy »

hi all

i am new with phpwcms. donwloaded installed, expirienced some errors but found it cool. after thinking about searchengine compatibility i read some threads about and implemented a solution for it, thought probably someone would also be interested:

step 1:

open index.php in your root directory

add the following lines at the begin (directly after gnu licence) of your index.php

Code: Select all

ob_start();
// set the error reporting level for this script
header ("Expires: Mon, 01 Jan 1990 01:00:00 GMT");
header ("Content-type: text/html");
header ("Last-Modified: " . gmdate("D, d M Y H:i:s") . " GMT");
header ("Cache-Control: no-cache, must-revalidate");
header ("Pragma: no-cache");
$_GET['id'] = str_replace(".",",",$_GET['id']);
go to the bottom of index.php and enter the following lines after your </body>
</html>

Code: Select all

<?php
$wcms = ob_get_contents();
while(@ob_end_clean());
include_once("./include/inc_rewrite/rewrite_url.php");
echo $wcms;
?>
step 2:

go to /include/ and create a new directory called 'inc_rewrite'

step 3:

create a new file called 'rewrite_url.php' and insert the following code:

Code: Select all

<?php
// this makes url from index.php?id=8,0,0,1,0,0 to 8,0,0,1,0,0.html

function url_search($query)	
{
	$noid = substr($query, 4);
	$file = str_replace(",", ".", $noid).".html"; //further use
	//$file = $noid.".html";
	$link = "<a href=\"".$file."\"";
	return($link);
	//unset($link);
}

function js_url_search($query)	
{
	$noid = substr($query, 4);
	$file = str_replace(",", ".", $noid).".html"; //further use
	//$file = $noid.".html";
	$link = "onClick=\"location.href='".$file."'";
	return($link);
	//unset($link);
}

//  this regex's call the function
$allowed_chars_in_url = "[".implode("]|[",array("@",",","\.","+","&","-","_","=","*","#","\/","%","?"))."]";
$wcms = preg_replace("/(<a href=\"index.php?)(([a-z]|[A-Z]|[0-9]|".$allowed_chars_in_url.")*)(\")/e","url_search('\\2')",$wcms);
$wcms = preg_replace("/(onClick=\"location.href='index.php?)(([a-z]|[A-Z]|[0-9]|".$allowed_chars_in_url.")*)(\')/e","js_url_search('\\2')",$wcms);

?>
don't forget to save your file!

step 4:

see what happened and reload your output page, you will now see the links in new format. if you click you will get an error message (404)

step 5:

go to your wcms root directory and create a new file called '.htaccess'. insert this code to the newly created file:

Code: Select all

RewriteEngine on 
RewriteRule ^(.*)\.html$ ../index.php?id=$1 
RewriteRule ^index.html$ ../index.php
check if ../ is needed on your host! and again don't forget to save!

step 6:

see your pageoutput and browse through, your links should now be cleany set and if you click a page should be delivered (without warranty) it works for me ;)

greetings marc
Last edited by rudeboy on Thu 20. Nov 2003, 00:18, edited 2 times in total.
Florian
Posts: 119
Joined: Wed 19. Nov 2003, 16:50
Location: Hamburg
Contact:

Post by Florian »

Hey rudeboy,

thanks for your great and very "clean" (ob_start / ob_and_clean) code :).

But I don't think, that the robots will have with an Link containing an .html or not at the end.
I build my own CMS as an interim solution until our really new page is done and Google don't have any problems with the URL (e.g. http://www.hyparchiv.com/index.php?site ... ede7f89ce7) Our URL is encrypted by an Hash (thats the reason why it's so long). So I think, the problem are the commas in the URL. But these commas are used by the navigation (correct me, if I'm wrong) to detect the pages and the level where they are stored in the navigation. So, maybe we should think about how we can seperate the commas by crypting them while serving the pages to the client (so the server uses the URL as it is for now the user/client an crypted one like given before).
But before we are trying to rule the searchengines we sholud take a look in the references of them ;) Has anybody a factsheet or so where the operating mode of the spyders is declared?

Cheers,
Florian
rudeboy
Posts: 16
Joined: Wed 19. Nov 2003, 21:19

Post by rudeboy »

hi flo

i have this crypted style on a private site:

http://www.example.com/index/base64_enc ... erystring)

well this does work well for there and goolge is spidering my page like a fool. because in every crypted querystring is a seperate sid, the same page is spidered multiple, looks for goolge different...

well about the coma. i thought about solution like 0.0.0.0.html, then the url would be very internet conform? don't you think so. give me some minutes i will look at, probably it's only a small extendhack ;) readyou
greetings marc
rudeboy
Posts: 16
Joined: Wed 19. Nov 2003, 21:19

Post by rudeboy »

hi again

thought about and changed something in step 1 and step 3 in the functions i edited my post above so you can see the changes! you now have urls like for example 0.0.0.0.html. ok?
greetings marc
sporto
Posts: 160
Joined: Mon 10. Nov 2003, 18:01
Location: USA, Chicago

Post by sporto »

Great clean fix. Nice job.

No reason to think this wouldn't work well for search engines.
User avatar
Oliver Georgi
Site Admin
Posts: 9892
Joined: Fri 3. Oct 2003, 22:22
Contact:

Post by Oliver Georgi »

@rudeboy

Thank you. I will test this on weekend and implement this into phpwcms. But I think the ob_ things are not neccessary. The complete content is in a var when delivered to the index.php - and only there is rewriting needed.

But I will check.

Oliver
Oliver Georgi | phpwcms Developer | GitHub | LinkedIn | Систрон
rudeboy
Posts: 16
Joined: Wed 19. Nov 2003, 21:19

Post by rudeboy »

hai

for sure you are right, the ob part is not really needed ;), but tought one never know what else outside this var is liked / needed to be replaced. happy trying :).

probably you should leave the header informations after ob start, because of damn proxy / chaching reasons.
greetings marc
User avatar
Oliver Georgi
Site Admin
Posts: 9892
Joined: Fri 3. Oct 2003, 22:22
Contact:

Post by Oliver Georgi »

yeah I know.
Oliver Georgi | phpwcms Developer | GitHub | LinkedIn | Систрон
Florian
Posts: 119
Joined: Wed 19. Nov 2003, 16:50
Location: Hamburg
Contact:

Post by Florian »

Hi there,

the ob part are very important.
If some people have (in firm or private) a paranoid admin, who turned on every security switch he could enable (like safeMode PHP && MySQL) you will have maybe a lot of trouble with the extraction of the strings (so I had at "my" 'System /headers allredy been send' a.s.o.).
So I would advise to keep the ob parts to serve the highest level of system indipendence. What we don't need are the '@'. Since PHP4.x.x extratcion will be handled automaticly by the pharser.

Cheers,
Florian.
photoads
Posts: 4
Joined: Wed 19. Nov 2003, 16:19

Post by photoads »

Just been on PCworld website and they have a similar url string :
http://www.pcworld.com/home/index/0,00.asp


spidered by google 16,800 times
http://www.google.co.uk/search?sourceid ... 2C00%2Easp

Can't wait to get going with this one!
User avatar
Oliver Georgi
Site Admin
Posts: 9892
Joined: Fri 3. Oct 2003, 22:22
Contact:

Post by Oliver Georgi »

I have uploaded a new patch package with adding above code.

The URL replace in phpwcms does work - but my mod_rewrite on the project site does not. Sorry, I have no time to search a solution at the moment.

Please test it.

Oliver
Oliver Georgi | phpwcms Developer | GitHub | LinkedIn | Систрон
rudeboy
Posts: 16
Joined: Wed 19. Nov 2003, 21:19

Post by rudeboy »

hi there

i downloaded the patch and "installed" it. still works great on my demosite :D . if you have time you should lookup your apache errorlog, then we could find out what's not ok on your server with the rewrite .htaccess. probably the ../ is not needed or something else (more) is needed on your host. other comments if it works are welcome :)
greetings marc
sunflare
Posts: 6
Joined: Fri 21. Nov 2003, 15:35

Apache's mod_rewrite

Post by sunflare »

I had some problems with rewrite rules too, maybe this little checklist will help:
1)
Make sure that URL-rewriting is enabled.
Go to your httpd.conf and look for

Code: Select all

LoadModule rewrite_module modules/mod_rewrite.so
AddModule mod_rewrite.c
2)
Maybe that url Rewriting in .htaccess doesn't work for some reasons. I defined my rules in the virtual server:

Code: Select all

<VirtualHost 127.0.0.7>
    ...
    RewriteLog "e:/rewrite.log"
    RewriteLogLevel 2
    RewriteEngine On
    RewriteOptions MaxRedirects=10
    RewriteRule .........
    RewriteRule .........
</VirtualHost>
You can log your requests on mod_rewrite with RewriteLog and RewriteLogLevel (not necessarily)
If your Apache version is higher than 1.3.27 you can use RewriteOptions MaxRedirect to prevent endless loops! Be careful with your rules! They can make trouble with Apache's Alias - Directive...
And dont forget to restart Apache...
More Information about RewriteRules: http://httpd.apache.org/docs/mod/mod_rewrite.html
User avatar
Oliver Georgi
Site Admin
Posts: 9892
Joined: Fri 3. Oct 2003, 22:22
Contact:

Post by Oliver Georgi »

I have checked it on my developement server [Windows XP Professional, Apache 2.0.48] and it works, but only with these RewriteRules:

Code: Select all

#RewriteLog "d:/rewrite.log" 
#RewriteLogLevel 2
RewriteEngine On
RewriteOptions MaxRedirects=10
RewriteRule ^\/(.*?).html$ /index.php?id=$1 
RewriteRule ^\/index.html$ /index.php
Oliver
Oliver Georgi | phpwcms Developer | GitHub | LinkedIn | Систрон
sunflare
Posts: 6
Joined: Fri 21. Nov 2003, 15:35

Post by sunflare »

Yes, of course, these commas at my code should be replaced with a valid rewrite rule, I'm sometimes a bit too lazy to type *oops*

I'm currently working on an advanced url-rewriting for your cms. Instead of just rewriting the parameters, I'd like to use 'speaking urls' like /main/sub/subtext_1/ for e.g.
It's a bit more difficult because you need some kind of lookup table: which id refers to which speaking url... like this

1,0,0,1,0,0 => /my/speaking/url/

Furthermore, all the links, images etc. URLs are not set absolutely, that means not starting with an "/".
Also all actions with INSERT INTO or UPDATE statements that changes IDs must be written/updated into that lookup file / or lookup table (lets see whats better...). I just yet got no clue where all these may appear...

But I will post that in a special new thread.
Post Reply