Hier der Code
SOLUTION TO SPIDERING PHPBB??
Been doing a bit of searching and tested and found:
- A list of Bots to spider your site effectively.
- A way of stopping Session ID's
- Creating pages which can be easily spidered.
Ok, lets do this step by step (recommend backing up all files):
CHECK IF YOUR SITE HAS SESSIONS
Go to
https://www.tools.summitmedia.co.uk/spider/ to check how your site is spidered. You should see the Session Id's there and perhaps unlinkable pages?
STOPPING SID'S (Guests won't be able to post unless registered, but can't find a better way of stopping SIDS)
#
#-----[ OPEN ]------------------------------------------
#
includes/sessions.php
#
#-----[ FIND ]------------------------------------------
#
$SID = 'sid=' . $session_id;
#
#-----[ REPLACE WITH ]------------------------------------------
#
if ( $userdata['session_user_id'] != ANONYMOUS ){
$SID = 'sid=' . $session_id;
} else {
$SID = '';
}
#---[EOM]----
GETTING THE SITE SPIDERED
See the post above above and add these spiders to the list in sessions.php:
//
// robots array all in lower case (feel free to add more robots)
//
$seRobots = array(
'almaden.ibm.com',
'appie 1.1',
'architext',
'ask jeeves',
'asterias2.0',
'augurfind',
'baiduspider',
'bannana_bot',
'bdcindexer',
'crawler',
'crawler@fast',
'docomo',
'fast-webcrawler',
'fluffy the spider',
'frooglebot',
'geobot',
'googlebot',
'gulliver',
'henrythemiragorobot',
'ia_archiver',
'infoseek',
'kit_fireball',
'lachesis',
'lycos_spider',
'mantraagent',
'mercator',
'moget/1.0',
'muscatferret',
'nationaldirectory-webspider',
'naverrobot',
'ncsa beta',
'netresearchserver',
'ng/1.0',
'osis-project',
'polybot',
'pompos',
'scooter',
'seventwentyfour',
'sidewinder',
'sleek spider',
'slurp/si',
'
slurp@inktomi.com',
'steeler/1.3',
'szukacz',
't-h-you-n-d-e-r-s-t-o-n-e',
'teoma',
'turnitinbot',
'ultraseek',
'vagabondo',
'voilabot',
'w3c_validator',
'zao/0',
'zyborg/1.0',
CREATE A ROBOTS.TXT FILE
This file stops spiders from accessing certain areas of your forum. Create a simple robots.txt file and save in the root. My Forums were held in "forums/", you should change this to your directory name. The robots.txt file should contain:
User-agent: *
Disallow: forums/admin/
Disallow: forums/attach_mod/
Disallow: forums/db/
Disallow: forums/files/
Disallow: forums/images/
Disallow: forums/includes/
Disallow: forums/language/
Disallow: forums/templates/
Disallow: forums/common.php
Disallow: forums/config.php
Disallow: forums/glance_config.php
Disallow: forums/groupcp.php
Disallow: forums/memberlist.php
Disallow: forums/modcp.php
Disallow: forums/posting.php
Disallow: forums/printview.php
Disallow: forums/privmsg.php
Disallow: forums/profile.php
Disallow: forums/ranks.php
Disallow: forums/search.php
Disallow: forums/statistics.php
Disallow: forums/tellafriend.php
Disallow: forums/viewonline.php
Disallow: /your-forum-folder/sutra*.html$
Disallow: /your-forum-folder/ptopic*.html$
Disallow: /your-forum-folder/ntopic*.html$
Disallow: /your-forum-folder/ftopic*asc*.html$
MODIFY PAGE_HEADER.PHP
This sits in includes/page_header.php.
Before:
//
// Generate logged in/logged out status
//
Add:
ob_start();
function replace_for_mod_rewrite(&$s) {
// get the correct base_url: protocoll,url,path to make sure to rewrite only internal links
if (empty($HTTP_SERVER_VARS['HTTP_HOST'])) {
$server = getenv('HTTP_HOST');
} else {
$server = $HTTP_SERVER_VARS['HTTP_HOST'];
}
// IIS sets HTTPS=off
if (isset($HTTP_SERVER_VARS['HTTPS']) && $HTTP_SERVER_VARS['HTTPS'] !=
'off') {
$proto = 'https://';
} else {
$proto = 'http://';
}
// Get the name of this URI
// Start of with REQUEST_URI
if (isset($HTTP_SERVER_VARS['REQUEST_URI'])) {
$path = $HTTP_SERVER_VARS['REQUEST_URI'];
} else {
$path = getenv('REQUEST_URI');
}
if ((empty($path)) || (substr($path, -1, 1) == '/')) {
// REQUEST_URI was empty or pointed to a path
// Try looking at PATH_INFO
$path = getenv('PATH_INFO');
if (empty($path)) {
// No luck there either
// Try SCRIPT_NAME
if (isset($HTTP_SERVER_VARS['SCRIPT_NAME'])) {
$path = $HTTP_SERVER_VARS['SCRIPT_NAME'];
} else {
$path = getenv('SCRIPT_NAME');
}
}
}
$path = preg_replace('/[#\?].*/', '', $path);
$path = dirname($path);
if (preg_match('!^[/\\\]*$!', $path)) {
$path = '';
}
$base_url = "$proto$server$path/";
$prefix = '|"(?:'.$base_url.')?';
// now that we know about the correct $prefix we can start the rewriting
$urlin =
array(
$prefix . '(?<!/)index.php"|',
$prefix . '(?<!/)viewforum.php\?f=([0-9]*)&(?:amp;)topicdays=([0-9]*)&(?:amp;)start=([
0-9]*)"|',
$prefix . '(?<!/)viewforum.php\?f=([0-9]*)"|',
$prefix . '(?<!/)viewtopic.php\?t=([0-9]*)&(?:amp;)view=previous"|',
$prefix . '(?<!/)viewtopic.php\?t=([0-9]*)&(?:amp;)view=next"|',
$prefix . '(?<!/)viewtopic.php\?t=([0-9]*)&(?:amp;)postdays=([0-9]*)&(?:amp;)postorder
=([a-zA-Z]*)&(?:amp;)start=([0-9]*)"|',
$prefix . '(?<!/)viewtopic.php\?t=([0-9]*)&(?:amp;)start=([0-9]*)&(?:amp;)postdays=([0
-9]*)&(?:amp;)postorder=([a-zA-Z]*)&(?:amp;)highlight=([a-zA-Z0-9]*)"|',
$prefix . '(?<!/)viewtopic.php\?t=([0-9]*)&(?:amp;)start=([0-9]*)"|',
$prefix . '(?<!/)viewtopic.php\?t=([0-9]*)"|',
);
$urlout = array(
'"forums.html"',
'"viewforum\\1-\\2-\\3.html"',
'"forum\\1.html"',
'"ptopic\\1.html"',
'"ntopic\\1.html"',
'"ftopic\\1-\\2-\\3-\\4.html"',
'"ftopic\\1.html"',
'"ftopic\\1-\\2.html"',
'"ftopic\\1.html"',
);
$s = preg_replace($urlin, $urlout, $s);
return $s;
}
MODIFY PAGE_TAIL.PHP
This sits in includes/page_tail.php.
After:
$db->sql_close();
Add:
$contents = ob_get_contents();
ob_end_clean();
echo replace_for_mod_rewrite($contents);
global $dbg_starttime;
In the same file, after:
ob_end_clean();
Add:
echo replace_for_mod_rewrite($contents);
global $dbg_starttime;
CREATE A HTACCESS FILE (You may need your host to help you with this)
This file goes in your forum directory. I.e. for mine it goes in forums/. Create a file called .htaccess and paste the following inside only:
RewriteEngine On
RewriteRule ^forums.*$ index.php
RewriteRule ^forum([0-9]*).*$ viewforum.php?f=$1&mark=topic
RewriteRule ^viewforum([0-9]*)-([0-9]*)-([0-9]*).*$
viewforum.php?f=$1&topicdays=$2&start=$3
RewriteRule ^forum([0-9]*).*$ viewforum.php?f=$1
RewriteRule ^ptopic([0-9]*).*$ viewtopic.php?t=$1&view=previous
RewriteRule ^ntopic([0-9]*).*$ viewtopic.php?t=$1&view=next
RewriteRule ^ftopic([0-9]*)-([0-9]*)-([a-zA-Z]*)-([0-9]*).*$
viewtopic.php?t=$1&postdays=$2&postorder=$3&start=$4
RewriteRule ^ftopic([0-9]*)-([0-9]*).*$ viewtopic.php?t=$1&start=$2
RewriteRule ^ftopic([0-9]*).*$ viewtopic.php?t=$1
RewriteRule ^ftopic([0-9]*).html$ viewtopic.php?t=$1&start=$2&postdays=$3&postorder=$4&highlight=$5
RewriteRule ^sutra([0-9]*).*$ viewtopic.php?p=$1
FINISH ?
Then go back to
https://www.tools.summitmedia.co.uk/spider/ and see how it's spidered. You should see that that the SIDS are gone, html pages are created for each forum (so they can be easily indexed) and your topics should be linked up.
Does it work for you??
_________________
----------------------------------------
Misohoni - We Love you Long Time
https://www.misohoni.com/forums/