@Chris2005Chris2005 hat geschrieben:@Steffi: ahh ok. Dann scheint Google die Sitemaps zu klassifizieren.
Was meinst Du damit? Meinst Du die Einteilung in Url-Sitemaps und Sitemap-Sitemaps

Herzlich willkommen im Archiv vom ABAKUS Online Marketing Forum
Du befindest Dich im Archiv vom ABAKUS Online Marketing Forum. Hier kannst Du Dich für das Forum mit den aktuellen Beiträgen registrieren.
https://enarion.net/google/Steffi hat geschrieben:Womit hast du denn die XML generiert??? Habe immer noch kein gutes Tool gefunden...
Planned features
* Add crawler and crawl the site from the web - so that dynamic links can be found - soon (Top1)
* Create a minimal version of phpSitemapNG that can be run as a cron job - soon(Top2)
* add support for xml sitemap index files - a must for sites with more than 50.000 pages or a huge sitemap file with a size greater then 10 MB- later(Top3)
* add gz handling to compress xml file - later(Top4)
* add xsd check agains google xsd file - maybe
Code: Alles auswählen
<?php
$allow_dir = array();
$disallow_dir = array();
$disallow_file = array();
ini_set("memory_limit", "15M");
/*
This is phpSitemap, a php script that creates a google sitemap
It can be downloaded under http://enarion.net/google/
License: LGPL
Requirements:
- create a file called sitemap.xml and make it writable for this script - all information will be stored in this file
Installation:
- see requirements
- nothing else at the moment
@author Tobias Kluge, enarion.net
@contributor Aditya Naik, so1o@so1o.net
@version 1.0, 2005-06-06 11:00 CET
@status working
TODO add a better handling of allowed/disallowed files & directories
TODO create more than one file if there are more than 1000 entries (@the moment there can only be 1000 entries in an xml file)
TODO add gz handling to compress xml file
CHANGELOG
0.5 integrated Aditya Naik's changes
0.5 intial gui added
*/
if ($_REQUEST[submit] == "") {
/* display start page */
$document_root = $_SERVER[DOCUMENT_ROOT];
$http_host = 'http://' . $_SERVER[HTTP_HOST];
$script = $_SERVER[SCRIPT_NAME];
$sitemap_url = dirname($_SERVER[SCRIPT_NAME]) . "/sitemap.xml";
/* list of allowed directories */
$allow_dir[] = "google";
/* list of disallowed directories */
$disallow_dir[] = "admin";
/* list of disallowed file types */
$disallow_file[] = ".inc";
$disallow_file[] = ".old";
$disallow_file[] = ".save";
$disallow_file[] = ".txt";
$disallow_file[] = ".js";
$disallow_file[] = "~";
$disallow_file[] = ".LCK";
$disallow_file[] = ".zip";
$disallow_file[] = ".ZIP";
$disallow_file[] = ".CSV";
$disallow_file[] = ".csv";
$disallow_file[] = ".css";
$disallow_file[] = ".class";
$disallow_file[] = ".jar";
$str_allow_dir = arrToString($allow_dir);
$str_disallow_dir = arrToString($disallow_dir);
$str_disallow_file = arrToString($disallow_file);
$priority = 0.5;
$msg = '
<form action="'.$script.'" method="post">
<fieldset style="padding: 10; width:500; border-color:#000099; border-width:2px; border-style:solid; ">
<legend style="color:#000099;"><b>Adapt this to your site</b></legend>
<table border="0" cellpadding="5" cellspacing="0" width="495">
<tr class="text">
<td width="250" valign="top"><label for="idocument_root" accesskey="D">Document root</label><br />
<font size="-1">path on server</font></td>
<td width="240">
<input class="required" type="Text" name="document_root" id="idocument_root" align="LEFT" size="50" value="'. $document_root .'"/>
</td>
</tr>
<tr class="text">
<td width="250" valign="top"><label for="ihttp_host" accesskey="H">HTTP host</label><br />
<font size="-1">the url of your website</font></td>
<td width="240">
<input class="required" type="Text" name="http_host" id="ihttp_host" align="LEFT" size="50" value="'.$http_host.'"/>
</td>
</tr>
<tr class="text">
<td width="250" valign="top"><label for="iallow_dir" accesskey="A">Allowed directories</label><br />
<font size="-1">this directories will be searched for files and added to site index; use line break to separate entries</font></td>
<td width="240">
<textarea name="allow_dir" cols="40" rows="10" id="iallow_dir">'.$str_allow_dir.'</textarea>
</td>
</tr>
<tr class="text">
<td width="250" valign="top"><label for="idisallow_dir" accesskey="D">Disallowed directories</label><br />
<font size="-1">this directories will NOT be searched for files and will not be added to site index; use line break to separate entries</font></td>
<td width="240">
<textarea name="disallow_dir" cols="40" rows="10" id="idisallow_dir">'.$str_disallow_dir.'</textarea>
</td>
</tr>
<tr class="text">
<td width="250" valign="top"><label for="idisallow_file" accesskey="F">Disallowed file types</label><br />
<font size="-1">files containing this will not be added to site index; use line break to separate entries</font></td>
<td width="240">
<textarea name="disallow_file" cols="40" rows="10" id="idisallow_file">'.$str_disallow_file.'</textarea>
</td>
</tr>
<tr class="text">
<td width="250" valign="top"><label for="isitemap_file" accesskey="S">Sitemap url</label><br />
<font size="-1">where to store sitemap file - relative to your document root; this must exist, be writetable and accessible for the google bot!</font></td>
<td width="240">
<input type="Text" name="sitemap_url" id="isitemap_url" align="LEFT" size="50" value="'.$sitemap_url.'"/>
</td>
</tr>
<tr class="text">
<td width="250" valign="top"><label for="ipriority" accesskey="P">Priority</label><br />
<font size="-1">from 0.0 to 1.0, e.g. 0.5</font></td>
<td width="240">
<input type="Text" name="priority" id="ipriority" align="LEFT" size="50" value="'.$priority.'"/>
</td>
</tr>
<tr>
<td> </td>
<td><input type="Submit" value="Start" name="submit"></td>
</tr>
</table>
</fieldset>
</form>';
msg($msg);
} elseif ($_REQUEST[submit] == "Start") {
// handle xml file creation
// get values from gui of script
$website = $_REQUEST[http_host];
$page_root = $_REQUEST[document_root];
$sitemap_file = $page_root . $_REQUEST[sitemap_url];
$sitemap_url = $website . $_REQUEST[sitemap_url];
$try = file_exists($sitemap_file) && is_writable($site);
$filehandle = @fopen($sitemap_file, 'a');
$msg = "";
if (!file_exists($sitemap_file) && $filehandle === FALSE) {
$msg = "File $sitemap_file does not exist and cannot be written; create file and set permission with chmod to 0666";
} elseif (!is_writable($sitemap_file) && $filehandle === FALSE) {
$msg = "File $sitemap_file does exist but cannot be written; change permission with chmod to 0666";
} elseif ($filehandle === FALSE) {
$msg = "Error while opening $sitemap_file for write access. Check existence and permission of file!";
}
if ($msg != "") {
msg ($msg);
return;
}
if ($_REQUEST[priority] != "") {
$priority = $_REQUEST[priority];
} else {
$priority = 0.8;
}
if ($_REQUEST[allow_dir] != "") $allow_dir = toArray($_REQUEST[allow_dir]);
if ($_REQUEST[disallow_dir] != "") $disallow_dir = toArray($_REQUEST[disallow_dir]);
if ($_REQUEST[disallow_file] != "") $disallow_file = toArray($_REQUEST[disallow_file]);
$a = getFiles($page_root);
// only when sending to stdout :
// header('Content-type: application/xml; charset="utf-8"',true);
$output = "";
$output .= '<?xml version="1.0" encoding="UTF-8"?>';
$output .= '<urlset xmlns="http://www.google.com/schemas/sitemap/0.84">';
foreach ($a as $file) {
$lastmod = filemtime($page_root.$file); // date of last modification
// set lastmod
$last_modification = date("Y-m-d\TH:i:s", $lastmod) . substr(date("O"),0,3) . ":" . substr(date("O"),3);
// set changefreq
$age = time() - $lastmod;
$change_freq = "monthly"; // default value
if ($age < 10) {
$change_freq = "always";
} elseif ($age < 60*60) {
$change_freq = "hourly";
} elseif ($age < 60*60*24) {
$change_freq = "dayly";
} elseif ($age < 60*60*24*7) {
$change_freq = "weekly";
} elseif ($age < 60*60*24*31) { // longest month has 31 days
$change_freq = "monthly";
} elseif ($age < 60*60*24*365) {
$change_freq = "yearly";
} else {
$change_freq = "never";
}
$output .= '
<url>
<loc>'. utf8_encode($website.$file) . '</loc>
<lastmod>'. utf8_encode($last_modification) .'</lastmod>
<changefreq>'. utf8_encode($change_freq).'</changefreq>
<priority>'. utf8_encode($priority) .'</priority>
</url>';
} // foreach
$output .= '</urlset>';
// write output to file
$res_write = fputs ($filehandle, $output);
fclose($filehandle);
if ($res_write === FALSE) {
msg ("Couldn't write result to file, don't know why!");
return;
}
$msg = '<h1>Export successful</h1>'."\n";
$msg .= '<p><form action="' . $script . '" method="post">' ."\n".
'<input type="hidden" name="sitemap_file" value="'.$sitemap_file.'">' . "\n".
'<input type="hidden" name="sitemap_url" value="'.$sitemap_url.'">' . "\n".
'<input type="Submit" value="Submit to google" name="submit">' . "\n".
'</form></p>' . "\n";
$msg .= '<div align="left"><p>Exported ' . count($a) . ' entries to '.$sitemap_url.'<br><font color="#009900">'."\n";
if (count($a) > 1000) $msg .= '<font color="red">Only 1000 entries are allowed in one file at the moment! Not implemented: split result into files with only 1000 entries</font><br>';
foreach ($a as $file) {
$msg .="added $website$file<br>\n";
}
$msg .= '</font></p></div>';
msg($msg);
} elseif ($_REQUEST[submit] == "Submit to google"){
$sitemap_url = $_REQUEST[sitemap_url];
$res = fopen("http://www.google.com/webmasters/sitemaps/ping?sitemap=".urlencode($sitemap_url),"r");
if ($res === FALSE) {
msg ("<h1>Error while submitting to google!</h1>");
return;
}
$str = fread($res, 10000);
fclose($res);
msg ("<h1>Successful sent to google!</h1>That's it, you're listed in google now. :)");
}// if
// misc functions
function toArray($str, $delim = "\n") {
$res = array();
$res = explode($delim, $str);
for($i = 0; $i < count($res); $i++) {
$res[$i] = trim($res[$i]);
}
return $res;
}
/* returns a string of all entries of array with delim */
function arrToString($array, $delim = "\n") {
$res = "";
if (is_array($array)) {
for ($i = 0; $i < count($array); $i++) {
$res .= $array[$i];
if ($i < (count($array)-1)) $res .= $delim;
}
}
return $res;
}
/* simple compare function: equals */
function ar_contains($key, $array) {
if (is_array($array) && count($array) > 0) {
foreach ($array as $val) {
if ($key == $val) {
return true;
}
}
}
return false;
}
/* better compare function: contains */
function fl_contains($key, $array) {
if (is_array($array) && count($array) > 0) {
foreach ($array as $val) {
$pos = strpos($key, $val);
if ($pos === FALSE) continue;
return true;
}
}
return false;
}
/* this function changes a substring($old_offset) of each array element to $offset */
function changeOffset($array, $old_offset, $offset) {
$res = array();
if (is_array($array) && count($array) > 0) {
foreach ($array as $val) {
$res[] = str_replace($old_offset, $offset, $val);
}
}
return $res;
}
/* this walks recursivly through all directories starting at page_root and
adds all files that fits the filter criterias */
// taken from Lasse Dalegaard, http://php.net/opendir
function getFiles($directory, $directory_orig = "", $directory_offset="") {
global $disallow_dir, $disallow_file, $allow_dir;
if ($directory_orig == "") $directory_orig = $directory;
if($dir = opendir($directory)) {
// Create an array for all files found
$tmp = Array();
// Add the files
while($file = readdir($dir)) {
// Make sure the file exists
if($file != "." && $file != ".." && $file[0] != '.' ) {
// If it's a directiry, list all files within it
if(is_dir($directory . "/" . $file)) {
$disallowed_abs = fl_contains($directory."/".$file, $disallow_dir); // handle directories with pathes
$disallowed = ar_contains($file, $disallow_dir); // handle directories only without pathes
$allowed_abs = fl_contains($directory."/".$file, $allow_dir);
$allowed = ar_contains($file, $allow_dir);
if ($disallowed || $disallowed_abs) continue;
if ($allowed_abs || $allowed){
$tmp2 = changeOffset(getFiles($directory . "/" . $file, $directory_orig, $directory_offset), $directory_orig, $directory_offset);
if(is_array($tmp2)) {
$tmp = array_merge($tmp, $tmp2);
}
}
} else { // files
if (fl_contains($file, $disallow_file)) continue;
array_push($tmp, str_replace($directory_orig, $directory_offset, $directory."/".$file));
}
}
}
// Finish off the function
closedir($dir);
return $tmp;
}
}
function msg($msg) {
echo '
<html>
<head>
<title>phpSitemap: create a google sitemap file -- powered by enarion.net</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<style type="text/css">
<!--
.required { background-color:#E0E0E0; }
Label {color:#000099; font-weight: bold; }
h1,h2 {color:#000099; }
body {color:#000000; font-family:helvetica; background-color:#ebb150; }
-->
</style>
</head>
<body>
<h1>phpSitemap: create a google sitemap file</h1>
<div align="center">'.$msg.'
</div>
<div align="center"><p>Copyright by enarion.net. This script is licensed under LGPL and can be downloaded under
<a target="_blank" href="http://enarion.net/google/">enarion.net/google</a></p></div>
</body></html>
';
}
?>