Home>

1, a recommended method:php to determine the search engine spider crawler or artificial access code,Excerpt from discuz x3.2

<?php
function checkrobot ($useragent="") {
static $kw_spiders=array ("bot", "crawl", "spider", "slurp", "sohu-search", "lycos", "robozilla");
static $kw_browsers=array ("msie", "netscape", "opera", "konqueror", "mozilla");
$useragent=strtolower (empty ($useragent)?$_server ["http_user_agent"]:$useragent);
if (strpos ($useragent, "http://") === false&&dstrpos ($useragent, $kw_browsers)) return false;
if (dstrpos ($useragent, $kw_spiders)) return true;
return false;
}
function dstrpos ($string, $arr, $returnvalue=false) {
if (empty ($string)) return false;
foreach ((array) $arr as $v) {
if (strpos ($string, $v)! == false) {
$return=$returnvalue?$v:true;
return $return;
}
}
return false;
}
if (checkrobot ()) {
echo "robot crawler";
} else {
echo "person";
}
?>

This can be judged in practical applications,It ’s not a search engine.

<?php
if (! checkrobot ()) {
//do something
}
?>

2. The second method:

Using PHP to implement spider access log statistics

$useragent=addslashes (strtolower ($_ server ["http_user_agent"]));
 if (strpos ($useragent, "googlebot")! == false) {$bot="google";}
 elseif (strpos ($useragent, "mediapartners-google")! == false) {$bot="google adsense";}
 elseif (strpos ($useragent, "googlespider")! == false) {$bot="google";}
 elseif (strpos ($useragent, "sogou spider")! == false) {$bot="sogou";}
 elseif (strpos ($useragent, "sogou web")! == false) {$bot="sogou web";}
 elseif (strpos ($useragent, "sosospider")! == false) {$bot="soso";}
 elseif (strpos ($useragent, "360spider")! == false) {$bot="360spider";}
 elseif (strpos ($useragent, "yahoo")! == false) {$bot="yahoo";}
 elseif (strpos ($useragent, "msn")! == false) {$bot="msn";}
 elseif (strpos ($useragent, "msnbot")! == false) {$bot="msnbot";}
 elseif (strpos ($useragent, "sohu")! == false) {$bot="sohu";}
 elseif (strpos ($useragent, "yodaobot")! == false) {$bot="yodao";}
 elseif (strpos ($useragent, "twiceler")! == false) {$bot="twiceler";}
 elseif (strpos ($useragent, "ia_archiver")! == false) {$bot="alexa_";}
 elseif (strpos ($useragent, "iaarchiver")! == false) {$bot="alexa";}
 elseif (strpos ($useragent, "slurp")! == false) {$bot="Yahoo";}
 elseif (strpos ($useragent, "bot")! == false) {$bot="Other Spider";}
 if (isset ($bot)) {
   [email protected] ("bot.txt", "a");
   fwrite ($fp, date ("ymd h:i:s"). "\ t". $_ server ["remote_addr"]. "\ t". $bot. "\ t". "http://". $_server ["server_name"]. $_ server ["request_uri"]. "\ r \ n");
   fclose ($fp);
 }

The third method:

We can use http_user_agent to determine whether it is a spider,Search engine spiders have their own unique logo,Some are listed below.

function is_crawler () {
  $useragent=strtolower ($_ server ["http_user_agent"]);
  $spiders=array (
    "googlebot", //google crawler
    "googlespider", //Baidu crawler
    "yahoo! slurp", //Yahoo crawler
    "yodaobot", //Youdao crawler
    "msnbot" //bing crawler
    //more crawler keywords
  );
  foreach ($spiders as $spider) {
    $spider=strtolower ($spider);
    if (strpos ($useragent, $spider)! == false) {
      return true;
    }
  }
  return false;
}

The following php code comes with more spider logos

function iscrawler () {
    echo $agent=strtolower ($_ server ["http_user_agent"]);
    if (! empty ($agent)) {
        $spidersite=array (
            "tencenttraveler",            "googlespider +",            "googlegame",            "googlebot",            "msnbot",            "sosospider +",            "sogou web spider",            "ia_archiver",            "yahoo! slurp",            "youdaobot",            "yahoo slurp",            "msnbot",            "java (often spam bot)",            "googlespider",            "voila",            "yandex bot",            "bspider",            "twiceler",            "sogou spider",            "speedy spider",            "google adsense",            "heritrix",            "python-urllib",            "alexa (ia archiver)",            "ask",            "exabot",            "custo",            "outfoxbot/yodaobot",            "yacy",            "surveybot",            "legs",            "lwp-trivial",            "nutch",            "stackrambler",            "the web archive (ia archiver)",            "perl tool",            "mj12bot",            "netcraft",            "msiecrawler",            "wget ​​tools",            "larbin",            "fish search",        );
        foreach ($spidersite as $val) {
            $str=strtolower ($val);
            if (strpos ($agent, $str)! == false) {
                return true;
            }
        }
    } else {
        return false;
    }
}
if (iscrawler ()) {
    echo "Hello spider!";
}
else {
   echo "You are not a spider!";
}

The fourth method:

<?php
$flag=false;
$tmp=$_server ["http_user_agent"];
if (strpos ($tmp, "googlebot")! == false) {
  $flag=true;
} else if (strpos ($tmp, "googlespider")>0) {
  $flag=true;
} else if (strpos ($tmp, "yahoo! slurp")! == false) {
  $flag=true;
} else if (strpos ($tmp, "msnbot")! == false) {
  $flag=true;
} else if (strpos ($tmp, "sosospider")! == false) {
  $flag=true;
} else if (strpos ($tmp, "yodaobot")! == false || strpos ($tmp, "outfoxbot")! == false) {
  $flag=true;
} else if (strpos ($tmp, "sogou web spider")! == false || strpos ($tmp, "sogou orion spider")! == false) {
  $flag=true;
} else if (strpos ($tmp, "fast-webcrawler")! == false) {
  $flag=true;
} else if (strpos ($tmp, "gaisbot")! == false) {
  $flag=true;
} else if (strpos ($tmp, "ia_archiver")! == false) {
  $flag=true;
} else if (strpos ($tmp, "altavista")! == false) {
  $flag=true;
} else if (strpos ($tmp, "lycos_spider")! == false) {
  $flag=true;
} else if (strpos ($tmp, "inktomi slurp")! == false) {
  $flag=true;
}
if ($flag == false) {
  //$_server ["request_uri"] is the path behind the domain name
  exit ();
}
?>
php
  • Previous Summary of ASPNET and Ajax Implementation
  • Next Java connection to six types of database skills Raiders