qxyhuiyuan 发表于 2020-6-25 14:20:16

【DiscuzX屏蔽*蜘蛛爬虫】以屏蔽蜘蛛Bytespider为例

大量站长*发现了Bytespider这个新型*蜘蛛爬虫,大量的对*进行抓取,其行为不压于持续的CC攻击,模拟移动手机用户大量抓取信息. GET信息的方式与爬虫类似, 但并未标注任何爬虫信息(站长们已经知道,这是某国内公司的,做新闻头条的). 请求速度经常导致服务器崩溃.
所以,我们就要采取防护措施,如何屏蔽搜索引擎机器人。


以DiscuzX 3.4 R20191201建站程序的来屏蔽*蜘蛛为例子。


首先感谢@dgqjj 提供的思路:robots.txt+程序识别UA屏蔽(php代码屏蔽、nginx代码屏蔽、.htaccess代码屏蔽等等,任选其一)+屏蔽蜘蛛IP段
提供给小白们来解决这个*蜘蛛问题,老白就不用看了,写的不一定好。


打开文件夹logs,查看里面的日志文件,发现了大量的*蜘蛛Bytespider访问爬取,造成大量的流量消耗。
http://www.discuz.net/data/attachment/forum/202001/22/162823f3z93x0ieiy60s0z.jpg.thumb.jpg
http://www.discuz.net/data/attachment/forum/202001/22/162826rf038umc0ixmuurx.jpg.thumb.jpg
这时候我们就要对他采取必要的封禁行动,四步禁止。
第一步:robots.txt规则封禁
文本中复制粘贴以下代码:
http://www.discuz.net/data/attachment/forum/202001/22/163228ky5y5kmyyax2z977.jpg.thumb.jpg
#
# robots.txt for Discuz! X3
#


User-agent: Googlebot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Googlebot-Image
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Baiduspider-news
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Baiduspider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Baiduspider-image
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: baiduboxapp
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sosospider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: bingbot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: 360Spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: HaosouSpider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: yisouspider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: YoudaoBot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sogou Orion spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sogou News Spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sogou blog
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sogou Spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sogou spider2
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sogou inst spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Sogou web spider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: EasouSpider
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: MSNBot
Disallow: /api/
Disallow: /data/
Disallow: /source/
Disallow: /install/
Disallow: /template/
Disallow: /config/
Disallow: /uc_client/
Disallow: /uc_server/
Disallow: /static/
Disallow: /admin.php
Disallow: /search.php
Disallow: /member.php
Disallow: /api.php
Disallow: /misc.php
Disallow: /connect.php
Disallow: /forum.php?mod=redirect*
Disallow: /forum.php?mod=post*
Disallow: /home.php?mod=spacecp*
Disallow: /userapp.php?mod=app&*
Disallow: /*?mod=misc*
Disallow: /*?mod=attachment*
Disallow: /*mobile=yes*


User-agent: Bytespider
Disallow: /


User-Agent: *
Disallow: /
第二步:程序识别UA屏蔽(.htaccess代码屏蔽)跳转与阻拦
当然*蜘蛛是不会遵守规则的,继续,文本中复制粘贴以下代码:
http://www.discuz.net/data/attachment/forum/202001/22/164953gj40jhd69hzzsyfm.jpg.thumb.jpg
# 将 RewriteEngine 模式打开
RewriteEngine On
# 修改以下语句中的 /discuz 为您的论坛目录地址,如果程序放在根目录中,请将 /discuz 修改为 /
RewriteBase /
# Rewrite 系统规则请勿修改
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^topic-(.+)\.html$ portal.php?mod=topic&topic=$1&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^article-(+)-(+)\.html$ portal.php?mod=view&aid=$1&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^forum-(\w+)-(+)\.html$ forum.php?mod=forumdisplay&fid=$1&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^thread-(+)-(+)-(+)\.html$ forum.php?mod=viewthread&tid=$1&extra=page\%3D$3&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^group-(+)-(+)\.html$ forum.php?mod=group&fid=$1&page=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^space-(username|uid)-(.+)\.html$ home.php?mod=space&$1=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^blog-(+)-(+)\.html$ home.php?mod=space&uid=$1&do=blog&id=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(fid|tid)-(+)\.html$ archiver/index.php?action=$1&value=$2&%1
RewriteCond %{QUERY_STRING} ^(.*)$
RewriteRule ^(+*)-(+)\.html$ plugin.php?id=$1:(去掉括号内容,文章变表情了)$2&%1
<IfModule mod_rewrite.c>
## 设置同时屏蔽多个*搜索引擎爬虫
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "^.*Bytespider.*|.*stagefright.*|.*ExtLinksBot.*|.*DuckDuckGo.*$"
RewriteRule ^(.*)$ https://weibo.com/u/7430037395?topnav=1&wvr=6&topsug=1&is_all=1
# RewriteRule ^(.*)$ -
</IfModule>

第三步:程序识别UA屏蔽(php代码屏蔽)过滤【这个可不选,优先服务器配置代码】
在入口处加这段php代码,然后引用。感谢@gaoyibobo 提供的代码
<?php
// 屏蔽过滤恶意爬虫蜘蛛机器人
$ua = $_SERVER['HTTP_USER_AGENT'];

$now_ua = array('Bytespider');

if(!$ua) {
      header("Content-type: text/html; charset=utf-8");
      die('请勿采集本站,如有疑问请联系客服!');
}else{
      foreach($now_ua as $value)
      if(strpos($ua,$value) !== false) {
                header("Content-type: text/html; charset=utf-8");
                die('请勿采集本站,如有疑问请联系客服!');
      }
}

?>

在目录/source/class/里建个robots.txt,保存后改成robots.php,之后引用到class_core.php
http://www.discuz.net/data/attachment/forum/202001/22/165944mmlvv9qokkmiizkk.jpg.thumb.jpg
首行加引用php代码:include('robots.php');
如图:
http://www.discuz.net/data/attachment/forum/202001/22/170235w4mwswtqgbeowsoe.jpg.thumb.jpg
第四步:屏蔽蜘蛛IP段(注意是*蜘蛛)禁止访问这个优先用服务器禁止爬虫IP段,其次选论坛
比如:
RewriteEngine On
RewriteCond %{http:X-Forwarded-For}&%{REMOTE_ADDR}&%{http:X-Real-IP} (6.6.6.6)
RewriteRule (.*) -

以下是论坛禁止IP方法:

已知的*蜘蛛Bytespider的IP段为以下(如果有新的,会逐渐更新):
220.243.136.*
220.243.135.*
110.249.202.*
110.249.201.*
111.225.148.*
111.225.149.*
60.8.123.*
http://www.discuz.net/data/attachment/forum/202001/22/170700jwyqoyacqnizw0no.jpg.thumb.jpg
http://www.discuz.net/data/attachment/forum/202001/22/170702n4i6cji4w8cjnnwc.jpg.thumb.jpg
这两个地方禁止*蜘蛛的IP段。


最后,总结一下:
第一步:robots.txt规则屏蔽
第二步:程序识别UA屏蔽(.htaccess代码屏蔽)跳转与阻拦
第三步:程序识别UA屏蔽(php代码屏蔽)过滤
第四步:屏蔽蜘蛛IP段(注意是*蜘蛛)禁止访问
如果有什么新方法、好方法,欢迎大家留言,给予遇到的人帮助,先就这样吧,祝大家大吉大利、万事顺心。


页: [1]
查看完整版本: 【DiscuzX屏蔽*蜘蛛爬虫】以屏蔽蜘蛛Bytespider为例