Google抓取工具找到robots.txt，但无法下载

谁能告诉我这个robots.txt有什么问题？ http://bizup.cloudapp.net/robots.txt 以下是我在Google网站站长工具中收到的错误：

Sitemap errors and warnings
Line    Status  Details
Errors  -   
Network unreachable: robots.txt unreachable
We were unable to crawl your Sitemap because we found a robots.txt file at the root of
your site but were unable to download it. Please ensure that it is accessible or remove
it completely.

实际上，上面的链接是一个行动机器人的路线的映射。该操作从存储中获取文件并以text / plain方式返回内容。谷歌表示他们无法下载该文件。是因为那个吗？

已邀请:

4 个回复

藐刚

看起来它正在阅读robots.txt好了，但是你的robots.txt声称http://bizup.cloudapp.net/robots.txt也是你的XML站点地图的URL，当它真的是http：//bizup.cloudapp .NET / sitemap.xml的。该错误似乎来自Google尝试将robots.txt解析为XML站点地图。您需要将robots.txt更改为

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/sitemap.xml

编辑它实际上比这更深入，Googlebot无法在您的网站上下载任何页面。以下是Googlebot请求robots.txt或主页时返回的例外情况：此应用程序不支持无Cookie表单身份验证。异常详细信息：System.Web.HttpException：无Cookie表单身份验证此应用程序不支持。

[HttpException (0x80004005): Cookieless Forms Authentication is not supported for this application.]
AzureBright.MvcApplication.FormsAuthentication_OnAuthenticate(Object sender, FormsAuthenticationEventArgs args) in C:ProjectosAzureBrightWebRoleGlobal.asax.cs:129
System.Web.Security.FormsAuthenticationModule.OnAuthenticate(FormsAuthenticationEventArgs e) +11336832
System.Web.Security.FormsAuthenticationModule.OnEnter(Object source, EventArgs eventArgs) +88
System.Web.SyncEventExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +80
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +266

FormsAuthentication正在尝试使用cookieless模式，因为它识别出Googlebot不支持cookie，但是FormsAuthentication_OnAuthenticate方法中的某些内容随后会抛出异常，因为它不想接受无cookie验证。我认为最简单的方法是在web.config中更改以下内容，这会阻止FormsAuthentication尝试使用cookieless模式...

<authentication mode="Forms"> 
    <forms cookieless="UseCookies" ...>
    ...

寇剩

我以一种简单的方式解决了这个问题：只需添加一个robot.txt文件（与我的index.html文件在同一目录中），以允许所有访问。我把它遗漏了，打算允许所有访问 - 但也许谷歌网站管理员工具然后找到我的ISP控制的另一个robot.txt？所以对于某些ISP来说，似乎至少你应该有一个robot.txt文件，即使你不想排除任何机器人，只是为了防止这种可能的故障。

暑袜眠退

生成robots.txt文件的脚本有问题。当GoogleBot访问该文件时，它将获得500 Internal Server Error。以下是标题检查的结果：请求：http：//bizup.cloudapp.net/robots.txt GET /robots.txt HTTP / 1.1 连接：保持活力保持活力：300 接受：*/* 主持人：bizup.cloudapp.net 接受语言：en-us Accept-Encoding：gzip，deflate 用户代理：Mozilla / 5.0（兼容; Googlebot / 2.1; + http：//www.google.com/bot.html）服务器响应：500内部服务器错误缓存控制：私有内容类型：text / html;字符集= utf-8的服务器：Microsoft-IIS / 7.0 X-AspNet-Version：4.0.30319 X-Powered-By：ASP.NET 日期：2010年8月19日星期四16:52:09 GMT 内容长度：4228 最终目的地页面您可以在http://www.seoconsultants.com/tools/headers/#Report上测试标题

校勒魏寡

我没有问题得到你的robots.txt

User-agent: *
Allow: /
Sitemap: http://bizup.cloudapp.net/robots.txt

但是它不是在执行递归robots.txt调用吗？ Sitemap应该是一个xml文件，请参阅Wikipedia

要回复问题请先登录或注册

Google抓取工具找到robots.txt，但无法下载

4 个回复

发起人

search_engine_bots

问题状态

Google抓取工具找到robots.txt，但无法下载

与内容相关的链接

4 个回复

发起人

search_engine_bots

问题状态