openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Kovacs <pe...@apache.org>
Subject Re: Critical issue on forum.openoffice.org and Google Search
Date Tue, 12 May 2020 15:41:09 GMT
Okay, I had a short debug session with Dave and Humbedooh.

We are now sure that the crawlers are not blocked. The 301 Response 
comes from the fact that Yandex still defaults to http and not https.

After I added https toi the URL all worked fine.

Wave did also do a curl request which also worked fine.


We have agreed now that I play the ball back to google, with the 
feedback that this looks like a Google internal issue.

The Robot.txt has not been changed for 11 years. Yandex can crawl the 
URL and we can curl the Webpage. So we think it is an Google Issue.


I very much appreciated the quick session. Thanks.


all the Best

Peter

Am 12.05.20 um 17:24 schrieb Dave Fisher:
> It’s not an IP Ban. Infra tells me that would not be a 301.
>
> Ah-ha - here is the 301:
>
> % curl -D headers http://forum.openoffice.org/
> <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
> <html><head>
> <title>301 Moved Permanently</title>
> </head><body>
> <h1>Moved Permanently</h1>
> <p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
> </body></html>
>
> Surprising that they cannot shift from HTTP to HTTPS via a 301!
>
> Regards,
> Dave
>
>> On May 12, 2020, at 8:04 AM, Dave Fisher <wave@apache.org> wrote:
>>
>> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
>>
>> Please direct the Google engineer to that resource.
>>
>> Regards,
>> Dave
>>
>>> On May 12, 2020, at 7:55 AM, Dave Fisher <wave@apache.org> wrote:
>>>
>>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
>>>
>>> curl -D headers https://forum.openoffice.org/ does return the correct page.
>>>
>>> The robots.txt is this:
>>>
>>> curl -D headers https://forum.openoffice.org/robots.txt
>>> User-agent: *
>>> Crawl-delay: 1
>>> Disallow: /en/forum/common.php
>>> Disallow: /en/forum/config.php
>>> Disallow: /en/forum/con.php
>>> Disallow: /en/forum/faq.php
>>> Disallow: /en/forum/mcp.php
>>> Disallow: /en/forum/memberlist.php
>>> Disallow: /en/forum/posting.php
>>> Disallow: /en/forum/report.php
>>> Disallow: /en/forum/search.php
>>> Disallow: /en/forum/style.php
>>> Disallow: /en/forum/ucp.php
>>> Disallow: /en/forum/viewonline.php
>>> Disallow: /en/forum/adm
>>> Disallow: /en/forum/cache
>>> Disallow: /en/forum/docs
>>> Disallow: /en/forum/files
>>> Disallow: /en/forum/images
>>> Disallow: /en/forum/includes
>>> Disallow: /en/forum/language
>>> Disallow: /en/forum/store
>>> Disallow: /en/forum/styles
>>> Disallow: /es/forum/common.php
>>> Disallow: /es/forum/config.php
>>> Disallow: /es/forum/con.php
>>> Disallow: /es/forum/faq.php
>>> Disallow: /es/forum/mcp.php
>>> Disallow: /es/forum/memberlist.php
>>> Disallow: /es/forum/posting.php
>>> Disallow: /es/forum/report.php
>>> Disallow: /es/forum/search.php
>>> Disallow: /es/forum/style.php
>>> Disallow: /es/forum/ucp.php
>>> Disallow: /es/forum/viewonline.php
>>> Disallow: /es/forum/adm
>>> Disallow: /es/forum/cache
>>> Disallow: /es/forum/docs
>>> Disallow: /es/forum/files
>>> Disallow: /es/forum/images
>>> Disallow: /es/forum/includes
>>> Disallow: /es/forum/language
>>> Disallow: /es/forum/store
>>> Disallow: /es/forum/styles
>>> Disallow: /fr/forum/common.php
>>> Disallow: /fr/forum/config.php
>>> Disallow: /fr/forum/con.php
>>> Disallow: /fr/forum/faq.php
>>> Disallow: /fr/forum/mcp.php
>>> Disallow: /fr/forum/memberlist.php
>>> Disallow: /fr/forum/posting.php
>>> Disallow: /fr/forum/report.php
>>> Disallow: /fr/forum/search.php
>>> Disallow: /fr/forum/style.php
>>> Disallow: /fr/forum/ucp.php
>>> Disallow: /fr/forum/viewonline.php
>>> Disallow: /fr/forum/adm
>>> Disallow: /fr/forum/cache
>>> Disallow: /fr/forum/docs
>>> Disallow: /fr/forum/files
>>> Disallow: /fr/forum/images
>>> Disallow: /fr/forum/includes
>>> Disallow: /fr/forum/language
>>> Disallow: /fr/forum/store
>>> Disallow: /fr/forum/styles
>>> Disallow: /fr/ci-joint
>>> Disallow: /hu/forum/common.php
>>> Disallow: /hu/forum/config.php
>>> Disallow: /hu/forum/con.php
>>> Disallow: /hu/forum/faq.php
>>> Disallow: /hu/forum/mcp.php
>>> Disallow: /hu/forum/memberlist.php
>>> Disallow: /hu/forum/posting.php
>>> Disallow: /hu/forum/report.php
>>> Disallow: /hu/forum/search.php
>>> Disallow: /hu/forum/style.php
>>> Disallow: /hu/forum/ucp.php
>>> Disallow: /hu/forum/viewonline.php
>>> Disallow: /hu/forum/adm
>>> Disallow: /hu/forum/cache
>>> Disallow: /hu/forum/docs
>>> Disallow: /hu/forum/files
>>> Disallow: /hu/forum/images
>>> Disallow: /hu/forum/includes
>>> Disallow: /hu/forum/language
>>> Disallow: /hu/forum/store
>>> Disallow: /hu/forum/styles
>>> Disallow: /ja/forum/common.php
>>> Disallow: /ja/forum/config.php
>>> Disallow: /ja/forum/con.php
>>> Disallow: /ja/forum/faq.php
>>> Disallow: /ja/forum/mcp.php
>>> Disallow: /ja/forum/memberlist.php
>>> Disallow: /ja/forum/posting.php
>>> Disallow: /ja/forum/report.php
>>> Disallow: /ja/forum/search.php
>>> Disallow: /ja/forum/style.php
>>> Disallow: /ja/forum/ucp.php
>>> Disallow: /ja/forum/viewonline.php
>>> Disallow: /ja/forum/adm
>>> Disallow: /ja/forum/cache
>>> Disallow: /ja/forum/docs
>>> Disallow: /ja/forum/files
>>> Disallow: /ja/forum/images
>>> Disallow: /ja/forum/includes
>>> Disallow: /ja/forum/language
>>> Disallow: /ja/forum/store
>>> Disallow: /ja/forum/styles
>>> Disallow: /test
>>> Disallow: /nl/forum/common.php
>>> Disallow: /nl/forum/config.php
>>> Disallow: /nl/forum/con.php
>>> Disallow: /nl/forum/faq.php
>>> Disallow: /nl/forum/mcp.php
>>> Disallow: /nl/forum/memberlist.php
>>> Disallow: /nl/forum/posting.php
>>> Disallow: /nl/forum/report.php
>>> Disallow: /nl/forum/search.php
>>> Disallow: /nl/forum/style.php
>>> Disallow: /nl/forum/ucp.php
>>> Disallow: /nl/forum/viewonline.php
>>> Disallow: /nl/forum/adm
>>> Disallow: /nl/forum/cache
>>> Disallow: /nl/forum/docs
>>> Disallow: /nl/forum/files
>>> Disallow: /nl/forum/images
>>> Disallow: /nl/forum/includes
>>> Disallow: /nl/forum/language
>>> Disallow: /nl/forum/store
>>> Disallow: /nl/forum/styles
>>> Disallow: /vi/forum/common.php
>>> Disallow: /vi/forum/config.php
>>> Disallow: /vi/forum/con.php
>>> Disallow: /vi/forum/faq.php
>>> Disallow: /vi/forum/mcp.php
>>> Disallow: /vi/forum/memberlist.php
>>> Disallow: /vi/forum/posting.php
>>> Disallow: /vi/forum/report.php
>>> Disallow: /vi/forum/search.php
>>> Disallow: /vi/forum/style.php
>>> Disallow: /vi/forum/ucp.php
>>> Disallow: /vi/forum/viewonline.php
>>> Disallow: /vi/forum/adm
>>> Disallow: /vi/forum/cache
>>> Disallow: /vi/forum/docs
>>> Disallow: /vi/forum/files
>>> Disallow: /vi/forum/images
>>> Disallow: /vi/forum/includes
>>> Disallow: /vi/forum/language
>>> Disallow: /vi/forum/store
>>> Disallow: /vi/forum/styles
>>> Disallow: /zh/forum/common.php
>>> Disallow: /zh/forum/config.php
>>> Disallow: /zh/forum/con.php
>>> Disallow: /zh/forum/faq.php
>>> Disallow: /zh/forum/mcp.php
>>> Disallow: /zh/forum/memberlist.php
>>> Disallow: /zh/forum/posting.php
>>> Disallow: /zh/forum/report.php
>>> Disallow: /zh/forum/search.php
>>> Disallow: /zh/forum/style.php
>>> Disallow: /zh/forum/ucp.php
>>> Disallow: /zh/forum/viewonline.php
>>> Disallow: /zh/forum/adm
>>> Disallow: /zh/forum/cache
>>> Disallow: /zh/forum/docs
>>> Disallow: /zh/forum/files
>>> Disallow: /zh/forum/images
>>> Disallow: /zh/forum/includes
>>> Disallow: /zh/forum/language
>>> Disallow: /zh/forum/store
>>> Disallow: /zh/forum/styles
>>>
>>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14
GMT
>>>
>>> Forum search uses phpBB
>>>
>>> We haven’t allowed search engines to crawl forum.openoffice.org since before
the Oracle donation to the ASF.
>>>
>>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive.
That could give the 301.
>>>
>>> Regards,
>>> Dave
>>>
>>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <legine@posteo.de> wrote:
>>>>
>>>> Hello all,
>>>>
>>>>
>>>> What I figured is that from the Google search tool the URL forum.openoffice.org
is not reachable.
>>>>
>>>> So I checked with Duckduckgo (my prefered Search engine), they don't use
crawler and point at the infra of Google, Bing and Yandex.
>>>>
>>>> I checked then with Bing, but could not figure out to check bots feedback
on an URL so I moved on
>>>>
>>>> I checked with Yandex. They have a search URL test page. I have entered there
forum.openoffice.org
>>>>
>>>> The Response is:
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>>> * Server: Apache/2.4.18 (Ubuntu)
>>>> * Location: https://forum.openoffice.org/
>>>> * Content-Length: 237
>>>> * Keep-Alive: timeout=15, max=100
>>>> * Connection: Keep-Alive
>>>> * Content-Type: text/html; charset=iso-8859-1
>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> HTTP status code 	301 Moved Permanently
>>>> Server response time 	133 ms
>>>> IP address 	54.84.201.130
>>>> Encoding 	UTF-8(unicode-1-1-utf-8, UTF8)
>>>> Page size 	237 B
>>>>
>>>>
>>>> I am not sure, what that means. HTTP Status Code moved Permanently reads
wrong. I just dont know if this is the return code from our webservcer or a response code
from the crawler.
>>>> I try to get someone from Infra. Or I'll open a ticket.
>>>>
>>>>
>>>> All the best
>>>> Peter
>>>>
>>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>>> Hi Kay,
>>>>>
>>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>>> Hi Kay,
>>>>>>>
>>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>>> Hi Peter...
>>>>>>>>
>>>>>>>> Since I am a Google Search admin for www.openoffice.org,
and
>>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have
not done
>>>>>>>> ANY work with the Google Search apis on these sites in quite
some time.
>>>>>>>>
>>>>>>>> I actually was NOT aware forum.openoffice.org was set up
to use Google
>>>>>>>> Search until I saw this.
>>>>>>> I think, I added it to the list when we had a discussion about
outdated
>>>>>>> information regarding SourceForge found by Google Search.
>>>>>>>
>>>>>>> But I don't have access to forum.openoffice.org, so I could never
>>>>>>> complete the step.
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>>    Matthias
>>>>>> OK. In the top level of the website source, there is a file called
>>>>>> "skeleton.html" which references the following bit of code --
>>>>>>
>>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>>>
>>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>>>> forgot) but this this is example for the google-analytics code snippet
>>>>>> that is used. Basically, this needs to be included in the site you
>>>>>> want analytics to be used on by putting it in the (header) files
that
>>>>>> generate the site. And, you might  take a look at recent instructions
>>>>>> from Google. Things change.
>>>>>>
>>>>>> https://support.google.com/analytics/answer/1008080
>>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>>>>> forum...
>>>>> The procedure for the Google Search Console is the same, it needs access
>>>>> to the root directory.
>>>>>
>>>>> Maybe Andrea can help if he is available again?
>>>>>
>>>>> Regards,
>>>>>
>>>>>   Matthias
>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Kay
>>>>>>
>>>>>>>> One of the Google Search admins for forum.openoffice.org
could check
>>>>>>>> the current Google search apis that are in use on that site.
Changes
>>>>>>>> are occasionally made to the calls, and maybe that is the
issue, or a
>>>>>>>> robots.txt for that site is causing this. I don't think it
requires a
>>>>>>>> response, but maybe some investigation.
>>>>>>>>
>>>>>>>> Just some ideas...
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Kay
>>>>>>>>
>>>>>>>>
>>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have received following mail. Probably because I am
listed in the
>>>>>>>>> google-Analytics page.
>>>>>>>>>
>>>>>>>>> Does this has some action items? What can we answer Mr
John Mueller?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> All the Best
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and
Google Search
>>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>>> Von:     John Mueller <johnmu@google.com>
>>>>>>>>> An:     morseidel@gmail.com, kay.schenk@gmail.com, leginee@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>>>>
>>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to
bring your
>>>>>>>>> attention to a critical issue with your website, and
how it's
>>>>>>>>> available for Google's web search.
>>>>>>>>>
>>>>>>>>> In particular, Googlebot has been unable to crawl URLs
from
>>>>>>>>> https://forum.openoffice.org/ . This will cause those
pages to drop
>>>>>>>>> out of Google's search results, and will prevent new
pages from being
>>>>>>>>> picked up for Search. If you're not aware of this issue,
you may be
>>>>>>>>> accidentally blocking these pages from Google Search
due to a server
>>>>>>>>> issue. If you need to block Googlebot from crawling pages
on your
>>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>>>
>>>>>>>>> Should you need to recognize IP addresses of Googlebot
requests, you
>>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>>>
>>>>>>>>> Should you have any questions, feel free to contact me
directly. For
>>>>>>>>> verification purposes, we are sending a copy of this
message to your
>>>>>>>>> site's Search Console account.
>>>>>>>>>
>>>>>>>>> Thank you,
>>>>>>>>> John Mueller (johnmu@google.com <mailto:johnmu@google.com>)
>>>>>>>>> Webmaster Trends Analyst
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>>>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message