openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <w...@apache.org>
Subject Re: Critical issue on forum.openoffice.org and Google Search
Date Tue, 12 May 2020 15:24:57 GMT
It’s not an IP Ban. Infra tells me that would not be a 301.

Ah-ha - here is the 301:

% curl -D headers http://forum.openoffice.org/ 
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="https://forum.openoffice.org/">here</a>.</p>
</body></html>

Surprising that they cannot shift from HTTP to HTTPS via a 301!

Regards,
Dave 

> On May 12, 2020, at 8:04 AM, Dave Fisher <wave@apache.org> wrote:
> 
> Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html
> 
> Please direct the Google engineer to that resource.
> 
> Regards,
> Dave
> 
>> On May 12, 2020, at 7:55 AM, Dave Fisher <wave@apache.org> wrote:
>> 
>> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
>> 
>> curl -D headers https://forum.openoffice.org/ does return the correct page.
>> 
>> The robots.txt is this:
>> 
>> curl -D headers https://forum.openoffice.org/robots.txt
>> User-agent: *
>> Crawl-delay: 1
>> Disallow: /en/forum/common.php
>> Disallow: /en/forum/config.php
>> Disallow: /en/forum/con.php
>> Disallow: /en/forum/faq.php
>> Disallow: /en/forum/mcp.php
>> Disallow: /en/forum/memberlist.php
>> Disallow: /en/forum/posting.php
>> Disallow: /en/forum/report.php
>> Disallow: /en/forum/search.php
>> Disallow: /en/forum/style.php
>> Disallow: /en/forum/ucp.php
>> Disallow: /en/forum/viewonline.php
>> Disallow: /en/forum/adm
>> Disallow: /en/forum/cache
>> Disallow: /en/forum/docs
>> Disallow: /en/forum/files
>> Disallow: /en/forum/images
>> Disallow: /en/forum/includes
>> Disallow: /en/forum/language
>> Disallow: /en/forum/store
>> Disallow: /en/forum/styles
>> Disallow: /es/forum/common.php
>> Disallow: /es/forum/config.php
>> Disallow: /es/forum/con.php
>> Disallow: /es/forum/faq.php
>> Disallow: /es/forum/mcp.php
>> Disallow: /es/forum/memberlist.php
>> Disallow: /es/forum/posting.php
>> Disallow: /es/forum/report.php
>> Disallow: /es/forum/search.php
>> Disallow: /es/forum/style.php
>> Disallow: /es/forum/ucp.php
>> Disallow: /es/forum/viewonline.php
>> Disallow: /es/forum/adm
>> Disallow: /es/forum/cache
>> Disallow: /es/forum/docs
>> Disallow: /es/forum/files
>> Disallow: /es/forum/images
>> Disallow: /es/forum/includes
>> Disallow: /es/forum/language
>> Disallow: /es/forum/store
>> Disallow: /es/forum/styles
>> Disallow: /fr/forum/common.php
>> Disallow: /fr/forum/config.php
>> Disallow: /fr/forum/con.php
>> Disallow: /fr/forum/faq.php
>> Disallow: /fr/forum/mcp.php
>> Disallow: /fr/forum/memberlist.php
>> Disallow: /fr/forum/posting.php
>> Disallow: /fr/forum/report.php
>> Disallow: /fr/forum/search.php
>> Disallow: /fr/forum/style.php
>> Disallow: /fr/forum/ucp.php
>> Disallow: /fr/forum/viewonline.php
>> Disallow: /fr/forum/adm
>> Disallow: /fr/forum/cache
>> Disallow: /fr/forum/docs
>> Disallow: /fr/forum/files
>> Disallow: /fr/forum/images
>> Disallow: /fr/forum/includes
>> Disallow: /fr/forum/language
>> Disallow: /fr/forum/store
>> Disallow: /fr/forum/styles
>> Disallow: /fr/ci-joint
>> Disallow: /hu/forum/common.php
>> Disallow: /hu/forum/config.php
>> Disallow: /hu/forum/con.php
>> Disallow: /hu/forum/faq.php
>> Disallow: /hu/forum/mcp.php
>> Disallow: /hu/forum/memberlist.php
>> Disallow: /hu/forum/posting.php
>> Disallow: /hu/forum/report.php
>> Disallow: /hu/forum/search.php
>> Disallow: /hu/forum/style.php
>> Disallow: /hu/forum/ucp.php
>> Disallow: /hu/forum/viewonline.php
>> Disallow: /hu/forum/adm
>> Disallow: /hu/forum/cache
>> Disallow: /hu/forum/docs
>> Disallow: /hu/forum/files
>> Disallow: /hu/forum/images
>> Disallow: /hu/forum/includes
>> Disallow: /hu/forum/language
>> Disallow: /hu/forum/store
>> Disallow: /hu/forum/styles
>> Disallow: /ja/forum/common.php
>> Disallow: /ja/forum/config.php
>> Disallow: /ja/forum/con.php
>> Disallow: /ja/forum/faq.php
>> Disallow: /ja/forum/mcp.php
>> Disallow: /ja/forum/memberlist.php
>> Disallow: /ja/forum/posting.php
>> Disallow: /ja/forum/report.php
>> Disallow: /ja/forum/search.php
>> Disallow: /ja/forum/style.php
>> Disallow: /ja/forum/ucp.php
>> Disallow: /ja/forum/viewonline.php
>> Disallow: /ja/forum/adm
>> Disallow: /ja/forum/cache
>> Disallow: /ja/forum/docs
>> Disallow: /ja/forum/files
>> Disallow: /ja/forum/images
>> Disallow: /ja/forum/includes
>> Disallow: /ja/forum/language
>> Disallow: /ja/forum/store
>> Disallow: /ja/forum/styles
>> Disallow: /test
>> Disallow: /nl/forum/common.php
>> Disallow: /nl/forum/config.php
>> Disallow: /nl/forum/con.php
>> Disallow: /nl/forum/faq.php
>> Disallow: /nl/forum/mcp.php
>> Disallow: /nl/forum/memberlist.php
>> Disallow: /nl/forum/posting.php
>> Disallow: /nl/forum/report.php
>> Disallow: /nl/forum/search.php
>> Disallow: /nl/forum/style.php
>> Disallow: /nl/forum/ucp.php
>> Disallow: /nl/forum/viewonline.php
>> Disallow: /nl/forum/adm
>> Disallow: /nl/forum/cache
>> Disallow: /nl/forum/docs
>> Disallow: /nl/forum/files
>> Disallow: /nl/forum/images
>> Disallow: /nl/forum/includes
>> Disallow: /nl/forum/language
>> Disallow: /nl/forum/store
>> Disallow: /nl/forum/styles
>> Disallow: /vi/forum/common.php
>> Disallow: /vi/forum/config.php
>> Disallow: /vi/forum/con.php
>> Disallow: /vi/forum/faq.php
>> Disallow: /vi/forum/mcp.php
>> Disallow: /vi/forum/memberlist.php
>> Disallow: /vi/forum/posting.php
>> Disallow: /vi/forum/report.php
>> Disallow: /vi/forum/search.php
>> Disallow: /vi/forum/style.php
>> Disallow: /vi/forum/ucp.php
>> Disallow: /vi/forum/viewonline.php
>> Disallow: /vi/forum/adm
>> Disallow: /vi/forum/cache
>> Disallow: /vi/forum/docs
>> Disallow: /vi/forum/files
>> Disallow: /vi/forum/images
>> Disallow: /vi/forum/includes
>> Disallow: /vi/forum/language
>> Disallow: /vi/forum/store
>> Disallow: /vi/forum/styles
>> Disallow: /zh/forum/common.php
>> Disallow: /zh/forum/config.php
>> Disallow: /zh/forum/con.php
>> Disallow: /zh/forum/faq.php
>> Disallow: /zh/forum/mcp.php
>> Disallow: /zh/forum/memberlist.php
>> Disallow: /zh/forum/posting.php
>> Disallow: /zh/forum/report.php
>> Disallow: /zh/forum/search.php
>> Disallow: /zh/forum/style.php
>> Disallow: /zh/forum/ucp.php
>> Disallow: /zh/forum/viewonline.php
>> Disallow: /zh/forum/adm
>> Disallow: /zh/forum/cache
>> Disallow: /zh/forum/docs
>> Disallow: /zh/forum/files
>> Disallow: /zh/forum/images
>> Disallow: /zh/forum/includes
>> Disallow: /zh/forum/language
>> Disallow: /zh/forum/store
>> Disallow: /zh/forum/styles
>> 
>> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14
GMT
>> 
>> Forum search uses phpBB
>> 
>> We haven’t allowed search engines to crawl forum.openoffice.org since before the
Oracle donation to the ASF.
>> 
>> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That
could give the 301.
>> 
>> Regards,
>> Dave
>> 
>>> On May 12, 2020, at 3:55 AM, Peter Kovacs <legine@posteo.de> wrote:
>>> 
>>> Hello all,
>>> 
>>> 
>>> What I figured is that from the Google search tool the URL forum.openoffice.org
is not reachable.
>>> 
>>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler
and point at the infra of Google, Bing and Yandex.
>>> 
>>> I checked then with Bing, but could not figure out to check bots feedback on
an URL so I moved on
>>> 
>>> I checked with Yandex. They have a search URL test page. I have entered there
forum.openoffice.org
>>> 
>>> The Response is:
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> * Date: Tue, 12 May 2020 10:37:47 GMT
>>> * Server: Apache/2.4.18 (Ubuntu)
>>> * Location: https://forum.openoffice.org/
>>> * Content-Length: 237
>>> * Keep-Alive: timeout=15, max=100
>>> * Connection: Keep-Alive
>>> * Content-Type: text/html; charset=iso-8859-1
>>> 
>>> ------------------------------------------------------------------------
>>> 
>>> 
>>> HTTP status code 	301 Moved Permanently
>>> Server response time 	133 ms
>>> IP address 	54.84.201.130
>>> Encoding 	UTF-8(unicode-1-1-utf-8, UTF8)
>>> Page size 	237 B
>>> 
>>> 
>>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong.
I just dont know if this is the return code from our webservcer or a response code from the
crawler.
>>> I try to get someone from Infra. Or I'll open a ticket.
>>> 
>>> 
>>> All the best
>>> Peter
>>> 
>>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>>> Hi Kay,
>>>> 
>>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>>> Hi Kay,
>>>>>> 
>>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>>> Hi Peter...
>>>>>>> 
>>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not
done
>>>>>>> ANY work with the Google Search apis on these sites in quite
some time.
>>>>>>> 
>>>>>>> I actually was NOT aware forum.openoffice.org was set up to use
Google
>>>>>>> Search until I saw this.
>>>>>> I think, I added it to the list when we had a discussion about outdated
>>>>>> information regarding SourceForge found by Google Search.
>>>>>> 
>>>>>> But I don't have access to forum.openoffice.org, so I could never
>>>>>> complete the step.
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>>   Matthias
>>>>> OK. In the top level of the website source, there is a file called
>>>>> "skeleton.html" which references the following bit of code --
>>>>> 
>>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>>> 
>>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>>> forgot) but this this is example for the google-analytics code snippet
>>>>> that is used. Basically, this needs to be included in the site you
>>>>> want analytics to be used on by putting it in the (header) files that
>>>>> generate the site. And, you might  take a look at recent instructions
>>>>> from Google. Things change.
>>>>> 
>>>>> https://support.google.com/analytics/answer/1008080
>>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>>>> forum...
>>>> The procedure for the Google Search Console is the same, it needs access
>>>> to the root directory.
>>>> 
>>>> Maybe Andrea can help if he is available again?
>>>> 
>>>> Regards,
>>>> 
>>>>  Matthias
>>>> 
>>>>> Regards,
>>>>> 
>>>>> Kay
>>>>> 
>>>>>>> One of the Google Search admins for forum.openoffice.org could
check
>>>>>>> the current Google search apis that are in use on that site.
Changes
>>>>>>> are occasionally made to the calls, and maybe that is the issue,
or a
>>>>>>> robots.txt for that site is causing this. I don't think it requires
a
>>>>>>> response, but maybe some investigation.
>>>>>>> 
>>>>>>> Just some ideas...
>>>>>>> 
>>>>>>> Regards,
>>>>>>> 
>>>>>>> Kay
>>>>>>> 
>>>>>>> 
>>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I have received following mail. Probably because I am listed
in the
>>>>>>>> google-Analytics page.
>>>>>>>> 
>>>>>>>> Does this has some action items? What can we answer Mr John
Mueller?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> All the Best
>>>>>>>> 
>>>>>>>> Peter
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google
Search
>>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>>> Von:     John Mueller <johnmu@google.com>
>>>>>>>> An:     morseidel@gmail.com, kay.schenk@gmail.com, leginee@gmail.com
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>>> 
>>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring
your
>>>>>>>> attention to a critical issue with your website, and how
it's
>>>>>>>> available for Google's web search.
>>>>>>>> 
>>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>>> https://forum.openoffice.org/ . This will cause those pages
to drop
>>>>>>>> out of Google's search results, and will prevent new pages
from being
>>>>>>>> picked up for Search. If you're not aware of this issue,
you may be
>>>>>>>> accidentally blocking these pages from Google Search due
to a server
>>>>>>>> issue. If you need to block Googlebot from crawling pages
on your
>>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>>> 
>>>>>>>> Should you need to recognize IP addresses of Googlebot requests,
you
>>>>>>>> can use a reverse IP lookup to do so:
>>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>>> 
>>>>>>>> Should you have any questions, feel free to contact me directly.
For
>>>>>>>> verification purposes, we are sending a copy of this message
to your
>>>>>>>> site's Search Console account.
>>>>>>>> 
>>>>>>>> Thank you,
>>>>>>>> John Mueller (johnmu@google.com <mailto:johnmu@google.com>)
>>>>>>>> Webmaster Trends Analyst
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>>>>> 
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>> For additional commands, e-mail: dev-help@openoffice.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message