openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Fisher <w...@apache.org>
Subject Re: Critical issue on forum.openoffice.org and Google Search
Date Tue, 12 May 2020 15:04:15 GMT
Information about Infra IP Bans is here: https://infra.apache.org/infra-ban.html

Please direct the Google engineer to that resource.

Regards,
Dave

> On May 12, 2020, at 7:55 AM, Dave Fisher <wave@apache.org> wrote:
> 
> Are you sure you weren’t using forums.openoffice.org instead of forum.openoffice.org?
> 
> curl -D headers https://forum.openoffice.org/ does return the correct page.
> 
> The robots.txt is this:
> 
> curl -D headers https://forum.openoffice.org/robots.txt
> User-agent: *
> Crawl-delay: 1
> Disallow: /en/forum/common.php
> Disallow: /en/forum/config.php
> Disallow: /en/forum/con.php
> Disallow: /en/forum/faq.php
> Disallow: /en/forum/mcp.php
> Disallow: /en/forum/memberlist.php
> Disallow: /en/forum/posting.php
> Disallow: /en/forum/report.php
> Disallow: /en/forum/search.php
> Disallow: /en/forum/style.php
> Disallow: /en/forum/ucp.php
> Disallow: /en/forum/viewonline.php
> Disallow: /en/forum/adm
> Disallow: /en/forum/cache
> Disallow: /en/forum/docs
> Disallow: /en/forum/files
> Disallow: /en/forum/images
> Disallow: /en/forum/includes
> Disallow: /en/forum/language
> Disallow: /en/forum/store
> Disallow: /en/forum/styles
> Disallow: /es/forum/common.php
> Disallow: /es/forum/config.php
> Disallow: /es/forum/con.php
> Disallow: /es/forum/faq.php
> Disallow: /es/forum/mcp.php
> Disallow: /es/forum/memberlist.php
> Disallow: /es/forum/posting.php
> Disallow: /es/forum/report.php
> Disallow: /es/forum/search.php
> Disallow: /es/forum/style.php
> Disallow: /es/forum/ucp.php
> Disallow: /es/forum/viewonline.php
> Disallow: /es/forum/adm
> Disallow: /es/forum/cache
> Disallow: /es/forum/docs
> Disallow: /es/forum/files
> Disallow: /es/forum/images
> Disallow: /es/forum/includes
> Disallow: /es/forum/language
> Disallow: /es/forum/store
> Disallow: /es/forum/styles
> Disallow: /fr/forum/common.php
> Disallow: /fr/forum/config.php
> Disallow: /fr/forum/con.php
> Disallow: /fr/forum/faq.php
> Disallow: /fr/forum/mcp.php
> Disallow: /fr/forum/memberlist.php
> Disallow: /fr/forum/posting.php
> Disallow: /fr/forum/report.php
> Disallow: /fr/forum/search.php
> Disallow: /fr/forum/style.php
> Disallow: /fr/forum/ucp.php
> Disallow: /fr/forum/viewonline.php
> Disallow: /fr/forum/adm
> Disallow: /fr/forum/cache
> Disallow: /fr/forum/docs
> Disallow: /fr/forum/files
> Disallow: /fr/forum/images
> Disallow: /fr/forum/includes
> Disallow: /fr/forum/language
> Disallow: /fr/forum/store
> Disallow: /fr/forum/styles
> Disallow: /fr/ci-joint
> Disallow: /hu/forum/common.php
> Disallow: /hu/forum/config.php
> Disallow: /hu/forum/con.php
> Disallow: /hu/forum/faq.php
> Disallow: /hu/forum/mcp.php
> Disallow: /hu/forum/memberlist.php
> Disallow: /hu/forum/posting.php
> Disallow: /hu/forum/report.php
> Disallow: /hu/forum/search.php
> Disallow: /hu/forum/style.php
> Disallow: /hu/forum/ucp.php
> Disallow: /hu/forum/viewonline.php
> Disallow: /hu/forum/adm
> Disallow: /hu/forum/cache
> Disallow: /hu/forum/docs
> Disallow: /hu/forum/files
> Disallow: /hu/forum/images
> Disallow: /hu/forum/includes
> Disallow: /hu/forum/language
> Disallow: /hu/forum/store
> Disallow: /hu/forum/styles
> Disallow: /ja/forum/common.php
> Disallow: /ja/forum/config.php
> Disallow: /ja/forum/con.php
> Disallow: /ja/forum/faq.php
> Disallow: /ja/forum/mcp.php
> Disallow: /ja/forum/memberlist.php
> Disallow: /ja/forum/posting.php
> Disallow: /ja/forum/report.php
> Disallow: /ja/forum/search.php
> Disallow: /ja/forum/style.php
> Disallow: /ja/forum/ucp.php
> Disallow: /ja/forum/viewonline.php
> Disallow: /ja/forum/adm
> Disallow: /ja/forum/cache
> Disallow: /ja/forum/docs
> Disallow: /ja/forum/files
> Disallow: /ja/forum/images
> Disallow: /ja/forum/includes
> Disallow: /ja/forum/language
> Disallow: /ja/forum/store
> Disallow: /ja/forum/styles
> Disallow: /test
> Disallow: /nl/forum/common.php
> Disallow: /nl/forum/config.php
> Disallow: /nl/forum/con.php
> Disallow: /nl/forum/faq.php
> Disallow: /nl/forum/mcp.php
> Disallow: /nl/forum/memberlist.php
> Disallow: /nl/forum/posting.php
> Disallow: /nl/forum/report.php
> Disallow: /nl/forum/search.php
> Disallow: /nl/forum/style.php
> Disallow: /nl/forum/ucp.php
> Disallow: /nl/forum/viewonline.php
> Disallow: /nl/forum/adm
> Disallow: /nl/forum/cache
> Disallow: /nl/forum/docs
> Disallow: /nl/forum/files
> Disallow: /nl/forum/images
> Disallow: /nl/forum/includes
> Disallow: /nl/forum/language
> Disallow: /nl/forum/store
> Disallow: /nl/forum/styles
> Disallow: /vi/forum/common.php
> Disallow: /vi/forum/config.php
> Disallow: /vi/forum/con.php
> Disallow: /vi/forum/faq.php
> Disallow: /vi/forum/mcp.php
> Disallow: /vi/forum/memberlist.php
> Disallow: /vi/forum/posting.php
> Disallow: /vi/forum/report.php
> Disallow: /vi/forum/search.php
> Disallow: /vi/forum/style.php
> Disallow: /vi/forum/ucp.php
> Disallow: /vi/forum/viewonline.php
> Disallow: /vi/forum/adm
> Disallow: /vi/forum/cache
> Disallow: /vi/forum/docs
> Disallow: /vi/forum/files
> Disallow: /vi/forum/images
> Disallow: /vi/forum/includes
> Disallow: /vi/forum/language
> Disallow: /vi/forum/store
> Disallow: /vi/forum/styles
> Disallow: /zh/forum/common.php
> Disallow: /zh/forum/config.php
> Disallow: /zh/forum/con.php
> Disallow: /zh/forum/faq.php
> Disallow: /zh/forum/mcp.php
> Disallow: /zh/forum/memberlist.php
> Disallow: /zh/forum/posting.php
> Disallow: /zh/forum/report.php
> Disallow: /zh/forum/search.php
> Disallow: /zh/forum/style.php
> Disallow: /zh/forum/ucp.php
> Disallow: /zh/forum/viewonline.php
> Disallow: /zh/forum/adm
> Disallow: /zh/forum/cache
> Disallow: /zh/forum/docs
> Disallow: /zh/forum/files
> Disallow: /zh/forum/images
> Disallow: /zh/forum/includes
> Disallow: /zh/forum/language
> Disallow: /zh/forum/store
> Disallow: /zh/forum/styles
> 
> This has been the robots.txt file since: Last-Modified: Sat, 06 Jun 2009 23:40:14 GMT
> 
> Forum search uses phpBB
> 
> We haven’t allowed search engines to crawl forum.openoffice.org since before the Oracle
donation to the ASF.
> 
> Crawlers IP addresses might be blocked by ASF Infra if their use is excessive. That could
give the 301.
> 
> Regards,
> Dave
> 
>> On May 12, 2020, at 3:55 AM, Peter Kovacs <legine@posteo.de> wrote:
>> 
>> Hello all,
>> 
>> 
>> What I figured is that from the Google search tool the URL forum.openoffice.org is
not reachable.
>> 
>> So I checked with Duckduckgo (my prefered Search engine), they don't use crawler
and point at the infra of Google, Bing and Yandex.
>> 
>> I checked then with Bing, but could not figure out to check bots feedback on an URL
so I moved on
>> 
>> I checked with Yandex. They have a search URL test page. I have entered there forum.openoffice.org
>> 
>> The Response is:
>> 
>> ------------------------------------------------------------------------
>> 
>> * Date: Tue, 12 May 2020 10:37:47 GMT
>> * Server: Apache/2.4.18 (Ubuntu)
>> * Location: https://forum.openoffice.org/
>> * Content-Length: 237
>> * Keep-Alive: timeout=15, max=100
>> * Connection: Keep-Alive
>> * Content-Type: text/html; charset=iso-8859-1
>> 
>> ------------------------------------------------------------------------
>> 
>> 
>> HTTP status code 	301 Moved Permanently
>> Server response time 	133 ms
>> IP address 	54.84.201.130
>> Encoding 	UTF-8(unicode-1-1-utf-8, UTF8)
>> Page size 	237 B
>> 
>> 
>> I am not sure, what that means. HTTP Status Code moved Permanently reads wrong. I
just dont know if this is the return code from our webservcer or a response code from the
crawler.
>> I try to get someone from Infra. Or I'll open a ticket.
>> 
>> 
>> All the best
>> Peter
>> 
>> Am 12.05.20 um 10:39 schrieb Matthias Seidel:
>>> Hi Kay,
>>> 
>>> Am 12.05.20 um 01:21 schrieb Kay Schenk:
>>>> On 5/11/20 12:33 PM, Matthias Seidel wrote:
>>>>> Hi Kay,
>>>>> 
>>>>> Am 11.05.20 um 21:23 schrieb Kay Schenk:
>>>>>> Hi Peter...
>>>>>> 
>>>>>> Since I am a Google Search admin for www.openoffice.org, and
>>>>>> openoffice.apache.org, I got this also. Disclaimer: I have not done
>>>>>> ANY work with the Google Search apis on these sites in quite some
time.
>>>>>> 
>>>>>> I actually was NOT aware forum.openoffice.org was set up to use Google
>>>>>> Search until I saw this.
>>>>> I think, I added it to the list when we had a discussion about outdated
>>>>> information regarding SourceForge found by Google Search.
>>>>> 
>>>>> But I don't have access to forum.openoffice.org, so I could never
>>>>> complete the step.
>>>>> 
>>>>> Regards,
>>>>> 
>>>>>    Matthias
>>>> OK. In the top level of the website source, there is a file called
>>>> "skeleton.html" which references the following bit of code --
>>>> 
>>>> <!--#include virtual="/scripts/google-analytics.js" -->
>>>> 
>>>> I didn't dig far enough to find how "skeleton.html" is used ( I
>>>> forgot) but this this is example for the google-analytics code snippet
>>>> that is used. Basically, this needs to be included in the site you
>>>> want analytics to be used on by putting it in the (header) files that
>>>> generate the site. And, you might  take a look at recent instructions
>>>> from Google. Things change.
>>>> 
>>>> https://support.google.com/analytics/answer/1008080
>>> Yes, but this is for Google Analytics. I wouldn't want to "analyze" the
>>> forum...
>>> The procedure for the Google Search Console is the same, it needs access
>>> to the root directory.
>>> 
>>> Maybe Andrea can help if he is available again?
>>> 
>>> Regards,
>>> 
>>>   Matthias
>>> 
>>>> Regards,
>>>> 
>>>> Kay
>>>> 
>>>>>> One of the Google Search admins for forum.openoffice.org could check
>>>>>> the current Google search apis that are in use on that site. Changes
>>>>>> are occasionally made to the calls, and maybe that is the issue,
or a
>>>>>> robots.txt for that site is causing this. I don't think it requires
a
>>>>>> response, but maybe some investigation.
>>>>>> 
>>>>>> Just some ideas...
>>>>>> 
>>>>>> Regards,
>>>>>> 
>>>>>> Kay
>>>>>> 
>>>>>> 
>>>>>> On 5/11/20 6:02 AM, Peter Kovacs wrote:
>>>>>>> Hi all,
>>>>>>> 
>>>>>>> I have received following mail. Probably because I am listed
in the
>>>>>>> google-Analytics page.
>>>>>>> 
>>>>>>> Does this has some action items? What can we answer Mr John Mueller?
>>>>>>> 
>>>>>>> 
>>>>>>> All the Best
>>>>>>> 
>>>>>>> Peter
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> -------- Weitergeleitete Nachricht --------
>>>>>>> Betreff:     Critical issue on forum.openoffice.org and Google
Search
>>>>>>> Datum:     Mon, 11 May 2020 13:37:27 +0200
>>>>>>> Von:     John Mueller <johnmu@google.com>
>>>>>>> An:     morseidel@gmail.com, kay.schenk@gmail.com, leginee@gmail.com
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Dear webmaster of forum.openoffice.org <http://forum.openoffice.org>
>>>>>>> 
>>>>>>> I'm an analyst at Google in Switzerland. We wanted to bring your
>>>>>>> attention to a critical issue with your website, and how it's
>>>>>>> available for Google's web search.
>>>>>>> 
>>>>>>> In particular, Googlebot has been unable to crawl URLs from
>>>>>>> https://forum.openoffice.org/ . This will cause those pages to
drop
>>>>>>> out of Google's search results, and will prevent new pages from
being
>>>>>>> picked up for Search. If you're not aware of this issue, you
may be
>>>>>>> accidentally blocking these pages from Google Search due to a
server
>>>>>>> issue. If you need to block Googlebot from crawling pages on
your
>>>>>>> website, we'd recommend using the robots.txt file instead.
>>>>>>> 
>>>>>>> Should you need to recognize IP addresses of Googlebot requests,
you
>>>>>>> can use a reverse IP lookup to do so:
>>>>>>> https://support.google.com/webmasters/answer/80553
>>>>>>> 
>>>>>>> Should you have any questions, feel free to contact me directly.
For
>>>>>>> verification purposes, we are sending a copy of this message
to your
>>>>>>> site's Search Console account.
>>>>>>> 
>>>>>>> Thank you,
>>>>>>> John Mueller (johnmu@google.com <mailto:johnmu@google.com>)
>>>>>>> Webmaster Trends Analyst
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
>>>> For additional commands, e-mail: dev-help@openoffice.apache.org
>>>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
> For additional commands, e-mail: dev-help@openoffice.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@openoffice.apache.org
For additional commands, e-mail: dev-help@openoffice.apache.org


Mime
View raw message