manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend Garåsen <e.f.gara...@usit.uio.no>
Subject Re: [VOTE] Release Apache ManifoldCF 1.7.1, RC1
Date Thu, 18 Sep 2014 10:20:28 GMT

MCF should handle invalid robots.txt files. We cannot rely on what 
people have entered into such files. So I guess MCF should just ignore 
invalid robots.txt files. I guess it already does.

It seems invalid due to use of the = symbol instead of a #. I'm not an 
expert of such files, so I'm not completely sure.

E

On 18.09.14 12:04, Karl Wright wrote:
> Hi Erlend,
>
> Your robots file has this at the top:
>
> ====
>      The contents of this file are subject to the license and copyright
>      detailed in the LICENSE and NOTICE files at the root of the source
>      tree and available online at
>
>      http://www.dspace.org/license/
> ====
>
> That's fine except to the best of my knowledge the robots spec does
> not allow for comments at all.
>
> If you have reason to believe that has changed, then please point me
> at a reference and I can change our robots parser.
>
> Thanks,
> Karl
>
>
>
> On Thu, Sep 18, 2014 at 6:02 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Erlend,
>>
>> MCF caches the robots.txt file in the database, which it considers valid
>> for 1 hour.
>>
>> I'll look at the logs and thread dump and let you know if this is a
>> locking issue or something else.  Please stand by.
>>
>> Karl
>>
>>
>> On Thu, Sep 18, 2014 at 5:24 AM, Erlend Garåsen <e.f.garasen@usit.uio.no>
>> wrote:
>>
>>>
>>> I tried to restart the job dealing with www.duo.no on our test server,
>>> but it does not seem to touch the robots.txt file at all. That's the reason
>>> why it's able to continue. Both servers are set up to obey the rules of
>>> such files.
>>>
>>> Erlend
>>>
>>>
>>> On 18.09.14 11:12, Erlend Garåsen wrote:
>>>
>>>>
>>>> I'm facing the same problems with robot.txt files using RC1, so maybe
>>>> this is another issue we have to fix. Can you please try to fetch the
>>>> host below? For some odd reason, it seems that MCF on our test server
>>>> can handle it.
>>>>
>>>> This is exactly the same that happened when I started MCF (referring to
>>>> my previous post) after I had deployed RC1:
>>>> 09-18-2014 11:02:14.400     robots parse     https:www.duo.uio.no:443
>>>>       ERRORS     0     3     Unknown robots.txt line: '===='
>>>>
>>>> No activity after this error.
>>>>
>>>> Here's the robots.txt file:
>>>> https://www.duo.uio.no/robots.txt
>>>>
>>>> This is the content of manifoldcf.log after the startup:
>>>>    WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown
>>>> robots.txt line from 'https:www.duo.uio.no:443': '===='
>>>>    WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown
>>>> robots.txt line from 'https:www.duo.uio.no:443': '    The contents of
>>>> this file are subject to the license and copyright'
>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>> robots.txt line from 'https:www.duo.uio.no:443': '    detailed in the
>>>> LICENSE and NOTICE files at the root of the source'
>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>> robots.txt line from 'https:www.duo.uio.no:443': '    tree and available
>>>> online at'
>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>> robots.txt line from 'https:www.duo.uio.no:443': '
>>>> http://www.dspace.org/license/'
>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>> robots.txt line from 'https:www.duo.uio.no:443': '===='
>>>>
>>>> E
>>>>
>>>>
>>>> On 18.09.14 03:12, Karl Wright wrote:
>>>>
>>>>> Please vote on whether to release Apache ManifoldCF 1.7.1, RC1.
>>>>>
>>>>> This release fixes a number of critical issues, as well as a number of
>>>>> user
>>>>> priorities, most notably:
>>>>>
>>>>> - A bad Zookeeper support issue, which made locking support fail when
>>>>> Zookeeper connections got lost and then restored;
>>>>> - The Alfresco connector, which was nonfunctional in both MCF 1.6 and
>>>>> 1.7;
>>>>> - Solr Cloud support, which had ceased working due to changes to SolrJ;
>>>>> - Non-null connector components caused failure;
>>>>> - PostgreSQL queries not performing well.
>>>>>
>>>>> The complete list of included fixes can be found at:
>>>>>
>>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.
>>>>> 7.1-RC1/CHANGES.txt
>>>>>
>>>>>
>>>>> The release candidate can be downloaded from:
>>>>>
>>>>> http://people.apache.org/~kwright/apache-manifoldcf-1.7.1
>>>>>
>>>>> There is a tag at:
>>>>>
>>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.7.1-RC1
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>
>>>
>>
>


Mime
View raw message