manifoldcf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject RE: [VOTE] Release Apache ManifoldCF 1.7.1, RC1
Date Thu, 18 Sep 2014 11:00:12 GMT
Hi Erlend,

please can you also add the manifoldcf log as well?

Thanks,
Karl

Sent from my Windows Phone
------------------------------
From: Karl Wright
Sent: 9/18/2014 6:24 AM
To: dev
Subject: Re: [VOTE] Release Apache ManifoldCF 1.7.1, RC1

Hi Erlend,

MCF does not care if there's garbage in the robots file; it just warns when
it sees it.  That doesn't appear to be the source of the difficulty.

Karl


On Thu, Sep 18, 2014 at 6:20 AM, Erlend Garåsen <e.f.garasen@usit.uio.no>
wrote:

>
> MCF should handle invalid robots.txt files. We cannot rely on what people
> have entered into such files. So I guess MCF should just ignore invalid
> robots.txt files. I guess it already does.
>
> It seems invalid due to use of the = symbol instead of a #. I'm not an
> expert of such files, so I'm not completely sure.
>
> E
>
>
> On 18.09.14 12:04, Karl Wright wrote:
>
>> Hi Erlend,
>>
>> Your robots file has this at the top:
>>
>> ====
>>      The contents of this file are subject to the license and copyright
>>      detailed in the LICENSE and NOTICE files at the root of the source
>>      tree and available online at
>>
>>      http://www.dspace.org/license/
>> ====
>>
>> That's fine except to the best of my knowledge the robots spec does
>> not allow for comments at all.
>>
>> If you have reason to believe that has changed, then please point me
>> at a reference and I can change our robots parser.
>>
>> Thanks,
>> Karl
>>
>>
>>
>> On Thu, Sep 18, 2014 at 6:02 AM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>  Hi Erlend,
>>>
>>> MCF caches the robots.txt file in the database, which it considers valid
>>> for 1 hour.
>>>
>>> I'll look at the logs and thread dump and let you know if this is a
>>> locking issue or something else.  Please stand by.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Sep 18, 2014 at 5:24 AM, Erlend Garåsen <e.f.garasen@usit.uio.no
>>> >
>>> wrote:
>>>
>>>
>>>> I tried to restart the job dealing with www.duo.no on our test server,
>>>> but it does not seem to touch the robots.txt file at all. That's the
>>>> reason
>>>> why it's able to continue. Both servers are set up to obey the rules of
>>>> such files.
>>>>
>>>> Erlend
>>>>
>>>>
>>>> On 18.09.14 11:12, Erlend Garåsen wrote:
>>>>
>>>>
>>>>> I'm facing the same problems with robot.txt files using RC1, so maybe
>>>>> this is another issue we have to fix. Can you please try to fetch the
>>>>> host below? For some odd reason, it seems that MCF on our test server
>>>>> can handle it.
>>>>>
>>>>> This is exactly the same that happened when I started MCF (referring
to
>>>>> my previous post) after I had deployed RC1:
>>>>> 09-18-2014 11:02:14.400     robots parse     https:www.duo.uio.no:443
>>>>>       ERRORS     0     3     Unknown robots.txt line: '===='
>>>>>
>>>>> No activity after this error.
>>>>>
>>>>> Here's the robots.txt file:
>>>>> https://www.duo.uio.no/robots.txt
>>>>>
>>>>> This is the content of manifoldcf.log after the startup:
>>>>>    WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown
>>>>> robots.txt line from 'https:www.duo.uio.no:443': '===='
>>>>>    WARN 2014-09-18 11:02:14,401 (Worker thread '19') - Web: Unknown
>>>>> robots.txt line from 'https:www.duo.uio.no:443': '    The contents of
>>>>> this file are subject to the license and copyright'
>>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>>> robots.txt line from 'https:www.duo.uio.no:443': '    detailed in the
>>>>> LICENSE and NOTICE files at the root of the source'
>>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>>> robots.txt line from 'https:www.duo.uio.no:443': '    tree and
>>>>> available
>>>>> online at'
>>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>>> robots.txt line from 'https:www.duo.uio.no:443': '
>>>>> http://www.dspace.org/license/'
>>>>>    WARN 2014-09-18 11:02:14,402 (Worker thread '19') - Web: Unknown
>>>>> robots.txt line from 'https:www.duo.uio.no:443': '===='
>>>>>
>>>>> E
>>>>>
>>>>>
>>>>> On 18.09.14 03:12, Karl Wright wrote:
>>>>>
>>>>>  Please vote on whether to release Apache ManifoldCF 1.7.1, RC1.
>>>>>>
>>>>>> This release fixes a number of critical issues, as well as a number
of
>>>>>> user
>>>>>> priorities, most notably:
>>>>>>
>>>>>> - A bad Zookeeper support issue, which made locking support fail
when
>>>>>> Zookeeper connections got lost and then restored;
>>>>>> - The Alfresco connector, which was nonfunctional in both MCF 1.6
and
>>>>>> 1.7;
>>>>>> - Solr Cloud support, which had ceased working due to changes to
>>>>>> SolrJ;
>>>>>> - Non-null connector components caused failure;
>>>>>> - PostgreSQL queries not performing well.
>>>>>>
>>>>>> The complete list of included fixes can be found at:
>>>>>>
>>>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.
>>>>>> 7.1-RC1/CHANGES.txt
>>>>>>
>>>>>>
>>>>>> The release candidate can be downloaded from:
>>>>>>
>>>>>> http://people.apache.org/~kwright/apache-manifoldcf-1.7.1
>>>>>>
>>>>>> There is a tag at:
>>>>>>
>>>>>> https://svn.apache.org/repos/asf/manifoldcf/tags/release-1.7.1-RC1
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message