manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: regarding crawl parameters
Date Tue, 07 Oct 2014 14:20:53 GMT
Hi Jitu,

I know of no way to crawl only those documents that were created after a
specified date.  SharePoint crawling involves walking a tree, not querying
SharePoint for a list of documents that fulfills a specific criteria.

What this means is that we will need to crawl the entire tree *regardless*
of what documents we decide to index.  We can filter the discovered
documents by looking at their creation date, and exclude those last
modified prior to 2011-01-01 from being indexed.  That would cut down on
the work that your index needs to do, and the work of actually fetching the
content itself.  But we would still need to crawl all documents.

Karl


On Tue, Oct 7, 2014 at 10:11 AM, Jitu <abjitu@gmail.com> wrote:

> Hi Karl,
>
> Here is the requirement:
>
> One of our customers would like to selectively publish the documents from
> his SharePoint which is over grown in size in due course. Since filtering
> based on folder names is not an easy task, he likes us to crawl all the
> documents created in sharepoint between 2 dates.
>
>
>
> All documents created/modified between 2011-01-01 till 2013-12-31 are
> needed to crawl and if that is possible to do, then the additional filters
> get added to the date range. Ex: get only the Docx and Doc files created
> between 2011-01-01 to 2013-12-31 etc…
>
>
> similarly all documents created/modified in last 2 months etc...
>
>
> Thanks,
>
> Jitu
>
> On Mon, Oct 6, 2014 at 5:04 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Jitu,
>>
>> Did you ever figure out what the customer requirement really was here?
>>
>> Thanks,
>> Karl
>>
>>
>> On Fri, Oct 3, 2014 at 6:09 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> SharePoint does not provide a way to crawl documents by date range, so
>>> all documents will need to be crawled regardless of any date range
>>> requirement, and then filtered.
>>>
>>> So at this point it is important to ask the client if their
>>> requirement's purpose is to save crawling load on the server, because if it
>>> is, you won't get much savings.  But if the client wants this feature for
>>> other reasons, we can support it with some work.
>>>
>>> Please open a ticket if you find that the client has a legitimate reason
>>> for this requirement.
>>>
>>> Karl
>>>
>>> Sent from my Windows Phone
>>> ------------------------------
>>> From: Jitu
>>> Sent: 10/3/2014 4:22 PM
>>> To: user@manifoldcf.apache.org
>>> Subject: regarding crawl parameters
>>>
>>> Hi Karl,
>>>
>>>  Thanks for your continuous support. we have a requirement from our
>>> client to crawl files which are created/modified in last one month or 2
>>> months from share point server and that parameter should be configurable in
>>> gui. we are using manifoldcf 1.7 version. Is there a way to achieve this.
>>> Please help.
>>>
>>> Thanks,
>>> Jitu
>>>
>>
>>
>

Mime
View raw message