manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jitu <abj...@gmail.com>
Subject Re: regarding crawl parameters
Date Tue, 07 Oct 2014 14:35:23 GMT
Hi Karl,
           Thanks for the support. what you said is absolutely what we are
looking for too. Crawling is absolutely fine but we should not process the
documents until the criteria is met. here the criteria is file modified
during last 2 months or 3 months or date range.

It is something similar to getDocumentVersions which checks if that
document version is updated and process the file only if the version is
updated. so crawl the documents but don't process them unless the criteria
matches. is there a way to achieve it.

Thanks,
Jitu

On Tue, Oct 7, 2014 at 7:50 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Jitu,
>
> I know of no way to crawl only those documents that were created after a
> specified date.  SharePoint crawling involves walking a tree, not querying
> SharePoint for a list of documents that fulfills a specific criteria.
>
> What this means is that we will need to crawl the entire tree *regardless*
> of what documents we decide to index.  We can filter the discovered
> documents by looking at their creation date, and exclude those last
> modified prior to 2011-01-01 from being indexed.  That would cut down on
> the work that your index needs to do, and the work of actually fetching the
> content itself.  But we would still need to crawl all documents.
>
> Karl
>
>
> On Tue, Oct 7, 2014 at 10:11 AM, Jitu <abjitu@gmail.com> wrote:
>
>> Hi Karl,
>>
>> Here is the requirement:
>>
>> One of our customers would like to selectively publish the documents from
>> his SharePoint which is over grown in size in due course. Since filtering
>> based on folder names is not an easy task, he likes us to crawl all the
>> documents created in sharepoint between 2 dates.
>>
>>
>>
>> All documents created/modified between 2011-01-01 till 2013-12-31 are
>> needed to crawl and if that is possible to do, then the additional filters
>> get added to the date range. Ex: get only the Docx and Doc files created
>> between 2011-01-01 to 2013-12-31 etc…
>>
>>
>> similarly all documents created/modified in last 2 months etc...
>>
>>
>> Thanks,
>>
>> Jitu
>>
>> On Mon, Oct 6, 2014 at 5:04 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> Did you ever figure out what the customer requirement really was here?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Fri, Oct 3, 2014 at 6:09 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Jitu,
>>>>
>>>> SharePoint does not provide a way to crawl documents by date range, so
>>>> all documents will need to be crawled regardless of any date range
>>>> requirement, and then filtered.
>>>>
>>>> So at this point it is important to ask the client if their
>>>> requirement's purpose is to save crawling load on the server, because if
it
>>>> is, you won't get much savings.  But if the client wants this feature for
>>>> other reasons, we can support it with some work.
>>>>
>>>> Please open a ticket if you find that the client has a legitimate
>>>> reason for this requirement.
>>>>
>>>> Karl
>>>>
>>>> Sent from my Windows Phone
>>>> ------------------------------
>>>> From: Jitu
>>>> Sent: 10/3/2014 4:22 PM
>>>> To: user@manifoldcf.apache.org
>>>> Subject: regarding crawl parameters
>>>>
>>>> Hi Karl,
>>>>
>>>>  Thanks for your continuous support. we have a requirement from our
>>>> client to crawl files which are created/modified in last one month or 2
>>>> months from share point server and that parameter should be configurable
in
>>>> gui. we are using manifoldcf 1.7 version. Is there a way to achieve this.
>>>> Please help.
>>>>
>>>> Thanks,
>>>> Jitu
>>>>
>>>
>>>
>>
>

Mime
View raw message