This code is now complete in both trunk and the dev_1x branch.

Karl


On Tue, Oct 7, 2014 at 11:18 AM, Karl Wright <daddywri@gmail.com> wrote:
Hi Jitu,

I would suggest that we do not try for multiple date ranges, but just an "earliest document date" filtering parameter.  Adding this functionality to the Document Filter transformation connector would be what I'd do.  If necessary, we can also add an IOutputActivities method which will allow a connector to decide whether a document needs to be fetched or not based on its date stamp; this would help prevent unnecessary work opening older documents.

Oddly enough, I think that the work involved would largely be in coming up with a reasonable date selection UI.

If this sounds like it is what you want, please go ahead and create a ticket describing this functionality.

Karl


On Tue, Oct 7, 2014 at 10:35 AM, Jitu <abjitu@gmail.com> wrote:
Hi Karl,
           Thanks for the support. what you said is absolutely what we are looking for too. Crawling is absolutely fine but we should not process the documents until the criteria is met. here the criteria is file modified during last 2 months or 3 months or date range.

It is something similar to getDocumentVersions which checks if that document version is updated and process the file only if the version is updated. so crawl the documents but don't process them unless the criteria matches. is there a way to achieve it.

Thanks,
Jitu

On Tue, Oct 7, 2014 at 7:50 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Jitu,

I know of no way to crawl only those documents that were created after a specified date.  SharePoint crawling involves walking a tree, not querying SharePoint for a list of documents that fulfills a specific criteria.

What this means is that we will need to crawl the entire tree *regardless* of what documents we decide to index.  We can filter the discovered documents by looking at their creation date, and exclude those last modified prior to 2011-01-01 from being indexed.  That would cut down on the work that your index needs to do, and the work of actually fetching the content itself.  But we would still need to crawl all documents.

Karl


On Tue, Oct 7, 2014 at 10:11 AM, Jitu <abjitu@gmail.com> wrote:
Hi Karl,

Here is the requirement:

One of our customers would like to selectively publish the documents from his SharePoint which is over grown in size in due course. Since filtering based on folder names is not an easy task, he likes us to crawl all the documents created in sharepoint between 2 dates.

 

All documents created/modified between 2011-01-01 till 2013-12-31 are needed to crawl and if that is possible to do, then the additional filters get added to the date range. Ex: get only the Docx and Doc files created between 2011-01-01 to 2013-12-31 etc…


similarly all documents created/modified in last 2 months etc...


Thanks,

Jitu


On Mon, Oct 6, 2014 at 5:04 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Jitu,

Did you ever figure out what the customer requirement really was here?

Thanks,
Karl


On Fri, Oct 3, 2014 at 6:09 PM, Karl Wright <daddywri@gmail.com> wrote:
Hi Jitu,

SharePoint does not provide a way to crawl documents by date range, so all documents will need to be crawled regardless of any date range requirement, and then filtered.

So at this point it is important to ask the client if their requirement's purpose is to save crawling load on the server, because if it is, you won't get much savings.  But if the client wants this feature for other reasons, we can support it with some work.

Please open a ticket if you find that the client has a legitimate reason for this requirement.

Karl

Sent from my Windows Phone

From: Jitu
Sent: 10/3/2014 4:22 PM
To: user@manifoldcf.apache.org
Subject: regarding crawl parameters

Hi Karl,

 Thanks for your continuous support. we have a requirement from our client to crawl files which are created/modified in last one month or 2 months from share point server and that parameter should be configurable in gui. we are using manifoldcf 1.7 version. Is there a way to achieve this. Please help.

Thanks,
Jitu