manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jitu <abj...@gmail.com>
Subject Re: schedule information
Date Tue, 23 Dec 2014 12:31:27 GMT
Hi Karl,

I checked the source code and in IncrementalIngester.java at line 555 of
checkFetchDocument() method we are checking for forced metadata match of
previous run and current run. if there is a change then file is considered
updated. So Please advice on how to send a parameter to output connector
from StartupThread class which changes for every job execution?

Thanks,
Jitu

On Tue, Dec 23, 2014 at 5:32 PM, Jitu <abjitu@gmail.com> wrote:

> Hi Karl,
>
> Thanks for your support. Here is what i tried. In StartupThread.java
> inside run method. i am trying to create one unique id called InstanceId
> and store it as part of forcedMetaData which will be sent to
> outputconnector. It all works fine. But when i re-run the same job again
> and again all files are getting crawled again. Is this because forced
> metadata is getting changed? is forced metadata used to check whether the
> file is updated or not?
>
> code snippet:
>
>                   final String instanceId = IDFactory.make(threadContext);
>                   // Only now record the fact that we are trying to start
> the job.
>
> connectionMgr.recordHistory(jobDescription.getConnectionName(),
>                     null,connectionMgr.ACTIVITY_JOBSTART,null,
>
> jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null);
>                   jobDescription.clearForcedMetadata();
>                   jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID",
> instanceId);
>                   jobManager.save(jobDescription);
>
>
> Thanks,
> Jitu
>
> On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Jitu,
>>
>> Your client's needs seem rather unusual, and will potentially be somewhat
>> expensive performance-wise.  So unless I hear from others as well that this
>> is a key feature, there's no point in contributing a patch.
>>
>> You will of course need to keep track of whatever changes you develop so
>> that you can later upgrade to newer versions of MCF.
>>
>> Thanks,
>> Karl
>>
>>
>> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <abjitu@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> Thanks for the quick reply and support. This is exactly what i was
>>> looking for. Thank you so much. If i modify WorkerThread.java do i need to
>>> submit a patch for the same?
>>>
>>> Thanks,
>>> Jitu
>>>
>>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Jitu,
>>>>
>>>> I'm sorry for the miscommunication.  What I meant is that without any
>>>> modifications, you can add the job's name as metadata for all documents
>>>> indexed with the job.
>>>>
>>>> If you need to index hard-wired metadata for every job run, you will
>>>> need to modify WorkerThread.java.  The IJobDescription object is readily
>>>> available there, but you will also need to write a SQL query to obtain the
>>>> job's start time.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <abjitu@gmail.com> wrote:
>>>>
>>>>> Hi Karl,
>>>>>           Thanks for the quick reply and support. i have gone through
>>>>> the source code of "ForcedMetadataConnector.java" as well as  end user
>>>>> document "
>>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>>>>> It says we can add a string constant for every job run. but for my client
>>>>> requirement he wants to know what all files crawled for every run of
the
>>>>> job. so to search that i need to a send unique id of every job run as
part
>>>>> of metadata. this unique id changes for every job run so i cannot use
>>>>> ForcedMetadataConnector. you advised "It's certainly possible to add
the
>>>>> current job's start time field as hard-wired metadata" Please let me
know
>>>>> how to achieve it.
>>>>>
>>>>> Thanks,
>>>>> Jitu
>>>>>
>>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Jitu,
>>>>>>
>>>>>> You can certainly add a unique string associated with a job to every
>>>>>> document using the Metadata Adjuster transformation connector (which
of
>>>>>> course can be the job name).  The time of indexing is already sent
as a
>>>>>> metadata field (can't remember which one off the top of my head,
but I'm
>>>>>> sure you can find it).  What you can't get, mainly because it basically
has
>>>>>> little meaning in MCF, is the time the job was started.  It's certainly
>>>>>> possible to add the current job's start time field as hard-wired
metadata,
>>>>>> but I bet your client would prefer the actual time of indexing of
the
>>>>>> document anyhow.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <abjitu@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Karl,
>>>>>>>             Thanks for all your support. For one of our customer
>>>>>>> they need job scheduled information to be sent as part of output
connector.
>>>>>>> Basically my customer wants to know what all files are indexed
in one job
>>>>>>> run using solr search.
>>>>>>>
>>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i
will
>>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of
file
>>>>>>> metadata to solr output connector. During solr search it will
use this
>>>>>>> string to search what all files are indexed as part of this string
or job
>>>>>>> run.
>>>>>>>
>>>>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jitu
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message