manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: Setting up Solr -- commit, event notifications
Date Wed, 02 Jun 2010 15:05:05 GMT
I don't think LCF *can* necessarily communicate enough information for a downstream handler
to make optimal smart decisions about committing, because effectively that would require LCF
to predict the future.  For example, if you *knew* that a job was coming to an end shortly,
you might delay a commit until that happened - but such certainty requires abilities beyond
mere software.

My concern with this feature is it will go in either one of three ways:


(1)    Nobody will use it at all, but will instead configure Solr appropriately, as the initial
design intended.

(2)    People will use it, but will never be satisfied with the amount of info that LCF sends
downstream for decision making - they'll always want more.  For example, they'll start to
want a notification after every X documents have been processed by a job.  Then, they'll want
a notification after a continuous job has been idle for more than Y seconds.  Etc.  And in
the end, the final results will *still* not be adequate for everyone's needs, because you're
still trying to predict the future, and that's impossible.

(3)    People will just use the feature in the dumbest possible way: causing a commit on every
job end, for example, and avoiding the lack of a job end on continuous jobs by never using
continuous jobs.

I am also getting very concerned that so many "requirements" seem to be coming from "initial
evaluation of LCF".  That sounds to me like features that won't really help anyone in the
long run.

Karl


From: ext Jack Krupansky [mailto:jack.krupansky@lucidimagination.com]
Sent: Wednesday, June 02, 2010 10:50 AM
To: connectors-user@incubator.apache.org
Subject: Re: Setting up Solr -- commit, event notifications

Yes, a sophisticated app with lots of complex jobs will have to be quite smart about how it
decides to commit. The goal for LCF would be simply to supply enough job status so that such
a sophisticated app could decide that the job status warrants a commit. As I suggested, the
simplest case would be to see that all non-continuous jobs (at least those that the app cares
about) have completed.

The app end might or might not be Solr itself. It could indeed be a plug-in for Solr, or just
some other app process that has the specified context handler.

And, yes, the "commit at end of job" option is not terribly useful for complex, overlapping
job arrangements. It's primary use case is for initial evaluation of LCF. But it might be
sufficient for some simpler apps. Not all Solr apps are horribly complicated.

Maybe the option should technically be spec'ed as "commit at end of job, but only if no other
jobs are active with the Solr output connector".

In some cases you might only want to commit when a specific job completes. For example, maybe
a series of jobs are scheduled to run in sequence and the commit is only desired on completion
of the final job in that sequence. In that case, the option is desired at the job level rather
than for the Solr output connection itself. Is there any provision for job-specific output
connector options?

-- Jack Krupansky

From: karl.wright@nokia.com<mailto:karl.wright@nokia.com>
Sent: Wednesday, June 02, 2010 10:19 AM
To: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>
Subject: RE: Setting up Solr

What about job deletion document cleanup, etc?  Overlapping job runs using the same output
connection?  We've had this discussion before; the connector can certainly have hooks added
but unless you intend to construct some kind of data structure on the Solr end that tries
to keep track of all that, you're likely not going to get quite what you are looking for.

Karl


From: ext Jack Krupansky [mailto:jack.krupansky@lucidimagination.com]
Sent: Wednesday, June 02, 2010 10:15 AM
To: connectors-user@incubator.apache.org
Subject: Re: Setting up Solr

It would be nice to have a "commit at end of job" option for the Solr output connector. Granted,
commit policy can be a lot more complicated than that, but it is a simple use case that would
facilitate initial evaluations of LCF with Solr.

Thinking further ahead, it would be very useful to have "job status notification" messages
that could be sent to an app (say, a "/update/lcf-job-status" request handler) that would
note start, end, abort, and periodic status of LCF jobs. Then the app could commit as it desires
with respect to individual job completion and larger collections of jobs for different repositories.
For example, an app might wait for all non-continuous jobs to complete before committing.
That would be a more comprehensive longer-term solution for the commit problem, but the simple
end-of-job commit option would be more user-friendly in the near-term.

-- Jack Krupansky

From: karl.wright@nokia.com<mailto:karl.wright@nokia.com>
Sent: Wednesday, June 02, 2010 9:09 AM
To: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>
Subject: RE: Setting up Solr

Solr has autocommit functionality built in.  Google for it and you will find out how to configure
it.

Karl

From: ext Rohan.GPatil@cognizant.com<mailto:Rohan.GPatil@cognizant.com> [mailto:Rohan.GPatil@cognizant.com]
Sent: Wednesday, June 02, 2010 9:08 AM
To: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>
Subject: RE: Setting up Solr

Why can we have a job for this ? else is there any other way ?? (Windows ? in linux there
are cron jobs )

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<mailto:Rohan.GPatil@cognizant.com>

From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: Wednesday, June 02, 2010 6:32 PM
To: connectors-user@incubator.apache.org
Subject: RE: Setting up Solr

You can send any argument you want by configuring the output connector.  However, the explicit
commit on every post will slow down performance of your crawls.

Karl

From: ext Rohan.GPatil@cognizant.com [mailto:Rohan.GPatil@cognizant.com]
Sent: Wednesday, June 02, 2010 9:00 AM
To: connectors-user@incubator.apache.org
Subject: RE: Setting up Solr

Hi,

Yes that is where I was stuck up.. making an explicit commit..

Can I send the argument commit=true while configuring the Repo connector.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<mailto:Rohan.GPatil@cognizant.com>

From: Jack Krupansky [mailto:jack.krupansky@lucidimagination.com]
Sent: Wednesday, June 02, 2010 4:42 PM
To: connectors-user@incubator.apache.org
Subject: Re: Setting up Solr

A short Solr tutorial is here:

http://lucene.apache.org/solr/tutorial.html
After running an LCF job that uses a Solr output connection, be sure to manually force a Solr
"commit", for example:

    cd .../apache-solr-1.4.0/example/exampledocs
    java -jar post.jar

-- Jack Krupansky

From: Rohan.GPatil@cognizant.com<mailto:Rohan.GPatil@cognizant.com>
Sent: Wednesday, June 02, 2010 1:46 AM
To: connectors-user@incubator.apache.org<mailto:connectors-user@incubator.apache.org>
Subject: Setting up Solr

Hi,

I am stuck at setting up the Solr server to be used with LCF.

I am new to Solr.

Thanks & Regards,
Rohan G Patil
Cognizant  Programmer Analyst Trainee,Bangalore || Mob # +91 9535577001
Rohan.GPatil@cognizant.com<mailto:Rohan.GPatil@cognizant.com>

This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.


This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.


This e-mail and any files transmitted with it are for the sole use of
the intended recipient(s) and may contain confidential and privileged
information.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy all copies of the original message.
Any unauthorized review, use, disclosure, dissemination, forwarding,
printing or copying of this email or any action taken in reliance on this
e-mail is strictly prohibited and may be unlawful.



Mime
View raw message