lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Thomas <JTho...@Camstar.com>
Subject RE: index merge question
Date Tue, 11 Jun 2013 15:57:38 GMT
FWIW, the Solr included with Cloudera Search, by default, "ignores all but the most recent
document version" during merges.
The conflict resolution is configurable however.  See the documentation for details.
http://www.cloudera.com/content/support/en/documentation/cloudera-search/cloudera-search-documentation-v1-latest.html
-- see the user guide pdf, " update-conflict-resolver" parameter

James

-----Original Message-----
From: anirudha81@gmail.com [mailto:anirudha81@gmail.com] On Behalf Of Anirudha Jadhav
Sent: Tuesday, June 11, 2013 10:47 AM
To: solr-user@lucene.apache.org
Subject: Re: index merge question

From my experience the lucene mergeTool and the one invoked by coreAdmin is a pure lucene
implementation and does not understand the concepts of a unique Key(solr land concept)

  http://wiki.apache.org/solr/MergingSolrIndexes has a cautionary note at the end

we do frequent index merges for which we externally run map/reduce ( java code using lucene
api's) jobs to merge & validate merged indices with sources.
-Ani

On Tue, Jun 11, 2013 at 10:38 AM, Mark Miller <markrmiller@gmail.com> wrote:
> Yeah, you have to carefully manage things if you are map/reduce building indexes *and*
updating documents in other ways.
>
> If your 'source' data for MR index building is the 'truth', you also have the option
of not doing incremental index merging, and you could simply rebuild the whole thing every
time - of course, depending your cluster size, that could be quite expensive.

>
> - Mark
>
> On Jun 10, 2013, at 8:36 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>
>> Thanks Mark.  My question is stemming from the new cloudera search stuff.
>> My concern its that if while rebuilding the index someone updates a 
>> doc that update could be lost from a solr perspective.  I guess what 
>> would need to happen to ensure the correct information was indexed 
>> would be to record the start time and reindex the information that changed since
then?
>> On Jun 8, 2013 2:37 PM, "Mark Miller" <markrmiller@gmail.com> wrote:
>>
>>>
>>> On Jun 8, 2013, at 12:52 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>>
>>>> When merging through the core admin (
>>>> http://wiki.apache.org/solr/MergingSolrIndexes) what is the policy 
>>>> for conflicts during the merge?  So for instance if I am merging 
>>>> core 1 and core 2 into core 0 (first example), what happens if core 
>>>> 1 and core 2
>>> both
>>>> have a document with the same key, say core 1 has a newer version 
>>>> of core 2?  Does the merge fail, does the newer document remain?
>>>
>>> You end up with both documents, both with that ID - not generally a 
>>> situation you want to end up in. You need to ensure unique id's in 
>>> the input data or replace the index rather than merging into it.
>>>
>>>>
>>>> Also if using the srcCore method if a document with key 1 is 
>>>> written
>>> while
>>>> an index also with key 1 is being merged what happens?
>>>
>>> It depends on the order I think - if the doc is written after the 
>>> merge and it's an update, it will update the doc that was just 
>>> merged in. If the merge comes second, you have the doc twice and it's a problem.
>>>
>>> - Mark
>



--
Anirudha P. Jadhav
Mime
View raw message