lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajesh parab <rajesh_para...@yahoo.com>
Subject Re: ParalleReader and synchronization between indexes
Date Thu, 01 May 2008 02:52:08 GMT
My apologies for quick follow-ups and thanks for
pointers/suggestions Grant and Otis.

I did check various threads on Java user forum around
this topic, but could not find a solution. Some most
relevant topics that end with same question I am
currently having.

http://www.gossamer-threads.com/lists/lucene/java-user/15063?search_string=parallelreader;#15063
http://www.gossamer-threads.com/lists/lucene/java-user/31435?search_string=parallelreader;#31435
http://www.gossamer-threads.com/lists/lucene/java-user/50164?search_string=parallelreader;#50164


Otis,
During incremental indexing, option of re-creating
second index entirely will not work well in our case
as we will be dealing with millions of documents.

I am sorry for creating confusion by referring index
as "small" index. I should have referred to it as
index with less no of fields, which change very often.

So, if first index with large no fields is not
changing and second index with small set of fields
requires constant updates due to frequent changes, is
there a way to keep document ids of both indexes in
sync without either re-creating second index entirely
or modifying both indexes? Can we somehow keep
internal document id same after updating (i.e. delete
and re-insert) index document?

Regards,
Rajesh

--- Otis Gospodnetic <otis_gospodnetic@yahoo.com>
wrote:

> Bravo Grant!
> 
> Rajesh, I believe the following will work:
> - delete your small index
> - optimize your big index  (needed?  Not 100% sure,
> but I think it is)
> - loop through the docs in your "big" index
> - for each document in the big index, add a document
> to the small index
> 
> When you are done you have big+small with docIDs in
> sync.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr -
> Nutch
> 
> ----- Original Message ----
> > From: Grant Ingersoll <gsingers@apache.org>
> > To: java-user@lucene.apache.org
> > Sent: Wednesday, April 30, 2008 5:48:33 PM
> > Subject: Re: ParalleReader and synchronization
> between indexes
> > 
> > Rajesh,
> > 
> > You are asking a fairly complicated question on a
> seldom used piece of  
> > functionality.  Constantly pinging the list is
> just making it less  
> > likely that someone will respond with an answer. 
> The likelihood that  
> > the 1 person who understand that code (and trust
> me, it really is  
> > likely very few people who know how to practically
> employ it) enough  
> > to give practical advice have read it in the time
> period you have  
> > alloted us to respond is next to nil.   We are all
> volunteers with day  
> > jobs.
> > 
> > Have you bothered to search the dev and user
> mailing list for  
> > information on the class in question?  I would
> look for threads from  
> > Doug or Chuck Williams.
> > 
> > -Grant
> > 
> > 
> > On Apr 30, 2008, at 5:00 PM, Rajesh parab wrote:
> > 
> > > Hi Guys,
> > >
> > > Any comments on this?
> > >
> > > I was looking into Lucene archive and came
> across this
> > > thread what asks the same question.
> > >
> > > 
> >
>
http://www.gossamer-threads.com/lists/lucene/java-user/50477?search_string=parallelreader;#50477
> > >
> > > Any pointers will be helpful.
> > >
> > > Regards,
> > > Rajesh
> > >
> > > --- Rajesh parab  wrote:
> > >
> > >> Hi All,
> > >>
> > >> Any suggestions/comments on my questions in
> this
> > >> thread will be really helpful.
> > >>
> > >> We are planning to use Lucene indexes
> throughout the
> > >> application and exploring possibilites of
> > >> partitioning
> > >> data between multiple indexes.
> > >>
> > >> Regards,
> > >> Rajesh
> > >>
> > >> --- Rajesh parab  wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> This is from javadoc of ParallelReader:
> > >>>
> > >>>
> > >>
> > >
>
======================================================
> > >>>
> > >>> An IndexReader which reads multiple, parallel
> > >>> indexes.
> > >>> Each index added must have the same number of
> > >>> documents, but typically each contains
> different
> > >>> fields. Each document contains the union of
> the
> > >>> fields
> > >>> of all documents with the same document
> number.
> > >> When
> > >>> searching, matches for a query term are from
> the
> > >>> first
> > >>> index added that has the field.
> > >>>
> > >>> This is useful, e.g., with collections that
> have
> > >>> large
> > >>> fields which change rarely and small fields
> that
> > >>> change more frequently. The smaller fields may
> be
> > >>> re-indexed in a new index and both indexes may
> be
> > >>> searched together.
> > >>>
> > >>>
> > >>
> > >
>
======================================================
> > >>>
> > >>> I have a similar use case as mentioned above
> and
> > >>> hence
> > >>> would like to use ParallelReader to search
> across
> > >>> multiple indexes.
> > >>>
> > >>> I have an object that has 50 fields. Out of
> these
> > >> 50
> > >>> fields, 45 are relatively static and other 5
> are
> > >>> modified very often. So, I am planning to
> > >> partition
> > >>> this objects data into 2 indexes such that 45
> > >> static
> > >>> fields will be part of one index and remaining
> 5
> > >>> dynamic fields will constitute second index.
> While
> > >>> generating the index for the first time, I can
> > >> make
> > >>> sure that the document order for documents
> inside
> > >>> both
> > >>> these indexes is same and hence ParallelReader
> > >> will
> > >>> work properly with it.
> > >>>
> > >>> The question is -
> > >>> What if the data inside second (smaller) index
> > >>> changes? In order to update index document, I
> will
> > >>> have to delete it and re-insert it again as
> Lucene
> > >>> does not support document update. This action
> (of
> > >>> delete and re-insert) will change internal
> > >> document
> > >>> id
> > >>> for updated document inside second index and
> in
> > >>> order
> > >>> to sync it with first index, I will have to
> also
> > >>> modify first (relatively big and static)
> index. If
> > >>> we
> > >>> will have to update both the indexes, how it
> is
> > >>> different from having a single index with all
> the
> > >>> fields? What is the use case in which
> > >> ParallelReader
> > >>> will get used? As per documentation, I was
> > >> thinking
> > >>> that it will apply for my use case, but
> > >>> synchronizing
> > >>> the indexes seems to be a problem.
> > >>>
> > >>> Please help.
> > >>>
> > >>> Regards,
> > >>> Rajesh
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>
> 
=== message truncated ===



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message