lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramprakash Ramamoorthy <youngestachie...@gmail.com>
Subject Re: Split index and store
Date Tue, 05 Mar 2013 11:12:46 GMT
On Mon, Mar 4, 2013 at 11:26 PM, Emmanuel Espina
<espinaemmanuel@gmail.com>wrote:

> 100 terms in a boolean query is not so costly. You could wrap that query in
> a ConstantScoreQuery to avoid the score calculation.
>

Thank you Immanuel. This one sounds good.

>
> Why do you have separate indexes? It would be better to build a single
> document and index+store it on a single index.
>

We are doing some sort of stream processing. The older indices would be
zipped, in order to save disk. But searching over the zipped indices were
painful. So we decided splitting index and store, we would compress only
the store part (Already uses Lucene41PostingsFormat though) and then unzip
it as the user paginates(I could get the count and other meta from the
index itself, store being needed only on pagination). Hope I was able to
explain without an ambiguity.

>
> Thanks
> Emmanuel
>
>
>
> 2013/3/1 Ramprakash Ramamoorthy <youngestachiever@gmail.com>
>
> > On Fri, Mar 1, 2013 at 4:46 PM, Ian Lea <ian.lea@gmail.com> wrote:
> >
> > > Never rely on lucene internal doc ids.  Use your own.  Lucene searches
> > > on unique ids are of course very fast.
> > >
> >
> > Point taken Ian. So in case I have 100 matching doc Ids and so the next
> > step is either collate the 100 docIds into a query with OR, or do a
> > searcher.search() for 100 times.
> >
> > Fine, if it isn't very expensive.
> >
> > On a slightly related note, stumbled upon this thread
> >
> >
> http://lucene.472066.n3.nabble.com/App-supplied-docID-in-lucene-possible-td4015797.htmlas
> > well. Some good discussion on this.
> >
> > >
> > > --
> > > Ian.
> > >
> > >
> > > On Fri, Mar 1, 2013 at 9:51 AM, Ramprakash Ramamoorthy
> > > <youngestachiever@gmail.com> wrote:
> > > > Hello team,
> > > >
> > > >           I have a query and I am explaining it as below.
> > > >
> > > > Objective : To split index and store, and combine it during query
> time
> > > >
> > > > Approach : Have two index writers, one will write a storedField and
> the
> > > > other will write an indexed Field(Index.TRUE).
> > > >
> > > > The Question : This happens sequentially(Store and index a single
> doc,
> > > then
> > > > move to the next one). Does this mean the docIds will be same in both
> > the
> > > > indexes stored and indexed (Assuming docIds are  sequential)? Am
> > > interested
> > > > in this because, when I get the docIds from the indexed index during
> > the
> > > > query time, I can simply use reader.get(int docId) and retrieve the
> doc
> > > > from the stored index. Please to note, I don't perform any
> > update/delete
> > > on
> > > > the indexes.
> > > >
> > > > Other solution : Can have an app supplied UUID, which will
> additionally
> > > be
> > > > stored in the indexed index and also indexed in the stored index. But
> > the
> > > > problem is when I have fetched the UUIDs from the indexed index, I
> will
> > > > have to do a searcher.search(UUID1 .. UUIDn) on the stored field,
> > which I
> > > > feel is costly.
> > > >
> > > > Hope I am understandable and less ambiguous. Help appreciated.
> > > >
> > > > --
> > > > With Thanks and Regards,
> > > > Ramprakash Ramamoorthy,
> > > > India
> > > > +91 9626975420
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > India,
> > +91 9626975420
> >
>



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
India.
+91 9626975420

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message