lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Espina <espinaemman...@gmail.com>
Subject Re: Split index and store
Date Wed, 06 Mar 2013 14:38:03 GMT
I understand and it sounds ok. The "store" index would be like an ordinary
database where you search by value.

Another approach you could consider is to compress the field before
indexing. That is you compress with
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/zip/GZIPInputStream.html
and
store those results as the contets of a stored but not indexed field.

Then you can do a single query to get the doc ids, from the doc ids you can
retrieve the compressed contents (that you compressed with gzip
inputstream) and uncompress it in your application before showing it. I
don't know if in your case you save a lot of disk (that depends of the data
that you are compressing), but it should be faster than doing two queries.

Thanks
Emmanuel


2013/3/5 Ramprakash Ramamoorthy <youngestachiever@gmail.com>

> On Mon, Mar 4, 2013 at 11:26 PM, Emmanuel Espina
> <espinaemmanuel@gmail.com>wrote:
>
> > 100 terms in a boolean query is not so costly. You could wrap that query
> in
> > a ConstantScoreQuery to avoid the score calculation.
> >
>
> Thank you Immanuel. This one sounds good.
>
> >
> > Why do you have separate indexes? It would be better to build a single
> > document and index+store it on a single index.
> >
>
> We are doing some sort of stream processing. The older indices would be
> zipped, in order to save disk. But searching over the zipped indices were
> painful. So we decided splitting index and store, we would compress only
> the store part (Already uses Lucene41PostingsFormat though) and then unzip
> it as the user paginates(I could get the count and other meta from the
> index itself, store being needed only on pagination). Hope I was able to
> explain without an ambiguity.
>
> >
> > Thanks
> > Emmanuel
> >
> >
> >
> > 2013/3/1 Ramprakash Ramamoorthy <youngestachiever@gmail.com>
> >
> > > On Fri, Mar 1, 2013 at 4:46 PM, Ian Lea <ian.lea@gmail.com> wrote:
> > >
> > > > Never rely on lucene internal doc ids.  Use your own.  Lucene
> searches
> > > > on unique ids are of course very fast.
> > > >
> > >
> > > Point taken Ian. So in case I have 100 matching doc Ids and so the next
> > > step is either collate the 100 docIds into a query with OR, or do a
> > > searcher.search() for 100 times.
> > >
> > > Fine, if it isn't very expensive.
> > >
> > > On a slightly related note, stumbled upon this thread
> > >
> > >
> >
> http://lucene.472066.n3.nabble.com/App-supplied-docID-in-lucene-possible-td4015797.htmlas
> > > well. Some good discussion on this.
> > >
> > > >
> > > > --
> > > > Ian.
> > > >
> > > >
> > > > On Fri, Mar 1, 2013 at 9:51 AM, Ramprakash Ramamoorthy
> > > > <youngestachiever@gmail.com> wrote:
> > > > > Hello team,
> > > > >
> > > > >           I have a query and I am explaining it as below.
> > > > >
> > > > > Objective : To split index and store, and combine it during query
> > time
> > > > >
> > > > > Approach : Have two index writers, one will write a storedField and
> > the
> > > > > other will write an indexed Field(Index.TRUE).
> > > > >
> > > > > The Question : This happens sequentially(Store and index a single
> > doc,
> > > > then
> > > > > move to the next one). Does this mean the docIds will be same in
> both
> > > the
> > > > > indexes stored and indexed (Assuming docIds are  sequential)? Am
> > > > interested
> > > > > in this because, when I get the docIds from the indexed index
> during
> > > the
> > > > > query time, I can simply use reader.get(int docId) and retrieve the
> > doc
> > > > > from the stored index. Please to note, I don't perform any
> > > update/delete
> > > > on
> > > > > the indexes.
> > > > >
> > > > > Other solution : Can have an app supplied UUID, which will
> > additionally
> > > > be
> > > > > stored in the indexed index and also indexed in the stored index.
> But
> > > the
> > > > > problem is when I have fetched the UUIDs from the indexed index,
I
> > will
> > > > > have to do a searcher.search(UUID1 .. UUIDn) on the stored field,
> > > which I
> > > > > feel is costly.
> > > > >
> > > > > Hope I am understandable and less ambiguous. Help appreciated.
> > > > >
> > > > > --
> > > > > With Thanks and Regards,
> > > > > Ramprakash Ramamoorthy,
> > > > > India
> > > > > +91 9626975420
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > With Thanks and Regards,
> > > Ramprakash Ramamoorthy,
> > > India,
> > > +91 9626975420
> > >
> >
>
>
>
> --
> With Thanks and Regards,
> Ramprakash Ramamoorthy,
> India.
> +91 9626975420
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message