lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramprakash Ramamoorthy <youngestachie...@gmail.com>
Subject Re: Split index and store
Date Fri, 08 Mar 2013 07:38:07 GMT
On Wed, Mar 6, 2013 at 8:08 PM, Emmanuel Espina <espinaemmanuel@gmail.com>wrote:

> I understand and it sounds ok. The "store" index would be like an ordinary
> database where you search by value.
>
> Another approach you could consider is to compress the field before
> indexing. That is you compress with
>
> http://docs.oracle.com/javase/1.5.0/docs/api/java/util/zip/GZIPInputStream.html
> and
> store those results as the contets of a stored but not indexed field.
>

Thank you Immanuel. Will try this and update here once done.

>
> Then you can do a single query to get the doc ids, from the doc ids you can
> retrieve the compressed contents (that you compressed with gzip
> inputstream) and uncompress it in your application before showing it. I
> don't know if in your case you save a lot of disk (that depends of the data
> that you are compressing), but it should be faster than doing two queries.
>
> Thanks
> Emmanuel
>
>
> 2013/3/5 Ramprakash Ramamoorthy <youngestachiever@gmail.com>
>
> > On Mon, Mar 4, 2013 at 11:26 PM, Emmanuel Espina
> > <espinaemmanuel@gmail.com>wrote:
> >
> > > 100 terms in a boolean query is not so costly. You could wrap that
> query
> > in
> > > a ConstantScoreQuery to avoid the score calculation.
> > >
> >
> > Thank you Immanuel. This one sounds good.
> >
> > >
> > > Why do you have separate indexes? It would be better to build a single
> > > document and index+store it on a single index.
> > >
> >
> > We are doing some sort of stream processing. The older indices would be
> > zipped, in order to save disk. But searching over the zipped indices were
> > painful. So we decided splitting index and store, we would compress only
> > the store part (Already uses Lucene41PostingsFormat though) and then
> unzip
> > it as the user paginates(I could get the count and other meta from the
> > index itself, store being needed only on pagination). Hope I was able to
> > explain without an ambiguity.
> >
> > >
> > > Thanks
> > > Emmanuel
> > >
> > >
> > >
> > > 2013/3/1 Ramprakash Ramamoorthy <youngestachiever@gmail.com>
> > >
> > > > On Fri, Mar 1, 2013 at 4:46 PM, Ian Lea <ian.lea@gmail.com> wrote:
> > > >
> > > > > Never rely on lucene internal doc ids.  Use your own.  Lucene
> > searches
> > > > > on unique ids are of course very fast.
> > > > >
> > > >
> > > > Point taken Ian. So in case I have 100 matching doc Ids and so the
> next
> > > > step is either collate the 100 docIds into a query with OR, or do a
> > > > searcher.search() for 100 times.
> > > >
> > > > Fine, if it isn't very expensive.
> > > >
> > > > On a slightly related note, stumbled upon this thread
> > > >
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/App-supplied-docID-in-lucene-possible-td4015797.htmlas
> > > > well. Some good discussion on this.
> > > >
> > > > >
> > > > > --
> > > > > Ian.
> > > > >
> > > > >
> > > > > On Fri, Mar 1, 2013 at 9:51 AM, Ramprakash Ramamoorthy
> > > > > <youngestachiever@gmail.com> wrote:
> > > > > > Hello team,
> > > > > >
> > > > > >           I have a query and I am explaining it as below.
> > > > > >
> > > > > > Objective : To split index and store, and combine it during
query
> > > time
> > > > > >
> > > > > > Approach : Have two index writers, one will write a storedField
> and
> > > the
> > > > > > other will write an indexed Field(Index.TRUE).
> > > > > >
> > > > > > The Question : This happens sequentially(Store and index a single
> > > doc,
> > > > > then
> > > > > > move to the next one). Does this mean the docIds will be same
in
> > both
> > > > the
> > > > > > indexes stored and indexed (Assuming docIds are  sequential)?
Am
> > > > > interested
> > > > > > in this because, when I get the docIds from the indexed index
> > during
> > > > the
> > > > > > query time, I can simply use reader.get(int docId) and retrieve
> the
> > > doc
> > > > > > from the stored index. Please to note, I don't perform any
> > > > update/delete
> > > > > on
> > > > > > the indexes.
> > > > > >
> > > > > > Other solution : Can have an app supplied UUID, which will
> > > additionally
> > > > > be
> > > > > > stored in the indexed index and also indexed in the stored index.
> > But
> > > > the
> > > > > > problem is when I have fetched the UUIDs from the indexed index,
> I
> > > will
> > > > > > have to do a searcher.search(UUID1 .. UUIDn) on the stored field,
> > > > which I
> > > > > > feel is costly.
> > > > > >
> > > > > > Hope I am understandable and less ambiguous. Help appreciated.
> > > > > >
> > > > > > --
> > > > > > With Thanks and Regards,
> > > > > > Ramprakash Ramamoorthy,
> > > > > > India
> > > > > > +91 9626975420
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > With Thanks and Regards,
> > > > Ramprakash Ramamoorthy,
> > > > India,
> > > > +91 9626975420
> > > >
> > >
> >
> >
> >
> > --
> > With Thanks and Regards,
> > Ramprakash Ramamoorthy,
> > India.
> > +91 9626975420
> >
>



-- 
With Thanks and Regards,
Ramprakash Ramamoorthy,
Member Technical Staff,
Zoho Corporation.
+91 9626975420

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message