lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <ser...@gmail.com>
Subject Re: Questions for facets search
Date Thu, 14 Aug 2014 04:51:11 GMT
Glad it helped Sheng.

Note, the taxonomy index is not exactly like what you implement, just want
to clarify that. You implemented something like a JOIN between two indexes,
where a document index Index1 can be joined with a document (or set of
docs) in Index2, by some primary key.

The taxonomy index is different. It's an auxiliary index, but the word
'index' is just an implementation detail. Again, think of it as a large Map
from a String to Integer. Every facet in the taxonomy gets a unique ID
(integer), and that integer is encoded in the search index for all
documents that are associated with that facet.

Lucene implements a similar feature, per-segment, through
SortedSetDocValues (and the facet module supports that one too, without the
need for an auxiliary index). The difference is that SortedSetDocValues
implement that mapping per-segment, so e.g. the facet Tags/Lucene may
receive the integer 5 in seg1 and 12 in seg2, where the taxonomy index maps
it *once* to an integer (say 4), and that integer is encoded in a
BinaryDocValuesField in all segments of the search index.

The only lookup that is done at search time is when you want to label top
facets. Since the search index holds only the integer values of the facets,
the taxonomy index is used to label them (so now it's more of a
bidirectional Map).

Just wanted to clarify the differences.

Shai


On Thu, Aug 14, 2014 at 2:56 AM, Sheng <shengcer@gmail.com> wrote:

> Shai,
>
> Thanks a lot for your answers! Sorry, I was distracted by some other
> matters during the day and cannot try your suggestions until now. So what
> you suggest on 1 is working like a charm :) for 2, it is a pity but I can
> understand. By the way, the way you described that facet index gets stored
> like a map is quite similar to how we store the payload :) We use an
> integer as payload for each token, and store more complicated information
> in another Lucene index with the integer payload as the key for each
> document.
>
> Sheng
>
> On Wednesday, August 13, 2014, Shai Erera <serera@gmail.com> wrote:
>
> > Sheng,
> >
> > I assume that you're using the Lucene faceting module, so I answer
> > following that:
> >
> > (1) A document can be associated with many facet labels, e.g. Tags/lucene
> > and Author/Shai. The way to extract all facet labels for a particular
> > document is this:
> >
> >   OrdinalsReader ordinals = new DocValuesOrdinalsReader();
> >   OrdinalsSegmentReader ordsSegment =
> > ordinals.getReader(indexReader.leaves().get(0)); // we have only one
> > segment
> >   IntsRef scratch = new IntsRef();
> >   ordsSegment.get(0, scratch);
> >   for (int i = 0; i < scratch.length; i++) {
> >     System.out.println(taxoReader.getPath(scratch.ints[i]));
> >   }
> >
> > Note that OrdinalsSegmentReader works on an AtomicReader. That means that
> > the doc-id that you pass to it must be relative to the segment. If you
> have
> > a global doc-id, you can wrap the DirectoryReader with a
> > SlowCompositeReaderWrapper, which presents the DirectoryReader as an
> > AtomicReader.
> >
> > (2) I'm not quite sure I understand what you mean by "facet cache". Do
> you
> > mean the taxonomy index? If so the answer is no. Think of the taxonomy
> > index is a large global Map<FacetLabel, Integer>, where each facet label
> is
> > mapped to an integer, irrespective of the segment it is indexed in. That
> > map is used to encode the facet information in the *Search Index* more
> > efficiently.
> >
> > Therefore the taxonomy index itself doesn't hold all the information that
> > is needed for faceted search, and you cannot only rebuild it.
> >
> > Shai
> >
> >
> > On Wed, Aug 13, 2014 at 8:08 AM, Ralf Heyde <ralf.heyde@gmx.de
> > <javascript:;>> wrote:
> >
> > > For 1st: from Solr Level i guess, you could select (only) the document
> by
> > > uniqueid. Then you have the facets for that particular document. But
> this
> > > results in one additional query/doc.
> > >
> > > Gesendet von meinem BlackBerry 10-Smartphone.
> > >   Originalnachricht
> > > Von: Sheng
> > > Gesendet: Dienstag, 12. August 2014 23:35
> > > An: java-user@lucene.apache.org <javascript:;>
> > > Antwort an: java-user@lucene.apache.org <javascript:;>
> > > Betreff: Questions for facets search
> > >
> > > I actually have 2 questions:
> > >
> > > 1. Is it possible to get the facet label for a particular document? The
> > > reason we want this is we'd like to allow users to see tags for each
> hit
> > in
> > > addition to the taxonomy for his/her search.
> > >
> > > 2. Is it possible to re-index the facet cache without reindexing the
> > whole
> > > lucene cache, since they are separated? We have a dynamic list of
> faceted
> > > fields, being able to quickly rebuild the whole facet lucene cache
> would
> > be
> > > quite desirable.
> > >
> > > Again, I am using lucene 4.7, thanks in advance to your answers!
> > >
> > > Sheng
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > <javascript:;>
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > <javascript:;>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message