lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <peterlkee...@gmail.com>
Subject Re: SpanQuery and database join
Date Mon, 13 Aug 2007 19:26:58 GMT
I suppose it could go under performance or HowTo/Interesting uses of
SpanQuery.

Peter

On 8/13/07, Erick Erickson <erickerickson@gmail.com> wrote:
>
> Thanks for writing this up. Do you think this is an appropriate subject
> for the Wiki performance page?
>
> Erick
>
> On 8/13/07, Peter Keegan <peterlkeegan@gmail.com> wrote:
> >
> > I've been experimenting with using SpanQuery to perform what is
> > essentially
> > a limited type of database 'join'. Each document in the index contains 1
> > or
> > more 'rows' of meta data from another 'table'. The meta data are simple
> > tokens representing a column name/value pair ( e.g. color$red or
> > location$123).  Each row is represented by a span with a maximum token
> > length equal to the maximum number of meta data columns. If a column has
> > multiple values, they are all indexed at the same position ( e.g.
> > color$red,
> > color$blue). All rows are added to a single field. The spans are
> > 'separated'
> > from each other by introducing a position gap between them via '
> > Analyzer.getPositionIncrementGap'. This gap should be greater than the
> > number of columns in each span.
> >
> > At query time, a SpanNearQuery is constructed to represent the meta data
> > to
> > join. The 'slop' value is set to the maximum number of meta data columns
> > (minus 1). Using a simple Antlr parser, boolean span queries with AND,
> OR,
> > NOT can be constructed fairly easily. The SpanQuery is And'd to the main
> > query to build the final query.
> >
> > This approach is flexible and pretty efficient because no stored fields
> or
> > external data are accessed at query time. Span queries are more
> expensive
> > compared than other queries, though. We measure performance via
> throughput
> > (as opposed to the response time for a single query), and the addition
> of
> > a
> > SpanQuery reduced throughput by 5X for ordered spans and 10X for
> unordered
> > spans. Still, this may be acceptable for some applications, especially
> if
> > spans are not used on every query.
> >
> > I thought this might interest some of you.
> >
> > Peter
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message