lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Keegan" <peterlkee...@gmail.com>
Subject Re: SpanQuery and database join
Date Wed, 15 Aug 2007 00:02:09 GMT
I added this under Use Cases. Thanks for the suggestion.

Peter


On 8/13/07, Grant Ingersoll <gsingers@apache.org> wrote:
>
> There is also a Use Cases item on the Wiki...
>
> On Aug 13, 2007, at 3:26 PM, Peter Keegan wrote:
>
> > I suppose it could go under performance or HowTo/Interesting uses of
> > SpanQuery.
> >
> > Peter
> >
> > On 8/13/07, Erick Erickson <erickerickson@gmail.com> wrote:
> >>
> >> Thanks for writing this up. Do you think this is an appropriate
> >> subject
> >> for the Wiki performance page?
> >>
> >> Erick
> >>
> >> On 8/13/07, Peter Keegan <peterlkeegan@gmail.com> wrote:
> >>>
> >>> I've been experimenting with using SpanQuery to perform what is
> >>> essentially
> >>> a limited type of database 'join'. Each document in the index
> >>> contains 1
> >>> or
> >>> more 'rows' of meta data from another 'table'. The meta data are
> >>> simple
> >>> tokens representing a column name/value pair ( e.g. color$red or
> >>> location$123).  Each row is represented by a span with a maximum
> >>> token
> >>> length equal to the maximum number of meta data columns. If a
> >>> column has
> >>> multiple values, they are all indexed at the same position ( e.g.
> >>> color$red,
> >>> color$blue). All rows are added to a single field. The spans are
> >>> 'separated'
> >>> from each other by introducing a position gap between them via '
> >>> Analyzer.getPositionIncrementGap'. This gap should be greater
> >>> than the
> >>> number of columns in each span.
> >>>
> >>> At query time, a SpanNearQuery is constructed to represent the
> >>> meta data
> >>> to
> >>> join. The 'slop' value is set to the maximum number of meta data
> >>> columns
> >>> (minus 1). Using a simple Antlr parser, boolean span queries with
> >>> AND,
> >> OR,
> >>> NOT can be constructed fairly easily. The SpanQuery is And'd to
> >>> the main
> >>> query to build the final query.
> >>>
> >>> This approach is flexible and pretty efficient because no stored
> >>> fields
> >> or
> >>> external data are accessed at query time. Span queries are more
> >> expensive
> >>> compared than other queries, though. We measure performance via
> >> throughput
> >>> (as opposed to the response time for a single query), and the
> >>> addition
> >> of
> >>> a
> >>> SpanQuery reduced throughput by 5X for ordered spans and 10X for
> >> unordered
> >>> spans. Still, this may be acceptable for some applications,
> >>> especially
> >> if
> >>> spans are not used on every query.
> >>>
> >>> I thought this might interest some of you.
> >>>
> >>> Peter
> >>>
> >>
>
> --------------------------
> Grant Ingersoll
> http://lucene.grantingersoll.com
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message