lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Searching over multiple indexes with 1:m relationship
Date Thu, 28 Jun 2007 14:08:56 GMT
I do have an off-the-wall question.. Why have two indexes? There
are, of course, good reasons, but they're things like size and speed.

Where I'm going here is that Lucene does NOT require that all
documents have the same fields. So it's perfectly reasonable to index
heterogeneous data (or differing forms of the same data) in a single index.
This may not fit your requirements, but I thought I'd mention it.

That said, it really doesn't bear on your question since you'd really have
two logical indexes in the same physical index. Although maybe it does.
If all the data were in one index, then perhaps you could do exactly one
search instead.

I'm always leery of using an index to mimic what looks like database
functionality. That often means that you either should actually use a
database for the database-like parts or get much more clever in your index
so you don't need what are essentially joins.

All that said, a lot depends on the data set size. If your first query
results
in, say, 100 documents (pks) that you need to use for your second query,
it probably doesn't matter whether you do a lot of manual processing. If the
first query results in 1,000,000 pks, then it does....

So how much data are you talking about? Even the single-index idea
depends upon whether we're talking a couple of G index size of a
couple of T...

Best
Erick




On 6/28/07, Michael Böckling <Michael.Boeckling@dmc.de> wrote:
>
> Hi folks!
>
> I know there is a MultiSearcher for searching over multiple indices, but
> my
> requirement is a bit special.
> I have two indices whose documents have a 1:m relationship. Most queries
> will only use the primary index, but some will have to look for detailed
> information in the secondary index (the index fields are of course
> different).
>
> What I plan to do:
> - first get the results from the primary index
> - then use the pk of the found documents and the additional search
> constraints to search in the secondary index
> - discard any primary results that did not match in the secondary index
>
> Is this ok, or am I completely nuts by doing that? Is there a better
> alternative?
>
> Thanks for any clues!
>
> Michael
>
>
> --
> Michael Böckling
> Java Engineer
> dmc digital media center GmbH
> Rommelstraße 11
> 70376 Stuttgart (Germany)
> Telefon: +49 711 601747-0
> Telefax: +49 711 601747-141
> E-Mail: Michael.Boeckling@dmc.de
> Internet: www.dmc.de
>
> Handelsregister: AG Stuttgart HRB 18974
> Geschäftsführer: Andreas Magg, Daniel Rebhorn, Andreas Schwend
>
> ---------------------------------------------
> Besseres E-Business.
> dmc ist die kreative Vernetzung von Agentur, Systemhaus und Service. Seit
> über 10 Jahren entwickeln und realisieren wir zukunftweisende und
> erfolgreiche E-Business-Lösungen. Zu unseren langjährigen Kunden zählen
> neckermann.de, Kodak und Telekom Training.
>
> dmc auf Platz 8 im aktuellen New Media Service Ranking.
> Als inhabergeführte und netzwerkunabhängige Agentur gehören wir mit einem
> Umsatz von 13,50 Mio. Euro zu den Top 10 der erfolgreichsten New Media
> Dienstleister in Deutschland.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message