lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Searching over multiple indexes with 1:m relationship
Date Thu, 28 Jun 2007 18:44:40 GMT
Chris is spot-on. Your data set is so small that I wouldn't worry about
speed unless and until you have proof that it's a problem. The complexity
you'll introduce by having multiple indexes just won't be worth it.

In your case, following Chris's advice and de-normalizing the data would
be the first you should try.

Erick.

On 6/28/07, Michael Böckling <Michael.Boeckling@dmc.de> wrote:
>
> Hi Erickson,
>
> thanks for your reply.
>
> Of course you are right that its a bit insane to mimic a database-schema
> with indices, but thats how it is. The primary index is already in use,
> the
> extended requirements came later.
>
> The Index isn't really that big, the primary one has 2-3 MB of data, I
> don't
> know yet how big the secondary one will be, but probably less than 20
> Megs.
> The idea was that most searches will only need the first index, it is only
> by using an extended search form that the secondary index is queried.
> Keeping the first index small should help with performance, where the main
> load is handled.
>
> The number of primary results will often be less than 200, typically
> around
> 20 I guess, so its not that big of a deal to iterate through them.
>
> Regards,
>
> Michael
>
>
> > -----Ursprüngliche Nachricht-----
> > Von: Erick Erickson [mailto:erickerickson@gmail.com]
> > Gesendet: Donnerstag, 28. Juni 2007 16:09
> > An: java-user@lucene.apache.org
> > Betreff: Re: Searching over multiple indexes with 1:m relationship
> >
> >
> > I do have an off-the-wall question.. Why have two indexes? There
> > are, of course, good reasons, but they're things like size and speed.
> >
> > Where I'm going here is that Lucene does NOT require that all
> > documents have the same fields. So it's perfectly reasonable to index
> > heterogeneous data (or differing forms of the same data) in a
> > single index.
> > This may not fit your requirements, but I thought I'd mention it.
> >
> > That said, it really doesn't bear on your question since
> > you'd really have
> > two logical indexes in the same physical index. Although
> > maybe it does.
> > If all the data were in one index, then perhaps you could do
> > exactly one
> > search instead.
> >
> > I'm always leery of using an index to mimic what looks like database
> > functionality. That often means that you either should actually use a
> > database for the database-like parts or get much more clever
> > in your index
> > so you don't need what are essentially joins.
> >
> > All that said, a lot depends on the data set size. If your first query
> > results
> > in, say, 100 documents (pks) that you need to use for your
> > second query,
> > it probably doesn't matter whether you do a lot of manual
> > processing. If the
> > first query results in 1,000,000 pks, then it does....
> >
> > So how much data are you talking about? Even the single-index idea
> > depends upon whether we're talking a couple of G index size of a
> > couple of T...
> >
> > Best
> > Erick
> >
> >
> >
> >
> > On 6/28/07, Michael Böckling <Michael.Boeckling@dmc.de> wrote:
> > >
> > > Hi folks!
> > >
> > > I know there is a MultiSearcher for searching over multiple
> > indices, but
> > > my
> > > requirement is a bit special.
> > > I have two indices whose documents have a 1:m relationship.
> > Most queries
> > > will only use the primary index, but some will have to look
> > for detailed
> > > information in the secondary index (the index fields are of course
> > > different).
> > >
> > > What I plan to do:
> > > - first get the results from the primary index
> > > - then use the pk of the found documents and the additional search
> > > constraints to search in the secondary index
> > > - discard any primary results that did not match in the
> > secondary index
> > >
> > > Is this ok, or am I completely nuts by doing that? Is there a better
> > > alternative?
> > >
> > > Thanks for any clues!
> > >
> > > Michael
> > >
> > >
> > > --
> > > Michael Böckling
> > > Java Engineer
> > > dmc digital media center GmbH
> > > Rommelstraße 11
> > > 70376 Stuttgart (Germany)
> > > Telefon: +49 711 601747-0
> > > Telefax: +49 711 601747-141
> > > E-Mail: Michael.Boeckling@dmc.de
> > > Internet: www.dmc.de
> > >
> > > Handelsregister: AG Stuttgart HRB 18974
> > > Geschäftsführer: Andreas Magg, Daniel Rebhorn, Andreas Schwend
> > >
> > > ---------------------------------------------
> > > Besseres E-Business.
> > > dmc ist die kreative Vernetzung von Agentur, Systemhaus und
> > Service. Seit
> > > über 10 Jahren entwickeln und realisieren wir zukunftweisende und
> > > erfolgreiche E-Business-Lösungen. Zu unseren langjährigen
> > Kunden zählen
> > > neckermann.de, Kodak und Telekom Training.
> > >
> > > dmc auf Platz 8 im aktuellen New Media Service Ranking.
> > > Als inhabergeführte und netzwerkunabhängige Agentur gehören
> > wir mit einem
> > > Umsatz von 13,50 Mio. Euro zu den Top 10 der
> > erfolgreichsten New Media
> > > Dienstleister in Deutschland.
> > >
> > >
> > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message