lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Böckling <Michael.Boeckl...@dmc.de>
Subject AW: Searching over multiple indexes with 1:m relationship
Date Thu, 28 Jun 2007 15:06:07 GMT
Hi Erickson,

thanks for your reply.

Of course you are right that its a bit insane to mimic a database-schema
with indices, but thats how it is. The primary index is already in use, the
extended requirements came later.

The Index isn't really that big, the primary one has 2-3 MB of data, I don't
know yet how big the secondary one will be, but probably less than 20 Megs.
The idea was that most searches will only need the first index, it is only
by using an extended search form that the secondary index is queried.
Keeping the first index small should help with performance, where the main
load is handled.

The number of primary results will often be less than 200, typically around
20 I guess, so its not that big of a deal to iterate through them.

Regards,

Michael


> -----Ursprüngliche Nachricht-----
> Von: Erick Erickson [mailto:erickerickson@gmail.com]
> Gesendet: Donnerstag, 28. Juni 2007 16:09
> An: java-user@lucene.apache.org
> Betreff: Re: Searching over multiple indexes with 1:m relationship
> 
> 
> I do have an off-the-wall question.. Why have two indexes? There
> are, of course, good reasons, but they're things like size and speed.
> 
> Where I'm going here is that Lucene does NOT require that all
> documents have the same fields. So it's perfectly reasonable to index
> heterogeneous data (or differing forms of the same data) in a 
> single index.
> This may not fit your requirements, but I thought I'd mention it.
> 
> That said, it really doesn't bear on your question since 
> you'd really have
> two logical indexes in the same physical index. Although 
> maybe it does.
> If all the data were in one index, then perhaps you could do 
> exactly one
> search instead.
> 
> I'm always leery of using an index to mimic what looks like database
> functionality. That often means that you either should actually use a
> database for the database-like parts or get much more clever 
> in your index
> so you don't need what are essentially joins.
> 
> All that said, a lot depends on the data set size. If your first query
> results
> in, say, 100 documents (pks) that you need to use for your 
> second query,
> it probably doesn't matter whether you do a lot of manual 
> processing. If the
> first query results in 1,000,000 pks, then it does....
> 
> So how much data are you talking about? Even the single-index idea
> depends upon whether we're talking a couple of G index size of a
> couple of T...
> 
> Best
> Erick
> 
> 
> 
> 
> On 6/28/07, Michael Böckling <Michael.Boeckling@dmc.de> wrote:
> >
> > Hi folks!
> >
> > I know there is a MultiSearcher for searching over multiple 
> indices, but
> > my
> > requirement is a bit special.
> > I have two indices whose documents have a 1:m relationship. 
> Most queries
> > will only use the primary index, but some will have to look 
> for detailed
> > information in the secondary index (the index fields are of course
> > different).
> >
> > What I plan to do:
> > - first get the results from the primary index
> > - then use the pk of the found documents and the additional search
> > constraints to search in the secondary index
> > - discard any primary results that did not match in the 
> secondary index
> >
> > Is this ok, or am I completely nuts by doing that? Is there a better
> > alternative?
> >
> > Thanks for any clues!
> >
> > Michael
> >
> >
> > --
> > Michael Böckling
> > Java Engineer
> > dmc digital media center GmbH
> > Rommelstraße 11
> > 70376 Stuttgart (Germany)
> > Telefon: +49 711 601747-0
> > Telefax: +49 711 601747-141
> > E-Mail: Michael.Boeckling@dmc.de
> > Internet: www.dmc.de
> >
> > Handelsregister: AG Stuttgart HRB 18974
> > Geschäftsführer: Andreas Magg, Daniel Rebhorn, Andreas Schwend
> >
> > ---------------------------------------------
> > Besseres E-Business.
> > dmc ist die kreative Vernetzung von Agentur, Systemhaus und 
> Service. Seit
> > über 10 Jahren entwickeln und realisieren wir zukunftweisende und
> > erfolgreiche E-Business-Lösungen. Zu unseren langjährigen 
> Kunden zählen
> > neckermann.de, Kodak und Telekom Training.
> >
> > dmc auf Platz 8 im aktuellen New Media Service Ranking.
> > Als inhabergeführte und netzwerkunabhängige Agentur gehören 
> wir mit einem
> > Umsatz von 13,50 Mio. Euro zu den Top 10 der 
> erfolgreichsten New Media
> > Dienstleister in Deutschland.
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message