lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: What *is* a lucene document?
Date Mon, 06 Jun 2005 05:04:53 GMT

: There are many other ways to approach this and my recommendation is
: just the simplest one based on the description of your needs.

I'd like to add one thing to Erik's excellent advice, something many
people (especially people use to dealing with rgorously structured data)
tend to overlook:

   Documents in a Lucene index can be heterogenous.

You can have some documents in your index with fields A, B, and C; and you
can have other documents in your index with field X, Y and Z.  And in some
cases you can have a search result of docs from that first set of
documents, in other cases you can have a search result with docs from the
second set, or you can return a mixed bag of both -- it all depends on how
you structure your query, and which fields you search on.

Consider a simplified example of your orriginal question:  What if you had
products with specs, and reviews of produduts.  You could have one
document per review, indexing all the text of the review, and the product
Ids of the products mentioned in it.  You can also have one document per
product, indexing all the spec data.  If a user does a search for "dell"
you can return results containing a mix of products and reviews that
contain the word dell.  By making the product Id field stored and indexed
in all documents, you can even provide links next to reviews to see all
the products mentioned in the review, or next to a product to see all
reviews that mention that product.

There's no need to limit yourself and say "based on my data, I will make
one document per X in my data model." you can make one doc per X, and one
doc per Y, and one doc per Z -- all depending on the desired behavior at
search time.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message