lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Lucene Seaches VS. Relational database Queries
Date Thu, 13 Apr 2006 20:34:15 GMT

: Also we need to address the Join Between A and B and C, which I don't know
: see how with out taking out values out of the hit list.

When discussing Index structure strategies, speaking in generalities like
A B and C is hard .. because there is no 100% generaic solution about how
to "join" X and Y in lucuen ... lucene isn't a relational database, it's
not designed to be a relational database, you shouldn't try to map
relational data concepts directly to lucene.

instead, the questions you should ask yourself are:
  1) what kinds of objects are my users going to want ot search for?
  2) how are they going to want to search for each of those types objects?

The answer to question #1 determines what your Documents should be.  If
you have more then one type of object, you will have more then one type of
document -- wether or not you put those different types of documents in
one index, or seperate indexes is up to you, there are pros and cons for
both cases.  The answer to question #2 determins what you should put in
the indexed fields for each type of Document.

lets assume a simple case where you have the following answers...
  1) Movies, and People
  2) Movies: movie's name, movie's description, cast/crew names/titles
     People: person's name, names of movies person has worked on

...then you have two types of documents.

For documents relating to movies, you have a "name" field, and a
"desciption" field and a "crew" field where you put everyone that worked
on the film, and a "cast" field where you put everyone that appeared on
screen, and then for "key" positions you make specific named fields...
"directory", "producer", "art_director", "best_boy".   And another field
htat every movie has called "doctype" with the value "movie".

For documents relating to people, you use "doctype":"person", You give
them a "first_name" and a "last_name", anda "movies" field where you list
them all.

when someone wants to search for movies with names like "Shining" you
search for "+doctype:movie +name:shining"  when they wnat to search for
people named bob, you search for "+doctype:person +first_name:bob"

	Have you denormalized your data?  Yes.
	Is this bad?  No.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message