lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Design Consideration for lucene index
Date Sat, 07 Oct 2006 00:08:42 GMT

The mantra I tell people when they are trying to decide how to index their
"relational" data is to start by asking yourself what you want the results
to be.

Is the primary list of "things" you want to return to your clients a list
of "tags" or a list of "images" ... It's not clear to me what the answer
is based on your question, but whatever it the "things" you care most
about are, make document for each, and denormalize the rest of the data
into those documents, indexing the stuff you want to search on, and
storing the stuff you want to be able to return.

Sometimes you have differnet use cases with differnet primary "things"
(ie: sometimes you want to return a list of movies, and sometimes you want
to return a list of actors) ... so you make differnet types of documents
and flatten the data in both -- you wind up storing the info that Bogart
was in the Maltese Falcon twice, once in the movie document and once in
the actor document, but that's what denormalizing your data for fast
searching is all about.

: Date: Fri, 6 Oct 2006 11:40:37 -0700
: From:
: Reply-To:
: To:
: Subject: Design Consideration for lucene index
: I am a newbie to the lucene search area. I would like to best way to do
: the following using lucene in terms of efficiency and the size of the
: index.
: Question : #1
: I have a table that contains some tags. These tags are tagged against
: multiple images that are in a different table (potentially 20 to 30,000
: images). If I am searching for a tag phrase and get the corresponding
: images, the approach that I was thinking is to join these two tables and
: index the result set.
: For example:
: Tag(abc)- ImageId1, Tag(abc)-ImageId2, Tag(abc)-ImageId3 etc. Hence this
: is a fairly fat joint. Assuming that we are doing like this how is the
: performance on lucene? If it is a bad design, what should be a better
: way of doing this? Looking forward to your valuable suggestions.
: Question : #2
: I need to search the multiple fields from a table. The search phrase
: needs to look for the fields DESCRIPTION1 and DESCRIPTION2 in the table.
: I have done something like this:
: while ( {
:  Document doc = new Document();
:  doc.add(new Field("ID", String.valueOf(rs.getInt("ID")),
: Field.Store.YES, Field.Index.UN_TOKENIZED));
:  doc.add(new Field("Description1", rs.getString("Description1"),
: Field.Store.YES, Field.Index.TOKENIZED));
:  doc.add(new Field("Description2", rs.getString("Description2"),
: Field.Store.YES, Field.Index.TOKENIZED));
:  String content = rs.getString("Description1") + " " +
: rs.getString("Description2")
:  doc.add(new Field("cContent", content, Field.Store.YES,
: Field.Index.TOKENIZED));
:  list[0].add(doc);
:  }
: Do I need to do the cContent part for searching? Is this increasing the
: size of the index? Is it better to create a dynamic query that looks for
: the description1 description2 field or use the cContent?
: Please help me in figuring out these things.
: Thanks
: Mathews
: ---------------------------------------------------------------------
: To unsubscribe, e-mail:
: For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message