lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ryan Aslett" <Ryan.Asl...@Qsent.com>
Subject Fields
Date Wed, 12 May 2004 01:39:11 GMT
How much of a performance benefit/impact does "fielding" your data have
in Lucene?

Lets say I have 100 million documents.  I have Name, Phone, and Address
for each document.

I could either index the terms in separate fields, like 
Field.Text("Name","Bob Jones");
Field.Keyword("Phone","5551212");
Field.Text("Address","123 Main");

Or, I could make everything in the same field, prepending a field
designator to the term itself as keywords, like:
Field.Keyword("Universal","nmBob");
Field.Keyword("Universal","nmJones");
Field.Keyword("Universal","ph5551212");
Field.Keyword("Universal","ad123");
Field.Keyword("Universal","adMain");

And when I build my queries always seach the same field, and prepend the
"fieldcode" to the search term.

Lets also assume that these universal fields are only indexed and not
stored, and I store something completely different as the actual stored
data.

Assumptions: 
*Indexing/Preprocessing speed isnt important, unless its orders of
magnitude slower.
*10 indexes of 10 million Documents each.

Does anybody have any ideas as to the impact on query performance with
this method? Pros/Cons?

A commercial product that we are using is much slower when "fielding"
data, and has the concept of "unfielded literals". This second method is
how we currently field data and it seems to give us a tremendous
performance boost. Im curious if Lucene works in a similar fashion...

Ryan Aslett

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message