Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 83083 invoked from network); 12 May 2004 01:39:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 12 May 2004 01:39:30 -0000 Received: (qmail 38166 invoked by uid 500); 12 May 2004 01:39:18 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 38063 invoked by uid 500); 12 May 2004 01:39:18 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 38044 invoked by uid 98); 12 May 2004 01:39:17 -0000 Received: from ryan.aslett@qsent.com by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(208.252.86.17):. Processed in 0.381076 secs); 12 May 2004 01:39:17 -0000 X-Qmail-Scanner-Mail-From: ryan.aslett@qsent.com via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(208.252.86.17):. Processed in 0.381076 secs) Received: from unknown (HELO mail2.qsent.com) (208.252.86.17) by hermes.apache.org with SMTP; 12 May 2004 01:39:17 -0000 Received: (qmail 13946 invoked from network); 12 May 2004 01:39:10 -0000 Received: from unknown (HELO exchange.qsent.com) (10.110.1.19) by zaxxon.qsent.com with SMTP; 12 May 2004 01:39:10 -0000 X-MimeOLE: Produced By Microsoft Exchange V6.0.6487.1 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: Fields Date: Tue, 11 May 2004 18:39:11 -0700 Message-ID: <5238CD8601F3EF4BA3C5553FECBB7D2A027F7BAA@exchange.qsent.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Fields Thread-Index: AcQ3vjyEu9k32q5JRcuohU2Ez3U2xgAANZig From: "Ryan Aslett" To: "Lucene Users List" X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N How much of a performance benefit/impact does "fielding" your data have in Lucene? Lets say I have 100 million documents. I have Name, Phone, and Address for each document. I could either index the terms in separate fields, like=20 Field.Text("Name","Bob Jones"); Field.Keyword("Phone","5551212"); Field.Text("Address","123 Main"); Or, I could make everything in the same field, prepending a field designator to the term itself as keywords, like: Field.Keyword("Universal","nmBob"); Field.Keyword("Universal","nmJones"); Field.Keyword("Universal","ph5551212"); Field.Keyword("Universal","ad123"); Field.Keyword("Universal","adMain"); And when I build my queries always seach the same field, and prepend the "fieldcode" to the search term. Lets also assume that these universal fields are only indexed and not stored, and I store something completely different as the actual stored data. Assumptions:=20 *Indexing/Preprocessing speed isnt important, unless its orders of magnitude slower. *10 indexes of 10 million Documents each. Does anybody have any ideas as to the impact on query performance with this method? Pros/Cons? A commercial product that we are using is much slower when "fielding" data, and has the concept of "unfielded literals". This second method is how we currently field data and it seems to give us a tremendous performance boost. Im curious if Lucene works in a similar fashion... Ryan Aslett --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org