Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 80002 invoked from network); 15 Sep 2009 08:20:16 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Sep 2009 08:20:16 -0000 Received: (qmail 24098 invoked by uid 500); 15 Sep 2009 08:20:14 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 24002 invoked by uid 500); 15 Sep 2009 08:20:14 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 23992 invoked by uid 99); 15 Sep 2009 08:20:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2009 08:20:14 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [82.132.130.152] (HELO mail.o2.co.uk) (82.132.130.152) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2009 08:20:03 +0000 Received: from [192.168.60.45] (188.220.58.136) by mail.o2.co.uk (8.0.013.3) (authenticated as joelhalbert) id 4A4B4C711A6EE287 for java-user@lucene.apache.org; Tue, 15 Sep 2009 09:19:37 +0100 Subject: Displaying search result data - stored fields vs external source From: Joel Halbert To: Lucene Users Content-Type: text/plain Organization: SU3 Analytics Date: Tue, 15 Sep 2009 09:19:50 +0100 Message-Id: <1253002790.7551.20.camel@bohr> Mime-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, When using Lucene I always consider two approaches to displaying search result data to users: 1. Store any fields that we index and display to users in the Lucene Documents themselves. When we perform a search simply retrieve the data to be displayed from the Lucence documents themselves. or 2. Index fields in Lucene but reference data to be displayed from another source, such as a database. So, when searching I would search for documents then use a (stored) reference key on the documents to then lookup the display fields to display from another source e.g. a database. With regards to the number and size of stored fields I am looking at indexing and displaying approximately 4 relatively small fields for each document (e.g. name, age, short description, URL ~ approx 500bytes in total). In any query about 10 hits will be displayed to the user. Approximately 10 million documents to index and search. I am interested the differences in both approaches with regards to: 1) Indexing time performance (how long it might take to index with and without stored fields) 2) Search time performance (total time taken to search for matching documents and then display fields to users) I am less interested in differences arising from maintainability/increased storage requirements. I would be interested to see what others think of using each approach. Cheers, Joel --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org