Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 88319 invoked from network); 15 Sep 2009 13:56:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Sep 2009 13:56:12 -0000 Received: (qmail 30048 invoked by uid 500); 15 Sep 2009 13:56:09 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 29989 invoked by uid 500); 15 Sep 2009 13:56:09 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 29979 invoked by uid 99); 15 Sep 2009 13:56:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2009 13:56:09 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of erickerickson@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-iw0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Sep 2009 13:56:00 +0000 Received: by iwn6 with SMTP id 6so1619560iwn.20 for ; Tue, 15 Sep 2009 06:55:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=diYcxCEGGlj5hEBJP1J9MC1K9dQ/Se5ARz2lB+lmh10=; b=scBloQTp1DdKWckFEixUmowHJyqHqGnhicKp+Nm1YlEgS4GdRZn9WKPa7CgtcB3uT1 dQFl1dq/lIcvg1+AiEHx+Vp3YG2C5bY8FVDF1AayOvbtIfUAvYz2a1Ymi489kxdrY+4Y Y/BrQNaW6hk1j3gwEvyZghpvZCDt4YyWVlero= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=SeL+2wG7fzkcGPA+06jsGfd10anji4RXOhUYCpzpwObUOqcXWYPL1o91jCfgFYkWgk hrdzWgcVefBC3R1IqHnfDuhVs6+d5mzq+OYnRTED74yd58iJPQWzRuSzYSGB6ErDxaoG IzEiaOb9wxdfTHvwAKe85SRh0luztwKGYFCH8= MIME-Version: 1.0 Received: by 10.231.126.69 with SMTP id b5mr5328046ibs.54.1253022938930; Tue, 15 Sep 2009 06:55:38 -0700 (PDT) In-Reply-To: <1253002790.7551.20.camel@bohr> References: <1253002790.7551.20.camel@bohr> Date: Tue, 15 Sep 2009 09:55:38 -0400 Message-ID: <359a92830909150655q67ca5ab1v151cff002b13bed7@mail.gmail.com> Subject: Re: Displaying search result data - stored fields vs external source From: Erick Erickson To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=0016368e2f2f2b9fa204739e2393 X-Virus-Checked: Checked by ClamAV on apache.org --0016368e2f2f2b9fa204739e2393 Content-Type: text/plain; charset=ISO-8859-1 Categorically I store everything in the index unless/until I *know* it doesn'twork. With some things, it's easy to know from the outset, like if I have 20T of data to store. First, storing fields has minimal impact on the search speed, the stored text isn't interleaved with the search tokens, so they're pretty much disjoint. Second, any scheme storing data separately is inherently more complex and difficult to maintain. From the eXtreme Programming folks "Do the simplest thing that could possibly work". Third, there isn't much work in trying it and seeing. I mean you have to write the retrieval code, and if you encapsulate fetching the data you can switch it out later if it comes to that pretty easily. So you don't lose much at all by "just trying it" .. HTH Erick On Tue, Sep 15, 2009 at 4:19 AM, Joel Halbert wrote: > Hi, > > When using Lucene I always consider two approaches to displaying search > result data to users: > > 1. Store any fields that we index and display to users in the Lucene > Documents themselves. When we perform a search simply retrieve the data > to be displayed from the Lucence documents themselves. > > or > > 2. Index fields in Lucene but reference data to be displayed from > another source, such as a database. So, when searching I would search > for documents then use a (stored) reference key on the documents to then > lookup the display fields to display from another source e.g. a > database. > > With regards to the number and size of stored fields I am looking at > indexing and displaying approximately 4 relatively small fields for each > document (e.g. name, age, short description, URL ~ approx 500bytes in > total). In any query about 10 hits will be displayed to the user. > Approximately 10 million documents to index and search. > > I am interested the differences in both approaches with regards to: > > 1) Indexing time performance (how long it might take to index with and > without stored fields) > 2) Search time performance (total time taken to search for matching > documents and then display fields to users) > > I am less interested in differences arising from > maintainability/increased storage requirements. > > I would be interested to see what others think of using each approach. > > Cheers, > Joel > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --0016368e2f2f2b9fa204739e2393--