Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 69946 invoked from network); 22 Dec 2004 18:49:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 22 Dec 2004 18:49:28 -0000 Received: (qmail 52704 invoked by uid 500); 22 Dec 2004 18:49:21 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 52686 invoked by uid 500); 22 Dec 2004 18:49:21 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 52673 invoked by uid 99); 22 Dec 2004 18:49:21 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from fork2.mail.Virginia.EDU (HELO fork2.mail.virginia.edu) (128.143.2.192) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 22 Dec 2004 10:49:18 -0800 Received: from localhost (localhost [127.0.0.1]) by fork2.mail.virginia.edu (Postfix) with ESMTP id 91EBE1C234 for ; Wed, 22 Dec 2004 13:49:14 -0500 (EST) Received: from fork2.mail.virginia.edu ([127.0.0.1]) by localhost (fork2.mail.virginia.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 20983-02 for ; Wed, 22 Dec 2004 13:49:14 -0500 (EST) Received: from [128.143.167.108] (d-128-167-108.bootp.Virginia.EDU [128.143.167.108]) by fork2.mail.virginia.edu (Postfix) with ESMTP id 46CA61C0F2 for ; Wed, 22 Dec 2004 13:49:14 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v619) In-Reply-To: <41C9B23B.2020602@Gangkast.com> References: <20041222164337.90698.qmail@web12703.mail.yahoo.com> <41C9A915.1070701@Gangkast.com> <41C9B23B.2020602@Gangkast.com> Content-Type: text/plain; charset=US-ASCII; format=flowed Message-Id: <30A403ED-544A-11D9-910C-000A95BC61B6@ehatchersolutions.com> Content-Transfer-Encoding: 7bit From: Erik Hatcher Subject: Re: retrieve tokens Date: Wed, 22 Dec 2004 13:49:20 -0500 To: "Lucene Users List" X-Mailer: Apple Mail (2.619) X-UVA-Virus-Scanned: by amavisd-new at fork2.mail.virginia.edu X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Dec 22, 2004, at 12:43 PM, M. Smit wrote: > Erik Hatcher wrote: > But for the other issue on 'store lucene' vs 'store db'. Does anyone > can provide me with some field experience on size? > The system I'm developing will provide searching through some 2000 > pdf's, say some 200 pages each. I feed the plain text into Lucene on a > Field.UnStored bases. I also store this plain text in the database for > the sole purpose of presenting a context snippet. > > If I were to use the Highlighter with a Field.Text, I will not use the > database plain part altogether. But still I'm a little worried about > speed/space issues. Or am I just seeing bears-on-the-road (Dutch > saying, in plain English: making a fuzz about nothing).. Consider that you're only highlighting 20 or so entries at one time. Getting the text from a Lucene index you're already navigating will be quite quick. But it shouldn't be too bad to pull 20 records from a database either. There is one other consideration, and that is to use the new (CVS only) feature of capturing term vectors with position information. The author of the Highlighter, Mark Harwood, has posted in the not too distant past, an update to the Highlighter that can use this position information for highlighting rather than re-analyzing the original text. The re-analysis of the text may be the bottleneck, not the database access. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org