Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 81124 invoked from network); 14 May 2004 15:30:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 14 May 2004 15:30:28 -0000 Received: (qmail 73494 invoked by uid 500); 14 May 2004 15:30:34 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 73437 invoked by uid 500); 14 May 2004 15:30:32 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 73394 invoked by uid 98); 14 May 2004 15:30:32 -0000 Received: from lu1@bihvhar.com by hermes.apache.org by uid 82 with qmail-scanner-1.20 (clamuko: 0.70. Clear:RC:0(216.193.202.192):. Processed in 0.018601 secs); 14 May 2004 15:30:32 -0000 X-Qmail-Scanner-Mail-From: lu1@bihvhar.com via hermes.apache.org X-Qmail-Scanner: 1.20 (Clear:RC:0(216.193.202.192):. Processed in 0.018601 secs) Received: from unknown (HELO nasa.lunarpages.com) (216.193.202.192) by hermes.apache.org with SMTP; 14 May 2004 15:30:31 -0000 Received: from pcp09530428pcs.ewndsr01.nj.comcast.net ([69.240.50.229] helo=pmc) by nasa.lunarpages.com with asmtp (Exim 4.34) id 1BOefD-0000jz-9Q for lucene-user@jakarta.apache.org; Fri, 14 May 2004 08:31:55 -0700 Message-ID: <01b701c439c8$5b3fb240$6400a8c0@CORP.AD.FACTIVA.NET> From: "Peter M Cipollone" To: "Lucene Users List" References: <566BCFC0EFBAAC40AF3EF4CA8228AD539E8711@master.valinf.com> Subject: Re: Getting a field value from a large indexed document is slow. Date: Fri, 14 May 2004 11:30:13 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - nasa.lunarpages.com X-AntiAbuse: Original Domain - jakarta.apache.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - bihvhar.com X-Source: X-Source-Args: X-Source-Dir: X-Spam-Rating: hermes.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Paul, It might be worth your while to store the file itself outside lucene, and only store the filename in the stored data. This is generally how relational databases deal with LOBs, and will work with Lucene, too. You will also save yourself hours when it comes time to merge indices or optimize, since those operations are, in effect, large copy operations. Regards, Pete ----- Original Message ----- From: "Paul Williams" To: "'Lucene Users List'" Sent: Friday, May 14, 2004 11:22 AM Subject: Getting a field value from a large indexed document is slow. > Hi, > > I hope someone can help! > I am using Lucene to make a searching repository of electronic documents. > (MS Office, PDF's etc.). Some of these document can contain a large amount > of text (about 500K of text in some cases) which is indexed to make it > searchable. > > Doing the search and getting the hits found is not effected by the size of > the document found. > > But when I try and access a field (my document id) in the document > > i.e. > > // Create Lucene Doc with value > Document doc = hits.doc(i); > > String number = doc.get("Field10"); > > > The creation of the Lucene document can take up to a second per hit. I don't > actually use any of the other fields apart from getting my ID value from > field10. > > So my question is:- > > Is there a smarter way of getting out the 'Field10' value without it > populating all the rest of the fields in the Lucene document and therefore > reduce the time taken for this action. > > > Paul > > DISCLAIMER: > The information in this message is confidential and may be legally > privileged. It is intended solely for the addressee. Access to this message > by anyone else is unauthorised. If you are not the intended recipient, any > disclosure, copying, or distribution of the message, or any action or > omission taken by you in reliance on it, is prohibited and may be unlawful. > Please immediately contact the sender if you have received this message in > error. > Thank you. > Valid Information Systems Limited. Address: Morline House, 160 London > Road, Barking, Essex, IG11 8BB. > http://www.valinf.com Tel: +44 (0) 20 8215 1414 Fax: +44 (0) 20 8215 2040 > ----------------------------------------- > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org