Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 41702 invoked from network); 17 Mar 2007 15:22:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 17 Mar 2007 15:22:58 -0000 Received: (qmail 55743 invoked by uid 500); 17 Mar 2007 15:22:59 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 55677 invoked by uid 500); 17 Mar 2007 15:22:59 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 55666 invoked by uid 99); 17 Mar 2007 15:22:59 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Mar 2007 08:22:59 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (herse.apache.org: local policy) Received: from [208.97.132.74] (HELO spunkymail-a8.g.dreamhost.com) (208.97.132.74) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 17 Mar 2007 08:22:48 -0700 Received: from [192.168.0.2] (adsl-074-229-189-244.sip.rmo.bellsouth.net [74.229.189.244]) by spunkymail-a8.g.dreamhost.com (Postfix) with ESMTP id 06CFD10AC3E for ; Sat, 17 Mar 2007 08:22:25 -0700 (PDT) Mime-Version: 1.0 (Apple Message framework v752.2) In-Reply-To: References: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed Message-Id: Content-Transfer-Encoding: 7bit From: Grant Ingersoll Subject: Re: Storing whole document in the index Date: Sat, 17 Mar 2007 11:22:27 -0400 To: java-dev@lucene.apache.org X-Mailer: Apple Mail (2.752.2) X-Virus-Checked: Checked by ClamAV on apache.org Please ask these type of questions on the user mailing list, you will get much better responses. The dev list is for developers of Lucene. To answer your question, yes you can do this. You may find the FieldSelector API additions and Lazy Field Loading to be helpful performance wise, as well. -Grant On Mar 17, 2007, at 8:36 AM, jafarim wrote: > Hello > It's a whil that I am using lucene and as most of people seemingly > do, I > used to save only some important fields of a docuemnt in the index. > But > recently I thought why not store the whole document bytes as an > untokenized > field in the index in order to ease the retrieval process? For example > serialize the pdf file into a byte[] and then save the bytes as a > field in > the index.(some gzip and base64 encodings may be needed as glue > logic). Then > I can delete the original file from the system. Is there any reason > against > this idea? Can lucene bear this large volume of input streamed data? -------------------------- Grant Ingersoll Center for Natural Language Processing http://www.cnlp.org Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ LuceneFAQ --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org