Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 65290 invoked from network); 23 Jul 2008 07:43:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Jul 2008 07:43:27 -0000 Received: (qmail 46204 invoked by uid 500); 23 Jul 2008 07:43:21 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 46170 invoked by uid 500); 23 Jul 2008 07:43:20 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 46158 invoked by uid 99); 23 Jul 2008 07:43:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jul 2008 00:43:20 -0700 X-ASF-Spam-Status: No, hits=4.0 required=10.0 tests=DNS_FROM_OPENWHOIS,FORGED_YAHOO_RCVD,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jul 2008 07:42:26 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1KLYzq-0005Vl-Nc for java-user@lucene.apache.org; Wed, 23 Jul 2008 00:42:50 -0700 Message-ID: <18605547.post@talk.nabble.com> Date: Wed, 23 Jul 2008 00:42:50 -0700 (PDT) From: starz10de To: java-user@lucene.apache.org Subject: Re: storing the contents of a document in the lucene index In-Reply-To: <18595855.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: farag_ahmed@yahoo.com References: <18595855.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Hi Erik, I don't remove the stop words, as I index parallel corpora which is used for learning the translations between pair of languages. so every word is important. I even develop my own analyzer for Arabic which is just remove punctuations and special symbols and it return only Arabic text. I guess in the FileDocument.java the whole text is already stored doc.add(Field.Text("contents", IN)); where IN is IN = new BufferedReader(new InputStreamReader(new FileInputStream(f)) if this is not the case yould you please how to store the whole text inside the index ? I am new to lucene and I don't know how to use this "Field.Store.YES" to store whole text. Best regards Farag starz10de wrote: > > Could any one tell me please how to print the content of the document > after reading the index. > for example if i like to print the index terms then i do : > > IndexReader ir = IndexReader.open(index); > TermEnum termEnum = ir.terms(); > while (termEnum.next()) { > TermDocs dok = ir.termDocs(); > dok.seek(termEnum); > while (dok.next()) { > System.out.println(termEnum.term().text().trim()); > } > > I can print the text files before indexing them, but because of encoding > issues i like to print them from the index. > As i know the content of the document(whole text) is also stored in the > index, my question how to print this content. > > so at the end i will print the path of the current document , index terms > and the content of the document > > > thanks in advance > -- View this message in context: http://www.nabble.com/storing-the-contents-of-a-document-in-the--lucene-index-tp18595855p18605547.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org