Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@www.apache.org Received: (qmail 21285 invoked from network); 14 Sep 2004 15:50:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 14 Sep 2004 15:50:44 -0000 Received: (qmail 36342 invoked by uid 500); 14 Sep 2004 15:50:40 -0000 Delivered-To: apmail-jakarta-lucene-dev-archive@jakarta.apache.org Received: (qmail 36272 invoked by uid 500); 14 Sep 2004 15:50:40 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 36259 invoked by uid 99); 14 Sep 2004 15:50:40 -0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=DNS_FROM_RFC_POST X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from [64.4.18.70] (HELO hotmail.com) (64.4.18.70) by apache.org (qpsmtpd/0.28) with ESMTP; Tue, 14 Sep 2004 08:50:36 -0700 Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Tue, 14 Sep 2004 08:50:35 -0700 Received: from 67.96.135.254 by by24fd.bay24.hotmail.msn.com with HTTP; Tue, 14 Sep 2004 15:50:35 GMT X-Originating-IP: [67.96.135.254] X-Originating-Email: [flyabovesun@hotmail.com] X-Sender: flyabovesun@hotmail.com From: "Haipeng Du" To: lucene-dev@jakarta.apache.org Bcc: Subject: RE: I am new to lucene Date: Tue, 14 Sep 2004 09:50:35 -0600 Mime-Version: 1.0 Content-Type: text/plain; format=flowed Message-ID: X-OriginalArrivalTime: 14 Sep 2004 15:50:35.0231 (UTC) FILETIME=[92C156F0:01C49A72] X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Thanks Aviran. But how could I use content to search the document if I use Field.Text("content", reader). I do not want to save the document content . Thanks a lot. Haipeng >From: "Aviran" >Reply-To: "Lucene Developers List" >To: "'Lucene Developers List'" >Subject: RE: I am new to lucene >Date: Tue, 14 Sep 2004 11:28:08 -0400 > >1. Field.Text Constructs a Reader-valued Field that is tokenized and >indexed, but is not stored in the index verbatim, Thus you can not retrieve >the text. You need to use Field.Text("content", String) to be able to read >back the content. >2. You can use an open source project called PDFBox which can extract text >from a PDF document. > >Aviran > >-----Original Message----- >From: Haipeng Du [mailto:flyabovesun@hotmail.com] >Sent: Tuesday, September 14, 2004 11:18 AM >To: lucene-dev@jakarta.apache.org >Subject: I am new to lucene > > >Hi, everyone: >I am new to Lucene. There are some questions I want to know why. >(1) when I use Field.Text("content", Reader) to index the file content, I >can not retrive it when I search. Here is part of code >Analyzer analyzer = new StopAnalyzer(); > Searcher searcher = new IndexSearcher(indexPath); > Query query = QueryParser.parse(queryString, key2, > analyzer); > Hits hits = searcher.search(query); >I can not find the field when I use : hits.doc(i).get("content"). It is >null. But I can get all other fields value as the same way. How could I get >that? >(2) Does Lucene have a way to index pdf content? Which is the best API that >can be easy used to change pdf to text? >Please response me. Thanks a lot. >Haipeng > >_________________________________________________________________ >Express yourself instantly with MSN Messenger! Download today - it's FREE! >hthttp://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-dev-help@jakarta.apache.org > _________________________________________________________________ Don�t just search. Find. Check out the new MSN Search! http://search.msn.click-url.com/go/onm00200636ave/direct/01/ --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org