Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 74686 invoked from network); 29 Jul 2008 04:41:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Jul 2008 04:41:19 -0000 Received: (qmail 79689 invoked by uid 500); 29 Jul 2008 04:41:13 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 79658 invoked by uid 500); 29 Jul 2008 04:41:13 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 79646 invoked by uid 99); 29 Jul 2008 04:41:13 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Jul 2008 21:41:13 -0700 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 Jul 2008 04:40:17 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1KNh0t-0007kX-4g for java-user@lucene.apache.org; Mon, 28 Jul 2008 21:40:43 -0700 Message-ID: <18705179.post@talk.nabble.com> Date: Mon, 28 Jul 2008 21:40:43 -0700 (PDT) From: syedfa To: java-user@lucene.apache.org Subject: Re: Creating an index from an XML file using Lucene in Java In-Reply-To: <18686430.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: fayyazuddin@gmail.com References: <18678779.post@talk.nabble.com> <18679016.post@talk.nabble.com> <18682150.post@talk.nabble.com> <18686430.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Dear Karsten: Sorry for the multiple posts, but I have made some progress. I think in order to search multiple fields, I should be using the MultipleFieldsQueryParser class, and simply pass a String array containing the fields I wish to search over. My follow-up question to you is this: How do I highlight the results returned from the MultipleFieldsQueryParser? As of this moment, my Searcher code looks like this: List searchResult = new ArrayList(); Directory fsDir=FSDirectory.getDirectory(indexDir); IndexSearcher is=new IndexSearcher(fsDir); String[] fields = {"SCENE-COMMENTARY", "LINES"}; Analyzer analyser = new StandardAnalyzer(); Query parser=new MultiFieldQueryParser(fields, analyser).parse(q); //parser.setAllowLeadingWildcard(true); long start=new Date().getTime(); Hits hits=is.search(parser); long end=new Date().getTime(); QueryScorer scorer = new QueryScorer(parser); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", ""); Highlighter highlighter = new Highlighter(formatter, scorer); //Highlighter highlighter = new Highlighter(scorer); Highlighter high = new Highlighter(formatter, scorer); //Highlighter high = new Highlighter(scorer); Fragmenter fragmenter = new NullFragmenter(); Fragmenter fragment = new SimpleFragmenter(250); highlighter.setTextFragmenter(fragmenter); high.setTextFragmenter(fragment); for(int i=0; i > Hi Fayyaz, > > again, this is about SAX-Handler not about lucene. > > My understanding of what you want: > 1. one lucene document for each SPEECH-Element (already implemented) > 2. one lucene document for each SCENE-COMMENTARY-Element (not implemented > yet). > > correct? > > If yes, you can write > if(qName.equals("SPEECH") || > qName.equals("SCENE-COMMENTARY")){ > doc=new Document(); > } > and > > public void endElement(String uri, String localName, String qName) throws > SAXException{ > ... > else if(qName.equals("SCENE-COMMENTARY")){ > Field lines = new Field(qName, elementBuffer.toString(), Field.Store.YES, > Field.Index.TOKENIZED, Field.TermVector.YES); > doc.add(lines); > } > ... > if(qName.equals("SPEECH") || qName.equals("SCENE-COMMENTARY")){ > indexWriter.addDocument(doc); > } > > (instead of "indexWriter.addDocument(doc);" in block of > if(qName.equals("LINES")){ ) > > > > Best regards > Karsten > > P.S.: > If you want to learn java: > I really like > http://www.java-hamster-modell.de/ > possible there is an english version somewhere? > > > syedfa wrote: >> >> I think I understand what you are saying, but I was hoping you could >> clarify a little further. in the start-element method, I have the >> following: >> >> if(qName.equals("SPEECH")){ >> doc=new Document(); >> } >> >> are you saying that I should add an identical block of code for >> as well, and include a similar clause in the >> endElement method as well? i.e. >> >> else if(qName.equals("SCENE-COMMENTARY")){ >> Field lines = new Field(qName, >> elementBuffer.toString(), Field.Store.YES, Field.Index.TOKENIZED, >> Field.TermVector.YES); >> lines.setBoost(1.0f); >> doc.add(lines); >> indexWriter.addDocument(doc); >> } >> >> Does it also matter where in the if/else if clauses I mention the >> "SCENE-COMMENTARY" tag? ie. should I mention it first? last? or does >> the order matter? >> >> Just wondering. >> Thanks again for your prompt reply. >> Sincerely; >> Fayyaz >> >> P.S. This is actually a personal project, as I have developed an >> interest in Information Retrieval and simply wanted to work on a creative >> project to help me develop my skills. :-) >> > > -- View this message in context: http://www.nabble.com/Creating-an-index-from-an-XML-file-using-Lucene-in-Java-tp18678779p18705179.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org