lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From syedfa <fayyazud...@gmail.com>
Subject Re: Creating an index from an XML file using Lucene in Java
Date Tue, 29 Jul 2008 04:40:43 GMT

Dear Karsten:

Sorry for the multiple posts, but I have made some progress.  I think in
order to search multiple fields, I should be using the
MultipleFieldsQueryParser class, and simply pass a String array containing
the fields I wish to search over.  My follow-up question to you is this: 
How do I highlight the results returned from the MultipleFieldsQueryParser? 
As of this moment, my Searcher code looks like this:

List searchResult = new ArrayList();
        Directory fsDir=FSDirectory.getDirectory(indexDir);
        IndexSearcher is=new IndexSearcher(fsDir);
        
        String[] fields = {"SCENE-COMMENTARY", "LINES"};
        Analyzer analyser = new StandardAnalyzer();
        Query parser=new MultiFieldQueryParser(fields, analyser).parse(q);
        //parser.setAllowLeadingWildcard(true); 
        long start=new Date().getTime();
        Hits hits=is.search(parser);
        long end=new Date().getTime();
        QueryScorer scorer = new QueryScorer(parser);
        SimpleHTMLFormatter formatter = new SimpleHTMLFormatter("", "");
    	Highlighter highlighter = new Highlighter(formatter, scorer);
    	//Highlighter highlighter = new Highlighter(scorer);
    	Highlighter high = new Highlighter(formatter, scorer);
    	//Highlighter high = new Highlighter(scorer);
    	Fragmenter fragmenter = new NullFragmenter();
    	Fragmenter fragment = new SimpleFragmenter(250);
    	highlighter.setTextFragmenter(fragmenter);
    	high.setTextFragmenter(fragment);
    	
        for(int i=0; i<hits.length(); i++){
        	Document doc=hits.doc(i);
        	String com = doc.get("SCENE-COMMENTARY");
        	String lns = doc.get("LINES");
            //String spkr = doc.get("SPEAKER");
            TokenStream lines = analyser.tokenStream("LINES", new
StringReader(lns));
            CachingTokenFilter filter = new CachingTokenFilter(lines);
            //TokenStream speaker = analyser.tokenStream("SPEAKER", new
StringReader(spkr));
            String highlightedLines = highlighter.getBestFragment(filter,
lns);
            filter.reset();
            String highlight = high.getBestFragment(filter, lns);
        	SearchResult resultBean = new SearchResult();
        	resultBean.setReference(hits.doc(i).get("REFERENCE"));
        	resultBean.setNarrator(hits.doc(i).get("SPEAKER"));
        	resultBean.setHitResult(highlight);
        	resultBean.setQuote(highlightedLines);
        	searchResult.add(resultBean);
        	System.out.println(resultBean.getReference());
        	System.out.println(resultBean.getNarrator());
         	System.out.println(resultBean.getHitResult());
         	System.out.println("");
        	System.out.println(resultBean.getQuote());
        	System.out.println("");
        }
        
        System.err.println("Found " + hits.length() + " document(s)(in " +
(end-start) + " milliseconds) that matched query '" + q + "':"); 
        
        return searchResult;        
    }

Thanks again for all of your help, I do sincerely appreciate it.

Take care.
Fayyaz


Karsten F. wrote:
> 
> Hi Fayyaz,
> 
> again, this is about SAX-Handler not about lucene.
> 
> My understanding of what you want:
> 1. one lucene document for each SPEECH-Element (already implemented)
> 2. one lucene document for each SCENE-COMMENTARY-Element (not implemented
> yet).
> 
> correct?
> 
> If yes, you can write
>                 if(qName.equals("SPEECH") ||
> qName.equals("SCENE-COMMENTARY")){
>                         doc=new Document();
>                 }
> and
> 
> public void endElement(String uri, String localName, String qName) throws
> SAXException{
> ...
> else if(qName.equals("SCENE-COMMENTARY")){
>  Field lines = new Field(qName, elementBuffer.toString(), Field.Store.YES,
> Field.Index.TOKENIZED, Field.TermVector.YES);
>  doc.add(lines);
> }
> ...
> if(qName.equals("SPEECH") || qName.equals("SCENE-COMMENTARY")){
>   indexWriter.addDocument(doc);
> }
> 
> (instead of "indexWriter.addDocument(doc);" in block of
> if(qName.equals("LINES")){ )
> 
> 
> 
> Best regards
>   Karsten
> 
> P.S.:
> If you want to learn java: 
> I really like 
> http://www.java-hamster-modell.de/
> possible there is an english version somewhere?
> 
> 
> syedfa wrote:
>> 
>> I think I understand what you are saying, but I was hoping you could
>> clarify a little further.  in the start-element method, I have the
>> following:
>> 
>>                 if(qName.equals("SPEECH")){ 
>>                         doc=new Document(); 
>>                 }
>> 
>> are you saying that I should add an identical block of code for
>> <SCENE-COMMENTARY> as well, and include a similar clause in the
>> endElement method as well? i.e.
>> 
>>                          else if(qName.equals("SCENE-COMMENTARY")){ 
>>                                 Field lines = new Field(qName,
>> elementBuffer.toString(), Field.Store.YES, Field.Index.TOKENIZED,
>> Field.TermVector.YES); 
>>                                 lines.setBoost(1.0f); 
>>                                 doc.add(lines); 
>>                                 indexWriter.addDocument(doc);
>>                          } 
>> 
>> Does it also matter where in the if/else if clauses I mention the
>> "SCENE-COMMENTARY" tag?  ie. should I mention it first?  last?  or does
>> the order matter?
>> 
>> Just wondering.
>> Thanks again for your prompt reply.
>> Sincerely;
>> Fayyaz
>> 
>> P.S.  This is actually a personal project, as I have developed an
>> interest in Information Retrieval and simply wanted to work on a creative
>> project to help me develop my skills.  :-) 
>> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Creating-an-index-from-an-XML-file-using-Lucene-in-Java-tp18678779p18705179.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message