Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (asf.osuosl.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding;
  b=B+SI7U8ly5aC6jm6i51an4HKbalGwHieXNNxAC0yRfQ/K/NuzKsdPnEX9J4uumu7RuBBV4XPfYd3dLmXrl+xwGb1kwSechGOWZs8228cdRLFmXFI3H4OIjysJpUjk1lMPUpvnp2YP/d2cLQIZDygqfxPt4B2pJkHCMvk04pEP5E=
  ;
Message-ID: <20051112013220.98811.qmail@web31206.mail.mud.yahoo.com>
Date: Fri, 11 Nov 2005 17:32:20 -0800 (PST)
From: bib_lucene bib <bib_lucene@yahoo.com>
Subject: Re: Document as Paramter (Rephrased)
To: java-user@lucene.apache.org
In-Reply-To: <1E6CEEAC-B893-4284-9652-FF6F8E9B23DF@ehatchersolutions.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="0-5833175-1131759140=:97247"
Content-Transfer-Encoding: 8bit

--0-5833175-1131759140=:97247
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: 8bit

Thanks for your time.
 
-- Text I want to highlight is stored in the file system and index
-- I can search and highlight the searched terms in results page ( just snippets)
-- I have given a download link next to snippets ( which will  point to file I stored in ROOT webapp of tomcat)
 
I understood the concept of NullFragmenter, sorry for repeating myself...
It is something like a google search. In google if I enter search term "highlight" and click on search, I get back search results with word "highlight" in bold. ( I can do that)
Now when I click on the links (Ex: GNU Source-highlight 2.2) which is http://www.gnu.org/software/src-highlite/source-highlight.html I want the term 'highlight" in bold when the page is displayed (This I do not know how to do)
 
I will index docs like html, pdf, word etc. 
As I have already extracted text using textminer etc, question is when I click on "show full document" link in search results page which I will give below the highlighted search snippet how can i pull out full text of the document from index. As there is no unique identifier for each doc in the index (?)
 
Or do I just have to extract the text from the file stored in file system and pass it to the highlighter. My only concern here would be if like 300 users do this operation it might fail with an error like maximum files open limit exceeded etc.
 
Here is the code snippet.
LuceneHitHighlighter highlighter = new LuceneHitHighlighter(queryStr, "snippet", "body");
                for (int i = 0; i < hits.size(); i++) {
                    Document doc = (Document) hits.get(i);
                    highlighter.doHighlight(doc);
                    out.println("SNIPPET: " + doc.get("snippet"));
                    out.println("<hr>");
 
Using Standard Analyzer
public LuceneHitHighlighter(String queryText, String highlightFieldName, String textFieldName)
        throws ParseException
    {
        Query query = QueryParser.parse(queryText, textFieldName, _analyzer);
        _highlighter = new Highlighter(_formatter, new QueryScorer(query));
        _highlighter.setTextFragmenter(new SimpleFragmenter(100));
        _maxNumFragmentsRequired = 3;
        _textFieldName = textFieldName;
        _highlightFieldName = highlightFieldName;
    }
    public void doHighlight(Document doc)
    {
        String highlightText = "";
        String text = doc.get(_textFieldName);
        if (text != null) {
            TokenStream tokenStream = _analyzer.tokenStream(_textFieldName, new StringReader(text));
            try {
                highlightText = _highlighter.getBestFragments(tokenStream, text, _maxNumFragmentsRequired, "...");
                // just store highlight text into document, nothing else
                doc.add(new Field(_highlightFieldName, highlightText, true, false, false));
            }
                        }
        }

Erik Hatcher <erik@ehatchersolutions.com> wrote:

On 11 Nov 2005, at 12:54, bib_lucene bib wrote:
> My requirement is that I do a search, the results of the search are 
> displayed. I am displaying results by using getbestfragmets and 
> highlighting searched text.
>
> So basically the user can search and see what documents matched his 
> search with snippets of text shown in the result of search.
>
> Now the user needs to select one of the results and view the 
> document completely.
>
> What I did is along with indexing I stored the document in the ROOT 
> webapplication and gave a link to the file. This worked fine.
>
> However, now I need to display the full document with search terms 
> highlighted.
>
> As I understand now I can do this...
>
> 1. Read the whole document again and apply some css to highlight 
> terms.
>
> 2. Create a new Lucene Document -> Read the file -> Add file 
> Content to Document -> Use nullfragmenter -> Apply CSS -> Display 
> result.

There is no need to create a new Lucene Document just to highlight 
text. The text you highlight does not have to be the text you 
indexed even. The Highlighter takes a String to highlight and a 
Query to extract the terms to highlight from.

> Question is , is there a way I can just show the content from index 
> rather then parsing the document again, when the user wants to see 
> full document from search results page.

Well, the question really comes back to you.... where is the text you 
want to highlight? You can store it in a Lucene index, in a 
database, on the filesystem, wherever. Neither Lucene nor the 
Highlighter are related to that decision whatsoever. So using a 
NullFragmenter, which I will commit to contrib/highlighter soon, use 
the Highlighter on the full text.

If you're still having issues, send us a (brief, please!) piece of 
code you're using to highlight.

Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------
 Yahoo! FareChase - Search multiple travel sites in one click.  
--0-5833175-1131759140=:97247--