Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76564 invoked from network); 12 Nov 2005 01:32:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 12 Nov 2005 01:32:48 -0000 Received: (qmail 74001 invoked by uid 500); 12 Nov 2005 01:32:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 73962 invoked by uid 500); 12 Nov 2005 01:32:42 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 73949 invoked by uid 99); 12 Nov 2005 01:32:42 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Nov 2005 17:32:42 -0800 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [68.142.200.223] (HELO web31206.mail.mud.yahoo.com) (68.142.200.223) by apache.org (qpsmtpd/0.29) with SMTP; Fri, 11 Nov 2005 17:32:34 -0800 Received: (qmail 98813 invoked by uid 60001); 12 Nov 2005 01:32:20 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=B+SI7U8ly5aC6jm6i51an4HKbalGwHieXNNxAC0yRfQ/K/NuzKsdPnEX9J4uumu7RuBBV4XPfYd3dLmXrl+xwGb1kwSechGOWZs8228cdRLFmXFI3H4OIjysJpUjk1lMPUpvnp2YP/d2cLQIZDygqfxPt4B2pJkHCMvk04pEP5E= ; Message-ID: <20051112013220.98811.qmail@web31206.mail.mud.yahoo.com> Received: from [68.126.216.215] by web31206.mail.mud.yahoo.com via HTTP; Fri, 11 Nov 2005 17:32:20 PST Date: Fri, 11 Nov 2005 17:32:20 -0800 (PST) From: bib_lucene bib Subject: Re: Document as Paramter (Rephrased) To: java-user@lucene.apache.org In-Reply-To: <1E6CEEAC-B893-4284-9652-FF6F8E9B23DF@ehatchersolutions.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-5833175-1131759140=:97247" Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N --0-5833175-1131759140=:97247 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Thanks for your time. -- Text I want to highlight is stored in the file system and index -- I can search and highlight the searched terms in results page ( just snippets) -- I have given a download link next to snippets ( which will point to file I stored in ROOT webapp of tomcat) I understood the concept of NullFragmenter, sorry for repeating myself... It is something like a google search. In google if I enter search term "highlight" and click on search, I get back search results with word "highlight" in bold. ( I can do that) Now when I click on the links (Ex: GNU Source-highlight 2.2) which is http://www.gnu.org/software/src-highlite/source-highlight.html I want the term 'highlight" in bold when the page is displayed (This I do not know how to do) I will index docs like html, pdf, word etc. As I have already extracted text using textminer etc, question is when I click on "show full document" link in search results page which I will give below the highlighted search snippet how can i pull out full text of the document from index. As there is no unique identifier for each doc in the index (?) Or do I just have to extract the text from the file stored in file system and pass it to the highlighter. My only concern here would be if like 300 users do this operation it might fail with an error like maximum files open limit exceeded etc. Here is the code snippet. LuceneHitHighlighter highlighter = new LuceneHitHighlighter(queryStr, "snippet", "body"); for (int i = 0; i < hits.size(); i++) { Document doc = (Document) hits.get(i); highlighter.doHighlight(doc); out.println("SNIPPET: " + doc.get("snippet")); out.println("
"); Using Standard Analyzer public LuceneHitHighlighter(String queryText, String highlightFieldName, String textFieldName) throws ParseException { Query query = QueryParser.parse(queryText, textFieldName, _analyzer); _highlighter = new Highlighter(_formatter, new QueryScorer(query)); _highlighter.setTextFragmenter(new SimpleFragmenter(100)); _maxNumFragmentsRequired = 3; _textFieldName = textFieldName; _highlightFieldName = highlightFieldName; } public void doHighlight(Document doc) { String highlightText = ""; String text = doc.get(_textFieldName); if (text != null) { TokenStream tokenStream = _analyzer.tokenStream(_textFieldName, new StringReader(text)); try { highlightText = _highlighter.getBestFragments(tokenStream, text, _maxNumFragmentsRequired, "..."); // just store highlight text into document, nothing else doc.add(new Field(_highlightFieldName, highlightText, true, false, false)); } } } Erik Hatcher wrote: On 11 Nov 2005, at 12:54, bib_lucene bib wrote: > My requirement is that I do a search, the results of the search are > displayed. I am displaying results by using getbestfragmets and > highlighting searched text. > > So basically the user can search and see what documents matched his > search with snippets of text shown in the result of search. > > Now the user needs to select one of the results and view the > document completely. > > What I did is along with indexing I stored the document in the ROOT > webapplication and gave a link to the file. This worked fine. > > However, now I need to display the full document with search terms > highlighted. > > As I understand now I can do this... > > 1. Read the whole document again and apply some css to highlight > terms. > > 2. Create a new Lucene Document -> Read the file -> Add file > Content to Document -> Use nullfragmenter -> Apply CSS -> Display > result. There is no need to create a new Lucene Document just to highlight text. The text you highlight does not have to be the text you indexed even. The Highlighter takes a String to highlight and a Query to extract the terms to highlight from. > Question is , is there a way I can just show the content from index > rather then parsing the document again, when the user wants to see > full document from search results page. Well, the question really comes back to you.... where is the text you want to highlight? You can store it in a Lucene index, in a database, on the filesystem, wherever. Neither Lucene nor the Highlighter are related to that decision whatsoever. So using a NullFragmenter, which I will commit to contrib/highlighter soon, use the Highlighter on the full text. If you're still having issues, send us a (brief, please!) piece of code you're using to highlight. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------- Yahoo! FareChase - Search multiple travel sites in one click. --0-5833175-1131759140=:97247--