Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 98625 invoked from network); 8 Jun 2010 12:27:23 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Jun 2010 12:27:23 -0000 Received: (qmail 78865 invoked by uid 500); 8 Jun 2010 12:27:22 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 78798 invoked by uid 500); 8 Jun 2010 12:27:22 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 78790 invoked by uid 99); 8 Jun 2010 12:27:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jun 2010 12:27:21 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mayankshrivastava.15@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 08 Jun 2010 12:27:15 +0000 Received: by vws2 with SMTP id 2so1307397vws.35 for ; Tue, 08 Jun 2010 05:26:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:from:date :message-id:subject:to:content-type; bh=7Zu/y0RWhHzn3RNdXrMlhg6xuoTOL5xeyeEtwGi62VQ=; b=nBjBZhm+qfXzPPPcQWAOEAOUXr0BXQ+ziipLRoEcningE7wdocWK99RXue4ROSFVdV +ufENvaFOmqypJsHwQNYh4uvE5Ja07n1nxTAYlzQStCTdlp4cDp2fAu6UU5YLa2uFTYZ zuRhrND2is6nIQ7WUmdBfT0xDX1IatMwKjD84= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=jeqyHv4aOKSfzp7zCKXorP4zBiKi4A+STPXnFvHcTIiQZdaxpR7/GP0ycvt/88e9BN /DeX0xqG90paVmNuxCe1DuFUxN/lliIB4MERyFLZkKW9lpPYpghyiw1GFlt4Cxmf7E6E KATKff96aSHgJNzBNr7iH1YzunVhC3nJqW1iM= Received: by 10.224.17.156 with SMTP id s28mr3768189qaa.350.1276000013186; Tue, 08 Jun 2010 05:26:53 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.60.141 with HTTP; Tue, 8 Jun 2010 05:26:33 -0700 (PDT) From: Mayank Shrivastava Date: Tue, 8 Jun 2010 17:56:33 +0530 Message-ID: Subject: Problems with lucene highlighter To: general Content-Type: text/plain; charset=ISO-8859-1 Hi, I am using Lucene Highlighter 2.4.1 for my application. I use the highlighter to get the best matching fragments, and display them. I make a call to a function String[] getFragmentsWithHighlightedTerms(Analyzer analyzer, Query query, String fieldName, String fieldContents, int fragmentsNumber, int fragmentSize). For example : String text = doc.get("MetaData"); getFragmentsWithHighlightedTerms(analyzer, query, "MetaData", Text, 5, 100); The function getFragmentsWithHighlightedTerms() is defined as follows private static String[] getFragmentsWithHighlightedTerms( argument list here) { TokenStream stream = TokenSources.getTokenStream(fieldName, fieldContents, analyzer); SpanScorer scorer = new SpanScorer(query, fieldName, new CachingTokenFilter(stream)); Fragmenter fragmenter = new SimpleSpanFragmenter(scorer, fragmentSize); Highlighter highlighter = new Highlighter(scorer); highlighter.setTextFragmenter(fragmenter); highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE); String[] fragments = highlighter.getBestFragments(stream, fieldContents, fragmentNumber); return fragments; } Now my trouble is that the highlighter.getBestFragments() method is returning duplicates. i.e, If i display say the first 5 fragments, no. 1 and 3 are same. I do not quite understand what is causing this. Is there a problem with the code?