lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (LUCENE-590) Demo HTML parser gives incorrect summaries when title is repeated as a heading
Date Fri, 05 Nov 2010 08:25:40 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir resolved LUCENE-590.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 4.0
                   3.1

Committed revision 1031467, 1031468 (3x)
Thanks Curtis!

> Demo HTML parser gives incorrect summaries when title is repeated as a heading
> ------------------------------------------------------------------------------
>
>                 Key: LUCENE-590
>                 URL: https://issues.apache.org/jira/browse/LUCENE-590
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 2.0.0
>            Reporter: Curtis d'Entremont
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-590.patch
>
>
> If you have an html document where the title is repeated as a heading at the top of the
document, the HTMLParser will return the title as the summary, ignoring everything else that
was added to the summary. Instead, it should keep the rest of the summary and chop off the
title part at the beginning (essentially the opposite). I don't see any benefit to repeating
the title in the summary for any case.
> In HTMLParser.jj's getSummary():
>     String sum = summary.toString().trim();
>     String tit = getTitle();
>     if (sum.startsWith(tit) || sum.equals(""))
>       return tit;
>     else
>       return sum;
> change it to: (* denotes a line that has changed)
>     String sum = summary.toString().trim();
>     String tit = getTitle();
> *    if (sum.startsWith(tit))             // don't repeat title in summary
> *      return sum.substring(tit.length()).trim();
>     else
>       return sum;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message