lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 25666] New: - Please increase the default size of HTMLParser summaries or make it ignore graphic's Alt text
Date Fri, 19 Dec 2003 21:43:33 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25666>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=25666

Please increase the default size of HTMLParser summaries or make it ignore graphic's Alt text

           Summary: Please increase the default size of HTMLParser summaries
                    or make it ignore graphic's Alt text
           Product: Lucene
           Version: unspecified
          Platform: PC
        OS/Version: Windows NT/2K
            Status: NEW
          Severity: Enhancement
          Priority: Other
         Component: Other
        AssignedTo: lucene-dev@jakarta.apache.org
        ReportedBy: kenkyee@excite.com


At the top of every page, I have some header graphics w/ Alt text.  The problem
is that the HTMLParser stores this Alt text in the summary and it shouldn't (all
graphics are supposed to have Alt text according to accessibility rules); maybe
there should be an option to disable storing Alt text since Lucene has always
done this.

Even if this is fixed, each of my web pages has a header on the page.  Ideally,
the summary generator should ignore <Hx> tags (H1, H2, etc.) as well.  The
header text is the same as the <title> text for the page.  This header ends up
in the summary as well as the link (the link is the title), so it's wasted space.

The end result is that I end up trimming off the first part of the summaries
that I get via getParser before storing it in the Lucene index.  In the
HTMLParser.java file in src\demo\org\apache\lucene\demo\html, the SUMMARY_LENGTH
is set to 200, so this effectively is only about 100 for me.  :-(

Just wanted to give you some feedback instead of just grabbing the source and
making my own version of this...

This is in 1.3RC3

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message