lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Gunn <kg...@csd.abdn.ac.uk>
Subject Re: problems with HTML Parser
Date Wed, 14 Aug 2002 19:06:41 GMT
If your parsing html files have a check in lucene
to see the terms that are index and see if you can
spot any joined terms.

The PDF parser as you can see from the other mail is from
www.pdfbox.org and i highly recommend it (thanks again Ben!)




On Wed, 14 Aug 2002, Maurits van Wijland wrote:

> Keith,
>
> I haven't noticed the problem with the Parser...but you trigger me
> by saying that you have a PDFParser!!!
>
> Are you able to contribute this PDFParser??
>
> Maurits.
> ----- Original Message -----
> From: "Keith Gunn" <kgunn@csd.abdn.ac.uk>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, August 14, 2002 9:46 AM
> Subject: problems with HTML Parser
>
>
> > Has anyone noticed that the HTML Parser that comes with
> > Lucene joins terms together when parsing a file.
> > I used to think it was my PDFParser but after fixing that
> > I found out it was the HMTLParser.
> >
> > I managed to find a replacement parser that doesn't join terms.
> >
> > Just wondered if anyone had come across this problem??
> >
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> > For additional commands, e-mail:
> <mailto:lucene-user-help@jakarta.apache.org>
> >
>
>
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message