lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: pdf in Chinese
Date Wed, 08 Sep 2004 11:44:34 GMT

This appears to be more of a PDFBox issue than a lucene issue, please post
an issue to the PDFBox site.

Also note, that because of certain encodings that a PDF writer can use, it
is impossible to extract text from all PDF documents.

Ben

On Wed, 8 Sep 2004, WuDG@infoPro.cn wrote:

> it is not about analyzer ,i  need to read text from pdf file first.
>
> ----- Original Message -----
> From: "Chandan Tamrakar" <chandan@ccnep.com.np>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Wednesday, September 08, 2004 4:15 PM
> Subject: Re: pdf in Chinese
>
>
> > which analyzer you are using to index chinese pdf documents ?
> > I think you should use cjkanalyzer
> > ----- Original Message -----
> > From: "WuDG@infoPro.cn" <wudg@infopro.cn>
> > To: <lucene-user@jakarta.apache.org>
> > Sent: Wednesday, September 08, 2004 11:27 AM
> > Subject: pdf in Chinese
> >
> >
> > > Hi all,
> > >     i use pdfbox to parse pdf file to lucene document.when i parse
> > Chinese
> > > pdf file,pdfbox is not always success.
> > >     Is anyone have some advice?
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message