lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Litchfield <...@csh.rit.edu>
Subject Re: FOP Generated PDF and PDFBox
Date Fri, 21 Jan 2005 18:05:22 GMT


Ya, when calling LucenePDFDocument.getDocument( File ) then it should be
the same as the path.

This is the code that the class uses to set those fields.

document.add( Field.UnIndexed("path", file.getPath() ) );
document.add(Field.UnIndexed("url", file.getPath().replace(FILE_SEPARATOR,
'/')));

I have no idea why an FOP PDF would be any different than another PDF.

You can also run it from the command line, this is just for debugging
purposes like this.

java org.pdfbox.searchengine.lucene.LucenePDFDocument <pdf-document>

and it should print out the fields of the lucene Document object.  Is the
url there and is it correct?

Ben

On Fri, 21 Jan 2005, Luke Shannon wrote:

> That is correct. No difference with how other PDF are handled.
>
> I am looking at the index in Luke now. The FOP generated documents have a
> path but no URL? I would guess that these would be the same?
>
> Thanks for the speedy reply.
>
> Luke
>
>
> ----- Original Message -----
> From: "Ben Litchfield" <ben@csh.rit.edu>
> To: "Lucene Users List" <lucene-user@jakarta.apache.org>
> Sent: Friday, January 21, 2005 12:34 PM
> Subject: Re: FOP Generated PDF and PDFBox
>
>
> >
> >
> > Are you indexing the FOP PDF's differently than other PDF documents?
> >
> > Can I assume that you are using PDFBox's LucenePDFDocument.getDocument()
> > method?
> >
> > Ben
> >
> > On Fri, 21 Jan 2005, Luke Shannon wrote:
> >
> > > Hello;
> > >
> > > Our CMS now allows users to create PDF documents (uses FOP) and than
> search
> > > them.
> > >
> > > I seem to be able to index these documents ok. But when I am generating
> the
> > > results to display I get a Null Pointer Exception while trying to use a
> > > variable that should contain the url keyword for one of these documents
> in
> > > the index:
> > >
> > > Document doc = hits.doc(i);
> > > String path = doc.get("url");
> > >
> > > Path contains null.
> > >
> > > The interesting thing is this only happens with PDF that are generate
> with
> > > FOP. Other PDFs are fine.
> > >
> > > What I find weird is shouldn't the "url" field just contain the path of
> the
> > > file?
> > >
> > > Anyone else seen this before?
> > >
> > > Any ideas?
> > >
> > > Thanks,
> > >
> > > Luke
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> >
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message