lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert MacMillan <macmill...@rogers.com>
Subject Re: Parsing PDF documents
Date Mon, 18 Feb 2002 20:55:00 GMT
Peter,

    I put up a link to the the PJ-example source code at
http://www.omniInvestments.ca/gnu_pdf/ . I wasn't sure to which you were
referring; the pdf classes that I am currently working on or the PJ example.

    Time permitting, I'll have a first-tested round of my classes completed
by next weekend. That said, I wasn't planning on taking it to the stage of
converting a PDF directly into a Lucene Document, but it's an interesting
thought. The immediate problem that I see is that most people don't properly
title their documents, for example. It was for that one reason alone I
figured I might be best to provide but an interface for extracting data
that's wanted from the PDF document and let the developer decide what to do
with it. 

    Any ideas?

Cheers

Robert MacMillan

On 2/17/02 10:56 PM, "Peter Carlson" <carlson@bookandhammer.com> wrote:

> Robert,
> If you supply your code I'll add it the contributions area.
> It would be great to have some code that already already converts the PDF
> directly to a Lucene Document.
> 
> --Peter


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message