pdfbox-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andreas Lehmkühler (JIRA) <j...@apache.org>
Subject [jira] Updated: (PDFBOX-650) Remove dependency on lucene
Date Sun, 07 Mar 2010 14:19:27 GMT

     [ https://issues.apache.org/jira/browse/PDFBOX-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Andreas Lehmkühler updated PDFBOX-650:
--------------------------------------

    Description: 
The current pdfbox version extracts all needed data from a pdf document and uses lucene to
create an index for the lucene search engine. 

To avoid the dependency on lucene pdfbox should only extract the data which can be used to
create a lucene index outside from pdfbox. That would decrase the number of external jars
and woukld eliminate an other potential issue because of changing apis like those coming with
lucene 3.0. 

I've created 2 new classes (one for the extraction and one as example how to use that feature)
based on existing code and attached it as patch.

WDYT?

If that patch will be added to the trunk the existing code will be removed including both
lucene jars.


  was:
The current pdfbox version extracts all needed data from a pdf document and uses lucene to
create an index for the lucene search engine. 

To avoid the dependency to lucene pdfbox should only extract the data which can be used to
create a lucene index outside from pdfbox. That would decrase the number of external jars
and woukld eliminate an other potential issue because of changing apis like those coming with
lucene 3.0. 

I've created 2 new classes (one for the extraction and one as example how to use that feature)
based on existing code and attached it as patch.

WDYT?

If that patch will be added to the trunk the existing code will be removed including both
lucene jars.


        Summary: Remove dependency on lucene  (was: Remove dependency to lucene)

> Remove dependency on lucene
> ---------------------------
>
>                 Key: PDFBOX-650
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-650
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Lucene, Utilities
>    Affects Versions: 1.0.0
>            Reporter: Andreas Lehmkühler
>            Assignee: Andreas Lehmkühler
>         Attachments: removing_lucene_patch.txt
>
>
> The current pdfbox version extracts all needed data from a pdf document and uses lucene
to create an index for the lucene search engine. 
> To avoid the dependency on lucene pdfbox should only extract the data which can be used
to create a lucene index outside from pdfbox. That would decrase the number of external jars
and woukld eliminate an other potential issue because of changing apis like those coming with
lucene 3.0. 
> I've created 2 new classes (one for the extraction and one as example how to use that
feature) based on existing code and attached it as patch.
> WDYT?
> If that patch will be added to the trunk the existing code will be removed including
both lucene jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message