nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <>
Subject [jira] Closed: (NUTCH-125) OpenOffice Parser plugin
Date Tue, 25 Apr 2006 19:14:03 GMT
     [ ]
Andrzej Bialecki  closed NUTCH-125:

    Fix Version: 0.8-dev
     Resolution: Fixed

Applied, with some changes (due to Nutch API changes, and also it uses lib-xml plugin now).

> OpenOffice Parser plugin
> ------------------------
>          Key: NUTCH-125
>          URL:
>      Project: Nutch
>         Type: New Feature

>     Reporter: Andrzej Bialecki 
>     Assignee: Andrzej Bialecki 
>      Fix For: 0.8-dev
>  Attachments:
> A simple parser for StarOffice SXW and OpenDocument ODT files. This plugin does not use
the UNO bridge in OpenOffice , but rather uses standard ZipInputStream, and parses content.xml
and meta.xml inside these files to extract metadata and plain text.
> This plugin uses dom4j, because of easy XPath node selection, but this dependency could
be removed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message