lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <>
Subject Re: Lius into apache incubator
Date Wed, 28 Feb 2007 18:06:16 GMT
Hi Rida,
I've been talking with Jukka Zitting (involved in Nutch) about parsing/Tika and we started
to sketch out some project objectives on the Wiki over there which may be of interest:

I recently did a round-up of the main open source projects which maintain their own custom
document parsing framework and counted over 17. There was a fair mix of approaches and  parser
choices but a lot of commonality suggesting a common project is possible/useful. The above
WIKI sketchings were an attempt to outline the requirements for such a common project and
also were questioning where best to host this. 

>>Tika is a really good projet and I'm really interested to join it.
I suspect one of the main differences between Lius and Tika's current objectives is that Tika
aims to be independent of any application which consumes  the parsed data (e.g. not tied to
Lucene indexing classes). That said, I don't imagine it is too hard to decouple Lius's parser
logic from it's indexing logic.


----- Original Message ----
From: Rida Benjelloun <>
Sent: Wednesday, 28 February, 2007 4:46:36 PM
Subject: Re: Lius into apache incubator

Hi Otis,
Many thanks for your comments, I'm so sorry for this late answer. I will add
lius as lucene contrib and I will change the licence to ASL.
There are some developper contributing to Lius but there are not very
For the question : this is a Laval University project, right?  But you work
at DocuLibre?
I have develpped lius during my study at laval university, I still the copy
right owner for this projet, so I can change the licence to ASL without any
problem. Lius has been used in serveral projet at Laval university and I
deceded to hoste it in Laval.
I work at Laval and at Doculibre.

Tika is a really good projet and I'm really interested to join it.


On 1/31/07, Otis Gospodnetic <> wrote:
> Hi Rida,
> Some comments in no particular order:
> - Looks useful
> - This looks like a more expanded version of what Erik and I wrote for
> LIA, and I know people often ask and use that code, so I know there is a
> need for a framework that knows how to parse various document formats
> - Nutch has some of the document parsing code written in form of
> plugins.  A few people wanted to decouple that from Nutch in a Tika project:
> .  Not sure what the status is, I think
> only Jukka Zitting did any work there, but I think the initial idea was
> never fully funished.  If LIUS joins Lucene, I think some of this
> duplication should be cleaned up, so we have only one framework for parsing
> various types of document formats.
> - Going through the Incubator is one way to go.  Perhaps another way to
> get LIUS under Lucene is to just place it under contrib/, say contrib/lius.
> - Licensing would have to change to ASL and you would probably also have
> to send in your ASF CLA.
> - Any dependencies on GPL or LGPL or code released under other licenses
> would have to either be removed, or you'd have to fetch the required Jars at
> compile/build time.  A few projects under Lucene contrib/ already do that, I
> believe
> - Are there developers who are actively working on LIUS?  Fixing bugs,
> adding features, keeping up with new versions of dependencies, etc.
> Otis
> P.S.
> Out of curiosity - this is a Laval University project, right?  But you
> work at DocuLibre?
> ----- Original Message ----
> From: Rida Benjelloun <>
> To:;
> Sent: Tuesday, January 30, 2007 7:27:28 PM
> Subject: Lius into apache incubator
> Hi,
> I would like to add Lius framework (
> to apache incubator. Is there some volontiers to do this job and to
> contribute to the developement of this project.
> Thanks.
> Rida Benjelloun.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at the Yahoo!
Mail Championships. Plus: play games and win prizes.* 

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message