lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: GData index html documents
Date Mon, 31 Jul 2006 02:14:59 GMT
I can confirm that.  Nutch includes it, for instance.


----- Original Message ----
From: Simon Willnauer <>
Sent: Sunday, July 30, 2006 6:41:08 PM
Subject: Re: GData index html documents

I got in touch with andy, he told me that it would be totally alright
to include it.
Other projects already using it.

regards Simon

On 7/30/06, Simon Willnauer <> wrote:
> Hello all,
> I'm at a point where I have to retrieve data from entry elements which
> could contain text, html, xhtml or even xml. So there is not problem
> so far. Detecting which format the element contains is also pretty
> easy as each element has a "type" attribute. if there is not such type
> attribute I treat it like html and remove all html tags.
> So my kind of problem is a licence problem. I'd like to use CyberNeko
> HTML parser the licence looks different to the apache licene although
> the licence has this sentence at the very bottom:
> "This license is based on the Apache Software License, version 1.1."
> I know that any software, lib, jar whatever distributed with apache
> project must be apache licenced. I'm not familiar with all the licence
> stuff so some help would be greatly appreciated.
> So can I add the cyberneko jar to the gdata project?
> I might send Andy Clark an email if he grands me a licence...
> regards simon

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message