lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com.au>
Subject Re: Highligher Example
Date Sun, 10 Sep 2006 23:49:48 GMT
Dejan Nenov wrote:
> Second that - I was a client of Stellent - the libs work great but are
> expensive. To see Stellent in action - get a copy of the free X1 desktop
> search or the X1 server (Lucene based).

I would say that the libs work great but are slow.

One problem is that they don't provide a Java API.  The "Java" API they 
provide is sample code which calls a native executable, not even a JNI 
library.  So you pay the penalty of that native app starting up every 
time you extract a document.

If all you want is the plain text, for many document types it's actually 
fairly fast, and beats having to write code for every document type 
yourself (or locating libraries to do it for you.)  But as soon as you 
want the marked up text, it becomes a completely different story.  We 
benchmarked it to be something like 10 times slower to handle markup 
than handling raw text and metadata.  Most of this extra time was spent 
parsing the XML it outputs, which is often far more verbose than it 
needs to be for the amount of formatting it actually contains.

Daniel


-- 
Daniel Noll

Nuix Pty Ltd
Suite 79, 89 Jones St, Ultimo NSW 2007, Australia    Ph: +61 2 9280 0699
Web: http://www.nuix.com.au/                        Fax: +61 2 9212 6902

This message is intended only for the named recipient. If you are not
the intended recipient you are notified that disclosing, copying,
distributing or taking any action in reliance on the contents of this
message or attachment is strictly prohibited.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message