Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 99146 invoked from network); 10 Sep 2006 23:52:05 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Sep 2006 23:52:05 -0000 Received: (qmail 68948 invoked by uid 500); 10 Sep 2006 23:51:59 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68923 invoked by uid 500); 10 Sep 2006 23:51:59 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68912 invoked by uid 99); 10 Sep 2006 23:51:59 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Sep 2006 16:51:59 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [203.217.22.128] (HELO file1.syd.nuix.com.au) (203.217.22.128) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 Sep 2006 16:51:57 -0700 Received: from [192.168.222.102] (host102.syd.nuix.com.au [192.168.222.102]) by file1.syd.nuix.com.au (Postfix) with ESMTP id EC2D3B735C for ; Mon, 11 Sep 2006 09:50:52 +1000 (EST) Message-ID: <4504A49C.2040400@nuix.com.au> Date: Mon, 11 Sep 2006 09:49:48 +1000 From: Daniel Noll Organization: NUIX Pty Limited User-Agent: Thunderbird 3.0a1 (Windows/20060908) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Highligher Example References: <007501c6d347$48ae9ea0$a280a8c0@safarijv.com> In-Reply-To: <007501c6d347$48ae9ea0$a280a8c0@safarijv.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Dejan Nenov wrote: > Second that - I was a client of Stellent - the libs work great but are > expensive. To see Stellent in action - get a copy of the free X1 desktop > search or the X1 server (Lucene based). I would say that the libs work great but are slow. One problem is that they don't provide a Java API. The "Java" API they provide is sample code which calls a native executable, not even a JNI library. So you pay the penalty of that native app starting up every time you extract a document. If all you want is the plain text, for many document types it's actually fairly fast, and beats having to write code for every document type yourself (or locating libraries to do it for you.) But as soon as you want the marked up text, it becomes a completely different story. We benchmarked it to be something like 10 times slower to handle markup than handling raw text and metadata. Most of this extra time was spent parsing the XML it outputs, which is often far more verbose than it needs to be for the amount of formatting it actually contains. Daniel -- Daniel Noll Nuix Pty Ltd Suite 79, 89 Jones St, Ultimo NSW 2007, Australia Ph: +61 2 9280 0699 Web: http://www.nuix.com.au/ Fax: +61 2 9212 6902 This message is intended only for the named recipient. If you are not the intended recipient you are notified that disclosing, copying, distributing or taking any action in reliance on the contents of this message or attachment is strictly prohibited. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org