Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 87140 invoked from network); 10 Feb 2006 21:23:48 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Feb 2006 21:23:48 -0000 Received: (qmail 1128 invoked by uid 500); 10 Feb 2006 21:18:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 1059 invoked by uid 500); 10 Feb 2006 21:18:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 1045 invoked by uid 99); 10 Feb 2006 21:18:54 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2006 13:18:54 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of chenjian1227@gmail.com designates 64.233.184.195 as permitted sender) Received: from [64.233.184.195] (HELO wproxy.gmail.com) (64.233.184.195) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2006 13:18:54 -0800 Received: by wproxy.gmail.com with SMTP id 69so19961wri for ; Fri, 10 Feb 2006 13:18:33 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=PXxhzpMjMs/f48AoHKj+YT7bgubt2z+EW54/RBe/j2ufEoHRkiJwH/pR9oU0WRrwYQCrunxplNBEaH+g5FBfUYHg4ueGEC0Nf56D1HxebSUHaf8IPaZ29XDr68X+b/0Rh/dboVHAprKEy3OMHtonYgN6trNdqXF2S2j/7h7fnZI= Received: by 10.54.89.3 with SMTP id m3mr96930wrb; Fri, 10 Feb 2006 13:18:31 -0800 (PST) Received: by 10.54.159.20 with HTTP; Fri, 10 Feb 2006 13:18:30 -0800 (PST) Message-ID: <7ca123910602101318u119d44e5y221df77c2f3817a4@mail.gmail.com> Date: Fri, 10 Feb 2006 13:18:30 -0800 From: jian chen To: java-user@lucene.apache.org Subject: Re: Build vs. Buy? In-Reply-To: <32af47fe0602091615n7b14c7ebu9d6ded82868287ed@mail.gmail.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_1049_7794612.1139606310835" References: <9778DA4F3D53D04B9AE80AC64AC073A305745B13@corpx.dicarta.com> <03da01c62d64$63305810$ca03010a@theoldstore.propelsystems.com> <32af47fe0602091615n7b14c7ebu9d6ded82868287ed@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_1049_7794612.1139606310835 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline For reading word document as text, you can try AntiWord. I have written a simplified Lucene that does Max words match. For example, if you are searching for aa, bb, cc, then, the document that contains all words (aa, bb, cc) will be definitely ranked higher than documents containing either aa, bb or aa, cc or bb, cc. I am going to put up the code as open source. If you are interested, you ca= n email me directly. Jian On 2/9/06, P. Alex. Salamanca R. wrote: > > On the other hand, if you want be the most cheapest, why don't give a > chance > to google search appliance? > > ------=_Part_1049_7794612.1139606310835--