Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 76269 invoked from network); 19 May 2009 14:15:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 May 2009 14:15:28 -0000 Received: (qmail 57802 invoked by uid 500); 19 May 2009 14:15:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 57743 invoked by uid 500); 19 May 2009 14:15:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 57732 invoked by uid 99); 19 May 2009 14:15:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 May 2009 14:15:26 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [67.195.23.122] (HELO web111813.mail.gq1.yahoo.com) (67.195.23.122) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 19 May 2009 14:15:15 +0000 Received: (qmail 85390 invoked by uid 60001); 19 May 2009 14:14:53 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1242742493; bh=SMUOIDXL/RuRzQhwTwmSxY4kmQRLFzzCc8Lfn9lC9pU=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=SdyBrcUn/pxkxXaz5UP7AV3ahhEfsEVhxzWPmirCrwUHb75fIBZL3/5BJQpUfOw5bgVtkpbk+HwOnwp917EyNuzyAxHL9LA9sA9lP4LQfOqqrwamWC58G4wWO4qnCTmTTlT1I0OKIGk14LKAjSQBtbnLiEZJZ7oo7gORc3r6+UQ= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=REAf3YCWrYyKA7eCz5WcNSyGobnUaTC97WLK3wAQRkvSmrmazo9/fCW9y8XAlnUl2Cp6Au3zBbTE5hk8m8PuD9zqbiV2wozBRuGsJJC6XueeiOlfZpofsR7x31GZQC5jNQ6ZHDBP76wFwuLxaOjKG9ACUCYjvCAvP8lS2wyUNEg=; Message-ID: <28798.85230.qm@web111813.mail.gq1.yahoo.com> X-YMail-OSG: 3E9U64kVM1lNegffmuI0TR1ag0IgdgdNHt12_D.uvKpowe7Uf0RyzzLlVzvAHHvAHp4Af9hBkVCv_LKfwiTiaKcyRzHVvMhK2XLoaSC.91CAN1a.tCzg_c0hnM5Ixu2IKBQNkFXMPHgzGOkw79ZEltHZeG833fd5jImOAW64g.LODV6YmmnNBmKLmlvPX.Uebfbw_xB2Fpsk3uSXqqzPNfnHXJ.c2moauYvZYPDMpXXnEWHDNAyCZkuPEkPpK7NG6BXUvjX6TAjwH1tC0.kxkA-- Received: from [216.165.132.250] by web111813.mail.gq1.yahoo.com via HTTP; Tue, 19 May 2009 07:14:52 PDT X-Mailer: YahooMailClassic/5.3.9 YahooMailWebService/0.7.289.10 Date: Tue, 19 May 2009 07:14:52 -0700 (PDT) From: Alex Steward Subject: Re: lucene code changes To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-1744116595-1242742492=:85230" X-Virus-Checked: Checked by ClamAV on apache.org --0-1744116595-1242742492=:85230 Content-Type: multipart/alternative; boundary="0-370331009-1242742492=:85230" --0-370331009-1242742492=:85230 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable =A0I have a need to implement an custom inverted index in Lucene. I=0Ahave files like the ones I have attached here. The Files have words and= =0Aand scores assigned to that word. There will 100's of such files. Each= =0Afile will have atleast 50000 such name value pairs.=20 =0ANote: Currently the file only shows 10s of such name value pairs. But=0A= My real production data will have 50000 plus name value pairs in file. Currently=0AI index the data=A0using Lucene's Inverted Index. The query tha= t is being=0Aexecute against the Index has 100 Words. When the query is exc= uted=0Aagainst the index the result is returned in 100 milli seconds or so.= =20 =0A Problem: Once i have the results of the query, I have to go=0Athrough each = file (for ex. attached file one). Then for each word in=0Athe user input qu= ery, I have to compute the total score. Doing this=0Aagainst 100's of files= and 100's of keywords is causing the score=0Acomputation to be slow i.e. a= bout 3-5seconds. =0A=0AI need help resolving the above problem so that scor= e computation takes less than 200Milli Seconds or so.=0AOne Resolution I wa= s thinking is modifying the Lucene Source Code=0Afor creating inverted inde= x. In this index we store the score in the=0Aindex itself. When the results= of the query are returned, we will get=0Athe scores along with the file na= mes, there by eleminating the need to=0Asearch the file for the keyword and= corresponding score. I need to=0Acompute the total of all scores that belo= ng to one single file. =0A I am also open to any other ideas that you may have. Any suggestions regard= ing this will be very helpful. a. =0A=0A =0A=0A=0A --0-370331009-1242742492=:85230 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable

 I have a = need to implement an custom i= nverted index in Lucene.
I=0Ahave files like the ones I have atta= ched here. The Files have words and=0Aand scores assigned to that word. The= re will 100's of such files. Each=0Afile will have atleast 50000 such name = value pairs.
=0ANote: Currently the file only shows 10s of such name va= lue pairs. But=0AMy real production data will have 50000 plus name value pa= irs in file.

Currently=0AI index the data using Lucene's Invert= ed Index. The query that is being=0Aexecute against the Index has 100 Words= . When the query is excuted=0Aagainst the index the result is returned in 1= 00 milli seconds or so.
=0A
Problem: Once i have the results= of the query, I have to go=0Athrough each file (for ex. attached file one)= . Then for each word in=0Athe user input query, I have to compute the total= score. Doing this=0Aagainst 100's of files and 100's of keywords is causin= g the score=0Acomputation to be slow i.e. about 3-5seconds.

= =0A=0A

I need help resolving the above problem so that score comp= utation takes less than 200Milli Seconds or so.

=0AOne Resoluti= on I was thinking is modifying the Lucene Source Code=0Afor creating invert= ed index. In this index we store the score in the=0Aindex itself. When the = results of the query are returned, we will get=0Athe scores along with the = file names, there by eleminating the need to=0Asearch the file for the keyw= ord and corresponding score. I need to=0Acompute the total of all scores th= at belong to one single file<= /span>.
=0A
I am also open to any other ideas that you may have. Any = suggestions regarding this will be very helpful.

a.


=0A=0A

=0A=0A= =0A=0A --0-370331009-1242742492=:85230-- --0-1744116595-1242742492=:85230 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --0-1744116595-1242742492=:85230--