Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 72207 invoked from network); 19 May 2009 14:13:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 19 May 2009 14:13:03 -0000 Received: (qmail 50081 invoked by uid 500); 19 May 2009 14:13:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50012 invoked by uid 500); 19 May 2009 14:13:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50001 invoked by uid 99); 19 May 2009 14:13:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 May 2009 14:13:01 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [67.195.23.119] (HELO web111810.mail.gq1.yahoo.com) (67.195.23.119) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 19 May 2009 14:12:50 +0000 Received: (qmail 25775 invoked by uid 60001); 19 May 2009 14:12:28 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1242742347; bh=fEmEUT+1Z6afERt+G+molnBdZO1t5ylbNTG3hbTAnOg=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=YWPjmBz6ImUa+RPf0r1P0bSPjn47FzdIQP25PY7eHCZnU5j3HdSwXF9G9oiv7hpeAgEjZ0tJORJ/f3RPTZbr+z4gVTMoeqe7eSFd+gLT0hSRwOOJK6BBLGKaDsQ5Ne98cjsAGbSRWDyILRPuSAMBrOIYGHBwj7jDCzs5SnMpv1c= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type; b=cJQ3EArv/R4c0vLxJ4US9Rg6DSgT87DqTYTm2oiiHbgM2dXJrkz5zJQ78sNYDMJbau/qV7uQKNYvHfznzNHYHvr3wv9bId3Mlu1ak1fUiRa9h/iviIYWGU68CHvgJB0reN7UKa1yNyRoU8crnkKsoW3QaxM7bONaTkyRYxcMd84=; Message-ID: <987174.25620.qm@web111810.mail.gq1.yahoo.com> X-YMail-OSG: SGEAPVEVM1mv2Fqq.Sfzo4Ps3B7juXm6rNJd29UfbrzYV54jz593j6rJ7BA5cqUQqw2IS0B3B4JkOez1An612qSvlYARIzjfyD_Bvn30MjkLJQCeNhYSd.9br73qtAjz0FLKD0soydYw0nHVoRNUSU1yNLWlRCcp9ZIYMDG07VrlZzZkvanjgsbYd0O8e5X7F.eOSaVf8lak0tVidR_HYzXAu_Wih_2RRSaGfKeKqUyBBludahkZCGKq8g7c9s2kzeTIjmCVn3t3l72pL90DyQ-- Received: from [216.165.132.250] by web111810.mail.gq1.yahoo.com via HTTP; Tue, 19 May 2009 07:12:27 PDT X-Mailer: YahooMailClassic/5.3.9 YahooMailWebService/0.7.289.10 Date: Tue, 19 May 2009 07:12:27 -0700 (PDT) From: Alex Steward Subject: lucene source code changes To: java-user@lucene.apache.org MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="0-268752970-1242742347=:25620" X-Virus-Checked: Checked by ClamAV on apache.org --0-268752970-1242742347=:25620 Content-Type: multipart/alternative; boundary="0-456037786-1242742347=:25620" --0-456037786-1242742347=:25620 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Hello, =A0I have a need to implement an custom inverted index in Lucene. I=0Ahave files like the ones I have attached here. The Files have words and= =0Aand scores assigned to that word. There will 100's of such files. Each= =0Afile will have atleast 50000 such name value pairs.=20 =0ANote: Currently the file only shows 10s of such name value pairs. But=0A= My real production data will have 50000 plus name value pairs in file. Currently=0AI index the data=A0using Lucene's Inverted Index. The query tha= t is being=0Aexecute against the Index has 100 Words. When the query is exc= uted=0Aagainst the index the result is returned in 100 milli seconds or so.= =20 =0A Problem: Once i have the results of the query, I have to go=0Athrough each = file (for ex. attached file one). Then for each word in=0Athe user input qu= ery, I have to compute the total score. Doing this=0Aagainst 100's of files= and 100's of keywords is causing the score=0Acomputation to be slow i.e. a= bout 3-5seconds. =0A=0AI need help resolving the above problem so that scor= e computation takes less than 200Milli Seconds or so.=0AOne Resolution I wa= s thinking is modifying the Lucene Source Code=0Afor creating inverted inde= x. In this index we store the score in the=0Aindex itself. When the results= of the query are returned, we will get=0Athe scores along with the file na= mes, there by eleminating the need to=0Asearch the file for the keyword and= corresponding score. I need to=0Acompute the total of all scores that belo= ng to one single file. =0A I am also open to any other ideas that you may have. Any suggestions regard= ing this will be very helpful. Thanks, Abhilasha =0A=0A=0A --0-456037786-1242742347=:25620 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable

Hello,

 I have a need to imple= ment an custom inverted index= in Lucene.
I=0Ahave files like the ones I have attached here. Th= e Files have words and=0Aand scores assigned to that word. There will 100's= of such files. Each=0Afile will have atleast 50000 such name value pairs. =
=0ANote: Currently the file only shows 10s of such name value pairs. Bu= t=0AMy real production data will have 50000 plus name value pairs in file.<= br>
Currently=0AI index the data using Lucene's Inverted Index. The= query that is being=0Aexecute against the Index has 100 Words. When the qu= ery is excuted=0Aagainst the index the result is returned in 100 milli seco= nds or so.
=0A
Problem: Once i have the results of the query= , I have to go=0Athrough each file (for ex. attached file one). Then for ea= ch word in=0Athe user input query, I have to compute the total score. Doing= this=0Aagainst 100's of files and 100's of keywords is causing the score= =0Acomputation to be slow i.e. about 3-5seconds.

=0A=0A

I need help resolving the above problem so that score computation takes= less than 200Milli Seconds or so.

=0AOne Resolution I was thin= king is modifying the Lucene Source Code=0Afor creating inverted index. In = this index we store the score in the=0Aindex itself. When the results of th= e query are returned, we will get=0Athe scores along with the file names, t= here by eleminating the need to=0Asearch the file for the keyword and corre= sponding score. I need to=0Acompute the total of all scores that belong to = one single file.
= =0A
I am also open to any other ideas that you may have. Any suggestions= regarding this will be very helpful.

Thanks,
Abhilasha


=0A=0A --0-456037786-1242742347=:25620-- --0-268752970-1242742347=:25620 Content-Type: text/plain; charset=us-ascii --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --0-268752970-1242742347=:25620--