Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
  s=s1024; d=yahoo.com;
  h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:MIME-Version:Content-Type;
  b=cJQ3EArv/R4c0vLxJ4US9Rg6DSgT87DqTYTm2oiiHbgM2dXJrkz5zJQ78sNYDMJbau/qV7uQKNYvHfznzNHYHvr3wv9bId3Mlu1ak1fUiRa9h/iviIYWGU68CHvgJB0reN7UKa1yNyRoU8crnkKsoW3QaxM7bONaTkyRYxcMd84=;
Message-ID: <987174.25620.qm@web111810.mail.gq1.yahoo.com>
Date: Tue, 19 May 2009 07:12:27 -0700 (PDT)
From: Alex Steward <alex_lucene@yahoo.com>
Subject: lucene source code changes
To: java-user@lucene.apache.org
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="0-268752970-1242742347=:25620"

--0-268752970-1242742347=:25620
Content-Type: multipart/alternative; boundary="0-456037786-1242742347=:25620"

--0-456037786-1242742347=:25620
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hello,

=A0I have a need to implement an custom inverted index in Lucene.
I=0Ahave files like the ones I have attached here. The Files have words and=
=0Aand scores assigned to that word. There will 100's of such files. Each=
=0Afile will have atleast 50000 such name value pairs.=20
=0ANote: Currently the file only shows 10s of such name value pairs. But=0A=
My real production data will have 50000 plus name value pairs in file.

Currently=0AI index the data=A0using Lucene's Inverted Index. The query tha=
t is being=0Aexecute against the Index has 100 Words. When the query is exc=
uted=0Aagainst the index the result is returned in 100 milli seconds or so.=
=20
=0A
Problem: Once i have the results of the query, I have to go=0Athrough each =
file (for ex. attached file one). Then for each word in=0Athe user input qu=
ery, I have to compute the total score. Doing this=0Aagainst 100's of files=
 and 100's of keywords is causing the score=0Acomputation to be slow i.e. a=
bout 3-5seconds. =0A=0AI need help resolving the above problem so that scor=
e computation takes less than 200Milli Seconds or so.=0AOne Resolution I wa=
s thinking is modifying the Lucene Source Code=0Afor creating inverted inde=
x. In this index we store the score in the=0Aindex itself. When the results=
 of the query are returned, we will get=0Athe scores along with the file na=
mes, there by eleminating the need to=0Asearch the file for the keyword and=
 corresponding score. I need to=0Acompute the total of all scores that belo=
ng to one single file.
=0A
I am also open to any other ideas that you may have. Any suggestions regard=
ing this will be very helpful.

Thanks,
Abhilasha

=0A=0A=0A      
--0-456037786-1242742347=:25620
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

<table cellspacing=3D"0" cellpadding=3D"0" border=3D"0" ><tr><td valign=3D"=
top" style=3D"font: inherit;"><p>Hello,<br><br>&nbsp;I have a need to imple=
ment an custom <span style=3D"border-bottom: 1px dashed rgb(0, 102, 204); c=
ursor: pointer;" class=3D"yshortcuts" id=3D"lw_1242742107_0">inverted index=
</span> in Lucene.<br>I=0Ahave files like the ones I have attached here. Th=
e Files have words and=0Aand scores assigned to that word. There will 100's=
 of such files. Each=0Afile will have atleast 50000 such name value pairs. =
<br>=0ANote: Currently the file only shows 10s of such name value pairs. Bu=
t=0AMy real production data will have 50000 plus name value pairs in file.<=
br><br>Currently=0AI index the data&nbsp;using Lucene's Inverted Index. The=
 query that is being=0Aexecute against the Index has 100 Words. When the qu=
ery is excuted=0Aagainst the index the result is returned in 100 milli seco=
nds or so. <br>=0A<br><strong>Problem: Once i have the results of the query=
, I have to go=0Athrough each file (for ex. attached file one). Then for ea=
ch word in=0Athe user input query, I have to compute the total score. Doing=
 this=0Aagainst 100's of files and 100's of keywords is causing the score=
=0Acomputation to be slow i.e. about 3-5seconds. </strong></p>=0A=0A<p><str=
ong>I need help resolving the above problem so that score computation takes=
 less than 200Milli Seconds or so</strong>.</p>=0AOne Resolution I was thin=
king is modifying the Lucene Source Code=0Afor creating inverted index. In =
this index we store the score in the=0Aindex itself. When the results of th=
e query are returned, we will get=0Athe scores along with the file names, t=
here by eleminating the need to=0Asearch the file for the keyword and corre=
sponding score. I need to=0Acompute the total of all scores that belong to =
one <span style=3D"border-bottom: 1px dashed rgb(0, 102, 204); cursor: poin=
ter;" class=3D"yshortcuts" id=3D"lw_1242742107_1">single file</span>.<br>=
=0A<br>I am also open to any other ideas that you may have. Any suggestions=
 regarding this will be very helpful.<br><br>Thanks,<br>Abhilasha<br><block=
quote style=3D"border-left: 2px solid rgb(16, 16, 255); margin-left: 5px; p=
adding-left: 5px;"><div class=3D"plainMail"><br></div></blockquote></td></t=
r></table><br>=0A=0A      
--0-456037786-1242742347=:25620--

--0-268752970-1242742347=:25620
Content-Type: text/plain; charset=us-ascii


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
--0-268752970-1242742347=:25620--