lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Patry <alexandre.pa...@keatext.com>
Subject Re: NewBie To Lucene || Perfect configuration on a 64 bit server
Date Mon, 26 May 2014 13:08:59 GMT
On 26/05/2014 05:40, Shruthi wrote:
> Hi All,
>
> Thanks for the suggestions. But there is a slight difference in the requirements.
> 1. We don't  index/ search 10 million documents for a keyword; instead we do it on only
500 documents because we are supposed to get the final result only from the 500 set of documents.
> 2.We have already filtered 500 documents from the 10M+ documents based on a DB Stored
Procedure which has nothing to do with any kind of search keywords .
> 3.Our search algorithm plays a vital role on this new set of 500 documents.
> 4.We can't avoid on the fly indexing because the  document set to be indexed is random
and is ever changing .
> 	Although we can index the existing 10M+ docs before hand and keep ready the indexes..We
don’t want to search on the complete document store. Instead we only want to search on the
500 documents got above.
>
> Is there any best alternative to this requirement?
You could index all 10 million documents and use a custom filter[1] with 
your queries to specify which 500 documents to look at.

Hope this help,

Alexandre

[1] 
http://lucene.apache.org/core/4_8_1/core/org/apache/lucene/search/Filter.html 

>
> Thanks,
>
> Shruthi Sethi
> SR. SOFTWARE ENGINEER
> iMedX
> OFFICE:
> 033-4001-5789 ext. N/A
> MOBILE:
> 91-9903957546
> EMAIL:
> ssethi@imedx.com
> WEB:
> www.imedx.com
>
>
>
> -----Original Message-----
> From: shashi.mit@gmail.com [mailto:shashi.mit@gmail.com] On Behalf Of Shashi Kant
> Sent: Saturday, May 24, 2014 5:55 AM
> To: java-user@lucene.apache.org
> Subject: Re: NewBie To Lucene || Perfect configuration on a 64 bit server
>
> To 2nd  Vitaly's suggestion. You should consider using Apache Solr
> instead - it handles such issues OOTB .
>
>
> On Fri, May 23, 2014 at 7:52 PM, Vitaly Funstein <vfunstein@gmail.com> wrote:
>> At the risk of sounding overly critical here, I would say you need to scrap
>> your entire approach of building one small index per request, and just
>> build your entire searchable data store in Lucene/Solr. This is the
>> simplest and probably most maintainable and scalable solution. Even if your
>> index contains 10M+ documents, returning at most 500 search results should
>> be lightning fast compared to the latencies you're seeing right now. To
>> facilitate data export from the DB, take a look at this:
>> http://wiki.apache.org/solr/DataImportHandler
>>
>>
>> On Tue, May 20, 2014 at 7:36 AM, Shruthi <ssethi@imedx.com> wrote:
>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
>>> Sent: Tuesday, May 20, 2014 3:48 PM
>>> To: java-user@lucene.apache.org
>>> Subject: Re: NewBie To Lucene || Perfect configuration on a 64 bit server
>>>
>>> On Tue, 2014-05-20 at 11:56 +0200, Shruthi wrote:
>>>
>>> Toke:
>>>> Is 20 second an acceptable response time for your users?
>>>>
>>>> Shruthi: Its definitely not acceptable. PFA the piece of code that we
>>>> are using..Its taking 20seconds. That’s why I drafted this ticket to
>>>> see where I was going wrong.
>>> Indexing 1000 documents/sec in Lucene is quite common, so even taking
>>> into account large documents, 20 seconds sounds like quite a bit.
>>> Shruthi: I had attached the code snippet in previous mail. Do you suspect
>>> a foul play there?
>>>
>>>> Shruthi: Well,  its two stage process: Client is looking at
>>>> historical data based on a parameters like names, dates,MRN, fields
>>>> etc.. SO the query actually gets the data set fulfilling the
>>>> requirements
>>>>
>>>> If client is interested in doing a text search then he would pass the
>>>> search phrase on the result set.
>>> So it is not possible for a client to perform a broad phrase search to
>>> start with. And it sounds like your DB-queries are all simple matching?
>>> No complex joins and such? If so, this calls even more for a full
>>> Lucene-index solution, which handles all aspect of the search process.
>>> Shruthi: We call a DB stored procedure to get us the result set for
>>> working with..
>>> We will be using highlighter API and  I don’t think Memory  index can be
>>> used with highlighter.
>>>
>>> - Toke Eskildsen, State and University Library, Denmark
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>
>


-- 
Alexandre Patry, Ph.D
Chercheur / Researcher
http://KeaText.com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message