lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shruthi <sse...@imedx.com>
Subject RE: NewBie To Lucene || Perfect configuration on a 64 bit server
Date Tue, 20 May 2014 09:56:31 GMT
-----Original Message-----
From: Toke Eskildsen [mailto:te@statsbiblioteket.dk]
Sent: Tuesday, May 20, 2014 3:01 PM
To: java-user@lucene.apache.org
Subject: Re: NewBie To Lucene || Perfect configuration on a 64 bit server



On Tue, 2014-05-20 at 10:40 +0200, Shruthi wrote:

> Just the indexing took 20 seconds L



That's more than I expected, but it leaves the same question:

Is 20 second an acceptable response time for your users?

Shruthi: Its definitely not acceptable. PFA the piece of code that we are using..Its taking
20seconds. That’s why I drafted this ticket to see where I was going wrong.



I don't know your document size, but unless they are very large, the

response times from a full 10M document index will be way better than 20

seconds. Even on a low-RAM machine with spinning drives.



> We are yet to try on 64 bit server to check if that would change

> drastically.



I doubt it will.



Toke:

> RAMDirectory seems a better choice.

>

> Shruthi : But RAM DIrectory  has bad concurrency on multithreaded

> environments.



I assumed you would be creating a dedicated index for each request,

thereby effectively having single threaded usage for each separate

index.

Shruthi: Yes we are creating a dedicated index for each request. Ok so RAM Directory holds
good for our use case then. By the way we would be using the

Highlighter APi also..we just found out that using that API increased the index size by 4
times.



I just remembered that Lucene has an implementation dedicated to fast

indexing. Take a look at

http://lucene.apache.org/core/4_8_0/memory/org/apache/lucene/index/memory/MemoryIndex.html

It seems like just the thing for your use case.

Shruthi: Thank you will definetly try this..





> Shruthi : The same user from the same client will not be searching for

> same phrase again unless he has amnesia. This was already discussed

> with our architects.



If your architects base their decisions on observed user behaviour, then

fine. At our library, many users refines their queries, meaning that a

common pattern is 2-4 queries that are very much alike.

Shruthi : I will put forward this approach. We search medical transcripts and most of the
time users search for drug names. I’m not sure if we can generalize this query.



> Shruthi:  Actually we have a DB query that runs prior to indexing

> which fetches max. 500 docs from 10million+ docs in NASSHARE. We then

> have to apply search phrase only on the resultant set..So this way

>

> The set is just limited to 500 -1000.



Frankly, the combination of a pre-selection with a DB query and the

addon of heavy index + search with Lucene seems like the absolute worst

of both worlds.



Does the DB-selector do anything that cannot easily be replicated in

Lucene?

Shruthi: Well,  its two stage process: Client is looking at  historical data based on a parameters
like names, dates,MRN, fields etc.. SO the query actually gets the data set fulfilling the
requirements

If client is interested in doing a text search then he would pass the search phrase on the
result set.



- Toke Eskildsen, State and University Library, Denmark







---------------------------------------------------------------------

To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org

For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message