lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Norskog, Lance" <la...@divvio.com>
Subject RE: Cache use
Date Thu, 06 Dec 2007 21:35:47 GMT
There are query and document field caches. A query cache is a list of
records that match a query. A document cache actually contains the
fields. Fetching from your query cache still has to assemble the results
from the indexed data. If the ram-based index is paging, that is an
answerr.  

Note that Lucene stores different fields of the same query, and the
index output, in different ares of the index. In my case, with very
small records of maybe 20 fields, there was a 5% difference between
fetching one field and all fields. This could be very different with
your index.

Lance 

-----Original Message-----
From: sfox [mailto:sfox@carleton.edu] 
Sent: Thursday, December 06, 2007 1:24 PM
To: solr-user@lucene.apache.org
Subject: Re: Cache use

One possible explanation is that the OS's native file system caching is
being successful at keeping these files mostly in RAM most of the time. 
  And so the performance benefits of 'forcing' the files into RAM by
using tmpfs aren't significant.

So the slowness of the queries is the result of being CPU bound, rather
than IO bound.  The cache within Solr is faster because it is saving and
returning the information for which the CPU-bound work has already been
done.

Just one possible explanation.

Sean Fox

Matthew Phillips wrote:
> No one has a suggestion? I must be missing something because as I 
> understand it from Dennis' email, all of queries are very quick 
> (cached type response times) whereas mine are not. I can clearly see 
> time differences between queries that are cached (things that have 
> been auto
> warmed) and queries that are not. This seems odd as my whole index is 
> loaded on a tmpfs memory based file system. Thanks for the help.
> 
> Matt
> 
> On Dec 4, 2007, at 3:55 PM, Matthew Phillips wrote:
> 
>> Thanks for the suggestion, Dennis. I decided to implement this as you

>> described on my collection of about 400,000 documents, but I did not 
>> receive the results I expected.
>>
>> Prior to putting the indexes on a tmpfs, I did a bit of benchmarking 
>> and found that it usually takes a little under two seconds for each 
>> facet query. After moving my indexes from disk to a tmpfs file 
>> system, I seem to get about the same result from facet queries: about

>> two seconds.
>>
>> Does anyone have any insight into this? Doesn't it seem odd that my 
>> response times are about the same? Thanks for the help.
>>
>> Matt Phillips
>>
>> Dennis Kubes wrote:
>>> One way to do this if you are running on linux is to create a tempfs

>>> (which is ram) and then mount the filesystem in the ram.  Then your 
>>> index acts normally to the application but is essentially served 
>>> from Ram.  This is how we server the Nutch lucene indexes on our web

>>> search engine (www.visvo.com) which is ~100M pages.  Below is how 
>>> you can achieve this, assuming your indexes are in /path/to/indexes:
>>> mv /path/to/indexes /path/to/indexes.dist mkdir /path/to/indexes cd 
>>> /path/to mount -t tmpfs -o size=2684354560 none /path/to/indexes 
>>> rsync --progress -aptv indexes.dist/* indexes/ chown -R user:group 
>>> indexes This would of course be limited by the amount of RAM you 
>>> have on the machine.  But with this approach most searches are 
>>> sub-second.
>>> Dennis Kubes
>>> Evgeniy Strokin wrote:
>>>> Hello,...
>>>> we have 110M records index under Solr. Some queries takes a while, 
>>>> but we need sub-second results. I guess the only solution is cache 
>>>> (something else?)...
>>>> We use standard LRUCache. In docs it says (as far as I understood) 
>>>> that it loads view of index in to memory and next time works with 
>>>> memory instead of hard drive.
>>>> So, my question: hypothetically, we can have all index in memory if

>>>> we'd have enough memory size, right? In this case the result should

>>>> come up very fast. We have very rear updates. So I think this could

>>>> be a solution.
>>>> How should I configure the cache to achieve such approach?
>>>> Thanks for any advise.
>>>> Gene
> 

Mime
View raw message