lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anshum <ansh...@gmail.com>
Subject Re: swapping Lucene's on RAM drive
Date Wed, 11 Jun 2008 04:09:24 GMT
Hi Tom,

Just a suggestion, why don't you try explicitly using tmpfs. (considering
you use a unix/linux server. I tried both, RAMDirectory implementation and
tmpfs and found the latter more appealing. Firstly, in this case you'll have
more control of cleaning up/retaining as per your needs. Also, using
RAMDirectory depends on the architecture i.e. In case you run a 32 bit
machine with a 32 bit OS and a 32 bit JVM, you wouldnt be able to allocate
more than 2 Gigs of memory to any process(in this case JVM) and hence the
total memory available for utilization by the program (includes both, the
index as well as program size) would be restricted by this limit.
In case of tmpfs, there is no such limitation. All you need to do is mount a
tmpfs type partion (which uses your ram, and the total RAM available is the
only limit here) and copy the index to that mount. The just open readers to
read from that partition.

All that I said above was just a suggestion, so you might want to try.
As about automatic GC, you could also look at the options available with
JVM. :)

Hope this helps.

--
Anshum Gupta
Naukri Labs

On Tue, Jun 10, 2008 at 7:15 PM, Tom <tbee@tbee.org> wrote:

> Thanks for reacting Anshum,
>
> I also posted this on the Compass forum, but I was not sure where the
> responsibility was. But your information did give me a good start. From the
> Compass' RAMDirectoryStore source code: "uses
> org.apache.lucene.store.RAMDirectory"
>
> So appearantly I use two separate Lucene RamDirectory instances, one for
> each Compass/Lucene engine, which would mean that the data is garbage
> collected. Good!
>
> I also have the sheer impression that the id I postfix to the "ram://" is
> quite useless. I cannot find any location where that information is actually
> used to configure separate filesystems on a ram drive. As a comparison, the
> FSDirectoryStore has a "configure" method which actually uses the part
> behind the ://, but the RAMDirectoryStore only is created. In fact the
> RAMDirectoryStore does nothing else aside from being created; no deleteIndex
> is implemented, no cleanIndex, no nothing. It seems a miracle it works at
> all ;-) The only thing that is present is a delete-files loop in
> beforeCopyFrom.
>
> So I think I dare to assume that the memory is garbage collected.
>
> Thanks again.
>
> Tom
>
>
>
>
>
>
> Anshum wrote:
>
>> Hi Tom, perhaps this is regarding Compass and you would have wanted to
>> post
>> this query there.
>> In case you are looking and asking about running 2 instances of lucene on
>> the machine with 2 separate indexes in RAM, could you specify :
>> * Do you use tmpfs or do you load it using the RamDirectory class?
>> * In case you use tmpfs, you'd have to clean it manually after releasing
>> the
>> hold of the engine (by stopping or closing the old index reader). The
>> cleaning is done straight off by deleting the index or moving the index
>> off
>> the tmpfs after closing the index reader. Incase you happen to do it
>> without
>> closing the reader, though the dir won't be listed under the directory
>> listing but it would still be using all of that space (which you would
>> only
>> realise if you run out of space i.e. have space constraints).
>> * In the other case, if you are just using the OS cache mechanism or
>> something like it, it would not get cleaned up but gradually be cleaned as
>> and when the OS needs to allocate more memory (that is how caching works).
>>
>> --
>> Anshum
>> Naukri Labs!
>>
>> On Tue, Jun 10, 2008 at 2:53 PM, Tom <tbee@tbee.org> wrote:
>>
>>
>>
>>> I have not been able to find much information about this, hence this
>>> question.
>>>
>>> Currently I use Lucene through Compass with the data stored in RAM. The
>>> indexed information is updated daily and therefore I create a new
>>> Compass/Lucene combination every day, let it load the new data and then
>>> swap
>>> the active search engine (simply assigning a 'global' variable). So, all
>>> in
>>> memory.
>>>
>>> I'm not confident what this will do with the memory. In my setupo each
>>> Compass/Lucene engine gets a separate ram store based on a different id
>>> per
>>> engine:
>>>
>>>  ...setSetting(CompassEnvironment.CONNECTION, "ram://" + id)
>>>
>>> And this seems to be working okay. However:
>>> 1. Is this the correct way to have 2 engines running next to each other
>>> (while the second one is loading the new data)?
>>> 2. After the engines are swapped, is the RAM store of the now not used
>>> engine automatically cleaned up? If not, how do I?
>>>
>>> Thanks for any help!
>>>
>>> Tom
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>>
>>>
>>
>>
>>
>>
>
>


-- 
--
The facts expressed here belong to everybody, the opinions to me.
The distinction is yours to draw............

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message