lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenbo Zhao <zha...@gmail.com>
Subject Re: Can Lucene unite multiple instances run as one ?
Date Mon, 16 Nov 2009 07:13:55 GMT
Yes, exactly 'distributed'...
>From maintenance point of view, the 'horizontal' expandable is very important.
For my case, the data file is a kind of 'history' file, categorized
by date.  Once the data file is indexed, it will not change, unless
the searching fields changed.
Say I make whole ten years data indexed, generated 400G index,
requiring 8G ram.  When I do backup, I have to backup the entire 400G
every time.  I need another 8G machine for backup.  And 8G is not
enough, the index is increasing everyday.
Compare to distributed solution, I can split the index by year or by
seasons.  Say I have 10x40G index.  I can easily run 10 jvm process
each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
Consider the backup, 9 of 10 indexes are old, only need backup once,
they won't change.  only 1 hot index is changing everyday, so I just
backup up to 40G.  The spare machine is also very cheap.  And the
machines are so cheap, I can use VMs to run this, it's more flexible
in resource management.  As time goes by, I just install new jvm
instance when needed.  I don't worry about ram and search speed
anymore.
I do think there should be more bigger cases out there just like mine.
 The general distributed Lucene will be very useful.  It will bring
Lucene to more enterprise applications, or more bigger, industry
applications.


2009/11/16 Jacob Rhoden <jrhoden@unimelb.edu.au>:
> Sounds like you may need to have some sort of distributed system, I just
> wanted to make sure you were aware of the cost/benifits of just buying a big
> 62bit/8Gb ram machine, vs having to not only maintain and power several 32
> bit machines, but also maintain and support your now more complicated code.
>
> I have seen it too many times developers/companies spend so much money in
> not just the initial development, but long term support and maintenance that
> could have been simplified by just buying a bigger/better more powerful
> machine in the first place.
>
> I am interested to see what other people have to say about how to solve your
> problem.
>
> Best regards,
> Jacob
>
> On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
>
>> My data is categorized by date.  About 14M+ docs per month, 37M+ terms.
>> When I use 1G heap size to do search of 10 month index, I got OOM.
>> The problem is I can't increase heap size in an easy way.
>> I have several machines, all 32bit windows, 4G ram.
>> And my goal is to index 10 year's data, plus more data every day !
>> If I put all of them together, I will need 8G+ ram to run search.
>> Maybe another 8G+ ram to run indexwriter.
>>
>> I think to split large index into smaller indexes and use a group of
>> machines to work as one is more flexible and faster compare to one
>> huge ram machine.
>> Any suggestions ?  beside more rams.
>>
>>
>> 2009/11/16 Jacob Rhoden <jrhoden@unimelb.edu.au>:
>>>
>>> Not sure how large your index is,  but it might be easier (if possible to
>>> increase your memory) than to develop a fairly complicated alternative
>>> strategy.
>>>
>>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>>>
>>>> Hi, all
>>>> I'm facing a large index, on a x86 win platform which may not have big
>>>> enough jvm heap space to hold the entire index.
>>>> So, I think it's possible to split the index into several smaller
>>>> indexes, run them in different jvm instances on different machine.
>>>> Then for each query, I can concurrently run it one every indexes and
>>>> merge the result together.
>>>> This can be a workaround of OutOfMemory issue.
>>>> But before I start to do this, I want to ask if Lucene already have a
>>>> solution for things like this.
>>>> Thanks.
>>>>
>>>> --
>>>>
>>>> Best Regards,
>>>> ZHAO, Wenbo
>>>>
>>>> =======================
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>>
>>>
>>> ____________________________________
>>> Information Technology Services,
>>> The University of Melbourne
>>>
>>> Email: jrhoden@unimelb.edu.au
>>> Phone: +61 3 8344 2884
>>> Mobile: +61 4 1095 7575
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>
>>
>>
>>
>> --
>>
>> Best Regards,
>> ZHAO, Wenbo
>>
>> =======================
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> ____________________________________
> Information Technology Services,
> The University of Melbourne
>
> Email: jrhoden@unimelb.edu.au
> Phone: +61 3 8344 2884
> Mobile: +61 4 1095 7575
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message