lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenbo Zhao <zha...@gmail.com>
Subject Re: Can Lucene unite multiple instances run as one ?
Date Mon, 16 Nov 2009 14:35:25 GMT
I just checked 2.9.1 doc from
http://lucene.apache.org/java/2_9_1/api/core/index.html
I can't find the RemoteSearchable you mentioned.
I don't know SOLR yet, look at it tomorrow.
Thanks

2009/11/16 Erick Erickson <erickerickson@gmail.com>:
> I should have read more carefully.
>
> Look at the Searchable definition. One of the concrete realizations
> of that interface is a RemoteSearchable, which is what you're
> asking for I think.
>
> Have you thought about SOLR? It's built on top of Lucene and has
> lots of stuff built in for handling distributed indexes....
>
> Best
> Erick
>
>
> On Mon, Nov 16, 2009 at 9:06 AM, Wenbo Zhao <zhaowb@gmail.com> wrote:
>
>> About the ParallelMultiSearcher, I don't really know that yet, just a
>> quick look at jdoc.  It seems to be a searcher searches other
>> searchables.   If all searchables are in same jvm, it won't help.   If
>> there is some searchable implementation can work as proxy for a
>> 'remote' lucene instance, then it might be what I'm looking for.  Is
>> there such a class ?
>>
>> 2009/11/16 Erick Erickson <erickerickson@gmail.com>:
>> > I confess that I've just skimmed your e-mail, but there's absolutely
>> > no requirement that the entire index fit in RAM. The fact that your
>> > index is larger than available RAM isn't the reason you're hitting OOM.
>> >
>> > Typical reasons for this are:
>> > 1> you're sorting on a field with many, many, many unique values. If
>> > you're sorting on a fine-grained timestamp, this is quite possible.
>> > 2> You've bumped MAX_BOOLEAN_CLAUSES and are searching
>> > on, say, one-letter wildcards.
>> > 3> many other reasons.
>> >
>> > I agree with Jacob, jumping into a multi-machine solution without
>> > understanding the problem in detail may not be your best course.
>> >
>> > So, can you tell us more about the conditions under which you hit
>> > OOM? Maybe with more details we can come up with better solutions.
>> >
>> > If you absolutely *must* implement a multi-machine solution, have
>> > you seen ParallelMultiSearcher?
>> >
>> > Best
>> > Erick
>> >
>> > On Mon, Nov 16, 2009 at 2:13 AM, Wenbo Zhao <zhaowb@gmail.com> wrote:
>> >
>> >> Yes, exactly 'distributed'...
>> >> From maintenance point of view, the 'horizontal' expandable is very
>> >> important.
>> >> For my case, the data file is a kind of 'history' file, categorized
>> >> by date.  Once the data file is indexed, it will not change, unless
>> >> the searching fields changed.
>> >> Say I make whole ten years data indexed, generated 400G index,
>> >> requiring 8G ram.  When I do backup, I have to backup the entire 400G
>> >> every time.  I need another 8G machine for backup.  And 8G is not
>> >> enough, the index is increasing everyday.
>> >> Compare to distributed solution, I can split the index by year or by
>> >> seasons.  Say I have 10x40G index.  I can easily run 10 jvm process
>> >> each with 1G heap space, in 3-5 low cost not dedicated x86 machines.
>> >> Consider the backup, 9 of 10 indexes are old, only need backup once,
>> >> they won't change.  only 1 hot index is changing everyday, so I just
>> >> backup up to 40G.  The spare machine is also very cheap.  And the
>> >> machines are so cheap, I can use VMs to run this, it's more flexible
>> >> in resource management.  As time goes by, I just install new jvm
>> >> instance when needed.  I don't worry about ram and search speed
>> >> anymore.
>> >> I do think there should be more bigger cases out there just like mine.
>> >>  The general distributed Lucene will be very useful.  It will bring
>> >> Lucene to more enterprise applications, or more bigger, industry
>> >> applications.
>> >>
>> >>
>> >> 2009/11/16 Jacob Rhoden <jrhoden@unimelb.edu.au>:
>> >> > Sounds like you may need to have some sort of distributed system, I
>> just
>> >> > wanted to make sure you were aware of the cost/benifits of just buying
>> a
>> >> big
>> >> > 62bit/8Gb ram machine, vs having to not only maintain and power
>> several
>> >> 32
>> >> > bit machines, but also maintain and support your now more complicated
>> >> code.
>> >> >
>> >> > I have seen it too many times developers/companies spend so much money
>> in
>> >> > not just the initial development, but long term support and
>> maintenance
>> >> that
>> >> > could have been simplified by just buying a bigger/better more
>> powerful
>> >> > machine in the first place.
>> >> >
>> >> > I am interested to see what other people have to say about how to
>> solve
>> >> your
>> >> > problem.
>> >> >
>> >> > Best regards,
>> >> > Jacob
>> >> >
>> >> > On 16/11/2009, at 3:39 PM, Wenbo Zhao wrote:
>> >> >
>> >> >> My data is categorized by date.  About 14M+ docs per month, 37M+
>> terms.
>> >> >> When I use 1G heap size to do search of 10 month index, I got OOM.
>> >> >> The problem is I can't increase heap size in an easy way.
>> >> >> I have several machines, all 32bit windows, 4G ram.
>> >> >> And my goal is to index 10 year's data, plus more data every day
!
>> >> >> If I put all of them together, I will need 8G+ ram to run search.
>> >> >> Maybe another 8G+ ram to run indexwriter.
>> >> >>
>> >> >> I think to split large index into smaller indexes and use a group
of
>> >> >> machines to work as one is more flexible and faster compare to
one
>> >> >> huge ram machine.
>> >> >> Any suggestions ?  beside more rams.
>> >> >>
>> >> >>
>> >> >> 2009/11/16 Jacob Rhoden <jrhoden@unimelb.edu.au>:
>> >> >>>
>> >> >>> Not sure how large your index is,  but it might be easier
(if
>> possible
>> >> to
>> >> >>> increase your memory) than to develop a fairly complicated
>> alternative
>> >> >>> strategy.
>> >> >>>
>> >> >>> On 16/11/2009, at 2:12 PM, Wenbo Zhao wrote:
>> >> >>>
>> >> >>>> Hi, all
>> >> >>>> I'm facing a large index, on a x86 win platform which may
not have
>> big
>> >> >>>> enough jvm heap space to hold the entire index.
>> >> >>>> So, I think it's possible to split the index into several
smaller
>> >> >>>> indexes, run them in different jvm instances on different
machine.
>> >> >>>> Then for each query, I can concurrently run it one every
indexes
>> and
>> >> >>>> merge the result together.
>> >> >>>> This can be a workaround of OutOfMemory issue.
>> >> >>>> But before I start to do this, I want to ask if Lucene
already have
>> a
>> >> >>>> solution for things like this.
>> >> >>>> Thanks.
>> >> >>>>
>> >> >>>> --
>> >> >>>>
>> >> >>>> Best Regards,
>> >> >>>> ZHAO, Wenbo
>> >> >>>>
>> >> >>>> =======================
>> >> >>>>
>> >> >>>>
>> ---------------------------------------------------------------------
>> >> >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >>>>
>> >> >>>
>> >> >>> ____________________________________
>> >> >>> Information Technology Services,
>> >> >>> The University of Melbourne
>> >> >>>
>> >> >>> Email: jrhoden@unimelb.edu.au
>> >> >>> Phone: +61 3 8344 2884
>> >> >>> Mobile: +61 4 1095 7575
>> >> >>>
>> >> >>>
>> >> >>>
>> ---------------------------------------------------------------------
>> >> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >>>
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >>
>> >> >> Best Regards,
>> >> >> ZHAO, Wenbo
>> >> >>
>> >> >> =======================
>> >> >>
>> >> >> ---------------------------------------------------------------------
>> >> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >>
>> >> >
>> >> > ____________________________________
>> >> > Information Technology Services,
>> >> > The University of Melbourne
>> >> >
>> >> > Email: jrhoden@unimelb.edu.au
>> >> > Phone: +61 3 8344 2884
>> >> > Mobile: +61 4 1095 7575
>> >> >
>> >> >
>> >> > ---------------------------------------------------------------------
>> >> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >>
>> >> Best Regards,
>> >> ZHAO, Wenbo
>> >>
>> >> =======================
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> >>
>> >>
>> >
>>
>>
>>
>> --
>>
>> Best Regards,
>> ZHAO, Wenbo
>>
>> =======================
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>



-- 

Best Regards,
ZHAO, Wenbo

=======================

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message