incubator-blur-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ravikumar Govindarajan <ravikumar.govindara...@gmail.com>
Subject Re: General guidance on blur-shard server
Date Mon, 02 Mar 2015 06:01:51 GMT
Many thanks Aaron…

Ok that sounds about right.  This will greatly depend on your data, how
> many fields, how many terms, etc.


Fields will be around 20-25.. But majority of searches happen on only 5-6
fields. Term count ~= 2.5 million.

I would recommend using the latest Java7
> (perhaps Java8 but I haven't tested with it), and use the G1 garbage
> collector if you plan on running larger heaps


We are using latest version of 1.7. Actually we imported around 2TB of data
as dry-run with just 16GB heap without major GC issues using good old CMS.
There is no sorting/faceting during searches also. My reluctance stems from
the fact that I am not quite familiar with G1 :)

I am favouring more for write-thru cache [at-least around 50Gb] rather than
read-cache because we have a lot of free RAM available & I feel read cache
is going to use very less RAM. But I am not sure about this. Any pointers
will greatly help.

I would also recommend that you increase the
> block cache buffer and file buffer sizes from 8K to 64K


This is one issue we faced during dry-run. Marking-up from 8K to 16K solved
the issue. I thought 32K must be a good fit for us. Will surely explore
this…

Thanks again for helping out

On Sat, Feb 28, 2015 at 3:04 AM, Aaron McCurry <amccurry@gmail.com> wrote:

> On Fri, Feb 27, 2015 at 1:29 AM, Ravikumar Govindarajan <
> ravikumar.govindarajan@gmail.com> wrote:
>
> > Hi,
> >
> > I need a general guidance on number of machines/shards required for our
> > first blur set-up
> >
> > Some data as follows
> >
> > 1. Shard-Server Config : 128GB RAM, 16-core dual socket with
> > hyper-threading. 32 procs
> > 2. Total dataset size: 10TB. With rep-factor=3, total cluster-size=30TB.
> > Pre-populated via
> >     MR or Thrift...
> > 3. We receive very less queries per minute [600-900 queries]. But the
> > response times for
> >     every query must be <=150 ms
> >
> > Initially we thought we can create 500 shards each of 20GB size with
> around
> > 20 machines.
> >
>
> Ok that sounds about right.  This will greatly depend on your data, how
> many fields, how many terms, etc.
>
>
> >
> > Can each shard-server machine with above specs handle 25 shards? Is such
> a
> > configuration over-utilized/under-utilized?
> >
>
> Again, it likely depends.  I would recommend using the latest Java7
> (perhaps Java8 but I haven't tested with it), and use the G1 garbage
> collector if you plan on running larger heaps.  Whatever is leftover can be
> allocated to the block cache.  I would also recommend that you increase the
> block cache buffer and file buffer sizes from 8K to 64K.  This will also
> decrease heap pressure for the number of entries in the block cache lru
> map.
>
>
> >
> > How do folks run in production. Any numbers/pointers will be really
> helpful
> >
>
> Generally large shard servers (like the ones you are suggesting) with a few
> controllers (6-12) are typical setups.
>
> If I can provide more details I will follow up.
>
> Aaron
>
>
> >
> > --
> > Ravi
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message