lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Sharma <nitin.sha...@bloomreach.com>
Subject Re: Solr Hot Cpu and high load
Date Thu, 20 Feb 2014 04:41:44 GMT
Thanks, Erick. I will try that




On Sun, Feb 16, 2014 at 5:07 PM, Erick Erickson <erickerickson@gmail.com>wrote:

> Stored fields are what the Solr DocumentCache in solrconfig.xml
> is all about.
>
> My general feeling is that stored fields are mostly irrelevant for
> search speed, especially if lazy-loading is enabled. The only time
> stored fields come in to play is when assembling the final result
> list, i.e. the 10 or 20 documents that you return. That does imply
> disk I/O, and if you have massive fields theres also decompression
> to add to the CPU load.
>
> So, as usual, "it depends". Try measuring where you restrict the returned
> fields to whatever your <uniqueKey> field is for one set of tests, then
> try returning _everything_ for another?
>
> Best,
> Erick
>
>
> On Sun, Feb 16, 2014 at 12:18 PM, Nitin Sharma
> <nitin.sharma@bloomreach.com>wrote:
>
> > Thanks Tri
> >
> >
> > *a. Are you docs distributed evenly across shards: number of docs and
> size
> > of the shards*
> > >> Yes the size of all the shards is equal (an ignorable delta in the
> order
> > of KB) and so are the # of docs
> >
> > *b. Is your test client querying all nodes, or all the queries go to
> those
> > 2 busy nodes?*
> > *>> *Yes all nodes are receiving exactly the same amount of queries
> >
> >
> > I have one more question. Do stored fields have significant impact on
> > performance of solr queries? Having 50% of the fields stored ( out of 100
> > fields) significantly worse that having 20% of the fields stored?
> > (signficantly == orders of 100s of milliseconds assuming all fields are
> of
> > the same size and type)
> >
> > How are stored fields retrieved in general (always from disk or loaded
> into
> > memory in the first query and then going forward read from memory?)
> >
> > Thanks
> > Nitin
> >
> >
> >
> > On Fri, Feb 14, 2014 at 11:45 AM, Tri Cao <tmcao@me.com> wrote:
> >
> > > 1. Yes, that's the right way to go, well, in theory at least :)
> > > 2. Yes, queries are alway fanned to all shards and will be as slow as
> the
> > > slowest shard. When I looked into
> > > Solr distributed querying implementation a few months back, the support
> > > for graceful degradation for things
> > > like network failures and slow shards was not there yet.
> > > 3. I doubt mmap settings would impact your read-only load, and it seems
> > > you can easily
> > > fit your index in RAM. You could try to warm the file cache to make
> sure
> > > with "cat $sorl_dir > /dev/null".
> > >
> > > It's odd that only 2 nodes are at 100% in your set up. I would check a
> > > couple of things:
> > > a. Are you docs distributed evenly across shards: number of docs and
> size
> > > of the shards
> > > b. Is your test client querying all nodes, or all the queries go to
> those
> > > 2 busy nodes?
> > >
> > > Regards,
> > > Tri
> > >
> > > On Feb 14, 2014, at 10:52 AM, Nitin Sharma <
> nitin.sharma@bloomreach.com>
> > > wrote:
> > >
> > > Hell folks
> > >
> > > We are currently using solrcloud 4.3.1. We have 8 node solrcloud
> cluster
> > > with 32 cores, 60Gb of ram and SSDs.We are using zk to manage the
> > > solrconfig used by our collections
> > >
> > > We have many collections and some of them are relatively very large
> > > compared to the other. The size of the shard of these big collections
> are
> > > in the order of Gigabytes.We decided to split the bigger collection
> > evenly
> > > across all nodes (8 shards and 2 replicas) with maxNumShards > 1.
> > >
> > > We did a test with a read load only on one big collection and we still
> > see
> > > only 2 nodes running 100% CPU and the rest are blazing through the
> > queries
> > > way faster (under 30% cpu). [Despite all of them being sharded across
> all
> > > nodes]
> > >
> > > I checked the JVM usage and found that none of the pools have high
> > > utilization (except Survivor space which is 100%). The GC cycles are in
> > > the order of ms and mostly doing scavenge. Mark and sweep occurs once
> > every
> > > 30 minutes
> > >
> > > Few questions:
> > >
> > > 1. Sharding all collections (small and large) across all nodes evenly
> > >
> > > distributes the load and makes the system characteristics of all
> machines
> > > similar. Is this a recommended way to do ?
> > > 2. Solr Cloud does a distributed query by default. So if a node is at
> > >
> > > 100% CPU does it slow down the response time for the other nodes
> waiting
> > > for this query? (or does it have a timeout if it cannot get a response
> > from
> > > a node within x seconds?)
> > > 3. Our collections use Mmap directory but i specifically haven't
> enabled
> > >
> > > anything related to mmaps (locked pages under ulimit ). Does it adverse
> > > affect performance? or can lock pages even without this?
> > >
> > > Thanks a lot in advance.
> > > Nitin
> > >
> > >
> >
> >
> > --
> > - N
> >
>



-- 
- N

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message