lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject Re: Solr performance on EC2 linux
Date Tue, 02 May 2017 04:22:08 GMT
Yes, that’s the Xenial I tried. Ubuntu 16.04.2 LTS.

On 5/1/17, 7:22 PM, "Will Martin" <wmartinusa@outlook.com> wrote:

    Ubuntu 16.04 LTS - Xenial (HVM)
    
    Is this your Xenial version?
    
    
    
    
    On 5/1/2017 6:37 PM, Jeff Wartes wrote:
    > I tried a few variations of various things before we found and tried that linux/EC2
tuning page, including:
    >    - EC2 instance type: r4, c4, and i3
    >    - Ubuntu version: Xenial and Trusty
    >    - EBS vs local storage
    >    - Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m aware of the
issues with early java8 versions and I’m not using G1)
    >
    > Most of those attempts were to help reduce differences between the data center and
the EC2 cluster. In all cases I re-indexed from scratch. I got the same very high system-time
symptom in all cases. With the linux changes in place, we settled on r4/Xenial/EBS/Stock.
    >
    > Again, this was a slightly modified Solr 5.4, (I added backup requests, and two memory
allocation rate tweaks that have long since been merged into mainline - released in 6.2 I
think. I can dig up the jira numbers if anyone’s interested) I’ve never used Solr 6.x
in production though.
    > The only reason I mentioned 6.x at all is because I’m aware that ES 5.x is based
on Lucene 6.2. I don’t believe my coworker spent any time on tuning his ES setup, although
I think he did try G1.
    >
    > I definitely do want to binary-search those settings until I understand better what
exactly did the trick.
    > It’s a long cycle time per test is the problem, but hopefully in the next couple
of weeks.
    >
    >
    >
    > On 5/1/17, 7:26 AM, "John Bickerstaff" <john@johnbickerstaff.com> wrote:
    >
    >      It's also very important to consider the type of EC2 instance you are
    >      using...
    >      
    >      We settled on the R4.2XL...  The R series is labeled "High-Memory"
    >      
    >      Which instance type did you end up using?
    >      
    >      On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey <apache@elyograg.org> wrote:
    >      
    >      > On 4/28/2017 10:09 AM, Jeff Wartes wrote:
    >      > > tldr: Recently, I tried moving an existing solrcloud configuration
from
    >      > a local datacenter to EC2. Performance was roughly 1/10th what I’d
    >      > expected, until I applied a bunch of linux tweaks.
    >      >
    >      > How very strange.  I knew virtualization would have overheard, possibly
    >      > even measurable overhead, but that's insane.  Running on bare metal is
    >      > always better if you can do it.  I would be curious what would happen on
    >      > your original install if you applied similar tuning to that.  Would you
    >      > see a speedup there?
    >      >
    >      > > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so
a
    >      > much more recent release) alternate implementation of the same index was
    >      > not seeing this high-system-time behavior on EC2, and was getting
    >      > throughput consistent with our general expectations.
    >      >
    >      > That's even weirder.  ES 5.x will likely be using Points field types for
    >      > numeric fields, and although those are faster than what Solr currently
    >      > uses, I doubt it could explain that difference.  The implication here is
    >      > that the ES systems are running with stock EC2 settings, not the tuned
    >      > settings ... but I'd like you to confirm that.  Same Java version as
    >      > with Solr?  IMHO, Java itself is more likely to cause issues like you
    >      > saw than Solr.
    >      >
    >      > > I’m writing this for a few reasons:
    >      > >
    >      > > 1.       The performance difference was so crazy I really feel like
this
    >      > should really be broader knowledge.
    >      >
    >      > Definitely agree!  I would be very interested in learning which of the
    >      > tunables you changed were major contributors to the improvement.  If it
    >      > turns out that Solr's code is sub-optimal in some way, maybe we can fix
it.
    >      >
    >      > > 2.       If anyone is aware of anything that changed in Lucene between
    >      > 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering from
    >      > this? If it’s the clocksource that’s the issue, there’s an implication
that
    >      > Solr was using tons more system calls like gettimeofday that the EC2 (xen)
    >      > hypervisor doesn’t allow in userspace.
    >      >
    >      > I had not considered the performance regression in 6.4.0 and 6.4.1 that
    >      > Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x version?
    >      >
    >      > =============
    >      >
    >      > Specific thoughts on the tuning:
    >      >
    >      > The noatime option is very good to use.  I also use nodiratime on my
    >      > systems.  Turning these off can have *massive* impacts on disk
    >      > performance.  If these are the source of the speedup, then the machine
    >      > doesn't have enough spare memory.
    >      >
    >      > I'd be wary of the "nobarrier" mount option.  If the underlying storage
    >      > has battery-backed write caches, or is SSD without write caching, it
    >      > wouldn't be a problem.  Here's info about the "discard" mount option, I
    >      > don't know whether it applies to your amazon storage:
    >      >
    >      >        discard/nodiscard
    >      >               Controls  whether ext4 should issue discard/TRIM commands
    >      > to the
    >      >               underlying block device when blocks are freed.  This  is
    >      > useful
    >      >               for  SSD  devices  and sparse/thinly-provisioned LUNs, but
    >      > it is
    >      >               off by default until sufficient testing has been done.
    >      >
    >      > The network tunables would have more of an effect in a distributed
    >      > environment like EC2 than they would on a LAN.
    >      >
    >      > Thanks,
    >      > Shawn
    >      >
    >      >
    >      
    >
    
    

Mime
View raw message