lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Martin <wmartin...@outlook.com>
Subject Re: Solr performance on EC2 linux
Date Tue, 02 May 2017 02:22:16 GMT
Ubuntu 16.04 LTS - Xenial (HVM)

Is this your Xenial version?




On 5/1/2017 6:37 PM, Jeff Wartes wrote:
> I tried a few variations of various things before we found and tried that linux/EC2 tuning
page, including:
>    - EC2 instance type: r4, c4, and i3
>    - Ubuntu version: Xenial and Trusty
>    - EBS vs local storage
>    - Stock openjdk vs Zulu openjdk (Recent java8 in both cases - I’m aware of the issues
with early java8 versions and I’m not using G1)
>
> Most of those attempts were to help reduce differences between the data center and the
EC2 cluster. In all cases I re-indexed from scratch. I got the same very high system-time
symptom in all cases. With the linux changes in place, we settled on r4/Xenial/EBS/Stock.
>
> Again, this was a slightly modified Solr 5.4, (I added backup requests, and two memory
allocation rate tweaks that have long since been merged into mainline - released in 6.2 I
think. I can dig up the jira numbers if anyone’s interested) I’ve never used Solr 6.x
in production though.
> The only reason I mentioned 6.x at all is because I’m aware that ES 5.x is based on
Lucene 6.2. I don’t believe my coworker spent any time on tuning his ES setup, although
I think he did try G1.
>
> I definitely do want to binary-search those settings until I understand better what exactly
did the trick.
> It’s a long cycle time per test is the problem, but hopefully in the next couple of
weeks.
>
>
>
> On 5/1/17, 7:26 AM, "John Bickerstaff" <john@johnbickerstaff.com> wrote:
>
>      It's also very important to consider the type of EC2 instance you are
>      using...
>      
>      We settled on the R4.2XL...  The R series is labeled "High-Memory"
>      
>      Which instance type did you end up using?
>      
>      On Mon, May 1, 2017 at 8:22 AM, Shawn Heisey <apache@elyograg.org> wrote:
>      
>      > On 4/28/2017 10:09 AM, Jeff Wartes wrote:
>      > > tldr: Recently, I tried moving an existing solrcloud configuration from
>      > a local datacenter to EC2. Performance was roughly 1/10th what I’d
>      > expected, until I applied a bunch of linux tweaks.
>      >
>      > How very strange.  I knew virtualization would have overheard, possibly
>      > even measurable overhead, but that's insane.  Running on bare metal is
>      > always better if you can do it.  I would be curious what would happen on
>      > your original install if you applied similar tuning to that.  Would you
>      > see a speedup there?
>      >
>      > > Interestingly, a coworker playing with a ElasticSearch (ES 5.x, so a
>      > much more recent release) alternate implementation of the same index was
>      > not seeing this high-system-time behavior on EC2, and was getting
>      > throughput consistent with our general expectations.
>      >
>      > That's even weirder.  ES 5.x will likely be using Points field types for
>      > numeric fields, and although those are faster than what Solr currently
>      > uses, I doubt it could explain that difference.  The implication here is
>      > that the ES systems are running with stock EC2 settings, not the tuned
>      > settings ... but I'd like you to confirm that.  Same Java version as
>      > with Solr?  IMHO, Java itself is more likely to cause issues like you
>      > saw than Solr.
>      >
>      > > I’m writing this for a few reasons:
>      > >
>      > > 1.       The performance difference was so crazy I really feel like this
>      > should really be broader knowledge.
>      >
>      > Definitely agree!  I would be very interested in learning which of the
>      > tunables you changed were major contributors to the improvement.  If it
>      > turns out that Solr's code is sub-optimal in some way, maybe we can fix it.
>      >
>      > > 2.       If anyone is aware of anything that changed in Lucene between
>      > 5.4 and 6.x that could explain why Elasticsearch wasn’t suffering from
>      > this? If it’s the clocksource that’s the issue, there’s an implication
that
>      > Solr was using tons more system calls like gettimeofday that the EC2 (xen)
>      > hypervisor doesn’t allow in userspace.
>      >
>      > I had not considered the performance regression in 6.4.0 and 6.4.1 that
>      > Erick mentioned.  Were you still running Solr 5.4, or was it a 6.x version?
>      >
>      > =============
>      >
>      > Specific thoughts on the tuning:
>      >
>      > The noatime option is very good to use.  I also use nodiratime on my
>      > systems.  Turning these off can have *massive* impacts on disk
>      > performance.  If these are the source of the speedup, then the machine
>      > doesn't have enough spare memory.
>      >
>      > I'd be wary of the "nobarrier" mount option.  If the underlying storage
>      > has battery-backed write caches, or is SSD without write caching, it
>      > wouldn't be a problem.  Here's info about the "discard" mount option, I
>      > don't know whether it applies to your amazon storage:
>      >
>      >        discard/nodiscard
>      >               Controls  whether ext4 should issue discard/TRIM commands
>      > to the
>      >               underlying block device when blocks are freed.  This  is
>      > useful
>      >               for  SSD  devices  and sparse/thinly-provisioned LUNs, but
>      > it is
>      >               off by default until sufficient testing has been done.
>      >
>      > The network tunables would have more of an effect in a distributed
>      > environment like EC2 than they would on a LAN.
>      >
>      > Thanks,
>      > Shawn
>      >
>      >
>      
>

Mime
View raw message