lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Johnson (JIRA)" <>
Subject [jira] [Commented] (SOLR-7393) HDFS poor indexing performance
Date Fri, 09 Sep 2016 19:26:21 GMT


David Johnson commented on SOLR-7393:

Does Hadoop have the native library configuration set appropriately?  

> HDFS poor indexing performance
> ------------------------------
>                 Key: SOLR-7393
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: Hadoop Integration, hdfs, SolrCloud
>    Affects Versions: 4.7.2, 4.10.3
>         Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
>            Reporter: Hari Sekhon
>            Priority: Critical
> When switching SolrCloud from local dataDir to HDFS directory factory indexing performance
falls through the floor.
> I've also observed very high latency on both QTime and code timer on HDFS writes compares
to local dataDir writes (using from
Single test document write latency jumps from a few dozen milliseconds to 700-1700 millisecs,
over 2000 on some runs.
> A previous bulk online indexing job from Hive to SolrCloud that took 2 hours for 620M
rows ended up taking a projected 20+ hours and never completing, usually breaking around the
16-17 hour timeframe when left overnight.
> It's worth noting that I had to disable the HDFS write cache which was causing index
corruption (SOLR-7255) on the advice of Mark Miller, who tells me this doesn't make much performance
difference anway.
> This is probably also related to SolrCloud not respecting HDFS replication factor, effectively
making 4 copies of data instead of 2 (SOLR-6528), but that solely doesn't account for the
massive performance drop going from vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
> Hari Sekhon

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message