lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Solr+HDFS
Date Fri, 05 Feb 2016 17:46:55 GMT
bq: I assume this would go along with also increasing autoCommit?

Not necessarily, the two are have much different consequences if
openSearcher is set to false for autoCommit. Essentially all this is
doing is flushing the current segments to disk and opening new
segments, no autowarming etc. is being done.

Here's the long form:

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Whether the Solr index is on HDFS or not, the process is the same.

Best,
Erick

On Fri, Feb 5, 2016 at 9:23 AM, Joseph Obernberger
<joseph.obernberger@gmail.com> wrote:
> Thank you Shawn.  Sounds like increasing the autoSoftCommit maxTime would
> be a good idea.  I assume this would go along with also increasing
> autoCommit?
> All of our collections (just 2 at the moment) have the same settings.  The
> data directory is in HDFS and is the same data directory for every shard.
> The two cores have different directories.
> ----------------
> root@hades logs]# hadoop fs -ls /solr5.2
> Found 2 items
> drwxr-xr-x   - solr hadoop          0 2015-10-05 12:54 /solr5.2/IMAGEDATA
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:54 /solr5.2/DOCUMENTS
>
> [root@hades logs]# hadoop fs -ls /solr5.2/DOCUMENTS
> Found 27 items
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:08
> /solr5.2/DOCUMENTS/core_node1
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
> /solr5.2/DOCUMENTS/core_node10
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node11
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node12
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node13
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node14
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node15
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node16
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node17
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node18
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node19
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:08
> /solr5.2/DOCUMENTS/core_node2
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node20
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node21
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node22
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node23
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node24
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:36
> /solr5.2/DOCUMENTS/core_node25
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:44
> /solr5.2/DOCUMENTS/core_node26
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:54
> /solr5.2/DOCUMENTS/core_node27
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:08
> /solr5.2/DOCUMENTS/core_node3
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:21
> /solr5.2/DOCUMENTS/core_node4
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:34
> /solr5.2/DOCUMENTS/core_node5
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:34
> /solr5.2/DOCUMENTS/core_node6
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
> /solr5.2/DOCUMENTS/core_node7
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
> /solr5.2/DOCUMENTS/core_node8
> drwxr-xr-x   - solr hadoop          0 2015-06-09 15:35
> /solr5.2/DOCUMENTS/core_node9
> -----------------
>
> Right now we are not running any replicas.
>
> -Joe
>
> On Fri, Feb 5, 2016 at 10:43 AM, Shawn Heisey <apache@elyograg.org> wrote:
>
>> On 2/5/2016 8:11 AM, Joseph Obernberger wrote:
>> > Thank you for the reply Scott - we have the commit settings as:
>> > <autoCommit>
>> >       <maxTime>60000</maxTime>
>> >       <openSearcher>false</openSearcher>
>> > </autoCommit>
>> > <autoSoftCommit>
>> >         <maxTime>15000</maxTime>
>> > </autoSoftCommit>
>> >
>> > Is that 50% disk space rule across the entire HDFS cluster or on an
>> > individual spindle?
>>
>> That autoSoftCommit maxTime is pretty small.  Frequent commits can be a
>> source of problems, if the actual commits take anywhere near (or longer
>> than) the maxTime value to complete.  If your commits are taking
>> significantly less than 15 seconds to complete, then it probably isn't
>> anything to worry about.
>>
>> The rule with disk space and Solr/Lucene is that you must have enough
>> free disk space for your largest index to triple in size temporarily,
>> and it's actually recommended to have three times the disk space of
>> *all* your indexes, not just the largest.  Most of the time the largest
>> merge you'll see will double the disk space, but in some unusual edge
>> cases, it can triple.
>>
>> I have no idea how disk space works with HDFS when individual data nodes
>> become full.  Someone else will have to tackle that question, and it
>> might need to be answered by the Hadoop project rather than here.
>>
>> With autoCommit at 60 seconds, your transaction logs should remain small
>> and there shouldn't be very many of them, so I really have no idea what
>> might be happening with those.  Do you have this same
>> autoCommit/autoSoftCommit config on every Solr collection?
>>
>> Erick's note about AlreadyBeingCreatedException may be relevant.  Are
>> you possibly sharing a data  directory between two or more Solr cores?
>> This can't normally be done, and even if you configure the locking
>> mechanism to allow it, it's NOT recommended, especially with SolrCloud.
>> In SolrCloud, all replicas will write to the index.  If two replicas try
>> to write to the same index, then that index will become corrupted and
>> unusable.
>>
>> Thanks,
>> Shawn
>>
>>

Mime
View raw message