lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Haschart <rh...@virginia.edu>
Subject Re: Solr in NAS or Network Shared Drive
Date Fri, 26 May 2017 18:48:37 GMT
When the indexing solr instance finishes, it fast-copies the newly built 
core to a new directory on the network storage, and then does the 
CREATE, SWAP, UNLOAD messages.
Just before starting this message, I needed to update some records and 
re-deploy to production, the process took less time then it took me to 
write this message.

-Bob Haschart
University of Virginia Library

On 5/26/2017 2:11 PM, David Hastings wrote:
> so are "core" and "corebak" pointing to the same datadir or do you have the
> indexing solr instance keep writing to a new directory?
>
> On Fri, May 26, 2017 at 1:53 PM, Robert Haschart <rh9ec@virginia.edu> wrote:
>
>> The process we use to signal the read-only servers, is to submit a CREATE
>> request pointing to the newly created index, with a name like corebak, then
>> doing a SWAP request between core and corebak, then submit an UNLOAD
>> request for the corebak which is now pointing at the previous version.
>>
>> The individual servers cannot do a merge on their own, since they mount
>> the NAS read-only.   Nothing they can do will affect the index.  I believe
>> this allows each machine to cache much of the index in memory, with no fear
>> that their cache will be made invalid by one of the others.
>>
>> -Bob Haschart
>> University of Virginia Library
>>
>>
>>
>> On 5/26/2017 12:52 PM, David Hastings wrote:
>>
>>> Im curious about this.  when you say "and signal the three Solr servers
>>> when the updated index is available.  " how does it send the signal? IE
>>> what command, just a reload?  Also what prevents them from doing a merge
>>> on
>>> their own?  Thanks
>>>
>>> On Fri, May 26, 2017 at 12:09 PM, Robert Haschart <rh9ec@virginia.edu>
>>> wrote:
>>>
>>> We have run using this exact scenario for several years.   We have three
>>>> Solr servers sitting behind a load balancer, with all three accessing the
>>>> same Solr index stored on read-only network addressable storage.   A
>>>> fourth
>>>> machine is used to update the index (typically daily) and signal the
>>>> three
>>>> Solr servers when the updated index is available.   Our index is
>>>> primarily
>>>> bibliographic information and it contains about 8 million documents and
>>>> is
>>>> about 30GB in size.    We've used this configuration since before
>>>> Zookeeper
>>>> and Cloud-based Solr or even java-based master slave replication were
>>>> available.   I cannot say whether this configuration has any benefits
>>>> over
>>>> the current accepted way of load-balancing, but it has worked well for us
>>>> for several years and we've never had a corrupted index problem.
>>>>
>>>>
>>>> -Bob Haschart
>>>> University of Virginia Library
>>>>
>>>>
>>>>
>>>> On 5/23/2017 10:05 PM, Shawn Heisey wrote:
>>>>
>>>> On 5/19/2017 8:33 AM, Ravi Kumar Taminidi wrote:
>>>>> Hello,  Scenario: Currently we have 2 Solr Servers running in 2
>>>>>> different servers (linux), Is there any way can we make the Core
to be
>>>>>> located in NAS or Network shared Drive so both the solrs using the
same
>>>>>> Index.
>>>>>>
>>>>>> Let me know if any performance issues, our size of Index is appx
1GB.
>>>>>>
>>>>>> I think it's a very bad idea to try to share indexes between multiple
>>>>> Solr instances.  You can override the locking and get it to work, and
>>>>> you may be able to find advice on the Internet about how to do it.  I
>>>>> can tell you that it's outside the design intent for both Lucene and
>>>>> Solr.  Lucene works aggressively to *prevent* multiple processes from
>>>>> sharing an index.
>>>>>
>>>>> In general, network storage is not a good idea for Solr.  There's added
>>>>> latency for accessing any data, and frequently the filesystem won't
>>>>> support the kind of locking that Lucene wants to use, but the biggest
>>>>> potential problem is disk caching.  Solr/Lucene is absolutely reliant
on
>>>>> disk caching in the SOlr server's local memory for good performance.
 If
>>>>> the network filesystem cannot be cached by the client that has mounted
>>>>> the storage, which I believe is the case for most network filesystem
>>>>> types, then you're reliant on disk caching in the network server(s).
>>>>> For VERY large indexes, which is really the only viable use case I can
>>>>> imagine for network storage, it is highly unlikely that the network
>>>>> server(s) will have enough memory to effectively cache the data.
>>>>>
>>>>> Solr has explicit support for HDFS storage, but as I understand it, HDFS
>>>>> includes the ability for a client to allocate memory that gets used
>>>>> exclusively for caching on the client side, which allows HDFS to
>>>>> function like a local filesystem in ways that I don't think NFS can.
>>>>> Getting back to my advice about not sharing indexes -- even with
>>>>> SolrCloud on HDFS, multiple replicas generally do NOT share an index.
>>>>>
>>>>> A 1GB index is very small, so there's no good reason I can think of to
>>>>> involve network storage.  I would strongly recommend local storage, and
>>>>> you should abandon any attempt to share the same index data between more
>>>>> than one Solr instance.
>>>>>
>>>>> Thanks,
>>>>> Shawn
>>>>>
>>>>>
>>>>>


Mime
View raw message