Return-Path: Delivered-To: apmail-lucene-solr-user-archive@minotaur.apache.org Received: (qmail 21013 invoked from network); 14 Aug 2009 15:54:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Aug 2009 15:54:23 -0000 Received: (qmail 36307 invoked by uid 500); 14 Aug 2009 15:54:28 -0000 Delivered-To: apmail-lucene-solr-user-archive@lucene.apache.org Received: (qmail 36217 invoked by uid 500); 14 Aug 2009 15:54:28 -0000 Mailing-List: contact solr-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-user@lucene.apache.org Delivered-To: mailing list solr-user@lucene.apache.org Received: (qmail 36207 invoked by uid 99); 14 Aug 2009 15:54:28 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Aug 2009 15:54:28 +0000 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=FS_REPLICA,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jibojohn@mac.com designates 17.148.16.89 as permitted sender) Received: from [17.148.16.89] (HELO asmtpout014.mac.com) (17.148.16.89) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Aug 2009 15:54:18 +0000 MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed; delsp=yes Received: from la0101a-dhcp176.apple.com ([17.214.14.176]) by asmtp014.mac.com (Sun Java(tm) System Messaging Server 6.3-8.01 (built Dec 16 2008; 32bit)) with ESMTPSA id <0KOD00ISSITSI040@asmtp014.mac.com> for solr-user@lucene.apache.org; Fri, 14 Aug 2009 08:53:52 -0700 (PDT) Message-id: <1611B419-F913-4EA5-B458-BBE77CC6AB4E@mac.com> From: Jibo John To: solr-user@lucene.apache.org In-reply-to: Content-transfer-encoding: quoted-printable Subject: Re: Solr 1.4 Replication scheme Date: Fri, 14 Aug 2009 08:53:51 -0700 References: <24965590.post@talk.nabble.com> <69de18140908140059l5eaaf757o7aae073ecf35d1bd@mail.gmail.com> <24968105.post@talk.nabble.com> <5e76b0ad0908140136m759ae067mf6ceb3ff33adac5c@mail.gmail.com> <24968460.post@talk.nabble.com> <5e76b0ad0908140203i3b76d403ja950c646ddf2a151@mail.gmail.com> X-Mailer: Apple Mail (2.935.3) X-Virus-Checked: Checked by ClamAV on apache.org Slightly off topic.... one question on the index file transfer =20 mechanism used in the new 1.4 Replication scheme. Is my understanding correct that the transfer is over http? (vs. =20 rsync in the script-based snappuller) Thanks, -Jibo On Aug 14, 2009, at 6:34 AM, Yonik Seeley wrote: > Longer term, it might be nice to enable clients to specify what > version of the index they were searching against. This could be used > to prevent consistency issues across different slaves, even if they > commit at different times. It could also be used in distributed > search to make sure the index didn't change between phases. > > -Yonik > http://www.lucidimagination.com > > > > 2009/8/14 Noble Paul =E0=B4=A8=E0=B5=8B=E0=B4=AC=E0=B4=BF=E0=B4=B3=E0=B5= =8D=E2=80=8D =E0=A4=A8=E0=A5=8B=E0=A4=AC=E0=A5=8D=E0=A4=B3=E0=A5=8D =20 > : >> On Fri, Aug 14, 2009 at 2:28 PM, =20 >> KaktuChakarabati wrote: >>> >>> Hey Noble, >>> you are right in that this will solve the problem, however it =20 >>> implicitly >>> assumes that commits to the master are infrequent enough ( so that =20= >>> most >>> polling operations yield no update and only every few polls lead =20 >>> to an >>> actual commit. ) >>> This is a relatively safe assumption in most cases, but one that =20 >>> couples the >>> master update policy with the performance of the slaves - if the =20 >>> master gets >>> updated (and committed to) frequently, slaves might face a commit =20= >>> on every >>> 1-2 poll's, much more than is feasible given new searcher warmup =20 >>> times.. >>> In effect what this comes down to it seems is that i must make the =20= >>> master >>> commit frequency the same as i'd want the slaves to use - and this =20= >>> is >>> markedly different than previous behaviour with which i could have =20= >>> the >>> master get updated(+committed to) at one rate and slaves =20 >>> committing those >>> updates at a different rate. >> I see , the argument. But , isn't it better to keep both the mster =20= >> and >> slave as consistent as possible? There is no use in committing in >> master, if you do not plan to search on those docs. So the best thing >> to do is do a commit only as frequently as you wish to commit in a >> slave. >> >> On a different track, if we can have an option of disabling commit >> after replication, is it worth it? So the user can trigger a commit >> explicitly >> >>> >>> >>> Noble Paul =E0=B4=A8=E0=B5=8B=E0=B4=AC=E0=B4=BF=E0=B4=B3=E0=B5=8D=E2=80= =8D =E0=A4=A8=E0=A5=8B=E0=A4=AC=E0=A5=8D=E0=A4=B3=E0=A5=8D-2 wrote: >>>> >>>> usually the pollInterval is kept to a small value like 10secs. =20 >>>> there >>>> is no harm in polling more frequently. This can ensure that the >>>> replication happens at almost same time >>>> >>>> >>>> >>>> >>>> On Fri, Aug 14, 2009 at 1:58 PM, = KaktuChakarabati>>> > >>>> wrote: >>>>> >>>>> Hey Shalin, >>>>> thanks for your prompt reply. >>>>> To clarity: >>>>> With the old script-based replication, I would snappull every x =20= >>>>> minutes >>>>> (say, on the order of 5 minutes). >>>>> Assuming no index optimize occured ( I optimize 1-2 times a day =20= >>>>> so we can >>>>> disregard it for the sake of argument), the snappull would take =20= >>>>> a few >>>>> seconds to run on each iteration. >>>>> I then have a crontab on all slaves that runs snapinstall on a =20 >>>>> fixed >>>>> time, >>>>> lets say every 15 minutes from start of a round hour, inclusive. =20= >>>>> (slave >>>>> machine times are synced e.g via ntp) so that essentially all =20 >>>>> slaves will >>>>> begin a snapinstall exactly at the same time - assuming uniform =20= >>>>> load and >>>>> the >>>>> fact they all have at this point in time the same snapshot since I >>>>> snappull >>>>> frequently - this leads to a fairly synchronized replication =20 >>>>> across the >>>>> board. >>>>> >>>>> With the new replication however, it seems that by binding the =20 >>>>> pulling >>>>> and >>>>> installing as well specifying the timing in delta's only (as =20 >>>>> opposed to >>>>> "absolute-time" based like in crontab) we've essentially made it >>>>> impossible >>>>> to effectively keep multiple slaves up to date and synchronized; =20= >>>>> e.g if >>>>> we >>>>> set poll interval to 15 minutes, a slight offset in the startup =20= >>>>> times of >>>>> the >>>>> slaves (that can very much be the case for arbitrary resets/=20 >>>>> maintenance >>>>> operations) can lead to deviations in snappull(+install) times. =20= >>>>> this in >>>>> turn >>>>> is further made worse by the fact that the pollInterval is then =20= >>>>> computed >>>>> based on the offset of when the last commit *finished* - and =20 >>>>> this number >>>>> seems to have a higher variance, e.g due to warmup which might be >>>>> different >>>>> across machines based on the queries they've handled previously. >>>>> >>>>> To summarize, It seems to me like it might be beneficial to =20 >>>>> introduce a >>>>> second parameter that acts more like a crontab time-based =20 >>>>> tableau, in so >>>>> far >>>>> that it can enable a user to specify when an actual commit =20 >>>>> should occur - >>>>> so >>>>> then we can have the pollInterval set to a low value (e.g 60 =20 >>>>> seconds) but >>>>> then specify to only perform a commit on the 0,15,30,45-minutes =20= >>>>> of every >>>>> hour. this makes the commit times on the slaves fairly =20 >>>>> deterministic. >>>>> >>>>> Does this make sense or am i missing something with current in-=20 >>>>> process >>>>> replication? >>>>> >>>>> Thanks, >>>>> -Chak >>>>> >>>>> >>>>> Shalin Shekhar Mangar wrote: >>>>>> >>>>>> On Fri, Aug 14, 2009 at 8:39 AM, KaktuChakarabati >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> In the old replication, I could snappull with multiple slaves >>>>>>> asynchronously >>>>>>> but perform the snapinstall on each at the same time (+- epsilon >>>>>>> seconds), >>>>>>> so that way production load balanced query serving will always =20= >>>>>>> be >>>>>>> consistent. >>>>>>> >>>>>>> With the new system it seems that i have no control over =20 >>>>>>> syncing them, >>>>>>> but >>>>>>> rather it polls every few minutes and then decides the next =20 >>>>>>> cycle based >>>>>>> on >>>>>>> last time it *finished* updating, so in any case I lose =20 >>>>>>> control over >>>>>>> the >>>>>>> synchronization of snap installation across multiple slaves. >>>>>>> >>>>>> >>>>>> That is true. How did you synchronize them with the script based >>>>>> solution? >>>>>> Assuming network bandwidth is equally distributed and all =20 >>>>>> slaves are >>>>>> equal >>>>>> in hardware/configuration, the time difference between new =20 >>>>>> searcher >>>>>> registration on any slave should not be more then pollInterval, =20= >>>>>> no? >>>>>> >>>>>> >>>>>>> >>>>>>> Also, I noticed the default poll interval is 60 seconds. It =20 >>>>>>> would seem >>>>>>> that >>>>>>> for such a rapid interval, what i mentioned above is a non =20 >>>>>>> issue, >>>>>>> however >>>>>>> i >>>>>>> am not clear how this works vis-a-vis the new searcher warmup? =20= >>>>>>> for a >>>>>>> considerable index size (20Million docs+) the warmup itself is =20= >>>>>>> an >>>>>>> expensive >>>>>>> and somewhat lengthy process and if a new searcher opens and =20 >>>>>>> warms up >>>>>>> every >>>>>>> minute, I am not at all sure i'll be able to serve queries with >>>>>>> reasonable >>>>>>> QTimes. >>>>>>> >>>>>> >>>>>> If the pollInterval is 60 seconds, it does not mean that a new =20= >>>>>> index is >>>>>> fetched every 60 seconds. A new index is downloaded and =20 >>>>>> installed on the >>>>>> slave only if a commit happened on the master (i.e. the index was >>>>>> actually >>>>>> changed on the master). >>>>>> >>>>>> -- >>>>>> Regards, >>>>>> Shalin Shekhar Mangar. >>>>>> >>>>>> >>>>> >>>>> -- >>>>> View this message in context: >>>>> = http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968105.html= >>>>> Sent from the Solr - User mailing list archive at Nabble.com. >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> ----------------------------------------------------- >>>> Noble Paul | Principal Engineer| AOL | http://aol.com >>>> >>>> >>> >>> -- >>> View this message in context: = http://www.nabble.com/Solr-1.4-Replication-scheme-tp24965590p24968460.html= >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >>> >> >> >> >> -- >> ----------------------------------------------------- >> Noble Paul | Principal Engineer| AOL | http://aol.com >>