Return-Path: X-Original-To: apmail-accumulo-dev-archive@www.apache.org Delivered-To: apmail-accumulo-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D1FF9150 for ; Wed, 10 Dec 2014 18:36:25 +0000 (UTC) Received: (qmail 3553 invoked by uid 500); 10 Dec 2014 18:36:25 -0000 Delivered-To: apmail-accumulo-dev-archive@accumulo.apache.org Received: (qmail 3514 invoked by uid 500); 10 Dec 2014 18:36:25 -0000 Mailing-List: contact dev-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list dev@accumulo.apache.org Received: (qmail 3490 invoked by uid 99); 10 Dec 2014 18:36:24 -0000 Received: from reviews-vm.apache.org (HELO reviews.apache.org) (140.211.11.40) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Dec 2014 18:36:24 +0000 Received: from reviews.apache.org (localhost [127.0.0.1]) by reviews.apache.org (Postfix) with ESMTP id A9CEF1D22A2; Wed, 10 Dec 2014 18:36:21 +0000 (UTC) Content-Type: multipart/alternative; boundary="===============6221628116500929249==" MIME-Version: 1.0 Subject: Re: Review Request 28873: ACCUMULO-3393 Follow-on work for per-table volume chooser. From: "Christopher Tubbs" To: "Jenna Huston" Cc: "Mike Drob" , "accumulo" , "Sean Busbey" , "Christopher Tubbs" , "Josh Elser" Date: Wed, 10 Dec 2014 18:36:21 -0000 Message-ID: <20141210183621.5766.55524@reviews.apache.org> X-ReviewBoard-URL: https://reviews.apache.org Auto-Submitted: auto-generated Sender: "Christopher Tubbs" X-ReviewGroup: accumulo X-ReviewRequest-URL: https://reviews.apache.org/r/28873/ X-Sender: "Christopher Tubbs" References: <20141210075450.26419.35187@reviews.apache.org> In-Reply-To: <20141210075450.26419.35187@reviews.apache.org> Reply-To: "Christopher Tubbs" X-ReviewRequest-Repository: accumulo --===============6221628116500929249== MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/pom.xml, lines 79-82 > > > > > > New dependency? Does it add significant value to warrant additional dependency conflict resolution and packaging complexity? > > Sean Busbey wrote: > It's generally better for us to rely on an external dependency that focuses on a task outside of our main concerns since they'll give more focus on things. I didn't see any of our current dependencies providing an object pool. If you know of one, please let me know and I'll switch to it. > > As for commons-pool in particular, it's a ASF project and it does exactly the task needed and no more. It'll also be removable once the TODO above its use is done. This is fine, as far as libraries go. I just didn't see the need for the object pool in the first place. We can shift discussion to that. (For what it's worth, Guava does have pools, but not in any released version, as far as I can tell.) > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/PerTableVolumeChooser.java, line 39 > > > > > > The volatile for lazy-initialization is not needed. We don't suffer from having this assigned twice, which this code doesn't prevent. Also, it should be initialized to null here before it is assigned to localConf. > > > > It should also be ServerConfigurationFactory. The ServerConfiguration type was only retained internally after Havanki's refactor to avoid changing the balancer API. > > > > Ideally, we'd have an init method and be able to pass in the AccumuloServerContext, but this was problematic last I looked, because it created a dependency cycle between the VolumeManager and ServerConfigurationFactory, because this is called by the VolumeManager code and that is needed by the ServerConfigurationFactory (so it can't depend on the ServerConfigurationFactory). > > Sean Busbey wrote: > The volatile is needed to ensure that Object initialization is properly handled before the Object is visible to other threads. > > The default value is already null, what's the advantage of repeating it? > > If ServerConfiguration isn't intended to be used as a placeholder for the more general case, can you please file a follow on to mark it @Deprecated with a note? > > The TODO above the variable calls out the preference for an init and the reason it can't currently happen. Will filing that ticketa address that part of your comment? The object (ServerConfigurationFactory) is fully initialized before it is assigned. I'm not sure what you're referring to. The use of volatile in this context can help (in newer JVMs, certainly in 1.6 and later) in the double-check idiom for lazy initialization. However, this is not the idiom being employed here. Nothing in this code prevents the object from being assigned twice (by another thread), which is what the double-check idiom is for. All this volatile does is prevent the JVM from caching the current reference. The suggestion for null is just a nice to have. It makes it clear what the default value is. It's not important. ServerConfiguration predates the switch to ServerConfigurationFactory. Yes, more follow-on work needs to be done to remove it completely. That work was discussed and deferred in https://issues.apache.org/jira/browse/ACCUMULO-2615 and its sub-tasks. In the meantime, I'm simply suggesting to avoid it, as the factory more explicitly conveys its function (it is used *as* a factory, not as a configuration). I'm not too concerned about the timeline for adding an init method. If you want to create a JIRA, go ahead. I'd have to put more thought into that. The main point of that comment was to empathize/agree that an init method might be helpful. On the other hand, if we don't guarantee that we can maintain state, it may not be any better than passing stuff in with the environment object on the choose method. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/PerTableVolumeChooser.java, line 53 > > > > > > This change unnecessarily adds a restriction that the initial volume chooser for a table will be fixed for the lifetime of the TabletServer. This prevents dynamic changes to a per-table configuration property without justification. This is problematic by itself, but especially so given that a user is unable to set an initial VolumeChooser for the table because of your objection to ACCUMULO-3176. > > Sean Busbey wrote: > that isn't true. The second if block is used if there already was chooser. In that case it checks to see if the property has changed, using hte same idiom the per table balancer users. You're right. I missed that. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/PreferredVolumeChooser.java, line 44 > > > > > > This is incorrect. This is not an experimental property. It is a custom per-table property, specific to the PreferredVolumeChooser. > > Sean Busbey wrote: > how would you prefer I convey the information that operators can't assume any compatibility guarantees on its use? > > Josh Elser wrote: > I don't get this either. The property is defined to take a comma-separated list of Volumes (URIs). If this fact changes, we'd have to make a new property to support whatever we needed it to become. Experimental has been used to define things that are incompletely implemented -- anything else has been handled by documentation. > > Sean Busbey wrote: > I'm trying to convey a similar level of reliability as we do with @Experimental on GENERAL_VOLUME_CHOOSER and TABLE_VOLUME_CHOOSER. That is, we might change the variable name or what it holds. Would describing it as a "table property specific to the current implementation of PreferredVolumeChooser" work? > > Josh Elser wrote: > Hrm, I hadn't noticed that GENERAL_VOLUME_CHOOSER was also Experimental so that makes sense for it to also trickle down to TABLE_VOLUME_CHOOSER. I don't think I mind experimental moving down the chain to keep all of these related. The PREFERRED_VOLUMES_CUSTOM_KEY by itself may not be experimental (by the previous definition), but since it relies on other things which are still marked as such, I could see the reason behind also treating it as experimental. Sean: Yeah, something along the lines of indicating that it's tied to this implementation I think would be better. I just don't want it to convey the false idea that custom per-table properties are experimental. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/PreferredVolumeChooser.java, line 99 > > > > > > Do we really need an object pool for a map which holds 0 or 1 element for the duration of a single, infrequently called method? A better optimization for this case would be to add an optimized get(String key) method to AccumuloConfiguration / TableConfiguration, so we don't have to use a map at all. > > Sean Busbey wrote: > This code path is in the critical section of tablet splitting, the frequency of which varies greatly based on system deployment. There's already a TODO for doing the refactoring you mention, which will allow us to later remove the pooling entirely. > > Josh Elser wrote: > I really don't see the creation of a new HashMap as being critical WRT the cost of waiting for a tablet to close, performing multiple metadata table updates and then waiting for the tablets to come back online. > > Sean Busbey wrote: > Am I correct that we currently don't have any microbenchmarks that would tell us one way or the other? > > Josh Elser wrote: > ditto what I said down below. As far as I know, we have not done any microbenchmarks on this. Which is why I suggested it was premature optimization. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/PreferredVolumeChooser.java, lines 116-122 > > > > > > This change unnecessarily prevents dynamic changes to preferredVolumes, and forces the initial property to be the only one possible to read. I would say that this is a problem by itself, but given that you've objected to ACCUMULO-3176, which would give users the ability to set initial properties, this is even more problematic. > > Sean Busbey wrote: > This change only caches the parsed version of particular volume strings to avoid calling split on every invocation if the configured volumes haven't changed. How does that prevent the dynamic configuration? Again, you're right. I mis-read the code, as a result of it jumping in complexity. I do think avoiding String.split is premature optimization, though. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/PreferredVolumeChooser.java, lines 125-141 > > > > > > How is this implementation better than the simple, readable, higher-level collections-based implementation? > > Sean Busbey wrote: > VC.choose is in the critical path for tablet splitting. The collections based approach does more allocations of much heavier weight objects. Additionally, since all of these things only live in the scope of the method those allocations will cause gc pressure during heavy splitting. > > Josh Elser wrote: > bq. will cause gc pressure during heavy splitting. > > *might* cause gc pressure :). I don't think you can make that assertion without actually quantifying how much "gc pressure" is introduced WRT the amount of time the other known slow operations with comprise splitting a tablet actually take. > > Sean Busbey wrote: > Am I correct that we currently don't have any microbenchmarks that would tell us one way or the other? > > Josh Elser wrote: > I imagine that you could easily look at how long it takes a split to happen with and without ingest load via the traces. If we don't have that instrumented already, maybe Eric or Keith could chime in as I imagine they would be the two who know the most off the top of their heads. I don't want to force you to start writing microbenchmarks -- my gut reaction was that back of the envelope calculations would hint that this optimization might not be with much total improvement. At the very least, I'd like to see this code split into a separate method, with full unit test coverage. I have very little confidence by looking at it that it does the set intersection that the two lines it replaces did. It's not clear to me at all how it's supposed to work. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/PreferredVolumeChooser.java, line 150 > > > > > > If you get here, the modifiedOptions had better not be null. Shouldn't it have numValid entries in it? > > Sean Busbey wrote: > If all of the options were valid then we wouldn't have copied anything. ref the comment "avoid copying if a prefix of the options are all fine". > > Should I expand the comment to explain the optimization? Maybe. I guess my main concern is that it's hard to grok what the previous block of String array manipulation is actually doing. A comment could help. Moving it to a private method might also. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/VolumeChooser.java, line 21 > > > > > > N.B.? > > Sean Busbey wrote: > It's a phrase that means the reader should pay attention to the bit that follows. > > Mike Drob wrote: > https://en.wikipedia.org/wiki/Nota_bene Interesting. If you could replace this jargon with plain, clear English, I think it would be helpful. > On Dec. 10, 2014, 2:54 a.m., Christopher Tubbs wrote: > > server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java, lines 581-586 > > > > > > +1 for this check, but this should be a separate issue, and also would benefit 1.6 as a sanity-check/bug-prevention measure. > > Sean Busbey wrote: > It's not strictly needed in 1.6 because only someone with control of accumulo-site.xml can set the volume chooser, but I'm happy to file a follow on. We also allow overriding general system properties in ZooKeeper, so I'm not sure that's exactly true. I'd have to check which properties we permit to be changed that way. In any case, to use a particular VolumeChooser, we already require it to be on the classpath, which requires the same level of privilege as modifying the accumulo-site.xml file, so I'm not sure this is distinctly different between 1.6 and master. - Christopher ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/28873/#review64511 ----------------------------------------------------------- On Dec. 10, 2014, 9:58 a.m., Sean Busbey wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/28873/ > ----------------------------------------------------------- > > (Updated Dec. 10, 2014, 9:58 a.m.) > > > Review request for accumulo and Jenna Huston. > > > Bugs: ACCUMULO-3393 > https://issues.apache.org/jira/browse/ACCUMULO-3393 > > > Repository: accumulo > > > Description > ------- > > ACCUMULO-3393 Follow-on work for per-table volume chooser. > > Still has TODOs for additional tickets I need to file. I'll update the review to remove once I have them all filed. I think everything marked TODO can wait for a later ticket. Please comment on relevant section if you think something needs to be done now. > > * docs clean up + code guideline compliance. > * ensure RandomVolumeChoosers are independent when used per-table. > * make sure that per-table choosers can keep state the way that global choosers can > * make sure that a chooser can only pick from the options it is presented. > * avoid object creation in critical path for Tablet.split. > > > Diffs > ----- > > core/src/main/java/org/apache/accumulo/core/conf/Property.java 4c2d0b41b3bce32449861da3ac42fa27deb2b182 > pom.xml 31601a1bf84e19b861e4f48b50824eeb77987b52 > server/base/pom.xml c21a168dec2092692f1ee6877c4703ee2d3e3977 > server/base/src/main/java/org/apache/accumulo/server/fs/PerTableVolumeChooser.java 7a825c796eb5a25de8c748e2aba642f483697b9a > server/base/src/main/java/org/apache/accumulo/server/fs/PreferredVolumeChooser.java 7ed7bba809a4e5e4b2d472c3327b15adb37251a7 > server/base/src/main/java/org/apache/accumulo/server/fs/RandomVolumeChooser.java f2eb2113cb848ed58ac5f41573c6ff2cde9b0a77 > server/base/src/main/java/org/apache/accumulo/server/fs/VolumeChooser.java f523057b11a2dc42e82010675bb1ac8e3802f96d > server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManager.java 890651e92f4c34514cb839b7b9ee9d23ad55070a > server/base/src/main/java/org/apache/accumulo/server/fs/VolumeManagerImpl.java dc1be739b634d91992894f8f27c2d9c184bd98cd > server/base/src/main/java/org/apache/accumulo/server/fs/VolumeUtil.java 877d01c2233cca086c1ac1539eb81cc282a7ceb4 > server/base/src/main/java/org/apache/accumulo/server/util/TabletOperations.java c0e1a9b991d61a4dbb127c74ae297f171434e7d5 > server/base/src/test/java/org/apache/accumulo/server/fs/VolumeManagerImplTest.java 582822a8ccbc398925a2184eed0b9d7fa853f9b4 > > Diff: https://reviews.apache.org/r/28873/diff/ > > > Testing > ------- > > Ran through VolumeManagerImplTest and VolumeChooserIT > > > Thanks, > > Sean Busbey > > --===============6221628116500929249==--