Return-Path: X-Original-To: apmail-accumulo-notifications-archive@minotaur.apache.org Delivered-To: apmail-accumulo-notifications-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1EFE61931D for ; Thu, 21 Apr 2016 17:34:26 +0000 (UTC) Received: (qmail 12727 invoked by uid 500); 21 Apr 2016 17:34:26 -0000 Delivered-To: apmail-accumulo-notifications-archive@accumulo.apache.org Received: (qmail 12683 invoked by uid 500); 21 Apr 2016 17:34:26 -0000 Mailing-List: contact notifications-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@apache.org Delivered-To: mailing list notifications@accumulo.apache.org Received: (qmail 12592 invoked by uid 99); 21 Apr 2016 17:34:25 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Apr 2016 17:34:25 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id BF6CD2C1F5C for ; Thu, 21 Apr 2016 17:34:25 +0000 (UTC) Date: Thu, 21 Apr 2016 17:34:25 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: notifications@accumulo.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ACCUMULO-4195) Generalized configuration object for Accumulo rfile interaction MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ACCUMULO-4195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252285#comment-15252285 ] ASF GitHub Bot commented on ACCUMULO-4195: ------------------------------------------ Github user ShawnWalker commented on a diff in the pull request: https://github.com/apache/accumulo/pull/95#discussion_r60622912 --- Diff: core/src/main/java/org/apache/accumulo/core/file/FileOperations.java --- @@ -48,38 +48,295 @@ public static FileOperations getInstance() { return new DispatchingFileFactory(); } + // + // Abstract methods (to be implemented by subclasses) + // + + protected abstract long getFileSize(GetFileSizeOperation options) throws IOException; + + protected abstract FileSKVWriter openWriter(OpenWriterOperation options) throws IOException; + + protected abstract FileSKVIterator openIndex(OpenIndexOperation options) throws IOException; + + protected abstract FileSKVIterator openScanReader(OpenScanReaderOperation options) throws IOException; + + protected abstract FileSKVIterator openReader(OpenReaderOperation options) throws IOException; + + // + // File operations + // + /** - * Open a reader that will not be seeked giving an initial seek location. This is useful for file operations that only need to scan data within a range and do - * not need to seek. Therefore file metadata such as indexes does not need to be kept in memory while the file is scanned. Also seek optimizations like bloom - * filters do not need to be loaded. + * Construct an operation object allowing one to query the size of a file.
+ * Syntax: * + *
    +   * long size = fileOperations.getFileSize().ofFile(filename, fileSystem, fsConfiguration).withTableConfiguration(tableConf).execute();
    +   * 
*/ + public GetFileSizeOperation getFileSize() { + return new GetFileSizeOperation(); + } - public abstract FileSKVIterator openReader(String file, Range range, Set columnFamilies, boolean inclusive, FileSystem fs, Configuration conf, - RateLimiter readLimiter, AccumuloConfiguration tableConf) throws IOException; + /** + * Construct an operation object allowing one to create a writer for a file.
+ * Syntax: + * + *
    +   * FileSKVWriter writer = fileOperations.openWriter()
    +   *     .ofFile(...)
    +   *     .withTableConfiguration(...)
    +   *     .withRateLimiter(...) // optional
    +   *     .withCompression(...) // optional
    +   *     .execute();
    +   * 
+ */ + public OpenWriterOperation openWriter() { + return new OpenWriterOperation(); + } + + /** + * Construct an operation object allowing one to create an index iterator for a file.
+ * Syntax: + * + *
    +   * FileSKVIterator iterator = fileOperations.openIndex()
    +   *     .ofFile(...)
    +   *     .withTableConfiguration(...)
    +   *     .withRateLimiter(...) // optional
    +   *     .withBlockCache(...) // optional
    +   *     .execute();
    +   * 
+ */ + public OpenIndexOperation openIndex() { + return new OpenIndexOperation(); + } - public abstract FileSKVIterator openReader(String file, Range range, Set columnFamilies, boolean inclusive, FileSystem fs, Configuration conf, - RateLimiter readLimiter, AccumuloConfiguration tableConf, BlockCache dataCache, BlockCache indexCache) throws IOException; + /** + * Construct an operation object allowing one to create a "scan" reader for a file. Scan readers do not have any optimizations for seeking beyond their + * initial position. This is useful for file operations that only need to scan data within a range and do not need to seek. Therefore file metadata such as + * indexes does not need to be kept in memory while the file is scanned. Also seek optimizations like bloom filters do not need to be loaded.
+ * Syntax: + * + *
    +   * FileSKVIterator scanner = fileOperations.openScanReader()
    +   *     .ofFile(...)
    +   *     .overRange(...)
    +   *     .withTableConfiguration(...)
    +   *     .withRateLimiter(...) // optional
    +   *     .withBlockCache(...) // optional
    +   *     .execute();
    +   * 
+ */ + public OpenScanReaderOperation openScanReader() { + return new OpenScanReaderOperation(); + } /** - * Open a reader that fully support seeking and also enable any optimizations related to seeking, like bloom filters. + * Construct an operation object allowing one to create a reader for a file. A reader constructed in this manner fully supports seeking, and also enables any + * optimizations related to seeking (e.g. Bloom filters).
+ * Syntax: * + *
    +   * FileSKVIterator scanner = fileOperations.openReader()
    +   *     .ofFile(...)
    +   *     .withTableConfiguration(...)
    +   *     .withRateLimiter(...) // optional
    +   *     .withBlockCache(...) // optional
    +   *     .seekToBeginning(...) // optional
    +   *     .execute();
    +   * 
+ */ + public OpenReaderOperation openReader() { + return new OpenReaderOperation(); + } + + // + // Operation objects. + // + + /** + * Options common to all FileOperations. + */ + protected static class FileAccessOperation> { + private AccumuloConfiguration tableConfiguration; + + private String filename; + private FileSystem fs; + private Configuration fsConf; + + /** Specify the table configuration defining access to this file. */ + @SuppressWarnings("unchecked") + public SubclassType withTableConfiguration(AccumuloConfiguration tableConfiguration) { + this.tableConfiguration = tableConfiguration; + return (SubclassType) this; + } + + /** Specify the file this operation should apply to. */ + @SuppressWarnings("unchecked") + public SubclassType ofFile(String filename, FileSystem fs, Configuration fsConf) { + this.filename = filename; + this.fs = fs; + this.fsConf = fsConf; + return (SubclassType) this; + } + + public String getFilename() { + return filename; + } + + public FileSystem getFileSystem() { + return fs; + } + + public Configuration getConfiguration() { + return fsConf; + } + + public AccumuloConfiguration getTableConfiguration() { --- End diff -- I considered both of those ideas. The second idea seems both sensible and dangerous to me. Forgetting to specify a table configuration when there's one available is an error, but one that might be hard to spot if a default table configuration were supplied. Adding `Objects.requireNonNull(...)` to various execute methods for fail-fast makes sense. I was just unsure at the time I put this together which of the parameters were actually required. After doing more work on it, I believe only the rate limiter (for all I/O operations) and block caches (for read operations) are optional. Can anyone verify that none of the remaining parameters are ever optional? > Generalized configuration object for Accumulo rfile interaction > --------------------------------------------------------------- > > Key: ACCUMULO-4195 > URL: https://issues.apache.org/jira/browse/ACCUMULO-4195 > Project: Accumulo > Issue Type: Improvement > Reporter: Josh Elser > Assignee: Shawn Walker > Fix For: 1.8.0 > > > Taken from https://github.com/apache/accumulo/pull/90/files#r59489073 > On [~ShawnWalker]'s PR for ACCUMULO-4187 which adds rate-limiting on major compactions, we noted that many of the changes were related to passing an extra argument (RateLimiter) around through all of the code which is related to file interaction. > It would be nice to move to a centralized configuration object instead of having to add a new argument every time some new feature is added to the file-path. -- This message was sent by Atlassian JIRA (v6.3.4#6332)