hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1398) Add in-memory caching of data
Date Fri, 18 Jan 2008 23:41:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560607#action_12560607

stack commented on HADOOP-1398:

(... continuing IRC discussion).

I didn't realize HColumnDescriptor was versioned.  It doesn't seem to have been added by either
Jim or I.  Someone smarter no doubt.  So, my comment that this change is incompatible doesn't
hold since I see you have code to make it so HCD migrates itself.  Nice.

In the below from HStoreFile, blockCacheEnabled method argument is not being passed to the
MapFile constructors.

+  public synchronized MapFile.Reader getReader(final FileSystem fs,
+      final Filter bloomFilter, final boolean blockCacheEnabled)
+  throws IOException {
+    if (isReference()) {
+      return new HStoreFile.HalfMapFileReader(fs,
+          getMapFilePath(reference).toString(), conf, 
+          reference.getFileRegion(), reference.getMidkey(), bloomFilter);
+    }
+    return new BloomFilterMapFile.Reader(fs, getMapFilePath().toString(),
+        conf, bloomFilter);
+  }

Out of interest, did you regenerate the thrift or hand-edit it?  Changes look right -- just

Default ReferenceMap constructor makes for hard keys and soft values.  If value has been let
go by the GC, does the corresponding key just stay in the Map?

Otherwise, patch looks great Tom.

> Add in-memory caching of data
> -----------------------------
>                 Key: HADOOP-1398
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1398
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: contrib/hbase
>            Reporter: Jim Kellerman
>            Priority: Trivial
>         Attachments: commons-collections-3.2.jar, hadoop-blockcache-v2.patch, hadoop-blockcache-v3.patch,
> Bigtable provides two in-memory caches: one for row/column data and one for disk block
> The size of each cache should be configurable, data should be loaded lazily, and the
cache managed by an LRU mechanism.
> One complication of the block cache is that all data is read through a SequenceFile.Reader
which ultimately reads data off of disk via a RPC proxy for ClientProtocol. This would imply
that the block caching would have to be pushed down to either the DFSClient or SequenceFile.Reader

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message