accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Change column family
Date Thu, 28 May 2015 05:34:58 GMT
I believe the typical case would be to set it at the scan and major 
compaction scopes for the table. This would ensure that queries for data 
would see the transformed result and, eventually, all of the data would 
be rewritten to the new schema (or you could force a major compaction 
and know definitively).

Also, since it hasn't been otherwise stated, using the 
TransformingIterator is on the fringes of "normal". Your life may be 
much more simple to write a mapreduce job to rewrite your data. 
Implementing the Iterator correctly is a little obtuse (as you're 
noticing) and is not at all straightforward to debug. If it's reasonable 
to rewrite your data, it may be the easier solution IMO.

madhvi wrote:
> Hi All,
>
> If anyone has worked on tranforming iterator can tell me if the iterator
> make tranformed changes in the accumulo table also or it returns the
> result at the scan time only. Can u provide me details how to implement
> its abstract methods and their use and workflow of the iterator?
>
> Thanks
> Madhvi
> On Wednesday 27 May 2015 05:38 PM, Andrew Wells wrote:
>> to implement that iterator.
>>
>> looks like you will only need to override replaceColumnFamily
>>
>> and this looks to return the new ColumnFamily via the argument. So
>> manipulate the Text object provided.
>>
>> On Wed, May 27, 2015 at 8:06 AM, Andrew Wells <awells@clearedgeit.com
>> <mailto:awells@clearedgeit.com>> wrote:
>>
>>     Looks like you want to override these methods:
>>
>>     |protected Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
>>     	|*replaceColumnFamily
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnFamily%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29>*(Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
originalKey,
>>     org.apache.hadoop.io.Text newColFam)|
>>               Make a new key with all parts (including delete flag)
>>     coming from |originalKey| but use |newColFam| as the column family.
>>     |protected Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
>>     	|*replaceColumnQualifier
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnQualifier%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29>*(Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
originalKey,
>>     org.apache.hadoop.io.Text newColQual)|
>>               Make a new key with all parts (including delete flag)
>>     coming from |originalKey| but use |newColQual| as the column
>>     qualifier.
>>     |protected Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
>>     	|*replaceColumnVisibility
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnVisibility%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29>*(Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
originalKey,
>>     org.apache.hadoop.io.Text newColVis)|
>>               Make a new key with all parts (including delete flag)
>>     coming from |originalKey| but use |newColVis| as the column
>>     visibility.
>>     |protected Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
>>     	|*replaceKeyParts
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29>*(Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
originalKey,
>>     org.apache.hadoop.io.Text newColQual,
>>     org.apache.hadoop.io.Text newColVis)|
>>               Make a new key with a column qualifier, and column
>>     visibility.
>>     |protected Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
>>     	|*replaceKeyParts
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29>*(Key
>>     <http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>
originalKey,
>>     org.apache.hadoop.io.Text newColFam,
>>     org.apache.hadoop.io.Text newColQual,
>>     org.apache.hadoop.io.Text newColVis)|
>>               Make a new key with a column family, column qualifier,
>>     and column visibility.
>>
>>
>>
>>
>>
>>     On Wed, May 27, 2015 at 7:40 AM, shweta.agrawal
>>     <shweta.agrawal@orkash.com <mailto:shweta.agrawal@orkash.com>> wrote:
>>
>>         Thanks for all the suggestion.
>>
>>         I read about TransformingIterator and started implementing
>>         it,  I extended this class and tried to override its abstract
>>         method. But I am not able to get where and what to write to
>>         change column family?
>>
>>         So please provide your suggestions.
>>
>>         Thanks
>>         Shweta
>>
>>
>>
>>         On Tuesday 26 May 2015 08:33 PM, Adam Fuchs wrote:
>>>         This can also be done with a row-doesn't-fit-into-memory
>>>         constraint. You won't need to hold the second column
>>>         in-memory if your iterator tree deep copies, filters,
>>>         transforms and merges. Exhibit A:
>>>
>>>         [HeapIterator-derivative]
>>>            |_________________________
>>>            |                         \
>>>         [transform-graph1-to-graph2]  \
>>>            |                           \
>>>         [column-family-graph1][all-but-column-family-graph1]
>>>
>>>         With this design, you can subclass the HeapIterator, deep
>>>         copy the source in the init method, wrap one in a custom
>>>         transform iterator, and create a appropriate seek method.
>>>         This is probably more on the advanced side of Accumulo
>>>         programming, but can be done.
>>>
>>>         Adam
>>>
>>>
>>>         On Tue, May 26, 2015 at 8:59 AM, Eric Newton
>>>         <eric.newton@gmail.com <mailto:eric.newton@gmail.com>> wrote:
>>>
>>>             Short answer: no.
>>>
>>>             Long answer: maybe.
>>>
>>>             You can write an iterator which will transform:
>>>
>>>             row, cf1, cq, vis -> value
>>>
>>>             into:
>>>
>>>             row, cf2, cq, vis -> value
>>>
>>>             And if you can do this while maintaining sort order, you
>>>             can get your new ColumnFamily transformed during scans
>>>             and compactions.
>>>
>>>             But this bit about maintaining the sort order is more
>>>             complex than it sounds.
>>>
>>>             If you have the following:
>>>
>>>             row, a, cq, vis -> value
>>>             row, aa, cq, vis -> value
>>>
>>>
>>>             And you want to transform cf "a" into cf "b":
>>>
>>>             row, aa, cq, vis -> value
>>>             row, b, cq, vis -> value
>>>
>>>
>>>             Your iterator needs to hold the second column in memory,
>>>             after transforming the first column.  Tablet server
>>>             memory for holding Key/Values is not infinite.
>>>
>>>             -Eric
>>>
>>>             On Tue, May 26, 2015 at 8:44 AM, shweta.agrawal
>>>             <shweta.agrawal@orkash.com
>>>             <mailto:shweta.agrawal@orkash.com>> wrote:
>>>
>>>                 Hi,
>>>
>>>                 I want to ask, is it possible in accumulo to change
>>>                 the column family without changing the whole data.
>>>
>>>                 Suppose my column family is graph1, now i want to
>>>                 rename this column family as graph2.
>>>                 Is it possible?
>>>
>>>                 Thanks
>>>                 Shweta
>>>
>>>
>>>
>>
>>
>>
>>
>>     --
>>     *Andrew George Wells*
>>     *Software Engineer*
>>     *awells@clearedgeit.com <mailto:awells@clearedgeit.com>*
>>
>>
>>
>>
>> --
>> *Andrew George Wells*
>> *Software Engineer*
>> *awells@clearedgeit.com <mailto:awells@clearedgeit.com>*
>>
>

Mime
View raw message