accumulo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Turner (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-456) Need utility for exporting and importing tables
Date Mon, 14 May 2012 18:15:52 GMT


Keith Turner commented on ACCUMULO-456:

The procedure I pointed out earlier w/ compacting the table is something that a user could
do now w/ existing code.  For future code changes I think generalizing the chop compaction
used by merge would be a good thing to do.  This way only the files that needs to be compacted
are compacted, it minimizes the amount of decompression, deserialization, serialization, and
compression that needs to be done.  I think chop+distcp is a good way to go.  distcp is a
well tested tool that copies bytewise and does not decompress, etc.  The identity map reduce
operation suggested above would be more efficient when all files need to be chopped, but I
am not sure this will be the usual case.  When only a small number of files need to be chopped
the identity map reduce will result in a lot more CPU load than chop+distcp.  I suppose the
ultimiate optimization is a map reduce job that copies bytewise when no chop is needed and
does the chop as part of the map reduce job when needed.  This would be a fairly complex bit
of code that may not get the testing it needs.

Making bulk import handle multiple dirs would be a nice convenience feature for users.  At
the moment its fairly easy to work around w/ one hadoop command for anyone trying to do this
w/ the current system.

  hadoop fs -mv <table dir>/*/*.rf <bulk import dir>
> Need utility for exporting and importing tables
> -----------------------------------------------
>                 Key: ACCUMULO-456
>                 URL:
>             Project: Accumulo
>          Issue Type: New Feature
>            Reporter: Keith Turner
>            Assignee: Keith Turner
>             Fix For: 1.5.0
> Need a utility to to export and import tables.  A use case would be export table on cluster
A, distcp to cluter B, import.  

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message