accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Kubina <jeff.kub...@gmail.com>
Subject Re: What is the Communication and Time Complexity for Bulk Inserts?
Date Thu, 18 Oct 2012 15:37:29 GMT
BatchWriter, but I would be interested in the answer assuming a
pre-sorted rfile.

On Thu, Oct 18, 2012 at 11:20 AM, Josh Elser <josh.elser@gmail.com> wrote:
> Are you referring to "bulk inserts" as importing a pre-sorted rfile of
> Key/Values or usinga BatchWriter?
>
> On 10/18/12 10:49 AM, Jeff Kubina wrote:
>>
>> I am deriving the time complexities for an algorithm I implemented in
>> Hadoop using Accumulo and need to know the time complexity of bulk
>> inserting m records evenly distributed across p nodes into an empty
>> table with p tablet servers. Assuming B is the bandwidth of the
>> network, would the communication complexity be O(m/B) and the
>> computation complexity O(m/p * log(m/p))? If the table contained n
>> records would the values be O(m/B) and O(m/p * log(m/p) + n/p)?

Mime
View raw message