accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <josh.el...@gmail.com>
Subject Re: Accumulo Cluster Sizing
Date Fri, 22 May 2015 19:02:44 GMT
Need to factor in the encoding that Accumulo does as well as the type of 
compression algorithm you choose. I think we've seen RFile's encoding 
shrink some datasets down to 1/10th the original size. I'm not sure if 
we have a general reduction formula for RFile since it depends so much 
on your schema.

GZ can shrink stuff pretty well, although snappy tends to be a little 
faster but a little bigger.

You might be able to approximate that for yourself relatively easily if 
you have a sliver of your dataset that you can play with.

Jeremy Kepner wrote:
> 7TB ->  21TB (Hadoop replication), perhaps larger if you have index tables, ...
>
> 1M fetches / day ~ 10M entries / day ~ 1000 entries/sec
>
> Typical Accumulo peak is 100K entries/sec/core so you should be fine on query
>
> How fast do you need to insert the data into Accumulo?
>
> On Fri, May 22, 2015 at 03:46:20PM +0000, Fagan, Michael wrote:
>> Josh,
>>
>> Thanks, I would like use my performance requirements to derive my HW
>> requirements.
>>
>> For example: assume I have a raw 7TB dataset representing 500 million
>> records with the expectation of 500K-1000K key fetches a day.
>>
>> I remember there was a tuning webpage circulating around a several years
>> back to help figure the HW sizing to meet performance benchmarks.
>>
>>
>> Regards,
>> Mike Fagan
>>
>>
>>
>> On 5/22/15, 8:55 AM, "Josh Elser"<josh.elser@gmail.com>  wrote:
>>
>>> Hi Mike,
>>>
>>> We have some info in
>>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_hardware
>>>
>>> What's missing there? Let us know the types of questions you have and we
>>> can expand on the document.
>>>
>>> - Josh
>>>
>>> Fagan, Michael wrote:
>>>> Hi,
>>>>
>>>> Can someone point me to recommendations regarding cluster sizing?
>>>>
>>>> Regards,
>>>> Mike Fagan
>>>>
>>>>

Mime
View raw message