accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fagan, Michael" <Michael_Fa...@cable.comcast.com>
Subject Re: Accumulo Cluster Sizing
Date Wed, 27 May 2015 15:54:41 GMT
Eric,

Thanks. I assume managing something like 280GB per tablet server is feasible given the various
knobs available to tune performance.

Regards,
Mike Fagan

From: Eric Newton <eric.newton@gmail.com<mailto:eric.newton@gmail.com>>
Reply-To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Date: Wednesday, May 27, 2015 at 9:22 AM
To: "user@accumulo.apache.org<mailto:user@accumulo.apache.org>" <user@accumulo.apache.org<mailto:user@accumulo.apache.org>>
Subject: Re: Accumulo Cluster Sizing

You can get decent ingest concurrency when the number of tablets per server is between 20
and 80.

There are so many knobs to adjust this performance, it's hard to give a simple answer. 0-1
tablets/server is bad. 1000+/server is bad.  Usually.

It will take time to tune your system.



On Wed, May 27, 2015 at 11:04 AM, Fagan, Michael <Michael_Fagan@cable.comcast.com<mailto:Michael_Fagan@cable.comcast.com>>
wrote:
Thanks everyone for their input.

I estimate I can use 20 tablet servers to support 1m lookups a day

Are there any good rules of thumb regarding the amount of data/tablets
managed by a tablet server?

Regards,
Mike Fagan


On 5/22/15, 1:33 PM, "Kepner, Jeremy - 0553 - MITLL" <kepner@ll.mit.edu<mailto:kepner@ll.mit.edu>>
wrote:

>77M records / 4 hours ~ 1.5B entries / 4 hours ~ 100K entries/sec
>
>On May 22, 2015, at 3:28 PM, Fagan, Michael
><Michael_Fagan@cable.comcast.com<mailto:Michael_Fagan@cable.comcast.com>>
wrote:
>
>> Jeremy,
>> ~72 million records.
>>
>> Regards,
>> Mike Fagan
>>
>> On 5/22/15, 1:12 PM, "Jeremy Kepner" <kepner@ll.mit.edu<mailto:kepner@ll.mit.edu>>
wrote:
>>
>>> How many records/entires is that?
>>>
>>> On Fri, May 22, 2015 at 07:02:05PM +0000, Fagan, Michael wrote:
>>>> Jeremy,
>>>>
>>>> The data will age off daily so I plan to bulk load ~1TB every 4 hours.
>>>>
>>>> Regards,
>>>> Mike Fagan
>>>>
>>>>
>>>> On 5/22/15, 12:09 PM, "Jeremy Kepner" <kepner@ll.mit.edu<mailto:kepner@ll.mit.edu>>
wrote:
>>>>
>>>>> 7TB -> 21TB (Hadoop replication), perhaps larger if you have index
>>>>> tables, ...
>>>>>
>>>>> 1M fetches / day ~ 10M entries / day ~ 1000 entries/sec
>>>>>
>>>>> Typical Accumulo peak is 100K entries/sec/core so you should be fine
>>>>>on
>>>>> query
>>>>>
>>>>> How fast do you need to insert the data into Accumulo?
>>>>>
>>>>> On Fri, May 22, 2015 at 03:46:20PM +0000, Fagan, Michael wrote:
>>>>>> Josh,
>>>>>>
>>>>>> Thanks, I would like use my performance requirements to derive my
HW
>>>>>> requirements.
>>>>>>
>>>>>> For example: assume I have a raw 7TB dataset representing 500
>>>>>>million
>>>>>> records with the expectation of 500K-1000K key fetches a day.
>>>>>>
>>>>>> I remember there was a tuning webpage circulating around a several
>>>> years
>>>>>> back to help figure the HW sizing to meet performance benchmarks.
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Mike Fagan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/22/15, 8:55 AM, "Josh Elser" <josh.elser@gmail.com<mailto:josh.elser@gmail.com>>
wrote:
>>>>>>
>>>>>>> Hi Mike,
>>>>>>>
>>>>>>> We have some info in
>>>>>>> http://accumulo.apache.org/1.7/accumulo_user_manual.html#_hardware
>>>>>>>
>>>>>>> What's missing there? Let us know the types of questions you
have
>>>> and
>>>>>> we
>>>>>>> can expand on the document.
>>>>>>>
>>>>>>> - Josh
>>>>>>>
>>>>>>> Fagan, Michael wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Can someone point me to recommendations regarding cluster
sizing?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Mike Fagan
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>



Mime
View raw message