accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher <ctubb...@apache.org>
Subject Re: micro compaction
Date Tue, 09 Jun 2015 18:00:48 GMT
The starting point would be to look at where mutations are added to the writer.

There'd be a couple of tricky parts. For one, mutations aren't sorted
on the client side, which is a prerequisite for iterators. Another
pitfall is getting the API right, so that it provides users enough
control over how much data is buffered for processing before being
sent along to the writer (which queues it for an RPC thread). Another
problem is that we pre-serialize Mutations as they are created, to
make the RPC faster. A client-side iterator from a bucket of mutations
would have to de-serialize these (with performance considerations).

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, Jun 9, 2015 at 1:02 PM, Russ Weeks <rweeks@newbrightidea.com> wrote:
> Having a combiner stack (more generally an iterator stack) run on the
> client-side seems to be the second most popular request on this list. The
> most popular being, "How do I write to Accumulo from inside an iterator?"
>
> Such a thing would be very useful for me, too. I have some cycles to help
> out, if somebody can give me an idea of where to get started and where the
> potential land-mines are.
>
> -Russ
>
> On Tue, Jun 9, 2015 at 9:08 AM roman.drapeko@baesystems.com
> <roman.drapeko@baesystems.com> wrote:
>>
>> Aggregated output is tiny,  so if I do same calculations in memory
>> (instead of sending mutations to Accumulo) , I can reduce overall number of
>> mutations by 1000x or so
>>
>>
>>
>> -----Original Message-----
>> From: Josh Elser [mailto:josh.elser@gmail.com]
>> Sent: 09 June 2015 16:54
>> To: user@accumulo.apache.org
>> Subject: Re: micro compaction
>>
>> Well, you win the prize for new terminology. I haven't ever heard the term
>> "micro compaction" before.
>>
>> Can you clarify though, you say hundreds of millions of mutations that
>> result in megabytes of data. Is that an increase or decrease in size.
>> Comparing apples to oranges :)
>>
>> roman.drapeko@baesystems.com wrote:
>> > Hi guys,
>> >
>> > While doing pre-analytics we generate hundreds of millions of
>> > mutations that result in 1-100 megabytes of useful data after major
>> > compaction. We ingest into Accumulo using MR from Mapper job. We
>> > identified that performance really degrades while increasing a number of
>> > mutations.
>> >
>> > The obvious improvement is to do some calculations in-memory before
>> > sending mutations to Accumulo.
>> >
>> > Of course, at the same time we are looking for a solution to minimize
>> > development effort.
>> >
>> > I guess I am asking about micro compaction/ingest-time iterators on
>> > the client side (before data is sent to Accumulo).
>> >
>> > To my understanding, Accumulo does not support them, is it correct?
>> > And if so, are there any plans to support this functionality in the
>> > future?
>> >
>> > Thanks
>> >
>> > Roman
>> >
>> > Please consider the environment before printing this email. This
>> > message should be regarded as confidential. If you have received this
>> > email in error please notify the sender and destroy it immediately.
>> > Statements of intent shall only become binding when confirmed in hard
>> > copy by an authorised signatory. The contents of this email may relate
>> > to dealings with other companies under the control of BAE Systems
>> > Applied Intelligence Limited, details of which can be found at
>> > http://www.baesystems.com/Businesses/index.htm.
>> Please consider the environment before printing this email. This message
>> should be regarded as confidential. If you have received this email in error
>> please notify the sender and destroy it immediately. Statements of intent
>> shall only become binding when confirmed in hard copy by an authorised
>> signatory. The contents of this email may relate to dealings with other
>> companies under the control of BAE Systems Applied Intelligence Limited,
>> details of which can be found at
>> http://www.baesystems.com/Businesses/index.htm.

Mime
View raw message