accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russ Weeks <rwe...@newbrightidea.com>
Subject Re: micro compaction
Date Tue, 09 Jun 2015 17:02:34 GMT
Having a combiner stack (more generally an iterator stack) run on the
client-side seems to be the second most popular request on this list. The
most popular being, "How do I write to Accumulo from inside an iterator?"

Such a thing would be very useful for me, too. I have some cycles to help
out, if somebody can give me an idea of where to get started and where the
potential land-mines are.

-Russ

On Tue, Jun 9, 2015 at 9:08 AM roman.drapeko@baesystems.com <
roman.drapeko@baesystems.com> wrote:

> Aggregated output is tiny,  so if I do same calculations in memory
> (instead of sending mutations to Accumulo) , I can reduce overall number of
> mutations by 1000x or so
>
>
>
> -----Original Message-----
> From: Josh Elser [mailto:josh.elser@gmail.com]
> Sent: 09 June 2015 16:54
> To: user@accumulo.apache.org
> Subject: Re: micro compaction
>
> Well, you win the prize for new terminology. I haven't ever heard the term
> "micro compaction" before.
>
> Can you clarify though, you say hundreds of millions of mutations that
> result in megabytes of data. Is that an increase or decrease in size.
> Comparing apples to oranges :)
>
> roman.drapeko@baesystems.com wrote:
> > Hi guys,
> >
> > While doing pre-analytics we generate hundreds of millions of
> > mutations that result in 1-100 megabytes of useful data after major
> > compaction. We ingest into Accumulo using MR from Mapper job. We
> > identified that performance really degrades while increasing a number of
> mutations.
> >
> > The obvious improvement is to do some calculations in-memory before
> > sending mutations to Accumulo.
> >
> > Of course, at the same time we are looking for a solution to minimize
> > development effort.
> >
> > I guess I am asking about micro compaction/ingest-time iterators on
> > the client side (before data is sent to Accumulo).
> >
> > To my understanding, Accumulo does not support them, is it correct?
> > And if so, are there any plans to support this functionality in the
> future?
> >
> > Thanks
> >
> > Roman
> >
> > Please consider the environment before printing this email. This
> > message should be regarded as confidential. If you have received this
> > email in error please notify the sender and destroy it immediately.
> > Statements of intent shall only become binding when confirmed in hard
> > copy by an authorised signatory. The contents of this email may relate
> > to dealings with other companies under the control of BAE Systems
> > Applied Intelligence Limited, details of which can be found at
> > http://www.baesystems.com/Businesses/index.htm.
> Please consider the environment before printing this email. This message
> should be regarded as confidential. If you have received this email in
> error please notify the sender and destroy it immediately. Statements of
> intent shall only become binding when confirmed in hard copy by an
> authorised signatory. The contents of this email may relate to dealings
> with other companies under the control of BAE Systems Applied Intelligence
> Limited, details of which can be found at
> http://www.baesystems.com/Businesses/index.htm.
>

Mime
View raw message