accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "roman.drapeko@baesystems.com" <roman.drap...@baesystems.com>
Subject micro compaction
Date Tue, 09 Jun 2015 15:28:53 GMT
Hi guys,

While doing pre-analytics we generate hundreds of millions of mutations  that result in 1-100
megabytes of useful data after major compaction. We ingest into Accumulo using MR from Mapper
job. We identified that performance really degrades while increasing a number of mutations.

The obvious improvement is to do some calculations in-memory before sending mutations to Accumulo.

Of course, at the same time we are looking for a solution to minimize development effort.

I guess I am asking about micro compaction/ingest-time iterators on the client side (before
data is sent to Accumulo).

To my understanding, Accumulo does not support them, is it correct? And if so, are there any
plans to support this functionality in the future?

Thanks
Roman


Please consider the environment before printing this email. This message should be regarded
as confidential. If you have received this email in error please notify the sender and destroy
it immediately. Statements of intent shall only become binding when confirmed in hard copy
by an authorised signatory. The contents of this email may relate to dealings with other companies
under the control of BAE Systems Applied Intelligence Limited, details of which can be found
at http://www.baesystems.com/Businesses/index.htm.

Mime
View raw message