accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Medinets <>
Subject Re: How does Accumulo compare to HBase
Date Fri, 11 Jul 2014 12:52:31 GMT
>On Fri, Jul 11, 2014 at 7:25 AM, <> wrote:
>The entirety of both data corpuses were re-loaded every night?


>What did the users do while the data was dropped and reloaded?

The technique of 'dropping and reloading' was not used.

Users were not impacted. For the original system, we used
a combination of Sqoop and the Fair Scheduler in Hadoop to
throttle the export. For Accumulo, we created new tables using
a date-based naming convention. Accumulo queries used a lookup
process to find the current table. When the new table was
ready it was automatically used.

>What happened in the middle of night if the job failed?

Why has this conversation topic changed to "Is David
competent to design an ingest system"?

>Couldn’t you identify the incremental updates to the two sources
>and incrementally load the new data into the combined target?

Yes, we could. But, for reasons not germane to this conversation,
we pulled the whole corpus.

>This brute force implementation is only applicable to a few use
>cases with lax SLAs.


>>From: David Medinets []
>>Last year, I used Accumulo's rapid ingest ability to join two data
>>silos into one dataset. Every field was fully indexed. Having all
>>of the data in one place allowed cross-referencing queries to be
>>executed. For various reason, this kind of query was not possible
>>using the existing technology. The rapid ingest was important
>>because a new copy of the data silos was pulled every night.

View raw message