accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jared R (JIRA)" <>
Subject [jira] [Commented] (ACCUMULO-4749) Need a bulk loading test equivalent to continuous ingest
Date Wed, 03 Jan 2018 14:48:00 GMT


Jared R commented on ACCUMULO-4749:

So in that case if  we were to do the route you suggest then I would need still need the class
for the generator (create numbers linked list -> file -> set of files from directories
to loader) and class for the loader(ingest) (turns the files from the directory into rfiles
-> loads batch of rfiles to accumulo). 

When I talked to Chris we discussed multiple directories where the generators put files of
random numbers in multiple directories, then are sequenced and send to the loader which uses
a mapreduce job to turn into Rfiles then loads to Accumulo. I wish I had a way to post the
diagram we made. 

If this can be done without having to copy files and possible can use the current batchwriter
and ingest code and make some changes for new classes then that's great. 

> Need a bulk loading test equivalent to continuous ingest
> --------------------------------------------------------
>                 Key: ACCUMULO-4749
>                 URL:
>             Project: Accumulo
>          Issue Type: Improvement
>          Components: test
>            Reporter: Ivan Bella
>            Assignee: Jared R
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
> There are some known cases at least in past versions where bulk loading may fail leaving
the ~blip in place but no transaction left to handle it.  This will result in directories
of files being left around that are not loaded.  We should create a continuous ingest variant
that uses bulk loading instead.  Then if this is run with agitation, the continuous ingest
verification can find data that has been essentially orphaned.

This message was sent by Atlassian JIRA

View raw message