accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Turner <ke...@deenlo.com>
Subject Re: Bulk ingestion of different locality groups at different times
Date Fri, 28 Oct 2016 14:14:46 GMT
On Fri, Oct 28, 2016 at 10:03 AM, Mario Pastorelli
<mario.pastorelli@teralytics.ch> wrote:
> Thanks for the answers. About the huge major compaction, the question is not
> about when the major compaction will be but more about how big the major
> compaction of two bulked loaded files will be. The rfiles will be already
> sorted and they will contain two different locality groups and Accumulo
> stores locality groups  separately on disk. The compaction should not do
> much here, just reuse the created groups, right?

Accumulo stores multiple locality groups into a single file.
Compactions make a pass for each locality group.  The following is a
sketch of what compactions do.

inputIter = //an iterator over the files being compacted
outputRFile = //the tmp file compaction is writing to

for(localityGroup : localityGroups) {
    inputIter.seek(new Range(), localityGroup.getFamilies(), true)
    outputRFile.startLocalityGroup(localityGroup.getFamilies())

    //write intputIter to outputRFile
}

//write default locality group
inputIter.seek(new Range(), localityGroups.getAllFamilies(), false)
//read all families not in a configured LG
outputRFile.startDefaultLocalityGroup()

//write intputIter to outputRFile

>
> On Fri, Oct 28, 2016 at 3:54 PM, <dlmarion@comcast.net> wrote:
>>
>> >>> Is Accumulo able to import these files, considering that they are two
>> >>> different locality groups
>>
>>  Yes.
>>
>> >>> without triggering a huge major compaction?
>>
>> Depends on your table.compaction.major.ratio and table.file.max settings.
>>
>>
>> Sorry, not a real answer, but I think the answer is "it depends"
>>
>> ________________________________
>> From: "Mario Pastorelli" <mario.pastorelli@teralytics.ch>
>> To: user@accumulo.apache.org
>> Sent: Friday, October 28, 2016 9:37:13 AM
>> Subject: Bulk ingestion of different locality groups at different times
>>
>>
>> Hi,
>>
>> I have a question about using bulk ingestion for a rather special case.
>> Let's say that I have the locality groups A and B. The values of each
>> locality group are written to Accumulo in at different times, which means
>> that first we ingest all the cells of the group A and then of B. We use
>> Spark to ingest those records. Right now we write all the values with a
>> custom writer but we would like to create the rfiles directly with Spark. In
>> the case above, we would have two jobs creating the rfiles for the two
>> distinct locality groups. Is Accumulo able to import these files,
>> considering that they are two different locality groups, without triggering
>> a huge major compaction?  If not, what strategy would you suggest for the
>> above use case?
>>
>> Thanks,
>> Mario
>>
>> --
>> Mario Pastorelli | TERALYTICS
>>
>> software engineer
>>
>> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
>> phone: +41794381682
>> email: mario.pastorelli@teralytics.ch
>> www.teralytics.net
>>
>> Company registration number: CH-020.3.037.709-7 | Trade register Canton
>> Zurich
>> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
>> de Vries
>>
>> This e-mail message contains confidential information which is for the
>> sole attention and use of the intended recipient. Please notify us at once
>> if you think that it may not be intended for you and delete it immediately.
>>
>>
>
>
>
> --
> Mario Pastorelli | TERALYTICS
>
> software engineer
>
> Teralytics AG | Zollstrasse 62 | 8005 Zurich | Switzerland
> phone: +41794381682
> email: mario.pastorelli@teralytics.ch
> www.teralytics.net
>
> Company registration number: CH-020.3.037.709-7 | Trade register Canton
> Zurich
> Board of directors: Georg Polzer, Luciano Franceschina, Mark Schmitz, Yann
> de Vries
>
> This e-mail message contains confidential information which is for the sole
> attention and use of the intended recipient. Please notify us at once if you
> think that it may not be intended for you and delete it immediately.

Mime
View raw message