accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dong Zhou <>
Subject Re: Metadata DataFileValue not Matching the Output of rfile-info Command
Date Tue, 13 Feb 2018 19:19:26 GMT
I see. Yes, the file is loaded via bulk import.
I would like to find out the most precise number of entries a table
contains, would running a compaction, and then scanning metadata table for
the entry number be sufficient method?
Also, what would happen is merge operation runs before the compaction?
Would it try to merge this tablet into other tablets since the file size
and entry number look fair small at the time it scans the metadata table?
Or, it would compact the table before running the merge.

By the way, thanks for the quick reply. :)


On Tue, Feb 13, 2018 at 11:05 AM Michael Wall <> wrote:

> Hi Dong,
> That file is the result of a bulk import.  I can tell because it starts
> with a capital "I", see
> Bulk files are inspected on import to find all the ranges of data they
> contain.  They are then assigned to all the tablets hosting that data.  So
> one "I" file can belong to more than one tablet.  When that file is
> included in a compaction, the data that is not part of the range the tablet
> is hosting is not rewritten to the new files.
> When inspecting "I" files, Accumulo does not keep track of how many keys
> are in each range.  So for "I" files in the metadata table, the number of
> keys is 0 until that file is compacted.
> Mike
> On Tue, Feb 13, 2018 at 1:37 PM Dong Zhou <> wrote:
>> Hi all,
>> We have noticed that the Accumulo metadata entry reports certain RFile
>> has file size but no entry number.
>> For example, <tableId>;<tabletEndRow>
>> file:hdfs://apps/accumulo/tables/<tableId>/<folder>/I001ahdz.rf []  
>> From Metadata's perspective, it looks like this the RFile contains zero
>> entries, but if we run an RFILE-INFO command against the same file, the
>> outcome shows that the RFile has a bunch of entries. If we dump the RFile,
>> we can see that it spills out the actual data too.
>> We wonder what is the reason behind it.
>> Thanks,
>> -Dong Zhou

View raw message