hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Clarification on T file
Date Sun, 29 Apr 2012 12:50:47 GMT
Hey Maninder,

In some ways the TFile is close to SequenceFiles.

On Fri, Apr 20, 2012 at 8:19 PM, maninder batth
<batth.maninder@gmail.com> wrote:
> My requirements are to save variable sized binary records and ability to
> query them later on. So i was looking at Tfile and had some doubts.
> 1. Is the datablock in the tfile a fixed size or variable size? If it is
> fixed, what happens when a record cannot fit in the datablock? Would you
> normally fill the empty space with zeros or spread the record over 2
> datablocks?
> 2. Is there any downside of having a variable sized datablocks?

The condition for creation of a data block is only if the current size
of the block (at end of an append) is >= min-size-of-block.

Hence the data block isn't "fixed" in size. So if there's still space,
another record's written and then the condition is checked (which
would then trigger a block completion).

> 3. Are the records synced with file at the boundary of a datablock or they
> just written to file system. The question is like write() call in linux vs
> fsync()?

Unsure what you mean by a "datablock" here. The TFiles don't work at
the FS level, and the "datablocks" in it are logical. Could you
clarify this question given (1) and (2)?

Harsh J

View raw message