impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <>
Subject Re: Parttition_id parameter of HdfsScanNodeRange::AllocateScanRange
Date Tue, 08 Aug 2017 16:25:49 GMT
Based on the comment of AllocateScanRange(), it seems like in some cases
the partition id is ignored so a dummy value can safely be passed in. I'm
honestly not sure why it was optional in those cases - it doesn't seem like
it would be hard plumb through the valid partition ID.

So the change seems reasonable to me.

On Tue, Aug 8, 2017 at 9:10 AM, Gabor Kaszab <>

> Hey,
> I'm currently working on IMPALA-5412
> <> where I enhanced
> FileDescMap to have a key of (partition_id, filename) pair instead of a
> single filename.
> With this I have to extend the HdfsScanNode::GetFileDesc to expect the
> partition_id as well. In turn one of the functions calling this GetFileDesc
> is HdfsScanNode::AllocateScanRange already has a partition_id as an input
> parameter so it seemed obvious to pass that to GetFileDesc.
> Unfortunately, after some debugging of the impala code that crashed during
> the "Loading Kudu TPCH" phase of testing I observed that this
> AllocateScanRange is sometimes given '-1' as a partition_id. (in one
> occasion it is given a scalar_reader->col_idx() as partition_id.)
> As a result no matching file descriptor is found by GetFileDesc where a
> DCHECK fails.
> I naively modified how AllocateScanRange is called to receive valid
> partition IDs instead of '-1' and col_idx(). This fixes the issues I
> encountered and mentioned above, however I'm not sure that this change
> doesn't break some other logic. (Core tests are being run, but even if they
> pass I'm not sure this change is fine)
> Was it intentional to give those non-valid partition_id's to
> AllocateScanRange? Do you see any risk of passing valid IDs instead?
> Cheers,
> Gabor

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message