asterixdb-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pouria Pirzadeh <pouria.pirza...@gmail.com>
Subject Re: Data in AsterixDB skewing towards one node
Date Wed, 04 Nov 2015 19:30:07 GMT
Hi Max,

Can you please explain this part a bit more:
"… When I load the external data it is all saved on a single node"

Are you using "external datasets" or "internal datasets, loaded from files
on HDFS".
The fact is if you are using "external datasets", then AsterixDB does not
really load any thing. It just gets the location of blocks on HDFS and
remembers them. So in this case, if there is any issue with uniform
distribution of data files, that is really related to HDFS and not
AsterixDB. But if you are 'loading' an "internal" dataset by reading
records from files on HDFS and you see issues with uniform distribution of
created on-disk components, then that is another issue and could be related
to AsterixDB.

Pouria



On Wed, Nov 4, 2015 at 11:23 AM, <schultze@informatik.hu-berlin.de> wrote:

> Hello,
>
> I have a cluster setup of AsterixDB running 4 nodes with the first being
> the master node and a node controller running on each of them. As a test I
> run TPC-H queries on them loading the generated TPC-H datasets from a
> hadoop distributed file system.
>
> When I load the external data it is all saved on a single node. For later
> querying that means that most of the computations are done by that single
> node which slows down the whole query (and makes the distributed
> computation idea obsolete).
>
> By now I tried to setup the system several times and interestingly enough
> two times I was able to receive a fully functional system. Unfortunatly I
> currently cannot reproduce a functional system state and whenever I try to
> do a new setup I get the data skewing towards one node.
>
> Has that ever happened before? Do you know the reason for this or how to
> fix that?
>
> Regards, Max
>
>

Mime
View raw message