impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: Fw: Issues with generating testdata for Impala
Date Mon, 25 Jul 2016 15:22:25 GMT
Hi Valencia,
  I have an update on the TPC-H/TPC-DS test data - I'm looking at
automating that part of data generation. I was able to verify that it is
the unmodified output of the TPC-H/TPC-DS data generator utilties (the
versions we have in native-toolchain). The only change is to move each
generated file into a subdirectory.

- Tim



On Tue, Jul 19, 2016 at 9:23 PM, Valencia Serrao <vserrao@us.ibm.com> wrote:

> Hi Tim,
>
> Thanks for the update.
>
> Regards,
> Valencia
>
> [image: Inactive hide details for Tim Armstrong ---07/20/2016 02:35:47
> AM---Hi Valencia, I wasn't able to get a clear answer, but as]Tim
> Armstrong ---07/20/2016 02:35:47 AM---Hi Valencia, I wasn't able to get a
> clear answer, but as far as we know it hasn't been
>
> From: Tim Armstrong <tarmstrong@cloudera.com>
> To: Valencia Serrao/Austin/Contr/IBM@IBMUS
> Cc: Alex Behm <alex.behm@cloudera.com>, Casey Ching <casey@cloudera.com>,
> dev@impala.incubator.apache.org, Manish Patil/Austin/Contr/IBM@IBMUS,
> Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
> Jagadale/Austin/Contr/IBM@IBMUS
> Date: 07/20/2016 02:35 AM
>
> Subject: Re: Fw: Issues with generating testdata for Impala
> ------------------------------
>
>
>
> Hi Valencia,
>   I wasn't able to get a clear answer, but as far as we know it hasn't
> been modified.
>
> - Tim
>
> On Tue, Jul 12, 2016 at 4:59 AM, Valencia Serrao <*vserrao@us.ibm.com*
> <vserrao@us.ibm.com>> wrote:
>
>    Hi Tim,
>
>    Thank you for responding.
>
>    Please do let me know if any post-processing was done on the data at
>    *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp*
>    <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp>
>    *.*
>
>    Regards,
>    Valencia
>
>
>    [image: Inactive hide details for Tim Armstrong ---07/08/2016 01:31:46
>    AM---Hi Valencia, The data is scale factor 1 for the TPC-H and]Tim
>    Armstrong ---07/08/2016 01:31:46 AM---Hi Valencia, The data is scale factor
>    1 for the TPC-H and TPC-DS benchmarks:
>
>    From: Tim Armstrong <*tarmstrong@cloudera.com*
>    <tarmstrong@cloudera.com>>
>    To: Valencia Serrao/Austin/Contr/IBM@IBMUS
>    Cc: Casey Ching <*casey@cloudera.com* <casey@cloudera.com>>, Alex Behm
>    <*alex.behm@cloudera.com* <alex.behm@cloudera.com>>,
>    *dev@impala.incubator.apache.org* <dev@impala.incubator.apache.org>,
>    Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>    Jagadale/Austin/Contr/IBM@IBMUS, Manish Patil/Austin/Contr/IBM@IBMUS
>    Date: 07/08/2016 01:31 AM
>
>
>    Subject: Re: Fw: Issues with generating testdata for Impala
>    ------------------------------
>
>
>
>    Hi Valencia,
>      The data is scale factor 1 for the TPC-H and TPC-DS benchmarks:
>    *http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp*
>    <http://www.tpc.org/tpc_documents_current_versions/current_specifications.asp>
>
>    I imagine you could reconstruct it using their data generators.
>
>    I'm unsure if we modified those data generators at all or did any
>    postprocessing. I'm going to check if anyone knows exactly how that data
>    was generated originally.
>
>    On Wed, Jul 6, 2016 at 10:52 PM, Valencia Serrao <*vserrao@us.ibm.com*
>    <vserrao@us.ibm.com>> wrote:
>       Hi Casey/Alex/Tim,
>
>          I need to know whether it is possible to generate the tpch and
>          tpcds data without using the tar's you provided at
>          *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp*
>          <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp>.
>          Because when i tried to load data without using the tpch and tpcds tars,
>          though functional-query data loaded successfully, I got the following error
>          during the TPC-H data load step:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Error: Error while compiling statement: FAILED: SemanticException Line
>          1:23 Invalid path ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No
>          files matching path file: /ImpalaPPC/testdata/impala-data/tpch/lineitem
>          (state=42000,code=40000) org.apache.hive.service.cli.HiveSQLException:
>          Error while compiling statement: FAILED: SemanticException Line 1:23
>          Invalid path ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No files
>          matching path file:/ImpalaPPC/testdata/impala-data/tpch/lineitem at
>          org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:235) at
>          org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:221) at
>          org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:244) at
>          org.apache.hive.beeline.Commands.executeInternal(Commands.java:893) at
>          org.apache.hive.beeline.Commands.execute(Commands.java:1079) at
>          org.apache.hive.beeline.Commands.sql(Commands.java:976) at
>          org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1085) at
>          org.apache.hive.beeline.BeeLine.execute(BeeLine.java:917) at
>          org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:895) at
>          org.apache.hive.beeline.BeeLine.begin(BeeLine.java:837) at
>          org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:482)
>          at org.apache.hive.beeline.BeeLine.main(BeeLine.java:465) at
>          sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
>          sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>          at
>          sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>          at java.lang.reflect.Method.invoke(Method.java:606) at
>          org.apache.hadoop.util.RunJar.run(RunJar.java:221) at
>          org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by:
>          org.apache.hive.service.cli.HiveSQLException: Error while compiling
>          statement: FAILED: SemanticException Line 1:23 Invalid path
>          ''/ImpalaPPC/testdata/impala-data/tpch/lineitem'': No files matching path
>          file:/ImpalaPPC/testdata/impala-data/tpch/lineitem*
>
>
>          Regards,
>          Valencia
>
>          [image: Inactive hide details for Casey Ching ---05/04/2016
>          11:51:39 AM---Comment inline below On May 3, 2016 at 11:18:06 PM, Alex Behm]Casey
>          Ching ---05/04/2016 11:51:39 AM---Comment inline below On May 3, 2016 at
>          11:18:06 PM, Alex Behm (*alex.behm@cloudera.com*
>          <alex.behm@cloudera.com>) wrote:
>
>          From: Casey Ching <*casey@cloudera.com* <casey@cloudera.com>>
>          To: Alex Behm <*alex.behm@cloudera.com* <alex.behm@cloudera.com>>,
>          *dev@impala.incubator.apache.org*
>          <dev@impala.incubator.apache.org>
>          Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Nishidha
>          Panpaliya/Austin/Contr/IBM@IBMUS, Valencia
>          Serrao/Austin/Contr/IBM@IBMUS
>          Date: 05/04/2016 11:51 AM
>          Subject: Re: Fw: Issues with generating testdata for Impala
>          ------------------------------
>
>
>
>
>          Comment inline below
>
>          On May 3, 2016 at 11:18:06 PM, Alex Behm (
>          *alex.behm@cloudera.com* <alex.behm@cloudera.com>) wrote:
>          Hi Valencia,
>
>                                  I'm sorry you are having so much trouble
>                                  with our setup. Let's see what we
>                                  can do.
>
>                                  There was an infra issue with receiving
>                                  the logs you sent me. The
>                                  email/attachment got rejected on our
>                                  side. Maybe you can upload the logs
>                                  somewhere so I can grab them?
>
>                                  See more responses inline below.
>
>                                  On Sat, Apr 30, 2016 at 5:01 AM,
>                                  Valencia Serrao <*vserrao@us.ibm.com*
>                                  <vserrao@us.ibm.com>> wrote:
>
>                                  > Hi Alex,
>                                  >
>                                  > I was going more deeper through the
>                                  logs. I have some findings and queries:
>                                  >
>                                  > 1. At the "Invalidating Metadata" step
>                                  (as mentioned in below mail), i
>                                  > noticed that, it is trying to use
>                                  kerberos. Perhaps, this is preventing the
>                                  > testdata generation from proceeding,
>                                  as we are not using Kerberos.
>                                  > I need to know how this can be done
>                                  without involving Kerberos support ?
>                                  >
>                                  Kerberos is certainly not needed to
>                                  build and run tests.
>
>                                  >
>                                  > 2. I had executed the fe tests despite
>                                  the incomplete testdata generation,
>                                  > the tests started and surely have
>                                  failed. Many of these (null pointer
>                                  > exception in AuthorzationTests) have a
>                                  common cause: "tpch database does
>                                  > not exist."
>                                  > e.g. as shown in
>                                  .Impala/cluster_logs/query_tests/test-run-workload.log.
>                                  >
>                                  > Does the "tpch" database gets created
>                                  after the current blocker step
>                                  > "Invalidating Metadata" ?
>                                  >
>
>                                  Yes, the TPCH database is created and
>                                  loaded as part of that first phase.
>                                  However, the data files are not yet
>                                  publicly accessible. Let me work on
>                                  that from my side, and get back to you
>                                  soon. One way or the other we'll be
>                                  able to provide you with the data.
>
>          The data is at
>          *https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp*
>          <https://github.com/cloudera/Impala-docker-hub/tree/master/prereqs/container_root/tmp>
>          . The files are split into 50 MB pieces for git. You can put them back
>          together as is done in
>          *https://github.com/cloudera/Impala-docker-hub/blob/master/complete/Dockerfile*
>          <https://github.com/cloudera/Impala-docker-hub/blob/master/complete/Dockerfile>
>
>                                  >
>                                  > 3. In the fe test console output log,
>                                  another error shown:
>                                  > ============================= test
>                                  session starts
>                                  > ==============================
>                                  > platform linux2 -- Python 2.7.5 --
>                                  py-1.4.30 -- pytest-2.7.2
>                                  > rootdir: /work/, inifile:
>                                  > plugins: random, xdist
>                                  > ERROR: file not found:/work/I
>                                  >
>                                  mpala/../Impala-auxiliary-tests/tests/aux_custom_cluster_tests/
>                                  >
>                                  > These are not present/created on my
>                                  vm. May i know when these get created ?
>                                  >
>                                  > 4. Could you also share the total
>                                  number of fe tests ?
>                                  >
>
>                                  I'll privately send you the console
>                                  output from a successful FE run.
>                                  Hopefully that can help.
>
>                                  Cheers,
>
>                                  Alex
>
>                                  >
>                                  >
>                                  > Looking forward to your reply.
>                                  >
>                                  > Regards,
>                                  > Valencia
>                                  >
>                                  >
>                                  > [image: Inactive hide details for
>                                  Valencia Serrao---04/30/2016 09:05:54
>                                  > AM---Hi Alex, I've been able to make
>                                  some progress on testdata]Valencia
>                                  > Serrao---04/30/2016 09:05:54 AM---Hi
>                                  Alex, I've been able to make some
>                                  > progress on testdata generation,
>                                  however, i still face the foll
>                                  >
>                                  > From: Valencia Serrao/Austin/Contr/IBM
>                                  > To: *dev@impala.incubator.apache.org*
>                                  <dev@impala.incubator.apache.org>, Alex
>                                  Behm <*alex.behm@cloudera.com*
>                                  <alex.behm@cloudera.com>>
>                                  > Cc: Sudarshan
>                                  Jagadale/Austin/Contr/IBM@IBMUS,
>                                  Nishidha
>                                  > Panpaliya/Austin/Contr/IBM@IBMUS,
>                                  Valencia Serrao/Austin/Contr/IBM@IBMUS
>                                  > Date: 04/30/2016 09:05 AM
>                                  > Subject: Fw: Issues with generating
>                                  testdata for Impala
>                                  > ------------------------------
>                                  >
>                                  >
>                                  >
>                                  > Hi Alex,
>                                  >
>                                  > I've been able to make some progress
>                                  on testdata generation, however, i
>                                  > still face the following issues:
>                                  >
>                                  >
>                                  >
>                                  *******************************************************************************************************************************************************************
>
>                                  > Invalidating Metadata
>                                  >
>                                  >
>                                  (load-functional-query-exhaustive-impala-load-generated-parquet-none-none.sql):
>
>                                  > INSERT OVERWRITE TABLE
>                                  functional_parquet.alltypes partition (year, month)
>                                  > SELECT id, bool_col, tinyint_col,
>                                  smallint_col, int_col, bigint_col,
>                                  > float_col, double_col,
>                                  date_string_col, string_col, timestamp_col, year,
>                                  > month
>                                  > FROM functional.alltypes
>                                  >
>                                  > Data Loading from Impala failed with
>                                  error: ImpalaBeeswaxException:
>                                  > INNER EXCEPTION: <class
>                                  'socket.error'>
>                                  > MESSAGE: [Errno 104] Connection reset
>                                  by peer
>                                  > Error in
>                                  /root/nishidha/Impala/testdata/bin/create-load-data.sh
at line
>                                  > 41: while [ -n "$*" ]
>                                  > Error in
>                                  /root/nishidha/Impala/buildall.sh at line 368:
>                                  >
>                                  ${IMPALA_HOME}/testdata/bin/create-load-data.sh ${CREATE_LOAD_DATA_ARGS}
>                                  > <<< Y
>                                  >
>                                  >
>                                  *************************************************************************************************************************************************************************
>
>                                  >
>                                  > i continued with fe tests as is. Here
>                                  is the complete output log.
>                                  > [attachment "fe_test_output.zip"
>                                  deleted by Valencia
>                                  > Serrao/Austin/Contr/IBM]
>                                  >
>                                  > Cluster logs: [attachment
>                                  "cluster_logs.7z" deleted by Valencia
>                                  > Serrao/Austin/Contr/IBM]
>                                  >
>                                  > Kindly guide me on the same.
>                                  >
>                                  > Regards,
>                                  > Valencia
>                                  > ----- Forwarded by Valencia
>                                  Serrao/Austin/Contr/IBM on 04/29/2016 10:57 AM
>                                  > -----
>                                  >
>                                  > From: Sudarshan
>                                  Jagadale/Austin/Contr/IBM
>                                  > To: Valencia
>                                  Serrao/Austin/Contr/IBM@IBMUS
>                                  > Date: 04/29/2016 10:49 AM
>                                  > Subject: Fw: Issues with generating
>                                  testdata for Impala
>                                  > ------------------------------
>                                  >
>                                  >
>                                  > FYI
>                                  > Thanks and Regards
>                                  > Sudarshan Jagadale
>                                  > Power Open Source Solutions
>                                  > ----- Forwarded by Sudarshan
>                                  Jagadale/Austin/Contr/IBM on 04/29/2016 10:48
>                                  > AM -----
>                                  >
>                                  > From: Alex Behm <
>                                  *alex.behm@cloudera.com*
>                                  <alex.behm@cloudera.com>>
>                                  > To: *dev@impala.incubator.apache.org*
>                                  <dev@impala.incubator.apache.org>
>                                  > Cc: Sudarshan
>                                  Jagadale/Austin/Contr/IBM@IBMUS,
>                                  Nishidha
>                                  > Panpaliya/Austin/Contr/IBM@IBMUS
>                                  > Date: 04/28/2016 09:34 PM
>                                  > Subject: Re: Issues with generating
>                                  testdata for Impala
>                                  > ------------------------------
>                                  >
>                                  >
>                                  >
>                                  > Hi Valencia,
>                                  >
>                                  > sorry I did not get the attachment.
>                                  Would you be able to tar.gz and attach
>                                  > the whole cluster_logs directory?
>                                  >
>                                  > Alex
>                                  >
>                                  > On Thu, Apr 28, 2016 at 6:23 AM,
>                                  Valencia Serrao <**vserrao@us.ibm.com*
>                                  <vserrao@us.ibm.com>*
>                                  > <*vserrao@us.ibm.com*
>                                  <vserrao@us.ibm.com>>> wrote:
>                                  >
>                                  > Hi Alex,
>                                  >
>                                  > I tried building impala again with the
>                                  following:
>                                  > HDFS CDH 5.7.0 (
>                                  > *
>                                  *http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3**
>                                  <http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3*>
>                                  > <
>                                  *http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3*
>                                  <http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh_package_tarball_57.html#topic_3>>
>
>                                  > )
>                                  > HBASE CDH 5.7.0 SNAPSHOT (
>                                  > *
>                                  *http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz**
>                                  <http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz*>
>                                  > <
>                                  *http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz*
>                                  <http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.7.0.tar.gz>>
>                                  )
>                                  > - this required to patch in a fix (
>                                  > *
>                                  *https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch**
>                                  <https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch*>
>                                  > <
>                                  *https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch*
>                                  <https://issues.apache.org/jira/secure/attachment/12792536/HBASE-15322-branch-1.2.patch>>
>
>                                  > )
>                                  > HIVE CDH 5.8.0 SNAPSHOT
>                                  >
>                                  > With the above combination, i'm able
>                                  to move past the exception and
>                                  > also have the RegionServer service up
>                                  and running. However, it now gives
>                                  > error as below:
>                                  >
>                                  >
>                                  >
>                                  ********************************************************************************************************************
>
>                                  >
>                                  (load-functional-query-exhaustive-impala-generated-text-none-none.sql):
>                                  > CREATE EXTERNAL TABLE IF NOT EXISTS
>                                  functional.decimal_tbl (
>                                  > d1 DECIMAL,
>                                  > d2 DECIMAL(10, 0),
>                                  > d3 DECIMAL(20, 10),
>                                  > d4 DECIMAL(38, 38),
>                                  > d5 DECIMAL(10, 5))
>                                  > PARTITIONED BY (d6 DECIMAL(9, 0))
>                                  > ROW FORMAT delimited fields terminated
>                                  by ','
>                                  > STORED AS TEXTFILE
>                                  > LOCATION '/test-warehouse/decimal_tbl'
>                                  >
>                                  >
>                                  (load-functional-query-exhaustive-impala-generated-text-none-none.sql):
>                                  > USE functional
>                                  >
>                                  >
>                                  (load-functional-query-exhaustive-impala-generated-text-none-none.sql):
>                                  > ALTER TABLE decimal_tbl ADD IF NOT
>                                  EXISTS PARTITION(d6=1)
>                                  >
>                                  > Data Loading from Impala failed with
>                                  error: ImpalaBeeswaxException:
>                                  > INNER EXCEPTION: <class
>                                  >
>                                  'impala._thrift_gen.beeswax.ttypes.BeeswaxException'>
>                                  > MESSAGE:
>                                  > Error: null
>                                  >
>                                  >
>                                  ******************************************************************************************************************
>
>                                  >
>                                  > Here is the complete log for the same.
>                                  *(See attached file:
>                                  > data-load-functional-exhaustive.log)*
>                                  >
>                                  > It would great if you could guide me
>                                  on this issue, so i could proceed
>                                  > with the fe tests.
>                                  >
>                                  > Still awaiting link to the source code
>                                  of HDFS CDH 5.8.0
>                                  >
>                                  > Regards,
>                                  > Valencia
>                                  >
>                                  >
>                                  >
>                                  >
>
>
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message