impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Armstrong <tarmstr...@cloudera.com>
Subject Re: LeaseExpiredException while test data creation on ppc64le
Date Tue, 08 Mar 2016 16:58:23 GMT
I don't think we've seen that crash before. It looks like it's
dereferencing a pointer thatis causing the crash. Tracing back through the
callstack, it looks like somehow the expression below is constructing a
StringValue that is causing a segfault when dereferenced
(0x000000ffff9b6aa0).

inline Status HdfsParquetTableWriter::BaseColumnWriter::AppendRow(TupleRow*
row) {
  ++num_values_;
  void* value = expr_ctx_->GetValue(row); <==

I'm not sure why it would be returning an invalid pointer. It not a NULL
pointer and looks possibly valid and 16-byte allgned. If you have a core
dump it would be interesting to know if that pointer is pointing into
invalid memory or if something else is going on.

> Thanks a lot Tim! I did check some of the impalad*. Error, impalad*.info,
and other logs in cluster_logs/data_loading. Couple of observations-
> 1. Dependency on SSE3, error says exiting as hardware does not support
SSE3.
I think if you were able to compile this is ok. We have some inline
assembly and SSE3 intrinsics, but you probably had to work around those
already to build. You could fix this so that the check isn't done if
running on PowerPC.
> 2. One error says to increase num_of_threads_per_disk (something related
to number of threads, not sure about exact variable name) while starting
impalad
Hmm, this is probably because its detection of local disks fails. This is
expected to happen if running on a remote filesystem (e.g. a cloud
filesystem like S3, or some specialised disk hardware like Isilon or DSSD).
If it's happening with local disks, it's probably because it assumes it's
running on linux with specific filesystem nodes for devices.
> 3. A few log files say that bad_alloc
I think this is the exception that gets thrown when malloc() fails (when
called via the C++ new operator). I wonder if the system is low on memory?
How much RAM do you have?

I think how long it will take probably depends on what the end goal is: if
it's just to get it running, probably a couple more weeks, maybe more if
there are any particularly tricky bugs. If you want to do performance
tuning, I feel like there's probably some more work there. I think we
implicitly depend on certain properties on Intel hardware, e.g. recent
Intel processors have reasonably fast unaligned loads and stores and in
some cases wetake advantage of that, but I'm not sure if that's also true
of the processors you're targeting.

On Tue, Mar 8, 2016 at 8:26 AM, Nishidha Panpaliya <nishidha@us.ibm.com>
wrote:

> Hi Tim,
>
> As you suggested, I disabled codegen and also generated core dump.
>
> Core dump has pointed HashUtil::MurmurHash2_64 being problematic. Please
> see attached log file.
> *(See attached file: hs_err_pid15697.log)*
>
> I tested this function individually in a small test app and it worked. May
> be data given to it was simple enough for it to pass. But in case of
> Impala, there is some issue with data/arguments passed to this function in
> a particular case. Looks like this function is not called on machines where
> SSE is supported, so on x86, you might not see this crash. Do you suspect
> anything in this function or the functions calling this function? I'm still
> debugging more into this.
> If you have any clue, please point that to me so that I can try nail down
> the issue on that direction.
>
> Thanks,
> Nishidha
>
> [image: Inactive hide details for nishidha randad ---03/07/2016 10:35:39
> PM---Thanks a lot Tim! I did check some of the impalad*. Error]nishidha
> randad ---03/07/2016 10:35:39 PM---Thanks a lot Tim! I did check some of
> the impalad*. Error, impalad*.info, and other logs in cluster_
>
> From: nishidha randad <nishidha27@gmail.com>
> To: Tim Armstrong <tarmstrong@cloudera.com>
> Cc: Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
> Jagadale/Austin/Contr/IBM@IBMUS, dev@impala.incubator.apache.org
> Date: 03/07/2016 10:35 PM
>
> Subject: Re: LeaseExpiredException while test data creation on ppc64le
> ------------------------------
>
>
>
> Thanks a lot Tim! I did check some of the impalad*. Error, impalad*.info,
> and other logs in cluster_logs/data_loading. Couple of observations-
> 1. Dependency on SSE3, error says exiting as hardware does not support
> SSE3.
> 2. One error says to increase num_of_threads_per_disk (something related
> to number of threads, not sure about exact variable name) while starting
> impalad
> 3. A few log files say that bad_alloc
>
> I'm analysing all these errors. I'll dig more into this tomorrow and
> update you.
> One more help I wanted from you is in predicting the amount of work I may
> be left with and possible challenges ahead. It would be really great if you
> could point that to me from the logs I had posted.
>
> Also, about LLVM 3.7 fixes you did, I was wondering if you have completed
> upgradation, since you have also started encountering crashes.
>
> Thanks again!
>
> Nishidha
>
> On 7 Mar 2016 21:56, "Tim Armstrong" <*tarmstrong@cloudera.com*
> <tarmstrong@cloudera.com>> wrote:
>
>    Hi Nishidha,
>      I started working on our next release cycle towards the end of last
>    week, so I've been looking at LLVM 3.7 and have made a bit of progress
>    getting it working on intel. We're trying to get it done working so we have
>    plenty of chance to test it.
>
>    RE the TTransportException error, that is often because of a crash.
>    Usually to debug I would first look at the /tmp/impalad.ERROR and
>    /tmp/impalad.INFO logs for the cause of the crash. The embedded JVM also
>    generates hs_err_pid*.log files with a crash report that can sometimes be
>    useful. If that doesn't reveal the cause, then I'd look to see if there is
>    a core dump in the Impala directory (I normally run with "ulimit -c
>    unlimited" set so that a crash will generate a core file).
>
>    I already fixed a couple of problems with codegen in LLVM 3.7,
>    including one crash that was an assertion about struct sizes. I'll be
>    posting the patch soon once I've done a bit more testing.
>
>    It might help to make progress is you disable LLVM codegen by default
>    during data loading by setting the following environment variable:
>
>    export
>    START_CLUSTER_ARGS='--impalad_args=-default_query_options="disable_codegen=1"'
>
>    You can also start the test cluster with the same arguments or just
>    set it in the set with "set disable_codegen=1).
>
>    ./bin/start-impala-cluster.py
>    --impalad_args=-default_query_options="disable_codegen=1"
>
>    On Mon, Mar 7, 2016 at 5:13 AM, Nishidha Panpaliya <
>    *nishidha@us.ibm.com* <nishidha@us.ibm.com>> wrote:
>    Hi Tim,
>
>    Yes, I could fix this snappyError by building snappy-java for Power
>    and adding the native library for power into existing
>    snappy-java-1.0.4.1.jar used by hbase, hive, sentry and hadoop.
>    The test data loading has been proceeded further and gave a new
>    exception which I'm looking into and as below.
>
>    Data Loading from Impala failed with error: ImpalaBeeswaxException:
>    INNER EXCEPTION: <class
>    'thrift.transport.TTransport.TTransportException'>
>    MESSAGE: None
>
>    Also, I've been able to start impala and try just one following query
>    as given in
>    *https://github.com/cloudera/Impala/wiki/How-to-build-Impala*
>    <https://github.com/cloudera/Impala/wiki/How-to-build-Impala>-
>    impala-shell.sh -q"SELECT version()"
>
>    And regarding patch of my work, I'm sorry for the delay. Although it
>    does not need any CLA to be signed, but it is under discussion with our IBM
>    legal team, just to ensure we are compliant with the policies. Hoping to
>    update you on this soon. Could you tell me when are you going to start with
>    this new release cycle?
>
>    Thanks,
>    Nishidha
>
>    [image: Inactive hide details for Tim Armstrong ---03/05/2016 03:14:29
>    AM---It also looks like it got far enough that you should have a]Tim
>    Armstrong ---03/05/2016 03:14:29 AM---It also looks like it got far enough
>    that you should have a bit of data loaded - have you been able
>
>    From: Tim Armstrong <*tarmstrong@cloudera.com*
>    <tarmstrong@cloudera.com>>
>    To: nishidha panpaliya <*nishidha27@gmail.com* <nishidha27@gmail.com>>
>    Cc: Impala Dev <*impala-dev@cloudera.org* <impala-dev@cloudera.org>>,
>    Nishidha Panpaliya/Austin/Contr/IBM@IBMUS, Sudarshan
>    Jagadale/Austin/Contr/IBM@IBMUS
>    Date: 03/05/2016 03:14 AM
>    Subject: Re: LeaseExpiredException while test data creation on ppc64le
>    ------------------------------
>
>
>
>
>    It also looks like it got far enough that you should have a bit of
>    data loaded - have you been able to start impala and run queries on some of
>    those tables?
>
>    We're starting a new release cycle so I'm actually about to focus on
>    upgrading our version of LLVM to 3.7 and getting the Intel support working.
>    I think we're going to be putting a bit of effort into reducing LLVM code
>    generation time: it seems like LLVM 3.7 is slightly slower in some cases.
>
>    We should stay in sync, it would be good to make sure that any changes
>    I make will work for your PowerPC work too. If you want to share any
>    patches (even if you're not formally contributing them) it would be helpful
>    for me to understand what you have already done on this path.
>
>    Cheers,
>    Tim
>
>    On Fri, Mar 4, 2016 at 1:40 PM, Tim Armstrong <
>    *tarmstrong@cloudera.com* <tarmstrong@cloudera.com>> wrote:
>
>          Hi Nishidha,
>            It looks like Hive is maybe missing the native snappy library:
>          I see this in the logs:
>
>          java.lang.Exception: org.xerial.snappy.SnappyError:
>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>              at
>          org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>              at
>          org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
>          Caused by: org.xerial.snappy.SnappyError:
>          [FAILED_TO_LOAD_NATIVE_LIBRARY] null
>              at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
>              at org.xerial.snappy.Snappy.<clinit>(Snappy.java:44)
>              at
>          org.apache.avro.file.SnappyCodec.compress(SnappyCodec.java:43)
>              at
>          org.apache.avro.file.DataFileStream$DataBlock.compressUsing(DataFileStream.java:361)
>              at
>          org.apache.avro.file.DataFileWriter.writeBlock(DataFileWriter.java:394)
>              at
>          org.apache.avro.file.DataFileWriter.sync(DataFileWriter.java:413)
>
>
>
>          If you want to try making progress without Hive snappy support,
>          I think you coudl disable some of the files formats by editing
>          testdata/workloads/*/*.csv and removing some of the "snap" file formats.
>          The impala test suite generates data in many different file formats with
>          different compression settings.
>
>
>          On Wed, Mar 2, 2016 at 7:08 AM, nishidha panpaliya <
>          *nishidha27@gmail.com* <nishidha27@gmail.com>> wrote:
>          Hello,
>
>          After building Impala on ppc64le, I'm trying to run all the
>          tests of Impala. In the process, I'm getting an error while test data
>          creation.
>          Command ran -
>             ${IMPALA_HOME}/buildall.sh -testdata -format
>                         Output - Attached log (output.txt)
>
>          Also attached logs named
>          cluster_logs/data_loading/data-load-functional-exhaustive.log. And hive.log.
>
>          I tried setting below parameters in hive-site.xml but of no use.
>             hive.exec.max.dynamic.partitions=100000;
>
>                            hive.exec.max.dynamic.partitions.pernode=100000;
>                            hive.exec.parallel=false
>
>          I'll be really thankful if you could provide me some help here.
>
>          Thanks in advance,
>          Nishidha
>
>          --
>          You received this message because you are subscribed to the
>          Google Groups "Impala Dev" group.
>          To unsubscribe from this group and stop receiving emails from
>          it, send an email to *impala-dev+unsubscribe@cloudera.org*
>          <impala-dev+unsubscribe@cloudera.org>.
>
>
>
>

Mime
  • Unnamed multipart/related (inline, None, 0 bytes)
View raw message