impala-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mostafa Mokhtar <mmokh...@cloudera.com>
Subject Re: Issue in data loading in Impala + Kudu
Date Thu, 10 May 2018 05:03:19 GMT
Can you share the query profile for the successful insert query? 

Thanks 
Mostafa

> On May 9, 2018, at 9:55 PM, Geetika Gupta <geetika.gupta@knoldus.in> wrote:
> 
> Thanks, Jeszy.
> 
> We build impala again with --release flag and data load was successful after that.
> 
> But now we are facing another issue. The table in which we loaded the data has less number
of rows. We executed the following command:
> 
> insert into LINEITEM select * from PARQUETIMPALA500.LINEITEM
> 
> This query was successful, but when we tried the count(*) on both the tables, row count
was different:
> 
> 0: jdbc:hive2://slave2:21050/default> select count(*) from lineitem
> . . . . . . . . . . . . . . . . . . > ;
> 536870912
> 
> 0: jdbc:hive2://slave2:21050/default> select count(*) from parquetimpala500.lineitem;
> 3000028242
> 
> Do you have any idea about this issue.
> 
> 
>> On Mon, May 7, 2018 at 12:06 PM, Jeszy <jeszyb@gmail.com> wrote:
>> Impala doesn't store the data itself, so you can switch versions
>> without rewriting data. But you don't have to do that, you would just
>> have to build impala using the -release flag (of buildall.sh) and run
>> it using the release binaries (versus the debug ones). If you would be
>> looking at performance, using the release version is highly
>> recommended anyway.
>> 
>> On 7 May 2018 at 08:30, Geetika Gupta <geetika.gupta@knoldus.in> wrote:
>> > Hi Jeszy,
>> >
>> > Currently, we are using the apache impala's Github master branch code. We
>> > tried using the released version but we encountered some errors related to
>> > downloading of dependencies and could not complete the installation.
>> >
>> > The current version of impala we are using: 2.12
>> >
>> > We can't try with the new release as we have already loaded 500GB of TPCH
>> > data on our cluster.
>> >
>> > On Mon, May 7, 2018 at 11:43 AM, Jeszy <jeszyb@gmail.com> wrote:
>> >>
>> >> What version of Impala are you using?
>> >> DCHECKs won't be triggered if you run a release build. Looking at the
>> >> code, it should work with bad values if not for the DCHECK. Can you
>> >> try using a release build?
>> >>
>> >> On 7 May 2018 at 08:04, Geetika Gupta <geetika.gupta@knoldus.in> wrote:
>> >> > Hi community,
>> >> >
>> >> > I was trying to load 500GB of TPCH data into kudu table using the
>> >> > following
>> >> > query:
>> >> >
>> >> > insert into lineitem select * from PARQUETIMPALA500.LINEITEM
>> >> >
>> >> > While executing the query for around 17 hrs it got cancelled as the
>> >> > impalad
>> >> > process of that machine got aborted. Here are the logs of the impalad
>> >> > process.
>> >> >
>> >> > impalad.ERROR
>> >> >
>> >> > Log file created at: 2018/05/06 13:40:34
>> >> > Running on machine: slave2
>> >> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>> >> > E0506 13:40:34.097759 28730 logging.cc:121] stderr will be logged to
>> >> > this
>> >> > file.
>> >> > SLF4J: Class path contains multiple SLF4J bindings.
>> >> > SLF4J: Found binding in
>> >> >
>> >> > [jar:file:/root/softwares/impala/fe/target/dependency/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> >> > SLF4J: Found binding in
>> >> >
>> >> > [jar:file:/root/softwares/impala/testdata/target/dependency/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> >> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
>> >> > explanation.
>> >> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> >> > 18/05/06 13:40:34 WARN util.NativeCodeLoader: Unable to load
>> >> > native-hadoop
>> >> > library for your platform... using builtin-java classes where applicable
>> >> > 18/05/06 13:40:36 WARN shortcircuit.DomainSocketFactory: The
>> >> > short-circuit
>> >> > local reads feature cannot be used because libhadoop cannot be loaded.
>> >> > tcmalloc: large alloc 1073741824 bytes == 0x484434000 @  0x4135176
>> >> > 0x7fd9e9fc3929
>> >> > tcmalloc: large alloc 2147483648 bytes == 0x7fd540f18000 @  0x4135176
>> >> > 0x7fd9e9fc3929
>> >> > F0507 09:46:12.673912 29258 error-util.cc:148] Check failed:
>> >> > log_entry.count
>> >> >> 0 (-1831809966 vs. 0)
>> >> > *** Check failure stack trace: ***
>> >> >     @          0x3fc0c0d  google::LogMessage::Fail()
>> >> >     @          0x3fc24b2  google::LogMessage::SendToLog()
>> >> >     @          0x3fc05e7  google::LogMessage::Flush()
>> >> >     @          0x3fc3bae  google::LogMessageFatal::~LogMessageFatal()
>> >> >     @          0x1bbcb31  impala::PrintErrorMap()
>> >> >     @          0x1bbcd07  impala::PrintErrorMapToString()
>> >> >     @          0x2decbd7  impala::Coordinator::GetErrorLog()
>> >> >     @          0x1a8d634  impala::ImpalaServer::UnregisterQuery()
>> >> >     @          0x1b29264  impala::ImpalaServer::CloseOperation()
>> >> >     @          0x2c5ce86
>> >> >
>> >> > apache::hive::service::cli::thrift::TCLIServiceProcessor::process_CloseOperation()
>> >> >     @          0x2c56b8c
>> >> > apache::hive::service::cli::thrift::TCLIServiceProcessor::dispatchCall()
>> >> >     @          0x2c2fcb1
>> >> > impala::ImpalaHiveServer2ServiceProcessor::dispatchCall()
>> >> >     @          0x16fdb20  apache::thrift::TDispatchProcessor::process()
>> >> >     @          0x18ea6b3
>> >> > apache::thrift::server::TAcceptQueueServer::Task::run()
>> >> >     @          0x18e2181  impala::ThriftThread::RunRunnable()
>> >> >     @          0x18e3885  boost::_mfi::mf2<>::operator()()
>> >> >     @          0x18e371b  boost::_bi::list3<>::operator()<>()
>> >> >     @          0x18e3467  boost::_bi::bind_t<>::operator()()
>> >> >     @          0x18e337a
>> >> > boost::detail::function::void_function_obj_invoker0<>::invoke()
>> >> >     @          0x192761c  boost::function0<>::operator()()
>> >> >     @          0x1c3ebf7  impala::Thread::SuperviseThread()
>> >> >     @          0x1c470cd  boost::_bi::list5<>::operator()<>()
>> >> >     @          0x1c46ff1  boost::_bi::bind_t<>::operator()()
>> >> >     @          0x1c46fb4  boost::detail::thread_data<>::run()
>> >> >     @          0x2eedb4a  thread_proxy
>> >> >     @     0x7fda1dbb16ba  start_thread
>> >> >     @     0x7fda1d8e741d  clone
>> >> > Wrote minidump to
>> >> > /tmp/minidumps/impalad/a9113d9b-bc3d-488a-1feebf9b-47b42022.dmp
>> >> >
>> >> > impalad.FATAL
>> >> >
>> >> > Log file created at: 2018/05/07 09:46:12
>> >> > Running on machine: slave2
>> >> > Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
>> >> > F0507 09:46:12.673912 29258 error-util.cc:148] Check failed:
>> >> > log_entry.count
>> >> >> 0 (-1831809966 vs. 0)
>> >> >
>> >> > Impalad.INFO
>> >> > edentials={real_user=root}} blocked reactor thread for 34288.6us
>> >> > I0507 09:38:14.943245 29882 outbound_call.cc:288] RPC callback for
RPC
>> >> > call
>> >> > kudu.tserver.TabletServerService.Write -> {remote=136.243.74.42:7050
>> >> > (slave5), user_credentials={real_user=root}} blocked reactor thread
for
>> >> > 35859.8us
>> >> > I0507 09:38:15.942150 29882 outbound_call.cc:288] RPC callback for
RPC
>> >> > call
>> >> > kudu.tserver.TabletServerService.Write -> {remote=136.243.74.42:7050
>> >> > (slave5), user_credentials={real_user=root}} blocked reactor thread
for
>> >> > 40664.9us
>> >> > I0507 09:38:17.495046 29882 outbound_call.cc:288] RPC callback for
RPC
>> >> > call
>> >> > kudu.tserver.TabletServerService.Write -> {remote=136.243.74.42:7050
>> >> > (slave5), user_credentials={real_user=root}} blocked reactor thread
for
>> >> > 49514.6us
>> >> > I0507 09:46:12.664149  4507 coordinator.cc:783] Release admission
>> >> > control
>> >> > resources for query_id=3e4a4c646800e1d9:c859bb7f00000000
>> >> > F0507 09:46:12.673912 29258 error-util.cc:148] Check failed:
>> >> > log_entry.count
>> >> >> 0 (-1831809966 vs. 0)
>> >> > Wrote minidump to
>> >> > /tmp/minidumps/impalad/a9113d9b-bc3d-488a-1feebf9b-47b42022.dmp
>> >> >
>> >> > Note:
>> >> > We are executing the queries on 8 node cluster with the following
>> >> > configuration
>> >> > Cluster : 8 Node Cluster (48 GB RAM , 8 CPU Core and 2 TB hard-disk
>> >> > each,
>> >> > Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz
>> >> >
>> >> >
>> >> > --
>> >> > Regards,
>> >> > Geetika Gupta
>> >
>> >
>> >
>> >
>> > --
>> > Regards,
>> > Geetika Gupta
> 
> 
> 
> -- 
> Regards,
> Geetika Gupta

Mime
View raw message