Mailing-List: contact user-help@kylin.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@kylin.apache.org
MIME-Version: 1.0
In-Reply-To: <CAK4CjGMAVHrwXqdXrqaweeEaT7fCzoZySR1NHmakjj0YcatRoA@mail.gmail.com>
References: <CAK4CjGOGqv4x8sHn40a5CRSSA1FZzNFRnrZiPxr43Mi+ejnQWw@mail.gmail.com>
 <CANfpUctssj87X-pcQ02Kk_cWTyySV6=-mqD1c8Di5PhWcn7U3w@mail.gmail.com>
 <CAK4CjGN-SDn1uQiEiHq-vziSRf_OE4w7=OYYxQOMky30Gw7knw@mail.gmail.com>
 <CANfpUcvTp0KBkKx2dhKFcnApDHdYX2BroCKFEX4JjUEWN2xLHg@mail.gmail.com>
 <CAK4CjGO0ozOFJ95G4AKyoDMiAXEdCv58FGugoSOb22WreZraUw@mail.gmail.com>
 <CANfpUcsbq=YRQhwHUkiTN4Xydq3vC3YZ_3i1OD7LjkgF-s68Ew@mail.gmail.com>
 <CAK4CjGMFsCrVW2TFeUKfOteo6FiAq__A=DZ918k0vzp_RV5ZqQ@mail.gmail.com>
 <CANfpUcsHJ+2MpHOCrDKYdNge8N0pdhsiF3re5yUHWyc0GE8ALg@mail.gmail.com>
 <6B00168F-5524-460A-B799-1FF5D48B24DD@joom.it> <CANfpUcs==JOkMKWDGy5jR5mndL1O5F_1ibNk5e+DYGxDtVRzsw@mail.gmail.com>
 <CAK4CjGMAVHrwXqdXrqaweeEaT7fCzoZySR1NHmakjj0YcatRoA@mail.gmail.com>
From: ShaoFeng Shi <shaofengshi@apache.org>
Date: Fri, 11 Aug 2017 20:13:15 +0800
Message-ID: <CANfpUcsr8Cnd+kWi742ZHsNPKTTXWOxEQmgSLFKVg9KNH2NQ0w@mail.gmail.com>
Subject: Re: HFile is empty if kylin.hbase.cluster.fs is set to s3
To: user <user@kylin.apache.org>
Content-Type: multipart/alternative; boundary="94eb2c11a66e4b8e7e0556793f98"
archived-at: Fri, 11 Aug 2017 12:14:01 -0000

--94eb2c11a66e4b8e7e0556793f98
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

EMR enables the direct output in mapred-site.xml, while in this step it
seems these settings doesn't work (althoug the job's configuration shows
they are there). I disabled the direct output but the behavior has no
change. I did some search but no finding. I need drop the EMR now, and may
get back it later.

If you have any idea or findings, please share it. We'd like to make Kylin
has better support for cloud.

Thanks for your feedback!

2017-08-11 19:19 GMT+08:00 Alexander Sterligov <sterligovak@joom.it>:

> Any ideas how to fix that?
>
> On Fri, Aug 11, 2017 at 2:16 PM, ShaoFeng Shi <shaofengshi@apache.org>
> wrote:
>
>> I got the same problem as you:
>>
>> 2017-08-11 08:44:16,342 WARN  [Job 2c86b4b6-7639-4a97-ba63-63c9dca095f6-=
2255]
>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not find
>> any files to load in directory s3://privatekeybucket-anac5h41
>> 523l/kylin/kylin_default_instance/kylin-2c86b4b6-7639-
>> 4a97-ba63-63c9dca095f6/kylin_sales_cube_clone3/hfile.  Does it contain
>> files in subdirectories that correspond to column family names?
>>
>> In S3 view, I see the files exist in "_temporary" folder, seems were not
>> moved to the target folder on complete. It seems EMR try to direct write=
 to
>> otuput path, but actually not.
>>
>> 2017-08-11 16:34 GMT+08:00 Alexander Sterligov <sterligovak@joom.it>:
>>
>>> No, defaultFs is hdfs.
>>>
>>> I=E2=80=99ve seen such behavior when set working dir to s3, but didn=E2=
=80=99t set
>>> cluster-fs at all. Maybe you have a typo in the name of the property. I
>>> used the old one =C2=ABkylin.hbase.cluster.fs=C2=BB
>>>
>>> When both working-dir and cluster-fs were set to s3 I got _temporary di=
r
>>> of convert job at s3, but no hfiles. Also I saw correct output path for=
 the
>>> job in the log. But I didn=E2=80=99t check if job creates temporary fil=
es in s3,
>>> but then copies results to hdfs. I hardly believe it happens.
>>>
>>> Do you see proper arguments for the step in the log?
>>>
>>>
>>> 11 =D0=B0=D0=B2=D0=B3. 2017 =D0=B3., =D0=B2 11:17, ShaoFeng Shi <shaofe=
ngshi@apache.org>
>>> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0):
>>>
>>> Hi Alexander,
>>>
>>> That makes sense. Using S3 for Cube build and storage is required for a
>>> cloud hadoop environment.
>>>
>>> I tried to reproduce this problem. I created a EMR with S3 as HBase
>>> storage, in kylin.properties, I set "kylin.env.hdfs-working-dir"
>>> and "kylin.storage.hbase.cluster-fs" to the S3 bucket. But in the "Conv=
ert
>>> Cuboid Data to HFile" step, Kylin still writes to local HDFS; Did you
>>> modify the core-site.xml to make S3 as the default FS?
>>>
>>>
>>>
>>>
>>> 2017-08-10 22:53 GMT+08:00 Alexander Sterligov <sterligovak@joom.it>:
>>>
>>>> Yes, I workarounded this problem in such way and it works.
>>>>
>>>> One problem of such solution is that I have to use pretty large hdfs
>>>> and it'expensive. And also I have to manually garbage collect it, beca=
use
>>>> it is not moved to s3, but copied. Kylin cleanup job doesn't work for =
it,
>>>> because main metadata folder is at s3. So it would be really nice to p=
ut
>>>> everything to s3.
>>>>
>>>> Another problem is that I had to rise hbase rpc timeout, because bulk
>>>> loading from hdfs takes long. That was not trivial. 3 minutes work goo=
d,
>>>> but with drawback of queries or metadata writes handing for 3 minutes =
if
>>>> something bad happen. But that's rare event.
>>>>
>>>> 10 =D0=B0=D0=B2=D0=B3. 2017 =D0=B3. 17:42 =D0=BF=D0=BE=D0=BB=D1=8C=D0=
=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C "ShaoFeng Shi" <
>>>> shaofengshi@apache.org> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:
>>>>
>>>> How about leaving empty for "kylin.hbase.cluster.fs"? This property is
>>>>> for two-cluster deployment (one Hadoop for cube build, the other for
>>>>> query);
>>>>>
>>>>> When be empty, the HFile will be written to default fs (HDFS in EMR),
>>>>> and then load to HBase. I'm not sure whether EMR HBase (using S3 as
>>>>> storage) can bulk load files from HDFS or not. If it can, that would =
be
>>>>> great as the write performance of HDFS would be better than S3.
>>>>>
>>>>> 2017-08-10 22:29 GMT+08:00 Alexander Sterligov <sterligovak@joom.it>:
>>>>>
>>>>>> I also thought about it, but no, it's not consistency.
>>>>>>
>>>>>> Consistency view is enabled. I use same s3 for my own map-reduce job=
s
>>>>>> and it's ok.
>>>>>>
>>>>>> I also checked if it lost consistency (emrfs diff). No problems.
>>>>>>
>>>>>> In case of inconsistency of s3 files disappear right after they were
>>>>>> written and appear some time after. Hfiles didn't appear after a day=
, but
>>>>>> _template is there.
>>>>>>
>>>>>> It's 100% reproducable, I think I'll investigate this problem by
>>>>>> running conversion job manually.
>>>>>>
>>>>>> 10 =D0=B0=D0=B2=D0=B3. 2017 =D0=B3. 17:18 =D0=BF=D0=BE=D0=BB=D1=8C=
=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C "ShaoFeng Shi" <
>>>>>> shaofengshi@apache.org> =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=B0=D0=BB:
>>>>>>
>>>>>> Did you enable the Consistent View? This article explains the
>>>>>>> challenge when using S3 directly for ETL process:
>>>>>>> https://aws.amazon.com/cn/blogs/big-data/ensuring-consistenc
>>>>>>> y-when-using-amazon-s3-and-amazon-elastic-mapreduce-for-etl-
>>>>>>> workflows/
>>>>>>>
>>>>>>>
>>>>>>> 2017-08-09 18:19 GMT+08:00 Alexander Sterligov <sterligovak@joom.it=
>
>>>>>>> :
>>>>>>>
>>>>>>>> Yes, it's empty. Also I see this message in the log:
>>>>>>>>
>>>>>>>> 2017-08-09 09:02:35,947 WARN  [Job 1e436685-7102-4621-a4cb-6472b86=
6126d-7608]
>>>>>>>> mapreduce.LoadIncrementalHFiles:234 : Skipping non-directory
>>>>>>>> s3://joom.emr.fs/home/production/bi/kylin/kylin_metadata/kyl
>>>>>>>> in-1e436685-7102-4621-a4cb-6472b866126d
>>>>>>>> /main_event_1_main/hfile/_SUCCESS
>>>>>>>> 2017-08-09 09:02:36,009 WARN  [Job 1e436685-7102-4621-a4cb-6472b86=
6126d-7608]
>>>>>>>> mapreduce.LoadIncrementalHFiles:252 : Skipping non-file
>>>>>>>> FileStatusExt{path=3Ds3://joom.emr.fs/home/production/bi/kylin
>>>>>>>> /kylin_metadata/kylin-1e436685-7102-4621-a4cb-6472b866126d/m
>>>>>>>> ain_event_1_main/hfile/_temporary/1; isDirectory=3Dtrue;
>>>>>>>> modification_time=3D0; access_time=3D0; owner=3D; group=3D; permis=
sion=3Drwxrwxrwx;
>>>>>>>> isSymlink=3Dfalse}
>>>>>>>> 2017-08-09 09:02:36,014 WARN  [Job 1e436685-7102-4621-a4cb-6472b86=
6126d-7608]
>>>>>>>> mapreduce.LoadIncrementalHFiles:422 : Bulk load operation did not
>>>>>>>> find any files to load in directory s3://joom.emr.fs/home/producti
>>>>>>>> on/bi/kylin/kylin_metadata/kylin-1e436685-7102-4621-a4cb-647
>>>>>>>> 2b866126d/main_event_1_main/hfile.  Does it contain files in
>>>>>>>> subdirectories that correspond to column family names?
>>>>>>>>
>>>>>>>> On Wed, Aug 9, 2017 at 1:15 PM, ShaoFeng Shi <
>>>>>>>> shaofengshi@apache.org> wrote:
>>>>>>>>
>>>>>>>>> The HFile will be moved to HBase data folder when bulk load
>>>>>>>>> finished; Did you check whether the HTable has data?
>>>>>>>>>
>>>>>>>>> 2017-08-09 17:54 GMT+08:00 Alexander Sterligov <
>>>>>>>>> sterligovak@joom.it>:
>>>>>>>>>
>>>>>>>>>> Hi!
>>>>>>>>>>
>>>>>>>>>> I set kylin.hbase.cluster.fs to s3 bucket where hbase lives.
>>>>>>>>>>
>>>>>>>>>> Step "Convert Cuboid Data to HFile" finished without errors.
>>>>>>>>>> Statistics at the end of the job said that it has written lot's =
of data to
>>>>>>>>>> s3.
>>>>>>>>>>
>>>>>>>>>> But there is no hfiles in kylin_metadata folder (kylin_metadata
>>>>>>>>>> /kylin-1e436685-7102-4621-a4cb-6472b866126d/<table name>/hfile),
>>>>>>>>>> but only _temporary folder and _SUCCESS file.
>>>>>>>>>>
>>>>>>>>>> _temporary contains hfiles inside attempt folders. it looks like
>>>>>>>>>> there were not copied from _temporary to result dir. But there i=
s no errors
>>>>>>>>>> neither in kylin log, nor in reducers' logs.
>>>>>>>>>>
>>>>>>>>>> Then loading empty hfiles produces empty segments.
>>>>>>>>>>
>>>>>>>>>> Is that a bug or I'm doing something wrong?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B
>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards,
>>>>>
>>>>> Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B
>>>>>
>>>>>
>>>
>>>
>>> --
>>> Best regards,
>>>
>>> Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B
>>>
>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B
>>
>>
>


--=20
Best regards,

Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B

--94eb2c11a66e4b8e7e0556793f98
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">EMR enables the direct output in mapred-site.xml, while in=
 this step it seems these settings doesn&#39;t work (althoug the job&#39;s =
configuration shows they are there). I disabled the direct output but the b=
ehavior has no change. I did some search but no finding. I need drop the EM=
R now, and may get back it later.=C2=A0<div><br></div><div>If you have any =
idea or findings, please share it. We&#39;d like to make Kylin has better s=
upport for cloud.=C2=A0</div><div><br></div><div>Thanks for your feedback!<=
/div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-0=
8-11 19:19 GMT+08:00 Alexander Sterligov <span dir=3D"ltr">&lt;<a href=3D"m=
ailto:sterligovak@joom.it" target=3D"_blank">sterligovak@joom.it</a>&gt;</s=
pan>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Any ideas how to f=
ix that?</div><div class=3D"HOEnZb"><div class=3D"h5"><div class=3D"gmail_e=
xtra"><br><div class=3D"gmail_quote">On Fri, Aug 11, 2017 at 2:16 PM, ShaoF=
eng Shi <span dir=3D"ltr">&lt;<a href=3D"mailto:shaofengshi@apache.org" tar=
get=3D"_blank">shaofengshi@apache.org</a>&gt;</span> wrote:<br><blockquote =
class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid=
;padding-left:1ex"><div dir=3D"ltr">I got the same problem as you:<div><br>=
</div><div>2017-08-11 08:44:16,342 WARN =C2=A0[Job 2c86b4b6-7639-4a97-ba63-=
63c9dc<wbr>a095f6-2255] mapreduce.LoadIncrementalHFile<wbr>s:422 : Bulk loa=
d operation did not find any files to load in directory s3://privatekeybuck=
et-anac5h41<wbr>523l/kylin/kylin_default_<wbr>instance/kylin-2c86b4b6-7639-=
<wbr>4a97-ba63-63c9dca095f6/kylin_<wbr>sales_cube_clone3/hfile.=C2=A0 Does =
it contain files in subdirectories that correspond to column family names?<=
br></div><div><br></div><div>In S3 view, I see the files exist in &quot;_te=
mporary&quot; folder, seems were not moved to the target folder on complete=
. It seems EMR try to direct write to otuput path, but actually not.</div><=
/div><div class=3D"m_774579245132577913HOEnZb"><div class=3D"m_774579245132=
577913h5"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-08=
-11 16:34 GMT+08:00 Alexander Sterligov <span dir=3D"ltr">&lt;<a href=3D"ma=
ilto:sterligovak@joom.it" target=3D"_blank">sterligovak@joom.it</a>&gt;</sp=
an>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex"><div style=3D"word-wrap:break-word">=
No, defaultFs is hdfs.<div><br></div><div>I=E2=80=99ve seen such behavior w=
hen set working dir to s3, but didn=E2=80=99t set cluster-fs at all. Maybe =
you have a typo in the name of the property. I used the old one =C2=ABkylin=
.hbase.cluster.fs=C2=BB=C2=A0</div><div><br></div><div>When both working-di=
r and cluster-fs were set to s3 I got _temporary dir of convert job at s3, =
but no hfiles. Also I saw correct output path for the job in the log. But I=
 didn=E2=80=99t check if job creates temporary files in s3, but then copies=
 results to hdfs. I hardly believe it happens.</div><div><br></div><div>Do =
you see proper arguments for the step in the log?</div><div><br></div><div>=
<div><br><div><blockquote type=3D"cite"><div>11 =D0=B0=D0=B2=D0=B3. 2017 =
=D0=B3., =D0=B2 11:17, ShaoFeng Shi &lt;<a href=3D"mailto:shaofengshi@apach=
e.org" target=3D"_blank">shaofengshi@apache.org</a>&gt; =D0=BD=D0=B0=D0=BF=
=D0=B8=D1=81=D0=B0=D0=BB(=D0=B0):</div><div><div class=3D"m_774579245132577=
913m_-7559636662798717270h5"><br class=3D"m_774579245132577913m_-7559636662=
798717270m_4073756543436353767Apple-interchange-newline"><div><div dir=3D"l=
tr">Hi Alexander,<div><br></div><div>That makes sense. Using S3 for Cube bu=
ild and storage is required for a cloud hadoop environment.</div><div><br><=
/div><div>I tried to reproduce this problem. I created a EMR with S3 as HBa=
se storage, in kylin.properties, I set=C2=A0&quot;kylin.env.hdfs-working-di=
<wbr>r&quot; and=C2=A0&quot;kylin.storage.hbase.clust<wbr>er-fs&quot; to th=
e S3 bucket. But in the &quot;<span style=3D"color:rgb(68,68,68);font-famil=
y:&quot;Helvetica Neue&quot;,Helvetica,Arial,sans-serif;font-size:14px">Con=
vert Cuboid Data to HFile&quot; step, Kylin still writes to local HDFS; Did=
 you modify the core-site.xml to make S3 as the default FS?</span></div><di=
v><br></div><div><span style=3D"color:rgb(68,68,68);font-family:&quot;Helve=
tica Neue&quot;,Helvetica,Arial,sans-serif;font-size:14px"><br></span></div=
><div><span style=3D"color:rgb(68,68,68);font-family:&quot;Helvetica Neue&q=
uot;,Helvetica,Arial,sans-serif;font-size:14px"><br></span></div></div><div=
 class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-08-10 22:53 GMT+=
08:00 Alexander Sterligov <span dir=3D"ltr">&lt;<a href=3D"mailto:sterligov=
ak@joom.it" target=3D"_blank">sterligovak@joom.it</a>&gt;</span>:<br><block=
quote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc=
 solid;padding-left:1ex"><div dir=3D"auto">Yes, I workarounded this problem=
 in such way and it works.<div dir=3D"auto"><br></div><div dir=3D"auto">One=
 problem of such solution is that I have to use pretty large hdfs and it=
9;expensive. And also I have to manually garbage collect it, because it is =
not moved to s3, but copied. Kylin cleanup job doesn&#39;t work for it, bec=
ause main metadata folder is at s3. So it would be really nice to put every=
thing to s3.=C2=A0<div dir=3D"auto"><br></div><div dir=3D"auto">Another pro=
blem is that I had to rise hbase rpc timeout, because bulk loading from hdf=
s takes long. That was not trivial. 3 minutes work good, but with drawback =
of queries or metadata writes handing for 3 minutes if something bad happen=
. But that&#39;s rare event.=C2=A0</div></div></div><div class=3D"gmail_ext=
ra"><br><div class=3D"gmail_quote">10 =D0=B0=D0=B2=D0=B3. 2017 =D0=B3. 17:4=
2 =D0=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C =
&quot;ShaoFeng Shi&quot; &lt;<a href=3D"mailto:shaofengshi@apache.org" targ=
et=3D"_blank">shaofengshi@apache.org</a>&gt; =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=
=D0=B0=D0=BB:<div><div class=3D"m_774579245132577913m_-7559636662798717270m=
_4073756543436353767h5"><br type=3D"attribution"><blockquote class=3D"gmail=
_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:=
1ex"><div dir=3D"ltr">How about leaving empty for &quot;<span style=3D"font=
-size:14px">kylin.hbase.cluster.fs&quot;? This property is for two-cluster =
deployment (one Hadoop for cube build, the other for query);=C2=A0</span><d=
iv><span style=3D"font-size:14px"><br></span></div><div><span style=3D"font=
-size:14px">When be empty, the HFile will be written to default fs (HDFS in=
 EMR), and then load to HBase. I&#39;m not sure whether EMR HBase (using S3=
 as storage) can bulk load files from HDFS or not. If it can, that would be=
 great as the write performance of HDFS would be better than S3.</span></di=
v></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-08-1=
0 22:29 GMT+08:00 Alexander Sterligov <span dir=3D"ltr">&lt;<a href=3D"mail=
to:sterligovak@joom.it" target=3D"_blank">sterligovak@joom.it</a>&gt;</span=
>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l=
eft:1px #ccc solid;padding-left:1ex"><div dir=3D"auto"><div dir=3D"auto">I =
also thought about it, but no, it&#39;s not consistency.=C2=A0</div><div di=
r=3D"auto"><br></div><div dir=3D"auto">Consistency view is enabled. I use s=
ame s3 for my own map-reduce jobs and it&#39;s ok.</div><div dir=3D"auto"><=
br></div><div dir=3D"auto">I also checked if it lost consistency (emrfs dif=
f). No problems.=C2=A0</div><div dir=3D"auto"><br></div><div dir=3D"auto">I=
n case of inconsistency of s3 files disappear right after they were written=
 and appear some time after. Hfiles didn&#39;t appear after a day, but _tem=
plate is there.=C2=A0</div><div dir=3D"auto"><br></div><div dir=3D"auto">It=
&#39;s 100% reproducable, I think I&#39;ll investigate this problem by runn=
ing conversion job manually.=C2=A0</div></div><div class=3D"gmail_extra"><b=
r><div class=3D"gmail_quote">10 =D0=B0=D0=B2=D0=B3. 2017 =D0=B3. 17:18 =D0=
=BF=D0=BE=D0=BB=D1=8C=D0=B7=D0=BE=D0=B2=D0=B0=D1=82=D0=B5=D0=BB=D1=8C &quot=
;ShaoFeng Shi&quot; &lt;<a href=3D"mailto:shaofengshi@apache.org" target=3D=
"_blank">shaofengshi@apache.org</a>&gt; =D0=BD=D0=B0=D0=BF=D0=B8=D1=81=D0=
=B0=D0=BB:<div><div class=3D"m_774579245132577913m_-7559636662798717270m_40=
73756543436353767m_3467265810605667983m_4735415571240895012h5"><br type=3D"=
attribution"><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b=
order-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Did you enable=
 the Consistent View? This article explains the challenge when using S3 dir=
ectly for ETL process:<div><a href=3D"https://aws.amazon.com/cn/blogs/big-d=
ata/ensuring-consistency-when-using-amazon-s3-and-amazon-elastic-mapreduce-=
for-etl-workflows/" target=3D"_blank">https://aws.amazon.com/cn/blog<wbr>s/=
big-data/ensuring-consistenc<wbr>y-when-using-amazon-s3-and-ama<wbr>zon-ela=
stic-mapreduce-for-etl-<wbr>workflows/</a><br></div><div><br></div></div><d=
iv class=3D"gmail_extra"><br><div class=3D"gmail_quote">2017-08-09 18:19 GM=
T+08:00 Alexander Sterligov <span dir=3D"ltr">&lt;<a href=3D"mailto:sterlig=
ovak@joom.it" target=3D"_blank">sterligovak@joom.it</a>&gt;</span>:<br><blo=
ckquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #c=
cc solid;padding-left:1ex"><div dir=3D"ltr">Yes, it&#39;s empty. Also I see=
 this message in the log:<div><br></div><div><div>2017-08-09 09:02:35,947 W=
ARN =C2=A0[Job 1e436685-7102-4621-a4cb-6472b8<wbr>66126d-7608] mapreduce.Lo=
adIncrementalHFile<wbr>s:234 : Skipping non-directory <a>s3://joom.emr.fs/h=
ome/producti</a><wbr>on/bi/kylin/kylin_metadata/kyl<wbr>in-1e436685-7102-46=
21-a4cb-647<wbr>2b866126d</div><div>/main_event_1_main/hfile/_SUCC<wbr>ESS<=
/div><div>2017-08-09 09:02:36,009 WARN =C2=A0[Job 1e436685-7102-4621-a4cb-6=
472b8<wbr>66126d-7608] mapreduce.LoadIncrementalHFile<wbr>s:252 : Skipping =
non-file FileStatusExt{path=3D<a>s3://joom.e</a><wbr>mr.fs/home/production/=
bi/kylin<wbr>/kylin_metadata/kylin-1e436685<wbr>-7102-4621-a4cb-6472b866126=
d/m<wbr>ain_event_1_main/hfile/_tempor<wbr>ary/1; isDirectory=3Dtrue; modif=
ication_time=3D0; access_time=3D0; owner=3D; group=3D; permission=3Drwxrwxr=
wx; isSymlink=3Dfalse}</div><div>2017-08-09 09:02:36,014 WARN =C2=A0[Job 1e=
436685-7102-4621-a4cb-6472b8<wbr>66126d-7608] mapreduce.LoadIncrementalHFil=
e<wbr>s:422 : Bulk load operation did not find any files to load in directo=
ry <a>s3://joom.emr.fs/home/producti</a><wbr>on/bi/kylin/kylin_metadata/kyl=
<wbr>in-1e436685-7102-4621-a4cb-647<wbr>2b866126d/main_event_1_main/hf<wbr>=
ile.=C2=A0 Does it contain files in subdirectories that correspond to colum=
n family names?</div></div></div><div class=3D"m_774579245132577913m_-75596=
36662798717270m_4073756543436353767m_3467265810605667983m_47354155712408950=
12m_4429368655037884309m_1374526930035634655HOEnZb"><div class=3D"m_7745792=
45132577913m_-7559636662798717270m_4073756543436353767m_3467265810605667983=
m_4735415571240895012m_4429368655037884309m_1374526930035634655h5"><div cla=
ss=3D"gmail_extra"><br><div class=3D"gmail_quote">On Wed, Aug 9, 2017 at 1:=
15 PM, ShaoFeng Shi <span dir=3D"ltr">&lt;<a href=3D"mailto:shaofengshi@apa=
che.org" target=3D"_blank">shaofengshi@apache.org</a>&gt;</span> wrote:<br>=
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">The HFile will be moved to =
HBase data folder when bulk load finished; Did you check whether the HTable=
 has data?</div><div class=3D"gmail_extra"><div><div class=3D"m_77457924513=
2577913m_-7559636662798717270m_4073756543436353767m_3467265810605667983m_47=
35415571240895012m_4429368655037884309m_1374526930035634655m_49513953339873=
26912h5"><br><div class=3D"gmail_quote">2017-08-09 17:54 GMT+08:00 Alexande=
r Sterligov <span dir=3D"ltr">&lt;<a href=3D"mailto:sterligovak@joom.it" ta=
rget=3D"_blank">sterligovak@joom.it</a>&gt;</span>:<br><blockquote class=3D=
"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding=
-left:1ex"><div dir=3D"ltr">Hi!<div><br></div><div>I set=C2=A0kylin.hbase.c=
luster.fs to s3 bucket where hbase lives.</div><div><br></div><div>Step &qu=
ot;<span style=3D"color:rgb(68,68,68);font-family:&quot;Helvetica Neue&quot=
;,Helvetica,Arial,sans-serif;font-size:14px">Convert Cuboid Data to HFile&q=
uot; finished without errors. Statistics at the end of the job said that it=
 has written lot&#39;s of data to s3.</span></div><div><br></div><div>But t=
here is no hfiles in kylin_metadata folder (kylin_metadata /kylin-1e436685-=
7102-4621-a4cb<wbr>-6472b866126d/&lt;table name&gt;/hfile), but only _tempo=
rary folder and _SUCCESS file.</div><div><br></div><div>_temporary contains=
 hfiles inside attempt folders. it looks like there were not copied from _t=
emporary to result dir. But there is no errors neither in kylin log, nor in=
 reducers&#39; logs.</div><div><br></div><div>Then loading empty hfiles pro=
duces empty segments.</div><div><br></div><div>Is that a bug or I&#39;m doi=
ng something wrong?</div><div><br></div><div><br></div><div><br></div><div>=
<br></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div></div></div><span c=
lass=3D"m_774579245132577913m_-7559636662798717270m_4073756543436353767m_34=
67265810605667983m_4735415571240895012m_4429368655037884309m_13745269300356=
34655m_4951395333987326912HOEnZb"><font color=3D"#888888">-- <br><div class=
=3D"m_774579245132577913m_-7559636662798717270m_4073756543436353767m_346726=
5810605667983m_4735415571240895012m_4429368655037884309m_137452693003563465=
5m_4951395333987326912m_7759306693933577578gmail_signature" data-smartmail=
=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=3D"ltr">Best regards,<d=
iv><br></div><div>Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B</div><div><br></=
div></div></div></div></div>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"m_774579245132577913m_-7559636662798717270m_40737565434363537=
67m_3467265810605667983m_4735415571240895012m_4429368655037884309m_13745269=
30035634655gmail_signature" data-smartmail=3D"gmail_signature"><div dir=3D"=
ltr"><div><div dir=3D"ltr">Best regards,<div><br></div><div>Shaofeng Shi =
=E5=8F=B2=E5=B0=91=E9=94=8B</div><div><br></div></div></div></div></div>
</div>
</blockquote></div></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div class=
=3D"m_774579245132577913m_-7559636662798717270m_4073756543436353767m_346726=
5810605667983m_4735415571240895012gmail_signature" data-smartmail=3D"gmail_=
signature"><div dir=3D"ltr"><div><div dir=3D"ltr">Best regards,<div><br></d=
iv><div>Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B</div><div><br></div></div>=
</div></div></div>
</div>
</blockquote></div></div></div></div>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div class=
=3D"m_774579245132577913m_-7559636662798717270m_4073756543436353767gmail_si=
gnature" data-smartmail=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=
=3D"ltr">Best regards,<div><br></div><div>Shaofeng Shi =E5=8F=B2=E5=B0=91=
=E9=94=8B</div><div><br></div></div></div></div></div>
</div>
</div></div></div></blockquote></div><br></div></div></div></blockquote></d=
iv><br><br clear=3D"all"><div><br></div>-- <br><div class=3D"m_774579245132=
577913m_-7559636662798717270gmail_signature" data-smartmail=3D"gmail_signat=
ure"><div dir=3D"ltr"><div><div dir=3D"ltr">Best regards,<div><br></div><di=
v>Shaofeng Shi =E5=8F=B2=E5=B0=91=E9=94=8B</div><div><br></div></div></div>=
</div></div>
</div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"gmail_signature" data-smartmail=3D"gmail_signature"><div dir=
=3D"ltr"><div><div dir=3D"ltr">Best regards,<div><br></div><div>Shaofeng Sh=
i =E5=8F=B2=E5=B0=91=E9=94=8B</div><div><br></div></div></div></div></div>
</div>

--94eb2c11a66e4b8e7e0556793f98--