flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Baris Akgun (Garanti Teknoloji)" <BarisA...@garanti.com.tr>
Subject RE: flume problem
Date Wed, 02 Mar 2016 12:25:10 GMT
I have changed sink serializer to text  but the problem still continues. I added my flume conf
file.,
thanks

# tier1 - HTTP POST source for GNIP - Twitter data.
tier1.sources = source1
tier1.sinks = sink1
tier1.channels = channel1

tier1.sources.source1.type = org.apache.flume.source.http.HTTPSource
tier1.sources.source1.handler = org.apache.flume.sink.solr.morphline.BlobHandler
tier1.sources.source1.port = 35853

tier1.channels.channel1.type = file
tier1.channels.channel1.capacity = 1000000
tier1.channels.channel1.checkpointDir = /tmp/flume_chnl/flm_checkpoint
tier1.channels.channel1.dataDirs = /tmp/flume_chnl/flm_data
tier1.channels.channel1.maxFileSize = 314572800

tier1.sinks.sink1.type=hdfs
tier1.sinks.sink1.hdfs.useLocalTimeStamp=true
tier1.sinks.sink1.hdfs.path = /tmp/Flume
tier1.sinks.sink1.hdfs.filePrefix = events_%Y%m%d
tier1.sinks.sink1.hdfs.fileSuffix = .json
tier1.sinks.sink1.hdfs.inUsePrefix = .
tier1.sinks.sink1.hdfs.inUseSuffix = .incomplete
tier1.sinks.sink1.hdfs.round = true
tier1.sinks.sink1.hdfs.roundValue = 1
tier1.sinks.sink1.hdfs.roundUnit = minute
tier1.sinks.sink1.hdfs.rollCount = 0
tier1.sinks.sink1.hdfs.rollInterval=60
tier1.sinks.sink1.hdfs.rollSize = 268435456
tier1.sinks.sink1.hdfs.batchSize = 10000
tier1.sinks.sink1.hdfs.writeFormat = Text
tier1.sinks.sink1.serializer = text

tier1.sources.source1.channels = channel1
tier1.sinks.sink1.channel = channel1

From: Gonzalo Herreros [mailto:gherreros@gmail.com]
Sent: Wednesday, March 2, 2016 2:03 PM
To: user
Subject: Re: flume problem

That looks like json but it's avro.
You need a read it using the avro library or change your sink to serialize text.

On 2 March 2016 at 11:28, Baris Akgun (Garanti Teknoloji) <BarisAkgu@garanti.com.tr<mailto:BarisAkgu@garanti.com.tr>>
wrote:
I added sink example . Why does flume add yellow part. I thought that yellow part means content
type.

Flume has to be sink a new line for each separate post. Am ı right? In our example flume
continue to sink new post after last post. It is not sink as a new line for new posts.


thanks


SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable��mq��I$�����{��US-�\�I{"id":"tag:search.twitter.com<http://search.twitter.com>,2005:642910625514016769","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:770439973<http://twitter.com:770439973>","link":"http://www.twitter.com/CanKurnaz5","displayName":"Can
Kurnaz","postedTime":"2012-08-20T23:40:34.000Z","image":"https://pbs.twimg.com/profile_images/642910076290879492/T4-UBuZE_normal.jpg","summary":null,"links":[{"href":null,"rel":"me"}],"friendsCount":62,"followersCount":40,"listedCount":1,"statusesCount":378,"twitterTimeZone":null,"verified":false,"utcOffset":null,"preferredUsername":"CanKurnaz5","languages":["tr"],"favoritesCount":507},"verb":"post","postedTime":"2015-09-13T04:00:12.000Z","generator":{"displayName":"Twitter
for iPhone","link":"http://twitter.com/download/iphone"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/CanKurnaz5/statuses/642910625514016769","body":"@mbilgehandemir
32 milyon dolar garanti para güzel bende yerim o paraya dayak","object":{"objectType":"note","id":"object:search.twitter.com<http://search.twitter.com>,2005:642910625514016769","summary":"@mbilgehandemir
32 milyon dolar garanti para güzel bende yerim o paraya dayak","link":"http://twitter.com/CanKurnaz5/statuses/642910625514016769","postedTime":"2015-09-13T04:00:12.000Z"},"inReplyTo":{"link":"http://twitter.com/mbilgehandemir/statuses/642908818138120192"},"favoritesCount":0,"twitter_entities":{"hashtags":[],"trends":[],"urls":[],"user_mentions":[{"screen_name":"mbilgehandemir","name":"Bilgehan
Demir","id":180756706,"id_str":"180756706","indices":[0,15]}],"symbols":[]},"twitter_filter_level":"low","twitter_lang":"tr","retweetCount":0,"gnip":{"matching_rules":[{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) (\"garantı\" OR \"garanti\" OR \"GARANTI\" OR \"GARANTİ\")","tag":null},{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) garanti -(#29ekimekadartakipteyiz OR #UnfsuzSeriTakipleselim
OR #TatilAskinaTakipleselim OR #takipedenitakipederim OR #DurmaTakipleselim OR #SenSakrakTakipleselimYine
OR #TakipedeneAnindaGeriTakip OR #TakipEdenTakipEdilir OR #Garanti_Takipçiyim OR #GARANTI_TAKIP�?IYIM
OR #BicirBicirTakipleselim OR #geritakip OR #KosKosTakipvar OR #AnindaGeriTakip OR #KesinTakipVar
OR #Hepimiz_Takipleselim OR #Hepimiz_Takipleselim OR #TakipleselimMutluOlalim OR #garantitakip
OR #TakipcininiyisiRTyapar OR #Hepimiz_Takipleselim. OR #takibetakip)","tag":null},{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) garanti -(\"Garanti süresi\" OR \"Gülmek garanti\" OR \"Puan
Garanti\" OR \"Final Garanti\" OR \"Garanti kupon\" OR \"Unfollow Garanti\" OR \"Garanti ediyorum\"
OR \"Takip Garanti\")","tag":null},{"value":"(bio_lang:tr OR twitter_lang:tr OR lang:tr) garanti
-(\"Gülmek garanti\" OR \"Garanti edebilirim\" OR \"Kilo garanti\" OR \"Cennet garanti\"
OR \"Kupa Garanti\" OR \"Garanti veriyorum\" OR \"Garanti belgesi\")","tag":null},{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) garanti -(#CenkAkyolYalnizDegildir OR #TakipedeneAnindaGeriTakip
OR #autofollowback OR #ifollowback OR #ff OR #takipedenitakipederim OR #TeamFollowBack OR
#followme OR #takibetakip OR #takipedentakipedilir OR #garantiemakelaars OR #DostlarlaTakipleselim
OR #takipedentakipedilir OR #KizliErkekliTakipleselim OR #TakipleselimMutluOlalim OR #BizSuperizYineTakiplesiyoruz
OR #DurmaTakipleselim OR #takibetakip OR #FigürsüzlerIleGeceTakibi OR #MuratCuvallSayesindeTakiplesiyoruz)","tag":null},{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) garanti -(\"Garanti Takip\" OR \"Tur garanti\" OR \"Kupa garanti\"
OR \"sampiyonluk garanti\" OR \"Garanti kapsami\" OR \"madalya garanti\" OR \"takipçi garanti\"
OR \"lig garanti\" OR \"ligi garanti\" OR \"kopmak garanti\" OR \"Garanti_Takipciyim\" OR
\"TAKİP EDENİ TAKİP EDERİM\" OR \"takip edeni takip ederim\")","tag":null},{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) garanti -(#takipcikazan OR #TakipleselimMutluOlalim OR #TakipVarDediler
OR #EtkilesimIleSeriTakip OR #Cumatakipteyiz OR #TakipedeneAnindaGeriTakip OR #Takiplerdeyiz
OR #AnindaGeriTakip OR #garanti_takipçiyim OR #takipçikazan)","tag":null},{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) (contains:garantı OR contains:garanti OR contains:paracard)","tag":null},{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) garanti -(#EtkilesimciTayfaTakiplesiyor OR #BugünPazarKarsilikliTakipleselim
OR #TümHayranGruplariHaftaSonuTakibi OR #Takiplerdeyiz OR #MeteHorozogluHayranlariTakiplesiyor
OR #garantilitakip OR #Garanti_Takipciyim OR #DurmaTakipleselim OR #Hepimiz_Takipleselim OR
#100de100GeriTakip OR #Problemsiz_FullTakipteyiz OR #EtkilesimSeverlerIleSeriTakip)","tag":null}],"language":{"value":"tr"}}}������mq��I$�����{��
?S-�\�
3{"id":"tag:search.twitter.com<http://search.twitter.com>,2005:642910743302668288","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:347936349<http://twitter.com:347936349>","link":"http://www.twitter.com/semagokcee","displayName":"
???? ???","postedTime":"2011-08-03T16:18:41.000Z","image":"https://pbs.twimg.com/profile_images/571609282766098432/ed6KzjNX_normal.jpeg","summary":null,"links":[{"href":null,"rel":"me"}],"friendsCount":654,"followersCount":510,"listedCount":10,"statusesCount":21790,"twitterTimeZone":"Baghdad","verified":false,"utcOffset":"10800","preferredUsername":"semagokcee","languages":["tr"],"location":{"objectType":"place","displayName":"AGD
Eyüpsultan"},"favoritesCount":4531},"verb":"share","postedTime":"2015-09-13T04:00:40.000Z","generator":{"displayName":"Twitter
for Android","link":"http://twitter.com/download/android"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/semagokcee/statuses/642910743302668288","body":"RT
@Hadis_Tweet: \"Kim sabah namazını kılarsa, Allah'ın garantisi altındadır.\" (Kütüb-i
Sitte, c.17, s.541)\n#Hadis","object":{"id":"tag:search.twitter.com<http://search.twitter.com>,2005:642906134421065728","objectType":"activity","actor":{"objectType":"person","id":"id:twitter.com:266950198<http://twitter.com:266950198>","link":"http://www.twitter.com/Hadis_Tweet","displayName":"Hadis-i
�?erif","postedTime":"2011-03-16T02:39:31.000Z","image":"https://pbs.twimg.com/profile_images/378800000408493403/279dd45b0a07d3965afefa59b1245f22_normal.png","summary":"Burada
Hadis-i �?erif Payla�?ılır.Gayemiz Hz.Muhammed 'in(a.s.m) Sahih Kaynaklardan Hadis-i
�?erifler'ini payla�?mak ve duyurmaktır.Selam ve Dua ile @Dua_Kardesligi","links":[{"href":"http://hadistweet.blogspot.com.tr/","rel":"me"}],"friendsCount":3387,"followersCount":376308,"listedCount":382,"statusesCount":5025,"twitterTimeZone":"Istanbul","verified":false,"utcOffset":"10800","preferredUsername":"Hadis_Tweet","languages":["tr"],"favoritesCount":623},"verb":"post","postedTime":"2015-09-13T03:42:21.000Z","generator":{"displayName":"Twitter
Web Client","link":"http://twitter.com"},"provider":{"objectType":"service","displayName":"Twitter","link":"http://www.twitter.com"},"link":"http://twitter.com/Hadis_Tweet/statuses/642906134421065728","body":"\"Kim
sabah namazını kılarsa, Allah'ın garantisi altındadır.\" (Kütüb-i Sitte, c.17, s.541)\n#Hadis","object":{"objectType":"note","id":"object:search.twitter.com<http://search.twitter.com>,2005:642906134421065728","summary":"\"Kim
sabah namazını kılarsa, Allah'ın garantisi altındadır.\" (Kütüb-i Sitte, c.17, s.541)\n#Hadis","link":"http://twitter.com/Hadis_Tweet/statuses/642906134421065728","postedTime":"2015-09-13T03:42:21.000Z"},"favoritesCount":61,"twitter_entities":{"hashtags":[{"text":"Hadis","indices":[90,96]}],"trends":[],"urls":[],"user_mentions":[],"symbols":[]},"twitter_filter_level":"low","twitter_lang":"tr"},"favoritesCount":0,"twitter_entities":{"hashtags":[{"text":"Hadis","indices":[107,113]}],"trends":[],"urls":[],"user_mentions":[{"screen_name":"Hadis_Tweet","name":"Hadis-i
�?erif","id":266950198,"id_str":"266950198","indices":[3,15]}],"symbols":[]},"twitter_filter_level":"low","twitter_lang":"tr","retweetCount":33,"gnip":{"matching_rules":[{"value":"(bio_lang:tr
OR twitter_lang:tr OR lang:tr) (contains:garantı OR contains:garanti OR contains:paracard)","tag":null}],"language":{"value":"tr"}}}


From: Gonzalo Herreros [mailto:gherreros@gmail.com<mailto:gherreros@gmail.com>]
Sent: Wednesday, March 2, 2016 12:01 PM
To: user
Subject: Re: flume problem

The channel serializes the flume event as avro including the headers, the http headers become
event headers
However the sink should only store the content, not the headers

On 2 March 2016 at 09:51, Baris Akgun (Garanti Teknoloji) <BarisAkgu@garanti.com.tr<mailto:BarisAkgu@garanti.com.tr>>
wrote:
No, we send json twitter data but in flume channel ı saw content type word for each tweet.
Is it normal ? How can ı send just tweets json without any content type. I took tweets json
from GNIP company.

Thanks
iPhone'umdan gönderildi

2 Mar 2016 tarihinde 10:56 saatinde, Gonzalo Herreros <gherreros@gmail.com<mailto:gherreros@gmail.com>>
şunları yazdı:
Could it be that you are serializing avro instead of json?

On 2 March 2016 at 08:25, Baris Akgun (Garanti Teknoloji) <BarisAkgu@garanti.com.tr<mailto:BarisAkgu@garanti.com.tr>>
wrote:
Hi,

When I send json data to flume with using http post, flume adds Co**ntent-Typeapplication/json**
for each json post.

In my http post java code,  I give the content-type with using

**con.setRequestProperty("Content-Type", "application/json");** function.


I am using blob handler.

**In flume conf file**

*tier1.sources.source1.type = org.apache.flume.source.http.HTTPSource
tier1.sources.source1.handler = org.apache.flume.sink.solr.morphline.BlobHandler*

In flume channel, flume adds content type for each post as you see. After HDFS sink, The content
type word causes a problem when ı try to parse json with spark sql or hive serDe.

**The flume channel log data**

*^LContent-Typeapplication/jsonú{"id":"+ag:_ea_ch.++i++e_.c-
^LContentTypeapplication/json‘{"id":"tag:search.twitter.com<http://search.twitter.com>,2005:642913165047648*

Is there any idea for that problem?

Thank a lot.

Barış Akgün
Analitik Veri Ambarı ve Büyük Veri Yönetimi
Uzman

Tel

:

Dahili

:

Faks

:



Bu mesaj ve ekleri, mesajda gonderildigi belirtilen kisi/kisilere ozeldir ve gizlidir. Bu
mesajin muhatabi olmamaniza ragmen tarafiniza ulasmis olmasi halinde mesaj iceriginin gizliligi
ve bu gizlilik yukumlulugune uyulmasi zorunlulugu tarafiniz icin de soz konusudur. Mesaj ve
eklerinde yer alan bilgilerin dogrulugu ve guncelligi konusunda gonderenin ya da sirketimizin
herhangi bir sorumlulugu bulunmamaktadir. Sirketimiz mesajin ve bilgilerinin size degisiklige
ugrayarak veya gec ulasmasindan, butunlugunun ve gizliliginin korunamamasindan, virus icermesinden
ve bilgisayar sisteminize verebilecegi herhangi bir zarardan sorumlu tutulamaz.

This message and attachments are confidential and intended solely for the individual(s) stated
in this message. If you received this message although you are not the addressee, you are
responsible to keep the message confidential. The sender has no responsibility for the accuracy
or correctness of the information in the message and its attachments. Our company shall have
no liability for any changes or late receiving, loss of integrity and confidentiality, viruses
and any damages caused in anyway to your computer system.

Bu mesaj ve ekleri, mesajda gonderildigi belirtilen kisi/kisilere ozeldir ve gizlidir. Bu
mesajin muhatabi olmamaniza ragmen tarafiniza ulasmis olmasi halinde mesaj iceriginin gizliligi
ve bu gizlilik yukumlulugune uyulmasi zorunlulugu tarafiniz icin de soz konusudur. Mesaj ve
eklerinde yer alan bilgilerin dogrulugu ve guncelligi konusunda gonderenin ya da sirketimizin
herhangi bir sorumlulugu bulunmamaktadir. Sirketimiz mesajin ve bilgilerinin size degisiklige
ugrayarak veya gec ulasmasindan, butunlugunun ve gizliliginin korunamamasindan, virus icermesinden
ve bilgisayar sisteminize verebilecegi herhangi bir zarardan sorumlu tutulamaz.

This message and attachments are confidential and intended solely for the individual(s) stated
in this message. If you received this message although you are not the addressee, you are
responsible to keep the message confidential. The sender has no responsibility for the accuracy
or correctness of the information in the message and its attachments. Our company shall have
no liability for any changes or late receiving, loss of integrity and confidentiality, viruses
and any damages caused in anyway to your computer system.

Bu mesaj ve ekleri, mesajda gonderildigi belirtilen kisi/kisilere ozeldir ve gizlidir. Bu
mesajin muhatabi olmamaniza ragmen tarafiniza ulasmis olmasi halinde mesaj iceriginin gizliligi
ve bu gizlilik yukumlulugune uyulmasi zorunlulugu tarafiniz icin de soz konusudur. Mesaj ve
eklerinde yer alan bilgilerin dogrulugu ve guncelligi konusunda gonderenin ya da sirketimizin
herhangi bir sorumlulugu bulunmamaktadir. Sirketimiz mesajin ve bilgilerinin size degisiklige
ugrayarak veya gec ulasmasindan, butunlugunun ve gizliliginin korunamamasindan, virus icermesinden
ve bilgisayar sisteminize verebilecegi herhangi bir zarardan sorumlu tutulamaz.

This message and attachments are confidential and intended solely for the individual(s) stated
in this message. If you received this message although you are not the addressee, you are
responsible to keep the message confidential. The sender has no responsibility for the accuracy
or correctness of the information in the message and its attachments. Our company shall have
no liability for any changes or late receiving, loss of integrity and confidentiality, viruses
and any damages caused in anyway to your computer system.

Bu mesaj ve ekleri, mesajda gonderildigi belirtilen kisi/kisilere ozeldir ve gizlidir. Bu
mesajin muhatabi olmamaniza ragmen tarafiniza ulasmis olmasi halinde mesaj iceriginin gizliligi
ve bu gizlilik yukumlulugune uyulmasi zorunlulugu tarafiniz icin de soz konusudur. Mesaj ve
eklerinde yer alan bilgilerin dogrulugu ve guncelligi konusunda gonderenin ya da sirketimizin
herhangi bir sorumlulugu bulunmamaktadir. Sirketimiz mesajin ve bilgilerinin size degisiklige
ugrayarak veya gec ulasmasindan, butunlugunun ve gizliliginin korunamamasindan, virus icermesinden
ve bilgisayar sisteminize verebilecegi herhangi bir zarardan sorumlu tutulamaz.

This message and attachments are confidential and intended solely for the individual(s) stated
in this message. If you received this message although you are not the addressee, you are
responsible to keep the message confidential. The sender has no responsibility for the accuracy
or correctness of the information in the message and its attachments. Our company shall have
no liability for any changes or late receiving, loss of integrity and confidentiality, viruses
and any damages caused in anyway to your computer system.
Mime
View raw message