www-infrastructure-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kettler Karl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (INFRA-10633) DNS
Date Tue, 20 Oct 2015 13:49:27 GMT

    [ https://issues.apache.org/jira/browse/INFRA-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14965113#comment-14965113
] 

Kettler Karl commented on INFRA-10633:
--------------------------------------

Flume supplies twitter data in avro format and not in Json.
Why?
Flume Config Agent:
TwitterAgent.sources = Twitter
TwitterAgent.channels = MemChannel
TwitterAgent.sinks = HDFS

TwitterAgent.sources.Twitter.type = org.apache.flume.source.twitter.TwitterSource
TwitterAgent.sources.Twitter.channels = MemChannel
TwitterAgent.sources.Twitter.consumerKey = xxx
TwitterAgent.sources.Twitter.consumerSecret = xxx 
TwitterAgent.sources.Twitter.accessToken = xxx
TwitterAgent.sources.Twitter.accessTokenSecret = xxx 
TwitterAgent.sources.Twitter.maxBatchSize = 10
TwitterAgent.sources.Twitter.maxBatchDurationMillis = 200
TwitterAgent.sources.Twitter.keywords = United Nations
TwitterAgent.sources.Twitter.deserializer.schemaType = LITERAL
# HDFS Sink
TwitterAgent.sinks.HDFS.channel = MemChannel
TwitterAgent.sinks.HDFS.type = hdfs
TwitterAgent.sinks.HDFS.hdfs.path = /demo/tweets/stream/%y-%m-%d/%H%M%S
TwitterAgent.sinks.HDFS.hdfs.filePrefix = events
TwitterAgent.sinks.HDFS.hdfs.round = true
TwitterAgent.sinks.HDFS.hdfs.roundValue = 5
TwitterAgent.sinks.HDFS.hdfs.roundUnit = minute
TwitterAgent.sinks.HDFS.hdfs.useLocalTimeStamp = true
TwitterAgent.sinks.HDFS.hdfs.fileType = DataStream
TwitterAgent.sinks.HDFS.hdfs.writeFormat = Text

TwitterAgent.channels.MemChannel.type = memory
TwitterAgent.channels.MemChannel.capacity = 1000
TwitterAgent.channels.MemChannel.transactionCapacity = 100

Twitter Data from Flume:
Obj	avro.schema�
{"type":"record","name":"Doc","doc":"adoc","fields":[{"name":"id","type":"string"},{"name":"user_friends_count","type":["int","null"]},{"name":"user_location","type":["string","null"]},{"name":"user_description","type":["string","null"]},{"name":"user_statuses_count","type":["int","null"]},{"name":"user_followers_count","type":["int","null"]},{"name":"user_name","type":["string","null"]},{"name":"user_screen_name","type":["string","null"]},{"name":"created_at","type":["string","null"]},{"name":"text","type":["string","null"]},{"name":"retweet_count","type":["long","null"]},{"name":"retweeted","type":["boolean","null"]},{"name":"in_reply_to_user_id","type":["long","null"]},{"name":"source","type":["string","null"]},{"name":"in_reply_to_status_id","type":["long","null"]},{"name":"media_url_https","type":["string","null"]},{"name":"expanded_url","type":["string","null"]}]}�]3hˊى���|����$656461386520784896�
�お絵描きするショタコン/オタクまっしぐら。論破メインに雑食もぐもぐ/成人済み pixiv:323565 隔離:【@yh_u_】�n�
ユハズ
yhzz_(2015-10-20T13:26:05Z�	はじめた~リセマラめんどくさいし緑茶来たから普通にこのまま進める
https://t.co/ZpfDqw4l9g	�	<a href="http://twitter.com" rel="nofollow">Twitter Web
Client</a>	^https://pbs.twimg.com/media/CRw4Js3UAAAGusn.pngthttp://twitter.com/yhzz_/status/656461386520784896/photo/1$656461390677417984�
<Mundo de las sombras (Cc,Extr)�#RP User de un agente del gobierno |20| Que no me veais
ni noteis mi presencia no quiere decir que no os este observando desde las sombras��	�
JKP® BakasumaUserSinCausa(2015-10-20T13:26:06Z�	RT @NaiiVicious: @Lisi_Hattori @UserSinCausa
https://t.co/M2LTJWwqae		�	<a href="http://twitter.com/download/android" rel="nofollow">Twitter
for Android</a>	^https://pbs.twimg.com/media/CRthC1mWUAIFTF-.jpg�	http://twitter.com/NaiiVicious/status/656224896297529344/photo/1�]3hˊى���|���

By loading this twitter data into a HDFS table. It is not possible to convert with avro-tools-1.7.7.jar.
into Json. We get error message: "No data"
If we want to read this file we get following error message:
"java -jar avro-tools-1.7.7.jar tojson twitter.avro > twitter.json
Exception in thread "main" org.apache.avro.AvroRuntimeException: java.io.EOFException"

I hope you could help us.

Kind regards,
Karl 


> DNS
> ---
>
>                 Key: INFRA-10633
>                 URL: https://issues.apache.org/jira/browse/INFRA-10633
>             Project: Infrastructure
>          Issue Type: New TLP - Common Tasks
>            Reporter: Kettler Karl
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message