flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mardan Khan <mardan8...@gmail.com>
Subject Re: How to upload the SEQ data into hdfs
Date Wed, 25 Jul 2012 03:45:31 GMT
Hi,


I have solved the problem of hostname expected by adding the hostname in
/etc/hosts file. Now I am facing the Broken pipline problem. As Per my
study, could be reason of this problem is mismatch version of hadoop. I
have downloaded hadoop version 0.20.0  from hadoop.apache.org website and
configure it. I have download the CDH4 for flume but could not configure
the hadoop of CDH4. I am currently using the hadoop-0.20.0.

I have copied the hadoop-0.20.0-core. jar into  /usr/lib/flume-ng/lib
directory. After that when run the command i could not get the error
message but stuck my system at point as below:


12/07/25 04:30:16 INFO conf.FlumeConfiguration: Added sinks:
hdfs-Cluster1-sink Agent: agent1
12/07/25 04:30:16 WARN conf.FlumeConfiguration: Configuration empty for:
hdfs-Cluster1-sink.Removed.
12/07/25 04:30:16 INFO conf.FlumeConfiguration: Post-validation flume
configuration contains configuration  for agents: [agent, agent1]
12/07/25 04:30:16 INFO properties.PropertiesFileConfigurationProvider:
Creating channels
12/07/25 04:30:16 INFO properties.PropertiesFileConfigurationProvider:
created channel mem-channel-1
12/07/25 04:30:16 INFO sink.DefaultSinkFactory: Creating instance of sink
hdfs-Cluster1-sink typehdfs


Now, How I can have both hadoop and flume have same version of hadoop. I
tried to replace the hadoop core jar file in flume-ng/lib directory but
could not find any hadoop core jar file there. Where I should copy the
hadoop-0.20.0-core.jar file in flume so have the same hadoop version.

I dont want to use CDH4 hadoop as i dont know much how to configure.

Please any suggestion.


Many thanks




On Wed, Jul 25, 2012 at 12:18 AM, mardan Khan <mardan8310@gmail.com> wrote:

> Hi,
>
>
> Many thanks for help. I have exactly used the same configuration but now
> getting the following error. My hadoop is properly runing and I can write
> the data to hdfs by -put command. I am providing the host IP address but
> still give me hostname expected error. Any suggestion please.
>
>
>
> 12/07/25 00:04:16 INFO conf.FlumeConfiguration: Added sinks:
> hdfs-Cluster1-sink Agent: agent1
> 12/07/25 00:04:17 WARN conf.FlumeConfiguration: Configuration empty for:
> hdfs-Cluster1-sink.Removed.
> 12/07/25 00:04:17 INFO conf.FlumeConfiguration: Post-validation flume
> configuration contains configuration  for agents: [agent, agent1]
> 12/07/25 00:04:17 INFO properties.PropertiesFileConfigurationProvider:
> Creating channels
> 12/07/25 00:04:17 INFO properties.PropertiesFileConfigurationProvider:
> created channel mem-channel-1
> 12/07/25 00:04:17 INFO sink.DefaultSinkFactory: Creating instance of sink
> hdfs-Cluster1-sink typehdfs
> 12/07/25 00:04:17 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
> 12/07/25 00:04:17 INFO nodemanager.DefaultLogicalNodeManager: Starting new
> configuration:{ sourceRunners:{avro-AppSrv-source=PollableSourceRunner: {
> source:org.apache.flume.source.SequenceGeneratorSource@6659fb21counterGroup:{ name:null
counters:{} } }}
> sinkRunners:{hdfs-Cluster1-sink=SinkRunner: {
> policy:org.apache.flume.sink.DefaultSinkProcessor@1d766806 counterGroup:{
> name:null counters:{} } }}
> channels:{mem-channel-1=org.apache.flume.channel.MemoryChannel@48a77106} }
> 12/07/25 00:04:17 INFO nodemanager.DefaultLogicalNodeManager: Starting
> Channel mem-channel-1
> 12/07/25 00:04:17 INFO nodemanager.DefaultLogicalNodeManager: Waiting for
> channel: mem-channel-1 to start. Sleeping for 500 ms
> 12/07/25 00:04:17 INFO nodemanager.DefaultLogicalNodeManager: Starting
> Sink hdfs-Cluster1-sink
> 12/07/25 00:04:17 INFO nodemanager.DefaultLogicalNodeManager: Starting
> Source avro-AppSrv-source
> 12/07/25 00:04:17 INFO source.SequenceGeneratorSource: Sequence generator
> source starting
> 12/07/25 00:04:18 INFO hdfs.BucketWriter: Creating hdfs://
> 134.83.35.24:9000/user/mardan/flume/FlumeData.1343171057851.tmp
> 12/07/25 00:04:18 ERROR hdfs.HDFSEventSink: process failed
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected
> hostname at index 7: hdfs://:9000
>     at org.apache.hadoop.net.NetUtils.getCanonicalUri(NetUtils.java:267)
>     at org.apache.hadoop.fs.FileSystem.getCanonicalUri(FileSystem.java:214)
>     at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:526)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:166)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:232)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:75)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:806)
>     at
> org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1059)
>     at
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:269)
>     at
> org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:368)
>     at
> org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:65)
>     at
> org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:49)
>     at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:125)
>     at
> org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:183)
>     at
> org.apache.flume.sink.hdfs.HDFSEventSink$1.doCall(HDFSEventSink.java:432)
>     at
> org.apache.flume.sink.hdfs.HDFSEventSink$1.doCall(HDFSEventSink.java:429)
>     at
> org.apache.flume.sink.hdfs.HDFSEventSink$ProxyCallable.call(HDFSEventSink.java:164)
>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:662)
> Caused by: java.net.URISyntaxException: Expected hostname at index 7:
> hdfs://:9000
>     at java.net.URI$Parser.fail(URI.java:2810)
>     at java.net.URI$Parser.failExpecting(URI.java:2816)
>     at java.net.URI$Parser.parseHostname(URI.java:3352)
>     at java.net.URI$Parser.parseServer(URI.java:3198)
>     at java.net.URI$Parser.parseAuthority(URI.java:3117)
>     at java.net.URI$Parser.parseHierarchical(URI.java:3059)
>     at java.net.URI$Parser.parse(URI.java:3015)
>     at java.net.URI.<init>(URI.java:662)
>     at org.apache.hadoop.net.NetUtils.getCanonicalUri(NetUtils.java:263)
>
>
> The configuration file as below:
>
>
>
>
> agent.sources = avro-AppSrv-source
> agent.sinks = hdfs-Cluster1-sink
> agent.channels = mem-channel-1
> # set channel for sources, sinks
> # properties of avro-AppSrv-source
> agent.sources.avro-AppSrv-source.type = SEQ
>
> agent.sources.avro-AppSrv-source.bind = localhost
> agent.sources.avro-AppSrv-source.port = 10000
>
> agent.sources.avro-AppSrv-source.channels = mem-channel-1
>
> # properties of mem-channel-1
> agent.channels.mem-channel-1.type = memory
> agent.channels.mem-channel-1.capacity = 1000
> agent.channels.mem-channel-1.transactionCapacity = 100
> # properties of hdfs-Cluster1-sink
> agent.sinks.hdfs-Cluster1-sink.type = hdfs
>
> agent.sinks.hdfs-Cluster1-sink.channel = mem-channel-1
> agent.sinks.hdfs-Cluster1-sink.hdfs.path = hdfs://
> 134.83.35.24:9000/user/mardan/flume
>
>
>
>
>
> Thanks
>
>
>
>
>
>
>
>
>
>
> On Tue, Jul 24, 2012 at 1:23 PM, Brock Noland <brock@cloudera.com> wrote:
>
>> Hi,
>>
>> Your channel is not hooked up to the source and sink. See the additions
>> below.
>>
>> agent.sources = avro-AppSrv-source
>> agent.sinks = hdfs-Cluster1-sink
>> agent.channels = mem-channel-1
>> # set channel for sources, sinks
>> # properties of avro-AppSrv-source
>> agent.sources.avro-AppSrv-source.type = SEQ
>>
>> agent.sources.avro-AppSrv-source.bind = localhost
>> agent.sources.avro-AppSrv-source.port = 10000
>> agent.sources.avro-AppSrv-source.channels = mem-channel-1
>> # properties of mem-channel-1
>> agent.channels.mem-channel-1.type = memory
>> agent.channels.mem-channel-1.capacity = 1000
>> agent.channels.mem-channel-1.transactionCapacity = 100
>> # properties of hdfs-Cluster1-sink
>> agent.sinks.hdfs-Cluster1-sink.type = hdfs
>> agent.sinks.hdfs-Cluster1-sink.channel = mem-channel-1
>> agent.sinks.hdfs-Cluster1-sink.hdfs.path =
>> hdfs://134.83.35.24/user/mukhtaj/flume/
>>
>>
>> Also we seem to give a better error message here:
>> https://issues.apache.org/jira/browse/FLUME-1271
>>
>>
>> Brock
>>
>>
>> On Tue, Jul 24, 2012 at 6:58 AM, mardan Khan <mardan8310@gmail.com>
>> wrote:
>> > Hi Will,
>> >
>> > I did changed in configuration file as per your suggestion
>> > (agent.sources.avro-AppSrv-source.type = SEQ) but still I am getting the
>> > same error.
>> >
>> > The configiration file as:
>> >
>> >
>> > agent.sources = avro-AppSrv-source
>> > agent.sinks = hdfs-Cluster1-sink
>> > agent.channels = mem-channel-1
>> > # set channel for sources, sinks
>> > # properties of avro-AppSrv-source
>> > agent.sources.avro-AppSrv-source.type = SEQ
>> >
>> > agent.sources.avro-AppSrv-source.bind = localhost
>> > agent.sources.avro-AppSrv-source.port = 10000
>> > # properties of mem-channel-1
>> > agent.channels.mem-channel-1.type = memory
>> > agent.channels.mem-channel-1.capacity = 1000
>> > agent.channels.mem-channel-1.transactionCapacity = 100
>> > # properties of hdfs-Cluster1-sink
>> > agent.sinks.hdfs-Cluster1-sink.type = hdfs
>> > agent.sinks.hdfs-Cluster1-sink.hdfs.path =
>> > hdfs://134.83.35.24/user/mukhtaj/flume/
>> >
>> >
>> >
>> >
>> > The error as:
>> >
>> >
>> > 12/07/24 12:52:33 ERROR properties.PropertiesFileConfigurationProvider:
>> > Failed to load configuration data. Exception follows.
>> >
>> > java.lang.NullPointerException
>> >  at
>> >
>> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadSources(PropertiesFileConfigurationProvider.java:324)
>> >  at
>> >
>> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:222)
>> >  at
>> >
>> org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:123)
>> >  at
>> >
>> org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>> >  at
>> >
>> org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:202)
>> >  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> >  at
>> >
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> >  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> >
>> >
>> > Why i am getting this error. I am struggling from few days for this
>> proble.
>> > Runing any command get this error.
>> >
>> >
>> > Any sugesstion please.
>> >
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Jul 24, 2012 at 3:46 AM, Will McQueen <will@cloudera.com>
>> wrote:
>> >>
>> >> Or as Brock said, you can refer to the link he posted and use the
>> example
>> >> from the user guide instead, then you'll need to include this:
>> >>
>> >> agent.sources = avro-AppSrv-source
>> >> agent.sinks = hdfs-Cluster1-sink
>> >> agent.channels = mem-channel-1
>> >>
>> >> ... but that example uses an Avro source so you'll likely need to
>> start an
>> >> avro-client to test (or use Flume SDK). Or just change the source type
>> to
>> >> SEQ.
>> >>
>> >> Cheers,
>> >> Will
>> >>
>> >>
>> >> On Mon, Jul 23, 2012 at 6:07 PM, mardan Khan <mardan8310@gmail.com>
>> wrote:
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Thanks Brocks,
>> >>>
>> >>> I have just gone through the posted link and just copy past the one
of
>> >>> configuration file  and change the hdfs path as below:
>> >>>
>> >>>
>> >>>
>> >>> # properties of avro-AppSrv-source
>> >>> agent.sources.avro-AppSrv-source.type = avro
>> >>> agent.sources.avro-AppSrv-source.bind = localhost
>> >>> agent.sources.avro-AppSrv-source.port = 10000
>> >>>
>> >>> # properties of mem-channel-1
>> >>> agent.channels.mem-channel-1.type = memory
>> >>> agent.channels.mem-channel-1.capacity = 1000
>> >>> agent.channels.mem-channel-1.transactionCapacity = 100
>> >>>
>> >>> # properties of hdfs-Cluster1-sink
>> >>> agent.sinks.hdfs-Cluster1-sink.type = hdfs
>> >>> agent.sinks.hdfs-Cluster1-sink.hdfs.path =
>> >>> hdfs://134.83.35.24/user/mardan/flume/
>> >>>
>> >>>
>> >>> apply the following command:
>> >>>
>> >>> $  /usr/bin/flume-ng agent -n agent -c conf -f
>> >>> /usr/lib/flume-ng/conf/flume.conf
>> >>>
>> >>>
>> >>> and got the following error. Most of the time of getting this error
>> >>>
>> >>> 12/07/24 01:54:43 ERROR
>> properties.PropertiesFileConfigurationProvider:
>> >>> Failed to load configuration data. Exception follows.
>> >>> java.lang.NullPointerException
>> >>>     at
>> >>>
>> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.loadSources(PropertiesFileConfigurationProvider.java:324)
>> >>>     at
>> >>>
>> org.apache.flume.conf.properties.PropertiesFileConfigurationProvider.load(PropertiesFileConfigurationProvider.java:222)
>> >>>     at
>> >>>
>> org.apache.flume.conf.file.AbstractFileConfigurationProvider.doLoad(AbstractFileConfigurationProvider.java:123)
>> >>>     at
>> >>>
>> org.apache.flume.conf.file.AbstractFileConfigurationProvider.access$300(AbstractFileConfigurationProvider.java:38)
>> >>>     at
>> >>>
>> org.apache.flume.conf.file.AbstractFileConfigurationProvider$FileWatcherRunnable.run(AbstractFileConfigurationProvider.java:202)
>> >>>     at
>> >>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>> >>>     at
>> >>>
>> java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
>> >>>     at
>> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
>> >>>     at
>> >>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
>> >>>     at
>> >>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
>> >>>     at
>> >>>
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
>> >>>     at
>> >>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >>>     at
>> >>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >>>     at java.lang.Thread.run(Thread.java:662)
>> >>>
>> >>> I think some thing wrong in the configuration file. I am using
>> flume1.x
>> >>> version and installed in /usr/lib/flume-ng/
>> >>>
>> >>> Could you please check the command and configuration file.
>> >>>
>> >>> Thanks
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On Tue, Jul 24, 2012 at 1:33 AM, Brock Noland <brock@cloudera.com>
>> wrote:
>> >>>>
>> >>>> Yes, you can do that. In fact that is the most common case. The
>> >>>> documents which should help you do so are here:
>> >>>>
>> >>>>
>> >>>>
>> https://cwiki.apache.org/confluence/display/FLUME/Flume+1.x+Documentation
>> >>>>
>> >>>> Brock
>> >>>>
>> >>>> On Mon, Jul 23, 2012 at 7:26 PM, mardan Khan <mardan8310@gmail.com>
>> >>>> wrote:
>> >>>> > Hi,
>> >>>> >
>> >>>> > I am just doing testing. I am generating the sequence and want
to
>> >>>> > upload
>> >>>> > into hdfs. My configuration file as:
>> >>>> >
>> >>>> > agent2.channels = c1
>> >>>> > agent2.sources = r1
>> >>>> > agent2.sinks = k1
>> >>>> >
>> >>>> > agent2.channels.c1.type = MEMORY
>> >>>> >
>> >>>> > agent2.sources.r1.channels = c1
>> >>>> > agent2.sources.r1.type = SEQ
>> >>>> >
>> >>>> > agent2.sinks.k1.channel = c1
>> >>>> > agent2.sinks.k1.type = LOGGER
>> >>>> >
>> >>>> >
>> >>>> > Is it possible to upload into hdfs, if possible then how I
can make
>> >>>> > the
>> >>>> > changes in configuration file.
>> >>>> >
>> >>>> >
>> >>>> > Many thanks
>> >>>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Apache MRUnit - Unit testing MapReduce -
>> >>>> http://incubator.apache.org/mrunit/
>> >>>
>> >>>
>> >>
>> >
>>
>>
>>
>> --
>> Apache MRUnit - Unit testing MapReduce -
>> http://incubator.apache.org/mrunit/
>>
>
>

Mime
View raw message