storm-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From K Zharas <kgzha...@gmail.com>
Subject Re: Storm + HDFS
Date Thu, 04 Feb 2016 16:29:31 GMT
Thank you for your reply, it worked. However, I got an another problem.

Basically, I'm trying to implement HDFSBolt in Storm. I wanted to start
with basic one, so used TestWordSpout provided by Storm.

I can successfully compile and submit the topology, but it doesn't write
into HDFS.

In StormUI, I can see that Spout is emitting continuously. Bolt doesn't do
anything, and it has an error

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/CanUnbuffer at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass(ClassLoader.java:800) at
java.security.Sec

Here is my topology

public class HdfsFileTopology {
  public static void main(String[] args) throws Exception {
    RecordFormat format = new DelimitedRecordFormat().withFieldDelimiter(",");
    SyncPolicy syncPolicy = new CountSyncPolicy(100);
    FileRotationPolicy rotationPolicy = new
FileSizeRotationPolicy(10.0f, Units.KB);
    FileNameFormat fileNameFormat = new
DefaultFileNameFormat().withPath("/user");
    HdfsBolt bolt = new HdfsBolt()
            .withFsUrl("hdfs://localhost:9000")
            .withFileNameFormat(fileNameFormat)
            .withRecordFormat(format)
            .withRotationPolicy(rotationPolicy)
            .withSyncPolicy(syncPolicy);

    TopologyBuilder builder = new TopologyBuilder();
    builder.setSpout("word", new TestWordSpout(), 1);
    builder.setBolt("output", bolt, 1).shuffleGrouping("word");
    Config conf = new Config();
    conf.setDebug(true);
    conf.setNumWorkers(3);
    StormSubmitter.submitTopology("HdfsFileTopology", conf,
builder.createTopology());
    }
}


On Thu, Feb 4, 2016 at 5:04 AM, P. Taylor Goetz <ptgoetz@gmail.com> wrote:

> Assuming you have git and maven installed:
>
> git clone git@github.com:apache/storm.git
> cd storm
> git checkout -b 1.x origin/1.x-branch
> mvn install -DskipTests
>
> That third step checks out the 1.x-branch branch which is the base for the
> upcoming 1.0 release.
>
> You can then include the storm-hdfs dependency in your project:
>
> <dependency>
> <groupId>org.apache.storm</groupId>
> <artifactId>storm-hdfs</artifactId>
> <version>1.0.0-SNAPSHOT</version>
> </dependency>
>
> You can find more information on using the spout and other HDFS components
> here:
>
>
> https://github.com/apache/storm/tree/1.x-branch/external/storm-hdfs#hdfs-spout
>
> -Taylor
>
> On Feb 3, 2016, at 2:54 PM, K Zharas <kgzharas@gmail.com> wrote:
>
> Oh ok. Can you plz give me an idea how can I do it manually? I'm quite
> beginner :)
>
> On Thu, Feb 4, 2016 at 3:43 AM, Parth Brahmbhatt <
> pbrahmbhatt@hortonworks.com> wrote:
>
>> Storm-hdfs spout is not yet published in maven. You will have to checkout
>> storm locally and build it to make it available for development.
>>
>> From: K Zharas <kgzharas@gmail.com>
>> Reply-To: "user@storm.apache.org" <user@storm.apache.org>
>> Date: Wednesday, February 3, 2016 at 11:41 AM
>> To: "user@storm.apache.org" <user@storm.apache.org>
>> Subject: Re: Storm + HDFS
>>
>> Yes, looks like it is. But, I have added dependencies required by
>> storm-hdfs as stated in a guide.
>>
>> On Thu, Feb 4, 2016 at 3:33 AM, Nick R. Katsipoulakis <
>> nick.katsip@gmail.com> wrote:
>>
>>> Well,
>>>
>>> those errors look like a problem with the way you build your jar file.
>>> Please, make sure that you build your jar with the proper storm maven
>>> dependency).
>>>
>>> Cheers,
>>> Nick
>>>
>>> On Wed, Feb 3, 2016 at 2:31 PM, K Zharas <kgzharas@gmail.com> wrote:
>>>
>>>> It throws and error that packages does not exist. I have also tried
>>>> changing org.apache to backtype, still got an error but only for
>>>> storm.hdfs.spout. Btw, I use Storm-0.10.0 and Hadoop-2.7.1
>>>>
>>>>    package org.apache.storm does not exist
>>>>    package org.apache.storm does not exist
>>>>    package org.apache.storm.generated does not exist
>>>>    package org.apache.storm.metric does not exist
>>>>    package org.apache.storm.topology does not exist
>>>>    package org.apache.storm.utils does not exist
>>>>    package org.apache.storm.utils does not exist
>>>>    package org.apache.storm.hdfs.spout does not exist
>>>>    package org.apache.storm.hdfs.spout does not exist
>>>>    package org.apache.storm.topology.base does not exist
>>>>    package org.apache.storm.topology does not exist
>>>>    package org.apache.storm.tuple does not exist
>>>>    package org.apache.storm.task does not exist
>>>>
>>>> On Wed, Feb 3, 2016 at 8:57 PM, Matthias J. Sax <mjsax@apache.org>
>>>> wrote:
>>>>
>>>>> Storm does provide HdfsSpout and HdfsBolt already. Just use those,
>>>>> instead of writing your own spout/bolt:
>>>>>
>>>>> https://github.com/apache/storm/tree/master/external/storm-hdfs
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 02/03/2016 12:34 PM, K Zharas wrote:
>>>>> > Can anyone help to create a Spout which reads a file from HDFS?
>>>>> > I have tried with the code below, but it is not working.
>>>>> >
>>>>> > public void nextTuple() {
>>>>> >       Path pt=new Path("hdfs://localhost:50070/user/BCpredict.txt");
>>>>> >       FileSystem fs = FileSystem.get(new Configuration());
>>>>> >       BufferedReader br = new BufferedReader(new
>>>>> > InputStreamReader(fs.open(pt)));
>>>>> >       String line = br.readLine();
>>>>> >       while (line != null){
>>>>> >          System.out.println(line);
>>>>> >          line=br.readLine();
>>>>> >          _collector.emit(new Values(line));
>>>>> >       }
>>>>> > }
>>>>> >
>>>>> > On Tue, Feb 2, 2016 at 1:19 PM, K Zharas <kgzharas@gmail.com
>>>>> > <mailto:kgzharas@gmail.com>> wrote:
>>>>> >
>>>>> >     Hi.
>>>>> >
>>>>> >     I have a project I'm currently working on. The idea is to
>>>>> implement
>>>>> >     "scikit-learn" into Storm and integrate it with HDFS.
>>>>> >
>>>>> >     I've already implemented "scikit-learn". But, currently I'm
>>>>> using a
>>>>> >     text file to read and write. However, I need to use HDFS, but
>>>>> >     finding it hard to integrate with HDFS.
>>>>> >
>>>>> >     Here is the link to github
>>>>> >     <https://github.com/kgzharas/StormTopologyTest>. (I only
>>>>> included
>>>>> >     files that I used, not whole project)
>>>>> >
>>>>> >     Basically, I have a few questions if you don't mint to answer
>>>>> them
>>>>> >     1) How to use HDFS to read and write?
>>>>> >     2) Is my "scikit-learn" implementation correct?
>>>>> >     3) How to create a Storm project? (Currently working in
>>>>> "storm-starter")
>>>>> >
>>>>> >     These questions may sound a bit silly, but I really can't find
a
>>>>> >     proper solution.
>>>>> >
>>>>> >     Thank you for your attention to this matter.
>>>>> >     Sincerely, Zharas.
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> > Best regards,
>>>>> > Zharas
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>> Zharas
>>>>
>>>
>>>
>>>
>>> --
>>> Nick R. Katsipoulakis,
>>> Department of Computer Science
>>> University of Pittsburgh
>>>
>>
>>
>>
>> --
>> Best regards,
>> Zharas
>>
>
>
>
> --
> Best regards,
> Zharas
>
>
>


-- 
Best regards,
Zharas

Mime
View raw message