incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: New to Blur
Date Tue, 07 May 2013 06:32:09 GMT
Hi,

I was able to run the code examples for loading data and searching. There
were really no hiccups. I started digging through the code and will let you
know if I have any questions. Thanks

- Rahul


On Thu, May 2, 2013 at 11:33 PM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Hi Aaron,
>
> I greatly appreciate your detailed response. I will go through the notes,
> code and the examples you provided over the weekend and will keep you
> posted regarding any issues that I will come across. Once again thank you.
>
> - Rahul
>
>
> On Thu, May 2, 2013 at 7:45 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>
>> Rahul,
>>
>> I'm glad you were able to get things built and Blur up and running!  Good
>> questions!  Let me see if I can answer them.
>>
>> 1. I am not able to find the 'blur.*.hostname' properties in the
>> blur.properties file, but these are listed in the readme file
>>
>> The blur-site.properties file overrides the blur-default.properties file
>> that can be found in src/blur-util/src/main/resources/ directory.
>>
>> 2. There seems to be a lot of code. I greatly appreciate if someone can
>> give me pointers before I dig through the codebase. Something like an
>> architectural overview or a flow explaining how the search query is
>> resolved.
>>
>> Good question.  I will explain how a query is executing assuming you are
>> running Blur in a clustered environment (controllers + shards).
>>
>> -1. Client creates a query (BlurQuery) with the generated Thrift objects.
>> -2. Client submits the query to one of the controllers by calling the
>> query
>> method on the Blur service.
>>
>> Note the easiest way to interact with Thrift in the client is by using
>> BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift project.
>>  And you can see it in use here (I just added it, so you might have to
>> pull)
>>
>>
>> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755
>>
>> -3. Once the query arrives in the controller, the controller then
>> re-submits the query to all the shard servers that are online.
>>
>> See
>>
>> src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java
>> query method.
>>
>> -4. Once the in shard server the query is then parsed into a Lucene query.
>> -5. The query is executed in parallel, one thread per index shard in the
>> shard server.
>>
>> See
>> src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java
>> query method.
>>
>> -6. Once the results have been found from the query they are merged and
>> the
>> top N are returned to the controller.
>> -7. Once a the results from all the shard servers have returned the top N
>> are returned to the client.
>>
>> I know this is a technical explanation to running a single query, but is
>> should give you some starting points to dig through the code.
>>
>> The projects breakdown:
>>
>> blur-core
>> - This project binds most of the other projects together, houses all the
>> thrift service impls, failover logic, server startup, shard and controller
>> management, etc.
>> blur-gui
>> - An http status server that runs in each controller and shard server,
>> needs some work.
>> blur-mapred
>> - The bulking indexing code lives in this project.
>> blur-query
>> - The lucene query classes that blur implements reside here.
>> blur-shell
>> - A basic shell program to interact with blur, needs some more features.
>> blur-store
>> - The lucene directory and block cache code resides here.
>> blur-testsuite
>> - Current contains a lot of example programs to exercise a blur cluster.
>> blur-thrift
>> - Contains generate thrift code and client code, the client code has
>> automatic retry logic for when you are running multiple controllers, etc.
>> blur-util
>> - Contains some basic utility classes, metrics, and zookeeper code.
>>
>>
>> 3. How do you guys manage your development workspace with eclipse, git,
>> and
>> maven. This will definitely help me get a kickstart.
>>
>> I run git on the command line, with mvn and eclipse as my IDE.  There are
>> some shortcuts runs testing a single shard server, or a shard server +
>> controller server from within eclipse.  Take a look at the
>> org.apache.blur.thrift.ThriftBlurShardServer and
>> org.apache.blur.thrift.ThriftBlurControllerServer to the main methods that
>> can be executed to run various processes.  If you have ZooKeeper running
>> you should be able to run those mains and then step through a query being
>> executed.
>>
>> 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the
>> steps in actually using it. Where do we start?
>>
>> Take a look at
>>
>> http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas
>> well as the blur-testsuite.  That project has some basic programs to
>> create a table, load data, search, etc.  And please follow up with more
>> questions if you need more guidance or help.
>>
>> Thanks for the notes about the initial setup and build!  I will take a
>> look
>> at the errors.
>>
>> Aaron
>>
>>
>> On Thu, May 2, 2013 at 1:42 AM, rahul challapalli <
>> challapallirahul@gmail.com> wrote:
>>
>> > Hi,
>> >
>> > I was able to get blur started (shards and controllers). It worked
>> straight
>> > away. Awesome. I have a few more questions. My apologies if some of the
>> > questions are naive.
>> >
>> > 1. I am not able to find the 'blur.*.hostname' properties in the
>> > blur.properties file, but these are listed in the readme file
>> > 2. There seems to be a lot of code. I greatly appreciate if someone can
>> > give me pointers before I dig through the codebase. Something like an
>> > architectural overview or a flow explaining how the search query is
>> > resolved.
>> > 3. How do you guys manage your development workspace with eclipse, git,
>> and
>> > maven. This will definitely help me get a kickstart.
>> > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the
>> > steps in actually using it. Where do we start?
>> >
>> > Also I am outlining the steps that I followed in getting blur to run and
>> > also I got a couple of errors during the build process which are also
>> > listed below. The overall build was successful though.
>> >
>> > Apache Blur Single Node Setup on Mac OS X Lion
>> >
>> > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5
>> > 2. Get the Blur code from Git using git clone
>> > https://git-wip-us.apache.org/repos/asf/incubator-blur.git
>> > 3. Checkout the branch 0.1.5
>> > 4. Run 'mvn clean install' from the 'src' directory as superuser
>> > 5. Extract the Blur tar.gz file from the 'target/' directory into a
>> > convenient location and set BLUR_HOME to this location and add it to
>> > .bash_profile
>> > 6. Go to the extracted folder and configure the
>> > $BLUR_HOME/config/blur-env.sh file.  The two exports that are required:
>> >            export JAVA_HOME=$(/usr/libexec/java_home)
>> >            export HADOOP_HOME=/usr/local/hadoop
>> > 7. Setup the $BLUR_HOME/config/blur.properties file.  The default site
>> > configuration:
>> >            blur.zookeeper.connection=localhost
>> >            blur.cluster.name=default
>> > 8. Start blur using $BLUR_HOME/bin/start-all.sh
>> >
>> > Errors during the build process :
>> >
>> >  ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher]
>> > writer.BlurIndexRefresher: Unknown error
>> > org.apache.lucene.store.AlreadyClosedException: this Directory is closed
>> > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256)
>> > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
>> >  at
>> >
>> >
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679)
>> > at
>> >
>> >
>> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
>> >  at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
>> > at
>> >
>> >
>> org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326)
>> >  at
>> >
>> >
>> org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284)
>> > at
>> >
>> >
>> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247)
>> >  at
>> >
>> >
>> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
>> > at
>> >
>> >
>> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
>> >  at
>> >
>> >
>> org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82)
>> > at
>> >
>> >
>> org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70)
>> >  at
>> >
>> >
>> org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61)
>> > at java.util.TimerThread.mainLoop(Timer.java:512)
>> >  at java.util.TimerThread.run(Timer.java:462)
>> >   WARN  20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error during
>> > unregister
>> > javax.management.InstanceNotFoundException:
>> >
>> >
>> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
>> >  at
>> >
>> >
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094)
>> > at
>> >
>> >
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415)
>> >  at
>> >
>> >
>> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403)
>> > at
>> >
>> >
>> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507)
>> >  at
>> >
>> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115)
>> > at
>> >
>> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132)
>> >  at
>> >
>> >
>> org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443)
>> > at
>> >
>> >
>> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436)
>> >  at
>> >
>> >
>> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271)
>> > at
>> >
>> >
>> org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127)
>> >  at
>> >
>> >
>> org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339)
>> > at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427)
>> >  at
>> org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146)
>> > at
>> >
>> >
>> org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >  at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> >  at
>> >
>> >
>> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
>> > at
>> >
>> >
>> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>> >  at
>> >
>> >
>> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
>> > at
>> >
>> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
>> >  at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
>> > at
>> >
>> >
>> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
>> >  at
>> >
>> >
>> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
>> > at
>> >
>> >
>> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> >  at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > at java.lang.reflect.Method.invoke(Method.java:597)
>> >  at
>> >
>> >
>> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
>> > at
>> >
>> >
>> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
>> >  at
>> >
>> >
>> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
>> > at
>> >
>> >
>> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
>> >  at
>> > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
>> >
>> >
>> > - Rahul
>> >
>> >
>> >
>> >
>> > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli <
>> > challapallirahul@gmail.com> wrote:
>> >
>> > > Aaron,
>> > >
>> > > Thanks for your reply. I will sure let you know how it goes.
>> > >
>> > > - Rahul
>> > >
>> > >
>> > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <amccurry@gmail.com>
>> > wrote:
>> > >
>> > >> Hi Rahul,
>> > >>
>> > >> Welcome!  Blur is a young incubator project and with that there is
>> not a
>> > >> lot of documentation.  Yet.  But we do have a lot of code.  :-)
>> > >>
>> > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing,
>> Thrift
>> > >> for
>> > >> RPC and ZooKeeper for state, and of course Lucene for search.  Yes
>> Blur
>> > >> can
>> > >> and should run along side a standard Hadoop install (MapReduce +
>> HDFS).
>> > >>  It
>> > >> currently works with the 1.0.x version or CDH3 from Cloudera.  I'm
>> sure
>> > we
>> > >> can get it to work with 2.0.x and CDH4, it just hasn't happen yet.
>> > >>  However
>> > >> the only dependency to run Blur on a single machine is ZooKeeper.
>>  HDFS
>> > is
>> > >> required for a cluster.
>> > >>
>> > >> To get you started.
>> > >>
>> > >> git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git
>> > >>
>> > >> # we are currently focusing on getting 0.1.5 to a releasable state.
>> > >> git checkout 0.1.5
>> > >>
>> > >> In the checkout you will find a README.md that is a bit out of date
>> with
>> > >> the code examples but the general theme is correct.  For more
>> examples
>> > >> take
>> > >> a look at the blur-testsuite project, there are a lot of code
>> examples
>> > in
>> > >> there to get you started.
>> > >>
>> > >> To build the project into a tarball that can be extracted and
>> executed.
>> > >>
>> > >> run "mvn install" from the src/ directory.  Once it has successfully
>> > >> executed all the tests and built everything you will find a tar.gz
>> file
>> > in
>> > >> the target/ directory in the distribution project.
>> > >>
>> > >> Before you can run Blur, Apache ZooKeeper needs to be running.  A
>> > default
>> > >> install will work.
>> > >>
>> > >> After extracting the Blur tar.gz file you should be able to run the
>> > >> bin/start-all.sh and it should start a Blur controller and a shard
>> > server
>> > >> on your local machine.
>> > >>
>> > >> I would love to hear how your initial compile and install goes,
>> because
>> > we
>> > >> could use this thread and any information that is exchanged to
>> create a
>> > >> nice little wiki page for 0.1.5.
>> > >>
>> > >> Thank!
>> > >>
>> > >> Aaron
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli <
>> > >> challapallirahul@gmail.com> wrote:
>> > >>
>> > >> > Hi,
>> > >> >
>> > >> > I am new to blur and even ASF in terms of contributing back to
a
>> > >> project. I
>> > >> > have decent knowledge about hadoop and mapreduce but completely
>> new to
>> > >> > search. I come from a Java/PHP background. I  am looking for some
>> > >> direction
>> > >> > in setting up blur on my local machine. I have a single node hadoop
>> > >> > installation on my Mac OS X Lion. Is it an issue if I have HDFS,
>> > >> MapReduce
>> > >> > daemons running alongside blur on the same machine. I would greatly
>> > >> > appreciate if you can refer me to some setup document as well
as an
>> > >> insight
>> > >> > into the architecture of blur. Thank You.
>> > >> >
>> > >> > - Rahul
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message