incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: New to Blur
Date Fri, 03 May 2013 06:33:44 GMT
Hi Aaron,

I greatly appreciate your detailed response. I will go through the notes,
code and the examples you provided over the weekend and will keep you
posted regarding any issues that I will come across. Once again thank you.

- Rahul


On Thu, May 2, 2013 at 7:45 PM, Aaron McCurry <amccurry@gmail.com> wrote:

> Rahul,
>
> I'm glad you were able to get things built and Blur up and running!  Good
> questions!  Let me see if I can answer them.
>
> 1. I am not able to find the 'blur.*.hostname' properties in the
> blur.properties file, but these are listed in the readme file
>
> The blur-site.properties file overrides the blur-default.properties file
> that can be found in src/blur-util/src/main/resources/ directory.
>
> 2. There seems to be a lot of code. I greatly appreciate if someone can
> give me pointers before I dig through the codebase. Something like an
> architectural overview or a flow explaining how the search query is
> resolved.
>
> Good question.  I will explain how a query is executing assuming you are
> running Blur in a clustered environment (controllers + shards).
>
> -1. Client creates a query (BlurQuery) with the generated Thrift objects.
> -2. Client submits the query to one of the controllers by calling the query
> method on the Blur service.
>
> Note the easiest way to interact with Thrift in the client is by using
> BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift project.
>  And you can see it in use here (I just added it, so you might have to
> pull)
>
>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755
>
> -3. Once the query arrives in the controller, the controller then
> re-submits the query to all the shard servers that are online.
>
> See
>
> src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java
> query method.
>
> -4. Once the in shard server the query is then parsed into a Lucene query.
> -5. The query is executed in parallel, one thread per index shard in the
> shard server.
>
> See src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java
> query method.
>
> -6. Once the results have been found from the query they are merged and the
> top N are returned to the controller.
> -7. Once a the results from all the shard servers have returned the top N
> are returned to the client.
>
> I know this is a technical explanation to running a single query, but is
> should give you some starting points to dig through the code.
>
> The projects breakdown:
>
> blur-core
> - This project binds most of the other projects together, houses all the
> thrift service impls, failover logic, server startup, shard and controller
> management, etc.
> blur-gui
> - An http status server that runs in each controller and shard server,
> needs some work.
> blur-mapred
> - The bulking indexing code lives in this project.
> blur-query
> - The lucene query classes that blur implements reside here.
> blur-shell
> - A basic shell program to interact with blur, needs some more features.
> blur-store
> - The lucene directory and block cache code resides here.
> blur-testsuite
> - Current contains a lot of example programs to exercise a blur cluster.
> blur-thrift
> - Contains generate thrift code and client code, the client code has
> automatic retry logic for when you are running multiple controllers, etc.
> blur-util
> - Contains some basic utility classes, metrics, and zookeeper code.
>
>
> 3. How do you guys manage your development workspace with eclipse, git, and
> maven. This will definitely help me get a kickstart.
>
> I run git on the command line, with mvn and eclipse as my IDE.  There are
> some shortcuts runs testing a single shard server, or a shard server +
> controller server from within eclipse.  Take a look at the
> org.apache.blur.thrift.ThriftBlurShardServer and
> org.apache.blur.thrift.ThriftBlurControllerServer to the main methods that
> can be executed to run various processes.  If you have ZooKeeper running
> you should be able to run those mains and then step through a query being
> executed.
>
> 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the
> steps in actually using it. Where do we start?
>
> Take a look at
>
> http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas
> well as the blur-testsuite.  That project has some basic programs to
> create a table, load data, search, etc.  And please follow up with more
> questions if you need more guidance or help.
>
> Thanks for the notes about the initial setup and build!  I will take a look
> at the errors.
>
> Aaron
>
>
> On Thu, May 2, 2013 at 1:42 AM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Hi,
> >
> > I was able to get blur started (shards and controllers). It worked
> straight
> > away. Awesome. I have a few more questions. My apologies if some of the
> > questions are naive.
> >
> > 1. I am not able to find the 'blur.*.hostname' properties in the
> > blur.properties file, but these are listed in the readme file
> > 2. There seems to be a lot of code. I greatly appreciate if someone can
> > give me pointers before I dig through the codebase. Something like an
> > architectural overview or a flow explaining how the search query is
> > resolved.
> > 3. How do you guys manage your development workspace with eclipse, git,
> and
> > maven. This will definitely help me get a kickstart.
> > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the
> > steps in actually using it. Where do we start?
> >
> > Also I am outlining the steps that I followed in getting blur to run and
> > also I got a couple of errors during the build process which are also
> > listed below. The overall build was successful though.
> >
> > Apache Blur Single Node Setup on Mac OS X Lion
> >
> > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5
> > 2. Get the Blur code from Git using git clone
> > https://git-wip-us.apache.org/repos/asf/incubator-blur.git
> > 3. Checkout the branch 0.1.5
> > 4. Run 'mvn clean install' from the 'src' directory as superuser
> > 5. Extract the Blur tar.gz file from the 'target/' directory into a
> > convenient location and set BLUR_HOME to this location and add it to
> > .bash_profile
> > 6. Go to the extracted folder and configure the
> > $BLUR_HOME/config/blur-env.sh file.  The two exports that are required:
> >            export JAVA_HOME=$(/usr/libexec/java_home)
> >            export HADOOP_HOME=/usr/local/hadoop
> > 7. Setup the $BLUR_HOME/config/blur.properties file.  The default site
> > configuration:
> >            blur.zookeeper.connection=localhost
> >            blur.cluster.name=default
> > 8. Start blur using $BLUR_HOME/bin/start-all.sh
> >
> > Errors during the build process :
> >
> >  ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher]
> > writer.BlurIndexRefresher: Unknown error
> > org.apache.lucene.store.AlreadyClosedException: this Directory is closed
> > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256)
> > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
> >  at
> >
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679)
> > at
> >
> >
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
> >  at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
> > at
> >
> >
> org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326)
> >  at
> >
> >
> org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284)
> > at
> >
> >
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247)
> >  at
> >
> >
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
> > at
> >
> >
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
> >  at
> >
> >
> org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82)
> > at
> >
> >
> org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70)
> >  at
> >
> >
> org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61)
> > at java.util.TimerThread.mainLoop(Timer.java:512)
> >  at java.util.TimerThread.run(Timer.java:462)
> >   WARN  20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error during
> > unregister
> > javax.management.InstanceNotFoundException:
> >
> >
> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
> >  at
> >
> >
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094)
> > at
> >
> >
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415)
> >  at
> >
> >
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403)
> > at
> >
> >
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507)
> >  at
> > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115)
> > at
> > org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132)
> >  at
> >
> >
> org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443)
> > at
> >
> >
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436)
> >  at
> >
> >
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271)
> > at
> >
> >
> org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127)
> >  at
> >
> >
> org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339)
> > at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427)
> >  at org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146)
> > at
> >
> >
> org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >  at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> >  at
> >
> >
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> > at
> >
> >
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> >  at
> >
> >
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> > at
> >
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
> >  at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> > at
> >
> >
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
> >  at
> >
> >
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
> > at
> >
> >
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >  at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > at java.lang.reflect.Method.invoke(Method.java:597)
> >  at
> >
> >
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> > at
> >
> >
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> >  at
> >
> >
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> > at
> >
> >
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
> >  at
> > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
> >
> >
> > - Rahul
> >
> >
> >
> >
> > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli <
> > challapallirahul@gmail.com> wrote:
> >
> > > Aaron,
> > >
> > > Thanks for your reply. I will sure let you know how it goes.
> > >
> > > - Rahul
> > >
> > >
> > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <amccurry@gmail.com>
> > wrote:
> > >
> > >> Hi Rahul,
> > >>
> > >> Welcome!  Blur is a young incubator project and with that there is
> not a
> > >> lot of documentation.  Yet.  But we do have a lot of code.  :-)
> > >>
> > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing,
> Thrift
> > >> for
> > >> RPC and ZooKeeper for state, and of course Lucene for search.  Yes
> Blur
> > >> can
> > >> and should run along side a standard Hadoop install (MapReduce +
> HDFS).
> > >>  It
> > >> currently works with the 1.0.x version or CDH3 from Cloudera.  I'm
> sure
> > we
> > >> can get it to work with 2.0.x and CDH4, it just hasn't happen yet.
> > >>  However
> > >> the only dependency to run Blur on a single machine is ZooKeeper.
>  HDFS
> > is
> > >> required for a cluster.
> > >>
> > >> To get you started.
> > >>
> > >> git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git
> > >>
> > >> # we are currently focusing on getting 0.1.5 to a releasable state.
> > >> git checkout 0.1.5
> > >>
> > >> In the checkout you will find a README.md that is a bit out of date
> with
> > >> the code examples but the general theme is correct.  For more examples
> > >> take
> > >> a look at the blur-testsuite project, there are a lot of code examples
> > in
> > >> there to get you started.
> > >>
> > >> To build the project into a tarball that can be extracted and
> executed.
> > >>
> > >> run "mvn install" from the src/ directory.  Once it has successfully
> > >> executed all the tests and built everything you will find a tar.gz
> file
> > in
> > >> the target/ directory in the distribution project.
> > >>
> > >> Before you can run Blur, Apache ZooKeeper needs to be running.  A
> > default
> > >> install will work.
> > >>
> > >> After extracting the Blur tar.gz file you should be able to run the
> > >> bin/start-all.sh and it should start a Blur controller and a shard
> > server
> > >> on your local machine.
> > >>
> > >> I would love to hear how your initial compile and install goes,
> because
> > we
> > >> could use this thread and any information that is exchanged to create
> a
> > >> nice little wiki page for 0.1.5.
> > >>
> > >> Thank!
> > >>
> > >> Aaron
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli <
> > >> challapallirahul@gmail.com> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I am new to blur and even ASF in terms of contributing back to a
> > >> project. I
> > >> > have decent knowledge about hadoop and mapreduce but completely new
> to
> > >> > search. I come from a Java/PHP background. I  am looking for some
> > >> direction
> > >> > in setting up blur on my local machine. I have a single node hadoop
> > >> > installation on my Mac OS X Lion. Is it an issue if I have HDFS,
> > >> MapReduce
> > >> > daemons running alongside blur on the same machine. I would greatly
> > >> > appreciate if you can refer me to some setup document as well as an
> > >> insight
> > >> > into the architecture of blur. Thank You.
> > >> >
> > >> > - Rahul
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message