incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: New to Blur
Date Tue, 07 May 2013 21:38:24 GMT
That's good to hear!  Thanks for letting us know.

Aaron


On Tue, May 7, 2013 at 2:32 AM, rahul challapalli <
challapallirahul@gmail.com> wrote:

> Hi,
>
> I was able to run the code examples for loading data and searching. There
> were really no hiccups. I started digging through the code and will let you
> know if I have any questions. Thanks
>
> - Rahul
>
>
> On Thu, May 2, 2013 at 11:33 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Hi Aaron,
> >
> > I greatly appreciate your detailed response. I will go through the notes,
> > code and the examples you provided over the weekend and will keep you
> > posted regarding any issues that I will come across. Once again thank
> you.
> >
> > - Rahul
> >
> >
> > On Thu, May 2, 2013 at 7:45 PM, Aaron McCurry <amccurry@gmail.com>
> wrote:
> >
> >> Rahul,
> >>
> >> I'm glad you were able to get things built and Blur up and running!
>  Good
> >> questions!  Let me see if I can answer them.
> >>
> >> 1. I am not able to find the 'blur.*.hostname' properties in the
> >> blur.properties file, but these are listed in the readme file
> >>
> >> The blur-site.properties file overrides the blur-default.properties file
> >> that can be found in src/blur-util/src/main/resources/ directory.
> >>
> >> 2. There seems to be a lot of code. I greatly appreciate if someone can
> >> give me pointers before I dig through the codebase. Something like an
> >> architectural overview or a flow explaining how the search query is
> >> resolved.
> >>
> >> Good question.  I will explain how a query is executing assuming you are
> >> running Blur in a clustered environment (controllers + shards).
> >>
> >> -1. Client creates a query (BlurQuery) with the generated Thrift
> objects.
> >> -2. Client submits the query to one of the controllers by calling the
> >> query
> >> method on the Blur service.
> >>
> >> Note the easiest way to interact with Thrift in the client is by using
> >> BlurClient (org.apache.blur.thrift.BlurClient) in the blur-thrift
> project.
> >>  And you can see it in use here (I just added it, so you might have to
> >> pull)
> >>
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-blur.git;a=blob;f=src/blur-testsuite/src/main/java/org/apache/blur/testsuite/SimpleQueryExample.java;h=df16bd82522bb9083309ac31f849924ba43318be;hb=ff29884ab60262305b99af49fae26575c808b755
> >>
> >> -3. Once the query arrives in the controller, the controller then
> >> re-submits the query to all the shard servers that are online.
> >>
> >> See
> >>
> >>
> src/blur-core/src/main/java/org/apache/blur/thrift/BlurControllerServer.java
> >> query method.
> >>
> >> -4. Once the in shard server the query is then parsed into a Lucene
> query.
> >> -5. The query is executed in parallel, one thread per index shard in the
> >> shard server.
> >>
> >> See
> >> src/blur-core/src/main/java/org/apache/blur/thrift/BlurShardServer.java
> >> query method.
> >>
> >> -6. Once the results have been found from the query they are merged and
> >> the
> >> top N are returned to the controller.
> >> -7. Once a the results from all the shard servers have returned the top
> N
> >> are returned to the client.
> >>
> >> I know this is a technical explanation to running a single query, but is
> >> should give you some starting points to dig through the code.
> >>
> >> The projects breakdown:
> >>
> >> blur-core
> >> - This project binds most of the other projects together, houses all the
> >> thrift service impls, failover logic, server startup, shard and
> controller
> >> management, etc.
> >> blur-gui
> >> - An http status server that runs in each controller and shard server,
> >> needs some work.
> >> blur-mapred
> >> - The bulking indexing code lives in this project.
> >> blur-query
> >> - The lucene query classes that blur implements reside here.
> >> blur-shell
> >> - A basic shell program to interact with blur, needs some more features.
> >> blur-store
> >> - The lucene directory and block cache code resides here.
> >> blur-testsuite
> >> - Current contains a lot of example programs to exercise a blur cluster.
> >> blur-thrift
> >> - Contains generate thrift code and client code, the client code has
> >> automatic retry logic for when you are running multiple controllers,
> etc.
> >> blur-util
> >> - Contains some basic utility classes, metrics, and zookeeper code.
> >>
> >>
> >> 3. How do you guys manage your development workspace with eclipse, git,
> >> and
> >> maven. This will definitely help me get a kickstart.
> >>
> >> I run git on the command line, with mvn and eclipse as my IDE.  There
> are
> >> some shortcuts runs testing a single shard server, or a shard server +
> >> controller server from within eclipse.  Take a look at the
> >> org.apache.blur.thrift.ThriftBlurShardServer and
> >> org.apache.blur.thrift.ThriftBlurControllerServer to the main methods
> that
> >> can be executed to run various processes.  If you have ZooKeeper running
> >> you should be able to run those mains and then step through a query
> being
> >> executed.
> >>
> >> 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are the
> >> steps in actually using it. Where do we start?
> >>
> >> Take a look at
> >>
> >>
> http://www.nearinfinity.com/blogs/aaron_mccurry/an_introduction_to_blur.htmlas
> >> well as the blur-testsuite.  That project has some basic programs to
> >> create a table, load data, search, etc.  And please follow up with more
> >> questions if you need more guidance or help.
> >>
> >> Thanks for the notes about the initial setup and build!  I will take a
> >> look
> >> at the errors.
> >>
> >> Aaron
> >>
> >>
> >> On Thu, May 2, 2013 at 1:42 AM, rahul challapalli <
> >> challapallirahul@gmail.com> wrote:
> >>
> >> > Hi,
> >> >
> >> > I was able to get blur started (shards and controllers). It worked
> >> straight
> >> > away. Awesome. I have a few more questions. My apologies if some of
> the
> >> > questions are naive.
> >> >
> >> > 1. I am not able to find the 'blur.*.hostname' properties in the
> >> > blur.properties file, but these are listed in the readme file
> >> > 2. There seems to be a lot of code. I greatly appreciate if someone
> can
> >> > give me pointers before I dig through the codebase. Something like an
> >> > architectural overview or a flow explaining how the search query is
> >> > resolved.
> >> > 3. How do you guys manage your development workspace with eclipse,
> git,
> >> and
> >> > maven. This will definitely help me get a kickstart.
> >> > 4. I started Hadoop (HDFS+MapReduce), Zookeeper, and Blur. What are
> the
> >> > steps in actually using it. Where do we start?
> >> >
> >> > Also I am outlining the steps that I followed in getting blur to run
> and
> >> > also I got a couple of errors during the build process which are also
> >> > listed below. The overall build was successful though.
> >> >
> >> > Apache Blur Single Node Setup on Mac OS X Lion
> >> >
> >> > 1. Environment : Single Node Hadoop-1.0.4 and Zookeeper-3.4.5
> >> > 2. Get the Blur code from Git using git clone
> >> > https://git-wip-us.apache.org/repos/asf/incubator-blur.git
> >> > 3. Checkout the branch 0.1.5
> >> > 4. Run 'mvn clean install' from the 'src' directory as superuser
> >> > 5. Extract the Blur tar.gz file from the 'target/' directory into a
> >> > convenient location and set BLUR_HOME to this location and add it to
> >> > .bash_profile
> >> > 6. Go to the extracted folder and configure the
> >> > $BLUR_HOME/config/blur-env.sh file.  The two exports that are
> required:
> >> >            export JAVA_HOME=$(/usr/libexec/java_home)
> >> >            export HADOOP_HOME=/usr/local/hadoop
> >> > 7. Setup the $BLUR_HOME/config/blur.properties file.  The default site
> >> > configuration:
> >> >            blur.zookeeper.connection=localhost
> >> >            blur.cluster.name=default
> >> > 8. Start blur using $BLUR_HOME/bin/start-all.sh
> >> >
> >> > Errors during the build process :
> >> >
> >> >  ERROR 20130430_22:47:42:042_PDT [IndexReader-Refresher]
> >> > writer.BlurIndexRefresher: Unknown error
> >> > org.apache.lucene.store.AlreadyClosedException: this Directory is
> closed
> >> > at org.apache.lucene.store.Directory.ensureOpen(Directory.java:256)
> >> > at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
> >> >  at
> >> >
> >> >
> >>
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:679)
> >> > at
> >> >
> >> >
> >>
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630)
> >> >  at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343)
> >> > at
> >> >
> >> >
> >>
> org.apache.lucene.index.StandardDirectoryReader.isCurrent(StandardDirectoryReader.java:326)
> >> >  at
> >> >
> >> >
> >>
> org.apache.lucene.index.StandardDirectoryReader.doOpenNoWriter(StandardDirectoryReader.java:284)
> >> > at
> >> >
> >> >
> >>
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:247)
> >> >  at
> >> >
> >> >
> >>
> org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:235)
> >> > at
> >> >
> >> >
> >>
> org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:169)
> >> >  at
> >> >
> >> >
> >>
> org.apache.blur.manager.writer.BlurIndexReader.refresh(BlurIndexReader.java:82)
> >> > at
> >> >
> >> >
> >>
> org.apache.blur.manager.writer.BlurIndexRefresher.refreshInternal(BlurIndexRefresher.java:70)
> >> >  at
> >> >
> >> >
> >>
> org.apache.blur.manager.writer.BlurIndexRefresher.run(BlurIndexRefresher.java:61)
> >> > at java.util.TimerThread.mainLoop(Timer.java:512)
> >> >  at java.util.TimerThread.run(Timer.java:462)
> >> >   WARN  20130501_20:54:18:018_PDT [main] jmx.MBeanRegistry: Error
> during
> >> > unregister
> >> > javax.management.InstanceNotFoundException:
> >> >
> >> >
> >>
> org.apache.ZooKeeperService:name0=StandaloneServer_port-1,name1=InMemoryDataTree
> >> >  at
> >> >
> >> >
> >>
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getMBean(DefaultMBeanServerInterceptor.java:1094)
> >> > at
> >> >
> >> >
> >>
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.exclusiveUnregisterMBean(DefaultMBeanServerInterceptor.java:415)
> >> >  at
> >> >
> >> >
> >>
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.unregisterMBean(DefaultMBeanServerInterceptor.java:403)
> >> > at
> >> >
> >> >
> >>
> com.sun.jmx.mbeanserver.JmxMBeanServer.unregisterMBean(JmxMBeanServer.java:507)
> >> >  at
> >> >
> >>
> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:115)
> >> > at
> >> >
> >>
> org.apache.zookeeper.jmx.MBeanRegistry.unregister(MBeanRegistry.java:132)
> >> >  at
> >> >
> >> >
> >>
> org.apache.zookeeper.server.ZooKeeperServer.unregisterJMX(ZooKeeperServer.java:443)
> >> > at
> >> >
> >> >
> >>
> org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:436)
> >> >  at
> >> >
> >> >
> >>
> org.apache.zookeeper.server.NIOServerCnxnFactory.shutdown(NIOServerCnxnFactory.java:271)
> >> > at
> >> >
> >> >
> >>
> org.apache.zookeeper.server.ZooKeeperServerMain.shutdown(ZooKeeperServerMain.java:127)
> >> >  at
> >> >
> >> >
> >>
> org.apache.blur.MiniCluster$ZooKeeperServerMainEmbedded.shutdown(MiniCluster.java:339)
> >> > at org.apache.blur.MiniCluster.shutdownZooKeeper(MiniCluster.java:427)
> >> >  at
> >> org.apache.blur.MiniCluster.shutdownBlurCluster(MiniCluster.java:146)
> >> > at
> >> >
> >> >
> >>
> org.apache.blur.thrift.BlurClusterTest.shutdownCluster(BlurClusterTest.java:81)
> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >  at
> >> >
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> >  at
> >> >
> >> >
> >>
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
> >> > at
> >> >
> >> >
> >>
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> >> >  at
> >> >
> >> >
> >>
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
> >> > at
> >> >
> >>
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:37)
> >> >  at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
> >> > at
> >> >
> >> >
> >>
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
> >> >  at
> >> >
> >> >
> >>
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
> >> > at
> >> >
> >> >
> >>
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >> >  at
> >> >
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> > at java.lang.reflect.Method.invoke(Method.java:597)
> >> >  at
> >> >
> >> >
> >>
> org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
> >> > at
> >> >
> >> >
> >>
> org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
> >> >  at
> >> >
> >> >
> >>
> org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
> >> > at
> >> >
> >> >
> >>
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
> >> >  at
> >> >
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
> >> >
> >> >
> >> > - Rahul
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Apr 30, 2013 at 9:29 PM, rahul challapalli <
> >> > challapallirahul@gmail.com> wrote:
> >> >
> >> > > Aaron,
> >> > >
> >> > > Thanks for your reply. I will sure let you know how it goes.
> >> > >
> >> > > - Rahul
> >> > >
> >> > >
> >> > > On Tue, Apr 30, 2013 at 7:33 PM, Aaron McCurry <amccurry@gmail.com>
> >> > wrote:
> >> > >
> >> > >> Hi Rahul,
> >> > >>
> >> > >> Welcome!  Blur is a young incubator project and with that there
is
> >> not a
> >> > >> lot of documentation.  Yet.  But we do have a lot of code.  :-)
> >> > >>
> >> > >> Blur uses HDFS for storing indexes, MapReduce for bulk indexing,
> >> Thrift
> >> > >> for
> >> > >> RPC and ZooKeeper for state, and of course Lucene for search.
 Yes
> >> Blur
> >> > >> can
> >> > >> and should run along side a standard Hadoop install (MapReduce
+
> >> HDFS).
> >> > >>  It
> >> > >> currently works with the 1.0.x version or CDH3 from Cloudera.
 I'm
> >> sure
> >> > we
> >> > >> can get it to work with 2.0.x and CDH4, it just hasn't happen
yet.
> >> > >>  However
> >> > >> the only dependency to run Blur on a single machine is ZooKeeper.
> >>  HDFS
> >> > is
> >> > >> required for a cluster.
> >> > >>
> >> > >> To get you started.
> >> > >>
> >> > >> git clone
> https://git-wip-us.apache.org/repos/asf/incubator-blur.git
> >> > >>
> >> > >> # we are currently focusing on getting 0.1.5 to a releasable state.
> >> > >> git checkout 0.1.5
> >> > >>
> >> > >> In the checkout you will find a README.md that is a bit out of
date
> >> with
> >> > >> the code examples but the general theme is correct.  For more
> >> examples
> >> > >> take
> >> > >> a look at the blur-testsuite project, there are a lot of code
> >> examples
> >> > in
> >> > >> there to get you started.
> >> > >>
> >> > >> To build the project into a tarball that can be extracted and
> >> executed.
> >> > >>
> >> > >> run "mvn install" from the src/ directory.  Once it has
> successfully
> >> > >> executed all the tests and built everything you will find a tar.gz
> >> file
> >> > in
> >> > >> the target/ directory in the distribution project.
> >> > >>
> >> > >> Before you can run Blur, Apache ZooKeeper needs to be running.
 A
> >> > default
> >> > >> install will work.
> >> > >>
> >> > >> After extracting the Blur tar.gz file you should be able to run
the
> >> > >> bin/start-all.sh and it should start a Blur controller and a shard
> >> > server
> >> > >> on your local machine.
> >> > >>
> >> > >> I would love to hear how your initial compile and install goes,
> >> because
> >> > we
> >> > >> could use this thread and any information that is exchanged to
> >> create a
> >> > >> nice little wiki page for 0.1.5.
> >> > >>
> >> > >> Thank!
> >> > >>
> >> > >> Aaron
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >>
> >> > >> On Tue, Apr 30, 2013 at 2:17 PM, rahul challapalli <
> >> > >> challapallirahul@gmail.com> wrote:
> >> > >>
> >> > >> > Hi,
> >> > >> >
> >> > >> > I am new to blur and even ASF in terms of contributing back
to a
> >> > >> project. I
> >> > >> > have decent knowledge about hadoop and mapreduce but completely
> >> new to
> >> > >> > search. I come from a Java/PHP background. I  am looking
for some
> >> > >> direction
> >> > >> > in setting up blur on my local machine. I have a single node
> hadoop
> >> > >> > installation on my Mac OS X Lion. Is it an issue if I have
HDFS,
> >> > >> MapReduce
> >> > >> > daemons running alongside blur on the same machine. I would
> greatly
> >> > >> > appreciate if you can refer me to some setup document as
well as
> an
> >> > >> insight
> >> > >> > into the architecture of blur. Thank You.
> >> > >> >
> >> > >> > - Rahul
> >> > >> >
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message