hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinaya Shastrakar <shastrakar.vin...@yahoo.co.in>
Subject New to Hadoop
Date Wed, 18 Apr 2007 08:01:00 GMT
  We're interested in using Hadoop for our application for purposes of replication and distribution
of query execution.  But I have some questions as to whether it's a good fit.  We have essentially
written a search engine using Jena (Semantic Web framework) and its accompanying Lucene interface
called LARQ (Lucene ARQ) to allow for free-text search over the RDF graphs stored in Jena.

We expect the Lucene indexes to get very large, thus the need for Hadoop.  I tried going through
the documentation provided on the site, but want to clarify some points that we are unable
to answer from the wiki, faq, etc: 

1.  We're not using Nutch, but the documentation seems to reference it frequently.  Is this
a problem?  Can Lucene indexes alone be used with Hadoop without using Nutch?

2.  Are there any best practices to using Hadoop behind such a setup in terms of creating/querying/managing
the Lucene indexes?  I found this thread ( http://www.mail-archive.com/hadoop-user@lucene.apache.org/msg00573.html
), but could use some clarification on several of the points mentioned. 
  3. How does Hadoop access, process & replicate the Lucene indexes in case we generate
the indexes in our local file system as against HDFS?
  4. Please provide a standard flow of execution as to how Hadoop works when Lucene is queried.

 Check out what you're missing if you're not on Yahoo! Messenger 
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message