hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lance Norskog <goks...@gmail.com>
Subject Re: Help for the problem of running lucene on Hadoop
Date Mon, 03 Jan 2011 02:29:37 GMT
Getting back to low-level details, you could do this: make a Directory
class which holds two directories. All indexing activity happens on
the first one. Directory.close() closes the first one, copies it to
the second, and then deletes the first.

If the first is a local file or RAMdirectory, the second is an HDFS
directory, this would give a very clean way to index into an HDFS
directory.

Lance

On Sun, Jan 2, 2011 at 12:56 AM, Ted Dunning <tdunning@maprtech.com> wrote:
> With even a dozen or two servers, it is very easy to flatten a mysql server
> with a hadoop cluster.
>
> Also, mysql is typically a very poor storage system for an inverted index
> because it doesn't allow for compression of the posting vectors.
>
> Better to copy Katta in this required and create many independent indexes.
>
> On Fri, Dec 31, 2010 at 9:56 PM, Jander g <jandergj@gmail.com> wrote:
>
>> Thanks for all the above reply.
>>
>> Now my idea is: running word segmentation on Hadoop and creating the
>> inverted index in mysql. As we know, Hadoop MR supports writing and reading
>> to mysql.
>>
>> Does this have any problem?
>>
>> On Sat, Jan 1, 2011 at 7:49 AM, James Seigel <james@tynt.com> wrote:
>>
>> > Check out katta for an example
>>
>



-- 
Lance Norskog
goksron@gmail.com

Mime
View raw message