hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Seigel <ja...@tynt.com>
Subject Re: Help for the problem of running lucene on Hadoop
Date Sat, 01 Jan 2011 16:09:40 GMT
Well.   Depending on the size of you cluster it could be or it might not.

If you have 100 machines with 8 maps each trying to connect to Mysql.
You might get some hilarity.

If you have three machines you won't knock your Mysql instance over.

Sent from my mobile. Please excuse the typos.

On 2010-12-31, at 10:56 PM, Jander g <jandergj@gmail.com> wrote:

> Thanks for all the above reply.
>
> Now my idea is: running word segmentation on Hadoop and creating the
> inverted index in mysql. As we know, Hadoop MR supports writing and reading
> to mysql.
>
> Does this have any problem?
>
> On Sat, Jan 1, 2011 at 7:49 AM, James Seigel <james@tynt.com> wrote:
>
>> Check out katta for an example
>>
>> J
>>
>> Sent from my mobile. Please excuse the typos.
>>
>> On 2010-12-31, at 4:47 PM, Lance Norskog <goksron@gmail.com> wrote:
>>
>>> This will not work for indexing. Lucene requires random read/write to
>>> a file and HDFS does not support this. HDFS only allows sequential
>>> writes: you start at the beginninig and copy the file in to block 0,
>>> block 1,...block N.
>>>
>>> For querying, if your HDFS implementation makes a local cache that
>>> appears as a file system (I think FUSE does this?) it might work well.
>>> But, yes, you should copy it down.
>>>
>>> On Fri, Dec 31, 2010 at 4:43 AM, Zhou, Yunqing <azurezyq@gmail.com>
>> wrote:
>>>> You should implement the Directory class by your self.
>>>> Nutch provided one, named HDFSDirectory.
>>>> You can use it to build the index, but when doing search on HDFS, it is
>>>> relatively slower, especially on phrase queries.
>>>> I recommend you to download it to disk when performing a search.
>>>>
>>>> On Fri, Dec 31, 2010 at 5:08 PM, Jander g <jandergj@gmail.com> wrote:
>>>>
>>>>> Hi, all
>>>>>
>>>>> I want  to run lucene on Hadoop, The problem as follows:
>>>>>
>>>>> IndexWriter writer = new IndexWriter(FSDirectory.open(new
>>>>> File("index")),new StandardAnalyzer(), true,
>>>>> IndexWriter.MaxFieldLength.LIMITED);
>>>>>
>>>>> when using Hadoop, whether the first param must be the dir of HDFS? And
>> how
>>>>> to use?
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Jander
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goksron@gmail.com
>>
>
>
>
> --
> Thanks,
> Jander

Mime
View raw message