lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [jira] Commented: (SOLR-1301) Solr + Hadoop
Date Fri, 15 Jan 2010 20:09:50 GMT
I can see why that is a win over the existing, but I still don't get why it wouldn't be faster
just to index to a suite of Solr master indexers and save all this file slogging around. 
But, I guess that is a separate patch all together.



On Jan 15, 2010, at 2:35 PM, Jason Rutherglen wrote:

> Zipping cores/shards is in the latest patch...
> 
> On Fri, Jan 15, 2010 at 11:22 AM, Andrzej Bialecki <ab@getopt.org> wrote:
>> On 2010-01-15 20:13, Ted Dunning wrote:
>>> 
>>> This can also be a big performance win.  Jason Venner reports significant
>>> index and cluster start time improvements by indexing to local disk,
>>> zipping
>>> and then uploading the resulting zip file.  Hadoop has significant file
>>> open
>>> overhead so moving one zip file wins big over many index component files.
>>> There is a secondary bandwidth win as well.
>> 
>> Indeed, this one should be easy to add to this patch. Unless Jason & Jason
>> already cooked a patch for this? ;)
>> 
>>> 
>>> On Fri, Jan 15, 2010 at 8:34 AM, Andrzej Bialecki
>>> (JIRA)<jira@apache.org>wrote:
>>> 
>>>> 
>>>> HDFS doesn't support enough POSIX to support writing Lucene indexes
>>>> directly to HDFS - for this reason indexes are always created on local
>>>> storage of each node, and then after closing they are copied to HDFS.
>> 
>> 
>> 
>> 
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>> 
>> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Mime
View raw message