incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: question on default block caching file exclusions
Date Fri, 12 Oct 2012 23:36:04 GMT
Seems like a useful setting, although you're right I'm not sure how it
would be tuned manually.

That said the setting should be "blockCacheExcludeFileTypes" (exclude
rather than include) given you're much more likely to want to take a
blacklisting approach here. For example if the default were to be
fixed "FDT,FDX" would be easier to specify than the inverse.

Patrick

On Fri, Oct 12, 2012 at 4:31 PM, Aaron McCurry <amccurry@gmail.com> wrote:
> Yes after reviewing the code you are correct.  If
> blockCachingFileTypes is passed in as null then is will try and cache
> all files (no exclusions).  I can/will update the wiki to reflect this
> error, but this begs the question should it operate as described in
> the wiki (excluding the FDX and FDT files)?  Or operate as coded?  I
> feel like if we implement a LIRS caching strategy then excluding the
> FDX and FDT files is not needed, and possibly not needed in it's
> current implementation either.
>
> Aaron
>
> On Fri, Oct 12, 2012 at 7:16 PM, Patrick Hunt <phunt@apache.org> wrote:
>> Thanks Aaron. I am looking at the master, and I do see line 501. I
>> also see that clusterstatus is coming from ZK, and that information in
>> ZK is originally set from a TableDescriptor.
>>
>> But I don't see anything other than blockCachingFileTypes being set to
>> null here:
>> org.apache.blur.thrift.generated.TableDescriptor.TableDescriptor()
>>
>> And if that's the case it looks to me like BlockDirectory will cache every file
>>
>>     if (_blockCacheFileTypes == null || isCachableFile(name)) {
>>       return new CachedIndexInput(source, _blockSize, name,
>> getFileCacheName(name), _cache, bufferSize);
>>     }
>>
>> rather than what the wiki page was mentioning: " If you leave this
>> null the default is to cache ALL Lucene file types except for the FDT
>> and FDX file types"
>>
>> Patrick
>>
>> On Fri, Oct 12, 2012 at 3:53 PM, Aaron McCurry <amccurry@gmail.com> wrote:
>>> So if you are looking the new-api-project you are correct.  In the
>>> master take a look at
>>> o.a.b.manager.indexserver.DistributedIndexServer:501.  Unfortunately
>>> the project is spread between 2 versions of Lucene and that's why
>>> there's no reference in the new api branch code.  Does this help?
>>>
>>> Aaron
>>>
>>>
>>> On Fri, Oct 12, 2012 at 6:34 PM, Patrick Hunt <phunt@apache.org> wrote:
>>>> Hi, I noticed this comment on the wiki:
>>>>
>>>> http://wiki.apache.org/blur/BlockCacheConfiguration
>>>> "To control the blockCachingFileTypes create a set with the given
>>>> Lucene file type extensions that you wish to cache. If you leave this
>>>> null the default is to cache ALL Lucene file types except for the FDT
>>>> and FDX file types which are used for data retrieval only and are not
>>>> accessed during the search itself."
>>>>
>>>> However I don't see in the code where this default is set. afaict all
>>>> file types are cached in the default case. What am I missing?
>>>>
>>>> Patrick

Mime
View raw message