incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Clum <joshc...@gmail.com>
Subject Re: HDFS Directory
Date Thu, 31 Oct 2013 21:39:11 GMT
Colton - I think the transfer latency from HDFS is to be expected. What was
unexpected was how long it took to return results when everything was
already inside the cache.


On Thu, Oct 31, 2013 at 5:19 PM, Colton McInroy <colton@dosarrest.com>wrote:

> Are your times taking into account network latency? If you getting
> latency/transfer time that's causing about 2000ms delay, then if you took
> your times and subtracted that delay, you would get cache times that are
> much better than your without cache times. If you latency fluctuates a bit,
> the could account for some of the differences in time.
> I could be wrong, but it depends upon the switch fabric between the code
> executing the queries and code processing them. Local traffic on a box
> compared to network traffic is much different. A single switch can add
> 200ms delay. This may not even apply to your situation, but this is the
> first thing that came up in my head.
>
> Thanks,
> Colton McInroy
>
>  * Director of Security Engineering
>
>
> Phone
> (Toll Free)
> _US_    (888)-818-1344 Press 2
> _UK_    0-800-635-0551 Press 2
>
> My Extension    101
> 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
> Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> Website         http://www.dosarrest.com
>
>
> On 10/31/2013 1:50 PM, Josh Clum wrote:
>
>> Hello,
>>
>> I refactored out the HDFS directory implementation from Blur to use in my
>> own project and was surprised to see how it performed. I'm using the both
>> the HDFSDirectory class and the
>> BlockCacheDirectoryFactoryV2 class.
>>
>> On my local machine when using the cache there was a significant speed up.
>> It was a small enough that each file making up lucene index (12 docs) fit
>> into one block inside the cache.
>>
>> When running it on a multinode cluster on AWS the performance pulling back
>> 1031 docs with the cache was not that much better than without. According
>> to my log statements, the cache was being hit every time, but the
>> difference between this an my local was that there were several blocks per
>> file.
>>
>> When setting up the cache I used the default BlurConfiguration.
>>
>> Any ideas on how to speed up performance? Should I change the block size?
>> Is there something that blur does to put a wrapper around the cache?
>>
>> ON A MULTI NODE CLUSTER
>> Number of documents in directory[1031]
>> Without Cache ->
>> Try #1 -> Total execution time: 4816
>> Try #2 -> Total execution time: 3137
>> Try #3 -> Total execution time: 2921
>> Try #4 -> Total execution time: 2525
>> Try #5 -> Total execution time: 2698
>> Try #6 -> Total execution time: 2330
>> Try #7 -> Total execution time: 2464
>> Try #8 -> Total execution time: 2568
>> Try #9 -> Total execution time: 2524
>> Try #10 -> Total execution time: 2537
>> With Cache ->
>> Cached try #1 -> Total execution time: 2228
>> Cached try #2 -> Total execution time: 2243
>> Cached try #3 -> Total execution time: 2584
>> Cached try #4 -> Total execution time: 2509
>> Cached try #5 -> Total execution time: 2163
>> Cached try #6 -> Total execution time: 2094
>> Cached try #7 -> Total execution time: 2069
>> Cached try #8 -> Total execution time: 2105
>> Cached try #9 -> Total execution time: 2124
>> Cached try #10 -> Total execution time: 2213
>>
>> ON MY LOCAL
>> Number of documents in directory[12]
>> Without Cache ->
>> Try #1 -> Total execution time: 599
>> Try #2 -> Total execution time: 639
>> Try #3 -> Total execution time: 461
>> Try #4 -> Total execution time: 544
>> Try #5 -> Total execution time: 424
>> Try #6 -> Total execution time: 381
>> Try #7 -> Total execution time: 487
>> Try #8 -> Total execution time: 368
>> Try #9 -> Total execution time: 311
>> Try #10 -> Total execution time: 411
>> With Cache ->
>> Cached try #1 -> Total execution time: 31
>> Cached try #2 -> Total execution time: 32
>> Cached try #3 -> Total execution time: 27
>> Cached try #4 -> Total execution time: 23
>> Cached try #5 -> Total execution time: 21
>> Cached try #6 -> Total execution time: 26
>> Cached try #7 -> Total execution time: 27
>> Cached try #8 -> Total execution time: 28
>> Cached try #9 -> Total execution time: 26
>> Cached try #10 -> Total execution time: 27
>>
>> Thanks,
>> Josh
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message