incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colton McInroy <col...@dosarrest.com>
Subject Re: HDFS Directory
Date Thu, 31 Oct 2013 21:19:43 GMT
Are your times taking into account network latency? If you getting 
latency/transfer time that's causing about 2000ms delay, then if you 
took your times and subtracted that delay, you would get cache times 
that are much better than your without cache times. If you latency 
fluctuates a bit, the could account for some of the differences in time.
I could be wrong, but it depends upon the switch fabric between the code 
executing the queries and code processing them. Local traffic on a box 
compared to network traffic is much different. A single switch can add 
200ms delay. This may not even apply to your situation, but this is the 
first thing that came up in my head.

Thanks,
Colton McInroy

  * Director of Security Engineering

	
Phone
(Toll Free) 	
_US_ 	(888)-818-1344 Press 2
_UK_ 	0-800-635-0551 Press 2

My Extension 	101
24/7 Support 	support@dosarrest.com <mailto:support@dosarrest.com>
Email 	colton@dosarrest.com <mailto:colton@dosarrest.com>
Website 	http://www.dosarrest.com

On 10/31/2013 1:50 PM, Josh Clum wrote:
> Hello,
>
> I refactored out the HDFS directory implementation from Blur to use in my
> own project and was surprised to see how it performed. I'm using the both
> the HDFSDirectory class and the
> BlockCacheDirectoryFactoryV2 class.
>
> On my local machine when using the cache there was a significant speed up.
> It was a small enough that each file making up lucene index (12 docs) fit
> into one block inside the cache.
>
> When running it on a multinode cluster on AWS the performance pulling back
> 1031 docs with the cache was not that much better than without. According
> to my log statements, the cache was being hit every time, but the
> difference between this an my local was that there were several blocks per
> file.
>
> When setting up the cache I used the default BlurConfiguration.
>
> Any ideas on how to speed up performance? Should I change the block size?
> Is there something that blur does to put a wrapper around the cache?
>
> ON A MULTI NODE CLUSTER
> Number of documents in directory[1031]
> Without Cache ->
> Try #1 -> Total execution time: 4816
> Try #2 -> Total execution time: 3137
> Try #3 -> Total execution time: 2921
> Try #4 -> Total execution time: 2525
> Try #5 -> Total execution time: 2698
> Try #6 -> Total execution time: 2330
> Try #7 -> Total execution time: 2464
> Try #8 -> Total execution time: 2568
> Try #9 -> Total execution time: 2524
> Try #10 -> Total execution time: 2537
> With Cache ->
> Cached try #1 -> Total execution time: 2228
> Cached try #2 -> Total execution time: 2243
> Cached try #3 -> Total execution time: 2584
> Cached try #4 -> Total execution time: 2509
> Cached try #5 -> Total execution time: 2163
> Cached try #6 -> Total execution time: 2094
> Cached try #7 -> Total execution time: 2069
> Cached try #8 -> Total execution time: 2105
> Cached try #9 -> Total execution time: 2124
> Cached try #10 -> Total execution time: 2213
>
> ON MY LOCAL
> Number of documents in directory[12]
> Without Cache ->
> Try #1 -> Total execution time: 599
> Try #2 -> Total execution time: 639
> Try #3 -> Total execution time: 461
> Try #4 -> Total execution time: 544
> Try #5 -> Total execution time: 424
> Try #6 -> Total execution time: 381
> Try #7 -> Total execution time: 487
> Try #8 -> Total execution time: 368
> Try #9 -> Total execution time: 311
> Try #10 -> Total execution time: 411
> With Cache ->
> Cached try #1 -> Total execution time: 31
> Cached try #2 -> Total execution time: 32
> Cached try #3 -> Total execution time: 27
> Cached try #4 -> Total execution time: 23
> Cached try #5 -> Total execution time: 21
> Cached try #6 -> Total execution time: 26
> Cached try #7 -> Total execution time: 27
> Cached try #8 -> Total execution time: 28
> Cached try #9 -> Total execution time: 26
> Cached try #10 -> Total execution time: 27
>
> Thanks,
> Josh
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message