incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Clum <joshc...@gmail.com>
Subject HDFS Directory
Date Thu, 31 Oct 2013 20:50:10 GMT
Hello,

I refactored out the HDFS directory implementation from Blur to use in my
own project and was surprised to see how it performed. I'm using the both
the HDFSDirectory class and the
BlockCacheDirectoryFactoryV2 class.

On my local machine when using the cache there was a significant speed up.
It was a small enough that each file making up lucene index (12 docs) fit
into one block inside the cache.

When running it on a multinode cluster on AWS the performance pulling back
1031 docs with the cache was not that much better than without. According
to my log statements, the cache was being hit every time, but the
difference between this an my local was that there were several blocks per
file.

When setting up the cache I used the default BlurConfiguration.

Any ideas on how to speed up performance? Should I change the block size?
Is there something that blur does to put a wrapper around the cache?

ON A MULTI NODE CLUSTER
Number of documents in directory[1031]
Without Cache ->
Try #1 -> Total execution time: 4816
Try #2 -> Total execution time: 3137
Try #3 -> Total execution time: 2921
Try #4 -> Total execution time: 2525
Try #5 -> Total execution time: 2698
Try #6 -> Total execution time: 2330
Try #7 -> Total execution time: 2464
Try #8 -> Total execution time: 2568
Try #9 -> Total execution time: 2524
Try #10 -> Total execution time: 2537
With Cache ->
Cached try #1 -> Total execution time: 2228
Cached try #2 -> Total execution time: 2243
Cached try #3 -> Total execution time: 2584
Cached try #4 -> Total execution time: 2509
Cached try #5 -> Total execution time: 2163
Cached try #6 -> Total execution time: 2094
Cached try #7 -> Total execution time: 2069
Cached try #8 -> Total execution time: 2105
Cached try #9 -> Total execution time: 2124
Cached try #10 -> Total execution time: 2213

ON MY LOCAL
Number of documents in directory[12]
Without Cache ->
Try #1 -> Total execution time: 599
Try #2 -> Total execution time: 639
Try #3 -> Total execution time: 461
Try #4 -> Total execution time: 544
Try #5 -> Total execution time: 424
Try #6 -> Total execution time: 381
Try #7 -> Total execution time: 487
Try #8 -> Total execution time: 368
Try #9 -> Total execution time: 311
Try #10 -> Total execution time: 411
With Cache ->
Cached try #1 -> Total execution time: 31
Cached try #2 -> Total execution time: 32
Cached try #3 -> Total execution time: 27
Cached try #4 -> Total execution time: 23
Cached try #5 -> Total execution time: 21
Cached try #6 -> Total execution time: 26
Cached try #7 -> Total execution time: 27
Cached try #8 -> Total execution time: 28
Cached try #9 -> Total execution time: 26
Cached try #10 -> Total execution time: 27

Thanks,
Josh

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message