incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: HDFS Directory
Date Fri, 01 Nov 2013 11:40:36 GMT
One other thing, have you set the -XX:MaxDirectoryMemorySize=<>   ?

Aaron


On Thu, Oct 31, 2013 at 8:31 PM, Aaron McCurry <amccurry@gmail.com> wrote:

> Josh,
>
> Since you are using the v2 block cache in your own implementation it will
> be hard to really debug the issue.  Assuming the Metrics code is still in
> place then it might point you to where the issue may lie.
>
> http://incubator.apache.org/blur/docs/0.2.0/cluster-setup.html#metrics
>
> Also by default the fdt files are not cached in block cache.  So unless
> you have altered the default settings that may be where your issue lies.
>
> Aaron
>
>
> On Thu, Oct 31, 2013 at 5:39 PM, Josh Clum <joshclum@gmail.com> wrote:
>
>> Colton - I think the transfer latency from HDFS is to be expected. What
>> was
>> unexpected was how long it took to return results when everything was
>> already inside the cache.
>>
>>
>> On Thu, Oct 31, 2013 at 5:19 PM, Colton McInroy <colton@dosarrest.com
>> >wrote:
>>
>> > Are your times taking into account network latency? If you getting
>> > latency/transfer time that's causing about 2000ms delay, then if you
>> took
>> > your times and subtracted that delay, you would get cache times that are
>> > much better than your without cache times. If you latency fluctuates a
>> bit,
>> > the could account for some of the differences in time.
>> > I could be wrong, but it depends upon the switch fabric between the code
>> > executing the queries and code processing them. Local traffic on a box
>> > compared to network traffic is much different. A single switch can add
>> > 200ms delay. This may not even apply to your situation, but this is the
>> > first thing that came up in my head.
>> >
>> > Thanks,
>> > Colton McInroy
>> >
>> >  * Director of Security Engineering
>> >
>> >
>> > Phone
>> > (Toll Free)
>> > _US_    (888)-818-1344 Press 2
>> > _UK_    0-800-635-0551 Press 2
>> >
>> > My Extension    101
>> > 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
>> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
>> > Website         http://www.dosarrest.com
>> >
>> >
>> > On 10/31/2013 1:50 PM, Josh Clum wrote:
>> >
>> >> Hello,
>> >>
>> >> I refactored out the HDFS directory implementation from Blur to use in
>> my
>> >> own project and was surprised to see how it performed. I'm using the
>> both
>> >> the HDFSDirectory class and the
>> >> BlockCacheDirectoryFactoryV2 class.
>> >>
>> >> On my local machine when using the cache there was a significant speed
>> up.
>> >> It was a small enough that each file making up lucene index (12 docs)
>> fit
>> >> into one block inside the cache.
>> >>
>> >> When running it on a multinode cluster on AWS the performance pulling
>> back
>> >> 1031 docs with the cache was not that much better than without.
>> According
>> >> to my log statements, the cache was being hit every time, but the
>> >> difference between this an my local was that there were several blocks
>> per
>> >> file.
>> >>
>> >> When setting up the cache I used the default BlurConfiguration.
>> >>
>> >> Any ideas on how to speed up performance? Should I change the block
>> size?
>> >> Is there something that blur does to put a wrapper around the cache?
>> >>
>> >> ON A MULTI NODE CLUSTER
>> >> Number of documents in directory[1031]
>> >> Without Cache ->
>> >> Try #1 -> Total execution time: 4816
>> >> Try #2 -> Total execution time: 3137
>> >> Try #3 -> Total execution time: 2921
>> >> Try #4 -> Total execution time: 2525
>> >> Try #5 -> Total execution time: 2698
>> >> Try #6 -> Total execution time: 2330
>> >> Try #7 -> Total execution time: 2464
>> >> Try #8 -> Total execution time: 2568
>> >> Try #9 -> Total execution time: 2524
>> >> Try #10 -> Total execution time: 2537
>> >> With Cache ->
>> >> Cached try #1 -> Total execution time: 2228
>> >> Cached try #2 -> Total execution time: 2243
>> >> Cached try #3 -> Total execution time: 2584
>> >> Cached try #4 -> Total execution time: 2509
>> >> Cached try #5 -> Total execution time: 2163
>> >> Cached try #6 -> Total execution time: 2094
>> >> Cached try #7 -> Total execution time: 2069
>> >> Cached try #8 -> Total execution time: 2105
>> >> Cached try #9 -> Total execution time: 2124
>> >> Cached try #10 -> Total execution time: 2213
>> >>
>> >> ON MY LOCAL
>> >> Number of documents in directory[12]
>> >> Without Cache ->
>> >> Try #1 -> Total execution time: 599
>> >> Try #2 -> Total execution time: 639
>> >> Try #3 -> Total execution time: 461
>> >> Try #4 -> Total execution time: 544
>> >> Try #5 -> Total execution time: 424
>> >> Try #6 -> Total execution time: 381
>> >> Try #7 -> Total execution time: 487
>> >> Try #8 -> Total execution time: 368
>> >> Try #9 -> Total execution time: 311
>> >> Try #10 -> Total execution time: 411
>> >> With Cache ->
>> >> Cached try #1 -> Total execution time: 31
>> >> Cached try #2 -> Total execution time: 32
>> >> Cached try #3 -> Total execution time: 27
>> >> Cached try #4 -> Total execution time: 23
>> >> Cached try #5 -> Total execution time: 21
>> >> Cached try #6 -> Total execution time: 26
>> >> Cached try #7 -> Total execution time: 27
>> >> Cached try #8 -> Total execution time: 28
>> >> Cached try #9 -> Total execution time: 26
>> >> Cached try #10 -> Total execution time: 27
>> >>
>> >> Thanks,
>> >> Josh
>> >>
>> >>
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message