incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: HDFS Directory
Date Fri, 01 Nov 2013 00:31:14 GMT
Josh,

Since you are using the v2 block cache in your own implementation it will
be hard to really debug the issue.  Assuming the Metrics code is still in
place then it might point you to where the issue may lie.

http://incubator.apache.org/blur/docs/0.2.0/cluster-setup.html#metrics

Also by default the fdt files are not cached in block cache.  So unless you
have altered the default settings that may be where your issue lies.

Aaron


On Thu, Oct 31, 2013 at 5:39 PM, Josh Clum <joshclum@gmail.com> wrote:

> Colton - I think the transfer latency from HDFS is to be expected. What was
> unexpected was how long it took to return results when everything was
> already inside the cache.
>
>
> On Thu, Oct 31, 2013 at 5:19 PM, Colton McInroy <colton@dosarrest.com
> >wrote:
>
> > Are your times taking into account network latency? If you getting
> > latency/transfer time that's causing about 2000ms delay, then if you took
> > your times and subtracted that delay, you would get cache times that are
> > much better than your without cache times. If you latency fluctuates a
> bit,
> > the could account for some of the differences in time.
> > I could be wrong, but it depends upon the switch fabric between the code
> > executing the queries and code processing them. Local traffic on a box
> > compared to network traffic is much different. A single switch can add
> > 200ms delay. This may not even apply to your situation, but this is the
> > first thing that came up in my head.
> >
> > Thanks,
> > Colton McInroy
> >
> >  * Director of Security Engineering
> >
> >
> > Phone
> > (Toll Free)
> > _US_    (888)-818-1344 Press 2
> > _UK_    0-800-635-0551 Press 2
> >
> > My Extension    101
> > 24/7 Support    support@dosarrest.com <mailto:support@dosarrest.com>
> > Email   colton@dosarrest.com <mailto:colton@dosarrest.com>
> > Website         http://www.dosarrest.com
> >
> >
> > On 10/31/2013 1:50 PM, Josh Clum wrote:
> >
> >> Hello,
> >>
> >> I refactored out the HDFS directory implementation from Blur to use in
> my
> >> own project and was surprised to see how it performed. I'm using the
> both
> >> the HDFSDirectory class and the
> >> BlockCacheDirectoryFactoryV2 class.
> >>
> >> On my local machine when using the cache there was a significant speed
> up.
> >> It was a small enough that each file making up lucene index (12 docs)
> fit
> >> into one block inside the cache.
> >>
> >> When running it on a multinode cluster on AWS the performance pulling
> back
> >> 1031 docs with the cache was not that much better than without.
> According
> >> to my log statements, the cache was being hit every time, but the
> >> difference between this an my local was that there were several blocks
> per
> >> file.
> >>
> >> When setting up the cache I used the default BlurConfiguration.
> >>
> >> Any ideas on how to speed up performance? Should I change the block
> size?
> >> Is there something that blur does to put a wrapper around the cache?
> >>
> >> ON A MULTI NODE CLUSTER
> >> Number of documents in directory[1031]
> >> Without Cache ->
> >> Try #1 -> Total execution time: 4816
> >> Try #2 -> Total execution time: 3137
> >> Try #3 -> Total execution time: 2921
> >> Try #4 -> Total execution time: 2525
> >> Try #5 -> Total execution time: 2698
> >> Try #6 -> Total execution time: 2330
> >> Try #7 -> Total execution time: 2464
> >> Try #8 -> Total execution time: 2568
> >> Try #9 -> Total execution time: 2524
> >> Try #10 -> Total execution time: 2537
> >> With Cache ->
> >> Cached try #1 -> Total execution time: 2228
> >> Cached try #2 -> Total execution time: 2243
> >> Cached try #3 -> Total execution time: 2584
> >> Cached try #4 -> Total execution time: 2509
> >> Cached try #5 -> Total execution time: 2163
> >> Cached try #6 -> Total execution time: 2094
> >> Cached try #7 -> Total execution time: 2069
> >> Cached try #8 -> Total execution time: 2105
> >> Cached try #9 -> Total execution time: 2124
> >> Cached try #10 -> Total execution time: 2213
> >>
> >> ON MY LOCAL
> >> Number of documents in directory[12]
> >> Without Cache ->
> >> Try #1 -> Total execution time: 599
> >> Try #2 -> Total execution time: 639
> >> Try #3 -> Total execution time: 461
> >> Try #4 -> Total execution time: 544
> >> Try #5 -> Total execution time: 424
> >> Try #6 -> Total execution time: 381
> >> Try #7 -> Total execution time: 487
> >> Try #8 -> Total execution time: 368
> >> Try #9 -> Total execution time: 311
> >> Try #10 -> Total execution time: 411
> >> With Cache ->
> >> Cached try #1 -> Total execution time: 31
> >> Cached try #2 -> Total execution time: 32
> >> Cached try #3 -> Total execution time: 27
> >> Cached try #4 -> Total execution time: 23
> >> Cached try #5 -> Total execution time: 21
> >> Cached try #6 -> Total execution time: 26
> >> Cached try #7 -> Total execution time: 27
> >> Cached try #8 -> Total execution time: 28
> >> Cached try #9 -> Total execution time: 26
> >> Cached try #10 -> Total execution time: 27
> >>
> >> Thanks,
> >> Josh
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message