accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Fuchs <afu...@apache.org>
Subject Re: init method being called multiple times of WrappingIterator.
Date Fri, 03 Apr 2015 20:44:46 GMT
A major compaction also might not be a full major compaction, depending on
how it is initiated. It also would be on a single tablet where a scan might
be over multiple tablets. The implication here is that major compactions
might not process all of the data that the scan processes.

The iterator lifecycle is essentially:
1. init()
2. seek()
3. hasTop()
4. getTopKey()
5. getTopValue()
6. next()

Iterators are run in modular fashion, so another iterator that wraps your
iterator might make repeated calls to steps 2-5 or steps 3-6. The scope
also affects how these calls are repeated. A compaction scope will repeat
steps 3-6 on the top level iterator (which may not be your iterator) until
next() returns false, while a scan scope will only repeat until the scan
batch is full.

One thing you might look at is whether there is a VersioningIterator at a
higher priority level than your iterator. Your iterator might be
recomputing the same value multiple times only to have the results omitted
by the VersioningIterator.

Adam


On Fri, Apr 3, 2015 at 4:12 PM, Josh Elser <josh.elser@gmail.com> wrote:

> I think there's only one difference between invocation of an iterator via
> scans and major compactions: the batching of Key Values being returned to
> the clients. A side effect of this is that after a batch of data it
> returned from the server to the client, it's common that a new instance of
> the Iterator will be instantiated. You could see if a lot of instances of
> your iterator are being created.
>
> Anything unique about the distribution of data? Very large values?
>
> Depending on how you did your timings (at the client or within the
> iterator itself), you might have noticed extra time spent in what Thrift is
> doing (extra serialization).
>
> If you issued the major compaction through the client API, there is an
> boolean option that will wait for the compaction to finish. Otherwise,
> compactions are asynchronous.
>
>
> shweta.agrawal wrote:
>
>> On Tuesday 31 March 2015 06:00 PM, shweta.agrawal wrote:
>>
>>> On Monday 30 March 2015 08:03 PM, Josh Elser wrote:
>>>
>>>> Why are you using a print writer to get output from your iterator?
>>>> Just use a logger and look in
>>>> $ACCUMULO_HOME/logs/tserver_$hostname.debug.log (or wherever you
>>>> configured logging). Create a log4j or slf4j Logger and use that
>>>> instead of a print writer. (It's possible that your print writer is
>>>> also what is slowing things down)
>>>>
>>>> In most real deployments, iterators should be faster on the server
>>>> side than your client because you have N servers performing the work
>>>> instead of your one client.
>>>>
>>>> It's not unheard of that a programming error is slowing down your
>>>> iterator. Looking at what your iterator does (via logging) should
>>>> help. Alternatively, you can use a remote debugger, connect a the
>>>> tabletserver, and set breakpoints inside your iterator.
>>>>
>>>> shweta.agrawal wrote:
>>>>
>>>>> On Monday 30 March 2015 09:58 AM, shweta.agrawal wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Actually i am working on iterator, which i ran on server side by
>>>>>> making jar and also on client side on same data, but on server side
>>>>>> jar which i made is working slow than on client side. I am not able
to
>>>>>> find what went wrong. is it possible to work same logic more fast
on
>>>>>> client side than on accumulo iterators?
>>>>>>
>>>>>> time on client side:8s
>>>>>> time on server side:30s
>>>>>>
>>>>>> And to get the output i am writing output on text file through print
>>>>>> writer. To perform my task, i am calling my method on next method
and
>>>>>> i am writing output to a file in next method. So actually i want
to
>>>>>> know the final method which is called, so that i can write my output
>>>>>> to a file after performing all the task.
>>>>>>
>>>>>> Thanks and Regards
>>>>>> Shweta
>>>>>>
>>>>>
>>>>>
>>> Hi,
>>>
>>> Without print writer also it is taking the same time. And i am trying
>>> to use remote debugger as you suggested but i am facing problem.
>>>
>>> To enable remote debugger i changed this in accumulo-env.sh file:
>>> test -z "$ACCUMULO_TSERVER_OPTS" && export
>>> ACCUMULO_TSERVER_OPTS="${POLICY} -Xmx384m -Xms384m -Xdebug
>>> -Xrunjdwp:transport=dt_socket,server=y,address=50095"
>>>
>>> But after changing this accumulo is not working. In terminal its
>>> showing started and when i am going to accumulo shell its saying there
>>> are no tablet servers. So please help me out in this. am i doing
>>> something wrong?
>>>
>>> Monitor and tserver is not starting their logs are:
>>> Monitor Logs:
>>> 2015-03-31 17:36:09,516 [mortbay.log] INFO : Logging to
>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
>>> org.mortbay.log.Slf4jLog
>>> 2015-03-31 17:36:09,535 [mortbay.log] INFO : jetty-6.1.26
>>> 2015-03-31 17:36:09,607 [mortbay.log] WARN : failed
>>> SocketConnector@shweta:50095: java.net.BindException: Address already
>>> in use
>>> 2015-03-31 17:36:09,608 [mortbay.log] WARN : failed Server@6555694:
>>> java.net.BindException: Address already in use
>>> 2015-03-31 17:36:09,608 [mortbay.log] INFO : Stopped
>>> SocketConnector@shweta:50095
>>>
>>> Tserver Logs:
>>> 2015-03-31 17:28:49,206 [tabletserver.TabletServer] INFO : unloaded
>>> !0;~;!0<
>>> 2015-03-31 17:28:49,298 [tabletserver.TabletServer] INFO : unloaded !0<;~
>>> 2015-03-31 17:28:50,074 [tabletserver.TabletServer] INFO : unloaded
>>> !0;!0<<
>>> 2015-03-31 17:28:50,121 [tabletserver.TabletServer] FATAL: Lost tablet
>>> server lock (reason = LOCK_DELETED), exiting.
>>> 2015-03-31 17:28:50,122 [tabletserver.TabletServer] INFO : Master
>>> requested tablet server halt
>>>
>>>
>>> Thanks and Regards
>>> Shweta
>>>
>>>  Hi,
>>
>> Thanks for all your help. I got the logs from
>> $ACCUMULO_HOME/logs/tserver_$hostname.debug.log. Upon analysing them and
>> setting the iterator to work at Major compaction scope, I found out that
>> the iterator speeds up and I was able to complete the computation in 887
>> ms. So now I want to ask that why is there a difference in execution
>> times when I run the same iterator at major compaction scope and scan
>> scope? Also is there a way to detect the end of a Major Compaction
>> programmatically?
>>
>> Thanks and Regards
>> Shweta
>>
>

Mime
View raw message