Mailing-List: contact user-help@accumulo.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@accumulo.apache.org
MIME-Version: 1.0
In-Reply-To: <97EB0FF1279CC5428640A3FB61B10BD602DC64FB@mx1.Comcept.L-3Com.com>
References: <97EB0FF1279CC5428640A3FB61B10BD602DC640F@mx1.Comcept.L-3Com.com>
	<503CC895.7020307@ccri.com>
	<CAF1jEfCQfKF4ZpmD2qui4s0F8hbNrrzNjjciiWcoX+9ZpEWmDw@mail.gmail.com>
	<97EB0FF1279CC5428640A3FB61B10BD602DC64FB@mx1.Comcept.L-3Com.com>
Date: Tue, 28 Aug 2012 11:04:08 -0700
Message-ID: 
 <CAF1jEfC4K+uOqFVK7R5ttCanKVUSh=F2CX-Z+vvYhZ+-_uY3bw@mail.gmail.com>
Subject: Re: TimeSpan Iterator
From: Billie Rinaldi <billie@apache.org>
To: user@accumulo.apache.org
Content-Type: multipart/alternative; boundary=20cf300faecdc6b7fa04c857448a

--20cf300faecdc6b7fa04c857448a
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On Tue, Aug 28, 2012 at 9:51 AM, <Bob.Thorman@l-3com.com> wrote:

> Billie****
>
> ** **
>
> Your comment =93Users should be aware that this is not an efficient
> operation, though.=94 may help me decide if my current use of a secondary
> time index is better then.  Right now I maintain a table that has
> timestamps as the rowid whose values are the rowid in a metadata table.
> Therefore I do one range scan based on the timestamp.  Then a second look=
up
> of the metadata rowid.  Is this more efficient?
>

It probably depends on what percentage of the data you're bringing back, as
compared to the amount you're scanning over (if that's not the whole
table).  I would hypothesize if you're bringing more than N% of the data
back, you might as well just use the TimestampFilter on the main table.  If
you're bringing a smaller percentage back, it could be better to reduce the
amount of the main table you have to scan over by maintaining a secondary
time index.  I'm not sure what N would be.  You should also make sure that
the secondary index is actually reducing the amount of the main table
you're scanning over, e.g. if each rowid had a full range of timestamps,
you could be pulling a list of all rowids back from the index table and not
reducing the scan over the main table.

Also, the TimestampFilter is not optimized.  Filters evaluate each
key/value pair to see if it is accepted (in this case, if it is in a
timestamp range).  If there are a lot of timestamps for each cell (keys
that are identical except for timestamp), it would be better to use seeking
instead.  That would involve writing a new iterator.  If there aren't many
timestamps for each cell, seeking won't help and the TimestampFilter will
be fine.

Billie


> ** **
>
> *From:* Billie Rinaldi [mailto:billie@apache.org]
> *Sent:* Tuesday, August 28, 2012 11:46
>
> *To:* user@accumulo.apache.org; john.armstrong@ccri.com
> *Subject:* Re: TimeSpan Iterator****
>
> ** **
>
> On Tue, Aug 28, 2012 at 6:33 AM, John Armstrong <jrja@ccri.com> wrote:***=
*
>
> On 08/28/2012 09:26 AM, Bob.Thorman@l-3com.com wrote:****
>
> Does anyone know of a TimeSpan Iterator that will fetch rows based on
> the accumulo timestamp?****
>
> ** **
>
> We actually wrote our own TimestampRangeIterator and TimestampSetIterator
> classes.  I don't know if 1.4 has any in the core libraries.  It's not ve=
ry
> hard though.****
>
>
> There's a TimestampFilter in org.apache.accumulo.core.iterators.user in
> 1.4.  It uses a range of timestamps.  Users should be aware that this is
> not an efficient operation, though.
>
> Billie****
>

--20cf300faecdc6b7fa04c857448a
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

On Tue, Aug 28, 2012 at 9:51 AM,  <span dir=3D"ltr">&lt;<a href=3D"mailto:B=
ob.Thorman@l-3com.com" target=3D"_blank">Bob.Thorman@l-3com.com</a>&gt;</sp=
an> wrote:<br><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" =
style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div link=3D"blue" vlink=3D"purple" lang=3D"EN-US"><div><p class=3D"MsoNorm=
al"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;s=
ans-serif&quot;;color:#1f497d">Billie<u></u><u></u></span></p><p class=3D"M=
soNormal">

<span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-=
serif&quot;;color:#1f497d"><u></u>=A0<u></u></span></p><p class=3D"MsoNorma=
l"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sa=
ns-serif&quot;;color:#1f497d">Your comment =93</span>Users should be aware =
that this is not an efficient operation, though.<span style=3D"font-size:11=
.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;;color:#1f497d">=
=94 may help me decide if my current use of a secondary time index is bette=
r then.=A0 Right now I maintain a table that has timestamps as the rowid wh=
ose values are the rowid in a metadata table.=A0 Therefore I do one range s=
can based on the timestamp.=A0 Then a second lookup of the metadata rowid.=
=A0 Is this more efficient?<br>

</span></p></div></div></blockquote><div><br>It probably depends on what pe=
rcentage of the data you&#39;re bringing back, as compared to the amount yo=
u&#39;re scanning over (if that&#39;s not the whole table).=A0 I would hypo=
thesize if you&#39;re bringing more than N% of the data back, you might as =
well just use the TimestampFilter on the main table.=A0 If you&#39;re bring=
ing a smaller percentage back, it could be better to reduce the amount of t=
he main table you have to scan over by maintaining a secondary time index.=
=A0 I&#39;m not sure what N would be.=A0 You should also make sure that the=
 secondary index is actually reducing the amount of the main table you&#39;=
re scanning over, e.g. if each rowid had a full range of timestamps, you co=
uld be pulling a list of all rowids back from the index table and not reduc=
ing the scan over the main table.<br>

<br>Also, the TimestampFilter is not optimized.=A0 Filters evaluate each ke=
y/value pair to see if it is accepted (in this case, if it is in a timestam=
p range).=A0 If there are a lot of timestamps for each cell (keys that are =
identical except for timestamp), it would be better to use seeking instead.=
=A0 That would involve writing a new iterator.=A0 If there aren&#39;t many =
timestamps for each cell, seeking won&#39;t help and the TimestampFilter wi=
ll be fine.<br>

<br>Billie<br><br>=A0</div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div link=3D"blue=
" vlink=3D"purple" lang=3D"EN-US"><div><p class=3D"MsoNormal"><span style=
=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;=
;color:#1f497d"></span></p>

<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d"><u></u>=A0<u></u></span><=
/p><div style=3D"border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0=
in 0in 0in">

<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Billie R=
inaldi [mailto:<a href=3D"mailto:billie@apache.org" target=3D"_blank">billi=
e@apache.org</a>] <br>

<b>Sent:</b> Tuesday, August 28, 2012 11:46</span></p><div><br><b>To:</b> <=
a href=3D"mailto:user@accumulo.apache.org" target=3D"_blank">user@accumulo.=
apache.org</a>; <a href=3D"mailto:john.armstrong@ccri.com" target=3D"_blank=
">john.armstrong@ccri.com</a><br>

<b>Subject:</b> Re: TimeSpan Iterator<u></u><u></u></div><p></p></div><p cl=
ass=3D"MsoNormal"><u></u>=A0<u></u></p><p class=3D"MsoNormal">On Tue, Aug 2=
8, 2012 at 6:33 AM, John Armstrong &lt;<a href=3D"mailto:jrja@ccri.com" tar=
get=3D"_blank">jrja@ccri.com</a>&gt; wrote:<u></u><u></u></p>

<div><div><div><blockquote style=3D"border:none;border-left:solid #cccccc 1=
.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in"><div><p =
class=3D"MsoNormal">On 08/28/2012 09:26 AM, <a href=3D"mailto:Bob.Thorman@l=
-3com.com" target=3D"_blank">Bob.Thorman@l-3com.com</a> wrote:<u></u><u></u=
></p>

<p class=3D"MsoNormal">Does anyone know of a TimeSpan Iterator that will fe=
tch rows based on<br>the accumulo timestamp?<u></u><u></u></p><p class=3D"M=
soNormal"><u></u>=A0<u></u></p></div><p class=3D"MsoNormal">We actually wro=
te our own TimestampRangeIterator and TimestampSetIterator classes. =A0I do=
n&#39;t know if 1.4 has any in the core libraries. =A0It&#39;s not very har=
d though.<u></u><u></u></p>

</blockquote></div><p class=3D"MsoNormal"><br>There&#39;s a TimestampFilter=
 in org.apache.accumulo.core.iterators.user in 1.4.=A0 It uses a range of t=
imestamps.=A0 Users should be aware that this is not an efficient operation=
, though.<br>

<br>Billie<u></u><u></u></p></div></div></div></div></blockquote></div><br>

--20cf300faecdc6b7fa04c857448a--