Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of decker.christian@gmail.com
 designates 209.85.215.44 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:from:date:message-id:subject:to
         :content-type;
        b=Kg3nuVgIczM3jXbF4PDfzhV4xdURw3s5CkSv+Zev26zBjFjPc6W+KZ91iwUAJcia72
         KBTxD5SJOXxS35Fm9Rvi+LLd5BeI/Qw4/E48zoKrC2U2TR9qS4686EfUavgZxcF2mo0Z
         SapquOk0GEYDc+uONSq70kPmVoqvP+YG+8cpM=
MIME-Version: 1.0
In-Reply-To: <41a3af72-1cd7-e9c9-cc51-ffa0e7095435@me.com>
References: <AANLkTim+RYNQb1covJvfevYpn91VsMXHmawwvetpHeUw@mail.gmail.com>
 <41a3af72-1cd7-e9c9-cc51-ffa0e7095435@me.com>
From: Christian Decker <decker.christian@gmail.com>
Date: Thu, 30 Sep 2010 09:10:27 +0200
Message-ID: <AANLkTikUfzqfaJiqJWst7rMXJVG38=Ah9+7U7=Jk3uxz@mail.gmail.com>
Subject: Re: LongType from user input
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=0015174bddc20c6700049174c7f9

--0015174bddc20c6700049174c7f9
Content-Type: text/plain; charset=ISO-8859-1

Apparently I have blanked the 0.7 completely out of my memory. I was trying
to implement application layer indices and ignored the fact that Cassandra
0.7 is implementing them by default. I found ticket CASSANDRA-749 about the
indices and am reading through the code right now, but is there a higher
level overview and a tutorial on how to get things started with these
indices (and maybe some inner workings)? This might actually solve all of my
problems I'm having right now :-)

Regards,
Chris


On Mon, Sep 27, 2010 at 3:45 AM, Aaron Morton <aaron@thelastpickle.com>wrote:

> The only thing I can think of is that values need to be in the correct byte
> format when used in indexes in 0.7. Take a look at the types.py module in
> the pycassa client http://github.com/pycassa/pycassa for an example of
> which values need to be byte packed.
>
> How is your pig function working against cassandra? Is it using the
> ColumnFamilyRecordReader? . The code in the internal RowIterator for that
> class has an example calling the cluster to get to the comparators.
>
> Aaron
>
>
> On 27 Sep, 2010,at 03:11 AM, Christian Decker <decker.christian@gmail.com>
> wrote:
>
> Hi Aaron,
>
> what changes can I expect in the 0.7 release regarding Comparison and
> Parameters? My problem is mainly that I want to take Strings from stdin (or
> Pig Scripts for that matter) and convert them in such a way that they are
> interpreted correctly and converted to the corresponding byte representation
> to use them in column names and keys.
>
> Regards,
> Chris
>
> On Sun, Sep 26, 2010 at 5:20 AM, Aaron Morton <aaron@thelastpickle.com>wrote:
>
>> Things a changing in v0.7, the row keys are byte arrays.
>>
>> Not sure I understand your other concerns.
>>
>> Aaron
>>
>>
>> On 25 Sep 2010, at 08:10, Christian Decker <decker.christian@gmail.com>
>> wrote:
>>
>>
>> Thanks for your quick answer, I think I'll use an affix to sort of cast
>> the keys, ranges and others from their textual representation (from Pig) to
>> the desired byte representation, since I just noticed that the keys for the
>> rows themselfs are always UTF8 interpreted, and since I want to make
>> key-range as well as slice queries, I'll be better off this way I think.
>> I'll just add a 'L' for Long and 'U' for UUID (of any kind).
>>  Or is there a better way that I just can't see from my beginners angle?
>> :-)thing
>>
>> Regards,
>> Chris
>>
>>
>> On Fri, Sep 24, 2010 at 6:35 PM, Tyler Hobbs < <tyler@riptano.com>
>> tyler@riptano.com> wrote:
>>
>>> Yes, you can use describe_keyspace() and then look through the results.
>>> It's a little ugly in 0.6, but it works
>>>
>>> - Tyler
>>>
>>>
>>>
>>> On Fri, Sep 24, 2010 at 11:25 AM, Christian Decker <<decker.christian@gmail.com>
>>> decker.christian@gmail.com> wrote:
>>>
>>>> Well I'm writing a loading function for Pig, and as it happens I want to
>>>> be able to load slices from cassandra which are specified in the pig script
>>>> (thus the input from stdin) but the ColumnFamily from which to read the data
>>>> is another parameter and some of the CFs have UTF8, UUID, TimeUUID or Long
>>>> types for their keys and columns, so simply converting everything I get to
>>>> an 8byte long would break compatibility with the others.
>>>> Now thinking about it I attacked the whole problem in a weird way, since
>>>> UUID types won't work either.
>>>> So let me change my question slightly, is there a way in 0.6 to detect
>>>> the compareWith type on a running cluster? That way I could convert it to
>>>> the right type :D
>>>>
>>>> Regards,
>>>> Chris
>>>>
>>>>
>>>> On Fri, Sep 24, 2010 at 6:09 PM, Tyler Hobbs < <tyler@riptano.com>
>>>> tyler@riptano.com> wrote:
>>>>
>>>>> I'm not sure I understand why using this with multiple column families
>>>>> prevents you from converting it.  Could you clarify this?
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 24, 2010 at 10:56 AM, Christian Decker <<decker.christian@gmail.com>
>>>>> decker.christian@gmail.com> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> I'm having quite a dilemma with the CompareWith attribute. The Problem
>>>>>> is that I have numeric IDs that I'd like to use as row keys, only that I
>>>>>> also have to offer a possibility to let users input them from std input.
>>>>>> Since I cannot ask my users to input an 8byte sequence representing the ID
>>>>>> they'd like, I was about to turn to UTF8, when I remembered that they are
>>>>>> compared lexicographically, so that 100 actually comes before 2, which kills
>>>>>> key slices. Also I cannot just code a converter in since this is supposed to
>>>>>> be a used with multiple columnfamilies, so just converting an integer read
>>>>>> into 8bytes isn't going to work either.
>>>>>> Any tricks for this one?
>>>>>>
>>>>>> Regards,
>>>>>> Chris
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

--0015174bddc20c6700049174c7f9
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Apparently I have blanked the 0.7 completely out of my memory. I was trying=
 to implement application layer indices and ignored the fact that Cassandra=
 0.7 is implementing them by default. I found ticket=A0CASSANDRA-749 about =
the indices and am reading through the code right now, but is there a highe=
r level overview and a tutorial on how to get things started with these ind=
ices (and maybe some inner workings)? This might actually solve all of my p=
roblems I&#39;m having right now :-)<div>

<br></div><div>Regards,</div><div>Chris<br clear=3D"all"><br><br><div class=
=3D"gmail_quote">On Mon, Sep 27, 2010 at 3:45 AM, Aaron Morton <span dir=3D=
"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.com">aaron@thelastpickle.co=
m</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex;"><div><div>The only thing I can think of is =
that values need to be in the correct byte format when used in indexes in 0=
.7. Take a look at the types.py module in the pycassa client=A0<a href=3D"h=
ttp://github.com/pycassa/pycassa" target=3D"_blank">http://github.com/pycas=
sa/pycassa</a>=A0for an example of which values need to be byte packed.=A0<=
/div>

<div><br></div><div>How is your pig function working against cassandra? Is =
it using the ColumnFamilyRecordReader?=A0.=A0The code in the internal RowIt=
erator for that class has an example calling the cluster to get to the comp=
arators. =A0</div>

<div></div><div><br></div><font color=3D"#888888"><div>Aaron</div></font><d=
iv><div></div><div class=3D"h5"><div><br></div><div><br>On 27 Sep, 2010,at =
03:11 AM, Christian Decker &lt;<a href=3D"mailto:decker.christian@gmail.com=
" target=3D"_blank">decker.christian@gmail.com</a>&gt; wrote:<br>

<br></div><div><blockquote type=3D"cite"><div><div>Hi Aaron,</div><div><br>=
</div><div>what changes can I expect in the 0.7 release regarding Compariso=
n and Parameters? My problem is mainly that I want to take Strings from std=
in (or Pig Scripts for that matter) and convert them in such a way that the=
y are interpreted correctly and converted to the corresponding byte represe=
ntation to use them in column names and keys.</div>


<div><br></div><div>Regards,</div><div>Chris</div><br><div class=3D"gmail_q=
uote">On Sun, Sep 26, 2010 at 5:20 AM, Aaron Morton <span dir=3D"ltr">&lt;<=
a href=3D"mailto:aaron@thelastpickle.com" target=3D"_blank">aaron@thelastpi=
ckle.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div><div>Things a changing in v0.7, the row=
 keys are byte arrays.</div><div><br></div><div>Not sure I understand your =
other concerns.=A0</div>


<div><br></div><div>Aaron<div><br><br>On 25 Sep 2010, at 08:10, Christian D=
ecker &lt;<a href=3D"mailto:decker.christian@gmail.com" target=3D"_blank">d=
ecker.christian@gmail.com</a>&gt; wrote:<br><br></div></div><div><br></div>

<blockquote type=3D"cite"><div><div>Thanks for your quick answer, I think I=
&#39;ll use an affix to sort of cast the keys, ranges and others from their=
 textual representation (from Pig) to the desired byte representation, sinc=
e I just noticed that the keys for the rows themselfs are always UTF8 inter=
preted, and since I want to make key-range as well as slice queries, I&#39;=
ll be better off this way I think. I&#39;ll just add a &#39;L&#39; for Long=
 and &#39;U&#39; for UUID (of any kind).</div>


<div>

Or is there a better way that I just can&#39;t see from my beginners angle?=
 :-)thing</div><div><div><br></div><div>Regards,</div><div>Chris</div><div>
<br><br><div class=3D"gmail_quote">On Fri, Sep 24, 2010 at 6:35 PM, Tyler H=
obbs <span dir=3D"ltr">&lt;<a href=3D"mailto:tyler@riptano.com" target=3D"_=
blank"></a><a href=3D"mailto:tyler@riptano.com" target=3D"_blank">tyler@rip=
tano.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">

Yes, you can use describe_keyspace() and then look through the results.=A0 =
It&#39;s a little ugly in 0.6, but it works<br><font color=3D"#888888"><br>=
- Tyler</font><div><div><br></div><div><br><br><div class=3D"gmail_quote">

On Fri, Sep 24, 2010 at 11:25 AM, Christian Decker <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:decker.christian@gmail.com" target=3D"_blank"></a><a href=
=3D"mailto:decker.christian@gmail.com" target=3D"_blank">decker.christian@g=
mail.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-=
left:1px solid rgb(204, 204, 204);padding-left:1ex">Well I&#39;m writing a =
loading function for Pig, and as it happens I want to be able to load slice=
s from cassandra which are specified in the pig script (thus the input from=
 stdin) but the ColumnFamily from which to read the data is another paramet=
er and some of the CFs have UTF8, UUID, TimeUUID or Long types for their ke=
ys and columns, so simply converting everything I get to an 8byte long woul=
d break compatibility with the others.<br>


Now thinking about it I attacked the whole problem in a weird way, since UU=
ID types won&#39;t work either.<div>So let me change my question slightly, =
is there a way in 0.6 to detect the compareWith type on a running cluster? =
That way I could convert it to the right type :D</div>


<div><br></div><div>Regards,</div><div>Chris</div><div><div><br></div><div>=
<div><br><div class=3D"gmail_quote">On Fri, Sep 24, 2010 at 6:09 PM, Tyler =
Hobbs <span dir=3D"ltr">&lt;<a href=3D"mailto:tyler@riptano.com" target=3D"=
_blank"></a><a href=3D"mailto:tyler@riptano.com" target=3D"_blank">tyler@ri=
ptano.com</a>&gt;</span> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-=
left:1px solid rgb(204, 204, 204);padding-left:1ex">I&#39;m not sure I unde=
rstand why using this with multiple column families prevents you from conve=
rting it.=A0 Could you clarify this?<div>


<div><br></div><div><br><br><div class=3D"gmail_quote">On Fri, Sep 24, 2010=
 at 10:56 AM, Christian Decker <span dir=3D"ltr">&lt;<a href=3D"mailto:deck=
er.christian@gmail.com" target=3D"_blank"></a><a href=3D"mailto:decker.chri=
stian@gmail.com" target=3D"_blank">decker.christian@gmail.com</a>&gt;</span=
> wrote:<br>


<blockquote class=3D"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-=
left:1px solid rgb(204, 204, 204);padding-left:1ex">Hi all,<div><br></div><=
div>I&#39;m having quite a dilemma with the CompareWith attribute. The Prob=
lem is that I have numeric IDs that I&#39;d like to use as row keys, only t=
hat I also have to offer a possibility to let users input them from std inp=
ut. Since I cannot ask my users to input an 8byte sequence representing the=
 ID they&#39;d like, I was about to turn to UTF8, when I remembered that th=
ey are compared lexicographically, so that 100 actually comes before 2, whi=
ch kills key slices. Also I cannot just code a converter in since this is s=
upposed to be a used with multiple columnfamilies, so just converting an in=
teger read into 8bytes isn&#39;t going to work either.
</div><div>Any tricks for this one?</div><div><br></div><div>Regards,</div>=
<div>Chris</div>
</blockquote></div><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div></blockquote></div><br>
</div></blockquote></div></div></div></div></blockquote></div><br></div>

--0015174bddc20c6700049174c7f9--