incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Viktor Jevdokimov <Viktor.Jevdoki...@adform.com>
Subject RE: Problems in the cassandra bulk loader
Date Thu, 10 Oct 2013 14:26:31 GMT
SSTableSimpleUnsortedWriter is a sstable writer, not Cassandra, so it just writes to file what
you give as it is, you need to ensure the consistency.

You can check the file before running sstableloader, all the data is within sstable, but instead
of 1 row it will have 10 rows with the same key. Probably the same will arrive to Cassandra
upon import.

But when Cassandra reads sstable sequentially when searches for the key, the only first row
will be returned (with first column), since it is found and no reason to scan more, it will
not return many rows with the same key, because Cassandra does not expect more rows with the
same key in sstable.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>

[Adform News] <http://www.adform.com>
[Adform awarded the Best Employer 2012] <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>


Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.

From: José Elias Queiroga da Costa Araújo [mailto:jeqca@cesar.org.br]
Sent: Thursday, October 10, 2013 4:33 PM
To: user@cassandra.apache.org
Subject: Re: Problems in the cassandra bulk loader


        Hi, I thought the bulk API could handle this, merging all columns for the same super
column. I did something like this in the java client (Hector) where it is able to solve this
conflict only appending the columns.

        Regarding to the column value, if the code is overwriting the columns I expected the
column had the last value of my collection, but it is considering the first one.

        Regards,

        Elias.

2013/10/10 Viktor Jevdokimov <Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>>
You overwrite your columns by writing new row/supercolumn.

Remove new row/supercolumn from "for" statement, which is for columns:


int rowKey = 10;
int superColumnKey = 20;
usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
for (int i = 0; i < 10; i++) {
        usersWriter.addColumn(
                ByteBufferUtil.bytes(i+1),
                ByteBufferUtil.bytes(i+1),
                System.currentTimeMillis());
 }
 usersWriter.close();

Next time ask such questions in user mail list, not C* devs, which is for C* development,
not usage/your code development around Cassandra.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<mailto:Viktor.Jevdokimov@adform.com>
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is intended solely for
the attention and use of the named addressee and may be confidential. If you are not the intended
recipient, you are reminded that the information remains the property of the sender. You must
not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this
message in error, please contact the sender immediately and irrevocably delete this message
and any copies.-----Original Message-----
From: José Elias Queiroga da Costa Araújo [mailto:jeqca@cesar.org.br<mailto:jeqca@cesar.org.br>]
Sent: Wednesday, October 9, 2013 11:22 PM
To: dev
Subject: Problems in the cassandra bulk loader

        Hi all,

        I'm trying to use the bulk insertion with the SSTableSimpleUnsortedWriter class from
cassandra API and I facing some problems.  After generating and uploading the .db files by
using the ./sstableloader command , I noticed the data didn't match with inserted one.

        I put the used code below to try to explain the bahaviour.

         I'm trying to generate the data files using only one rowkey and one supercolumn.
Where the super column has 10 columns.

IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new CFMetaData("myKeySpace", "Column",
 ColumnFamilyType.Super, BytesType.instance, BytesType.instance);

SSTableSimpleUnsortedWriter usersWriter = new SSTableSimpleUnsortedWriter(new File("./"),
scf, p,64);

int rowKey = 10;
int superColumnKey = 20;
for (int i = 0; i < 10; i++) {
 usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
 usersWriter.addColumn(ByteBufferUtil.bytes(i+1),ByteBufferUtil.bytes(i+1),
System.currentTimeMillis());
 }
 usersWriter.close();

                After uploading,  the result is:

                RowKey: 0000000a
                   => (super_column=00000014,
                              (name=00000001, value=00000001,
timestamp=1381348293144))

                1 Row Returned.

                In this case, my super column should have 10 columns? With values between
00000001 to 00000011?  Since I'm using the same super column.  The documentation says the
newRow method could be invoked many times, it impacts only the performance.

                The second question is: If this is the correct behavior, the column value
should be 00000011, since it is the last value passed as argument to addColumn(...) method
in the loop?

              Thanks in the advance,

               Elias.


Mime
View raw message