hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject transfer in to AWS will be free of charge through June 2010
Date Tue, 08 Dec 2009 15:48:07 GMT
> Data Transfer into AWS will be free of
charge from now through June 30, 2010, making it even easier for
customers to get their data into AWS. This applies to data transfer
into Amazon EC2, Amazon S3, Amazon SimpleDB, Amazon Relational Database
Service, Amazon Simple Queue Service, and Amazon Virtual Private Cloud.
Other applicable charges for use of these services continue to apply.

So it looks like a real world true crawling test will be fine until at least June 30.

I have it on my to do list to get a one command test-and-collect (results) version of my heritrix
+ mozillahtml parser test into the tree as test/contrib/ec2/crawlertest or something like
that. This test does the following:
1) Starts up multiple Heritrix2 instances running long lived crawls -- tries to get up to
max write throughput of cluster
2) Runs a CPU intensive mapreduce job that reads crawled content out of HBase, builds an org.w3c.document
object tree using MozillaHtmlParser, and stores the (bzip) compressed serialization of the
object tree back into HBase.
Object sizes stored to and read out of HBase follow real world size distribution by definition.

In the past this has revealed many bugs -- the most serious being dead space on heap held
by I/O buffers that grew but never released their allocations -- and operational considerations:
I hit them all... file descriptors, xcievers, I/O saturation leading to ZK aborts, compaction
storms, memstore flush gating, etc. 

See subtasks on HBASE-1961 for details that have to be addressed first to make easy fully
scripted full system testing up on EC2 possible.

    - Andy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message