Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36E2810F1C for ; Tue, 23 Dec 2014 12:33:01 +0000 (UTC) Received: (qmail 85383 invoked by uid 500); 23 Dec 2014 12:32:55 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 85245 invoked by uid 500); 23 Dec 2014 12:32:55 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 85235 invoked by uid 99); 23 Dec 2014 12:32:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Dec 2014 12:32:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=AC_DIV_BONANZA,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of blueflycn@gmail.com designates 209.85.217.169 as permitted sender) Received: from [209.85.217.169] (HELO mail-lb0-f169.google.com) (209.85.217.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Dec 2014 12:32:48 +0000 Received: by mail-lb0-f169.google.com with SMTP id p9so5349108lbv.28 for ; Tue, 23 Dec 2014 04:30:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=JsvmVM36bzokwuj03jVmWbjv786wk/3ZBBOZ3DsxfTE=; b=ONoWvzvIKqGDXYpZmzRbWlotCZ4t1LKgf9lu/Qs87SUQNdeNPVGkLBdDXYQy/nhKbr SLIUyoMe3QydZ0/F/Bg4aXcmKZVlqCZS9Eg6B9bvRvuH9Pg9KB/uV+S5kt9oQJCPGU0w KA8KLMqFpspHwo5f4EWGUpkuVFcXT1qKEocfq7RIoQbLsDehBktDjYPRhyDWRENfQ33X Vejk5bqxoD8Uawamzy90hnf90P9ND22t2Qt4/aT1ytRhj0o14K6p6h2fsCSn0hcsPUYt D9HNEzi4rj7AC9AN9bGI/Erb81WOuzliT+QvRqa043zVOT2gwburTyiD+khlOW4BTvDa vPTg== MIME-Version: 1.0 X-Received: by 10.112.164.240 with SMTP id yt16mr27507183lbb.34.1419337812662; Tue, 23 Dec 2014 04:30:12 -0800 (PST) Received: by 10.112.207.102 with HTTP; Tue, 23 Dec 2014 04:30:12 -0800 (PST) Date: Tue, 23 Dec 2014 20:30:12 +0800 Message-ID: Subject: Need some recommendations on hardwares From: Yatong Zhang To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001a1133be2a23e33b050ae156b6 X-Virus-Checked: Checked by ClamAV on apache.org --001a1133be2a23e33b050ae156b6 Content-Type: text/plain; charset=UTF-8 Hi there, I am gonna build a 30-nodes cluster and the basic idea is: 1. Hadoop for the base distributed file system and spark for the map reduce framework. 2.Hbase for the data stograge. 3.Kafka for the outside data, 4.Using Storm to read messages in Kafka and write them to hbase and solr 5.Solr to index data and provide the search & query services I am planning to build this with commodity PC hardwares like i5, i7, with 16~32GB memory, and possibly SSDs. So I have some suggestions/recommendations: 1. Hardware recommendations for each sub systems(Hbase, kafka, solr etc.) 2. The amount of PCs of each sub systems I have about 50M messages per day and each message is about 400 ~ 600 bytes with about 10 fields to index. Thanks and any suggestions are appreciated~ --001a1133be2a23e33b050ae156b6 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi there= ,
I am gonna build a 30-nodes cluster and the basic idea is:
1. Hadoop for the base distributed file system and spark for the map re= duce framework.
2.Hbase for the data stograge.
3.Kafka fo= r the outside data,
4.Using Storm to read messages in Kafka and wr= ite them to hbase and solr
5.Solr to index data and provide the search &= amp; query services

I am planning to build this with commodity= PC hardwares like i5, i7, with 16~32GB memory, and possibly SSDs.
So I have some suggestions/recommendations:
1. Hardware reco= mmendations for each sub systems(Hbase, kafka, solr etc.)
2. The = amount of PCs of each sub systems

I have about 50M messages pe= r day and each message is about 400 ~ 600 bytes with about 10 fields to ind= ex.

Thanks and any suggestions are appreciated~
<= br>
--001a1133be2a23e33b050ae156b6--