accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cardon, Tejay E" <tejay.e.car...@lmco.com>
Subject RE: EXTERNAL: Re: Large files in Accumulo
Date Thu, 23 Aug 2012 21:34:39 GMT
Thanks Eric,
I was afraid that would be the case.  If I understand you correctly, putting a GB file into
Accumulo would be a bad idea.  Given that fact, are there any strategies available to ensure
that a given file in HDFS is co-located with the index info for that file in Accumulo? (I
would assume not).  In my case, I could use Accumulo to store my indexes for fast query, but
then have them return a URL/URI to the actual file.  However, I have to process each of those
files further to get to my final result, and I was hoping to do the second stage of processing
without having to return intermediate results.  Am I correct in assuming that this can't be
done?

Thanks,
Tejay

From: Eric Newton [mailto:eric.newton@gmail.com]
Sent: Thursday, August 23, 2012 3:06 PM
To: user@accumulo.apache.org
Subject: Re: EXTERNAL: Re: Large files in Accumulo

An entire mutation needs to fit in memory several times, so you should not attempt to push
in a single mutation larger than a 100MB unless you have a lot of memory in your tserver/logger.

And while I'm at it, large keys will create large indexes, so try to keep your (row,cf,cq,cv)
under 100K.

-Eric
On Thu, Aug 23, 2012 at 4:37 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com<mailto:tejay.e.cardon@lmco.com>>
wrote:
In my case I'll be doing a document based index store (like the wikisearch example), but my
documents may be as large as several GB.  I just wanted to pick the collective brain of the
group to see if I'm walking into a major headache.  If it's never been tried before, then
I'll give it a shot and report back.

Tejay

From: William Slacum [mailto:wilhelm.von.cloud@accumulo.net<mailto:wilhelm.von.cloud@accumulo.net>]
Sent: Thursday, August 23, 2012 2:07 PM
To: user@accumulo.apache.org<mailto:user@accumulo.apache.org>
Subject: EXTERNAL: Re: Large files in Accumulo

Are these RFiles as a whole? I know at some point HBase needed to have entire rows fit into
memory; Accumulo does not have this restriction.
On Thu, Aug 23, 2012 at 12:55 PM, Cardon, Tejay E <tejay.e.cardon@lmco.com<mailto:tejay.e.cardon@lmco.com>>
wrote:
Alright, this one's a quick question.  I've been told that HBase does not perform well if
large (> 100MB) files are stored in it).  Does Accumulo have similar trouble?  If so, can
it be overcome by storing the large files in their own locality group?

Thanks,
Tejay



Mime
View raw message