accumulo-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jared Winick <jaredwin...@gmail.com>
Subject Re: Should I store Long values as String or Long?
Date Tue, 14 May 2013 06:09:26 GMT
I believe the feature John is referring to above is the Formatter interface
(org.apache.accumulo.core.util.format.Formatter). You can implement this
interface to convert key/values to a more human readable format for the
shell. You can drop a JAR file containing your implementation into lib/ext
just like your Iterators and then load it in the shell with the "formatter"
command.


On Mon, May 13, 2013 at 8:04 PM, Mike Hugo <mike@piragua.com> wrote:

> Thanks - String it is!
>
>
> On Mon, May 13, 2013 at 7:47 PM, Christopher <ctubbsii@apache.org> wrote:
>
>> Well, encoding it might save space, but strings are nice and
>> human-readable, especially in the shell, and in the overall scheme of
>> things, a string probably isn't really that much larger on disk,
>> especially after compression.
>>
>> --
>> Christopher L Tubbs II
>> http://gravatar.com/ctubbsii
>>
>>
>> On Mon, May 13, 2013 at 6:09 PM, Mike Hugo <mike@piragua.com> wrote:
>> > I've been playing around with the LongCombiner on a table that's
>> summing up
>> > the counts of output of a MapReduce job, very similar to the WordCount
>> > example from the user manual.
>> >
>> > I started out encoding the values using LongCombiner.FIXED_LEN_ENCODER,
>> but
>> > have noticed that this can lead to some confusion later on downstream.
>>  For
>> > example, a co-worker was scanning using the shell and was caught off
>> guard
>> > by the encoded values.  Also, out of the box, the StatsCombiner example
>> > works using String values, not Long values so we built a custom piece to
>> > essentially do the same thing with Long values instead.
>> >
>> > It looks to me like most of the examples I've seen just store things are
>> > String values, rather than encoding them.  What are the tradeoffs?
>>  We're at
>> > a point where we could pretty easily switch things to just use strings
>> - it
>> > seems like that might make things more convenient from a maintenance
>> > perspective (human readable values) and would allow us to re-use some
>> > existing components (e.g. StatsCombiner).  Any thoughts?
>> >
>> > Thanks,
>> >
>> > Mike
>>
>
>

Mime
View raw message