hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From venito camelas <robotirlan...@gmail.com>
Subject Re: Help designing application architecture
Date Sat, 09 Jul 2016 19:17:51 GMT
Sorry but I did not understand.
For what I see case classes are scala, I'm using java (I could consider
learn and change to scala because I have not started yet and its for
learning purposes only)

What do you mean with known formats? When the user creates a channel he
only has some basic types (string, long, timestamp, etc) and some channels
previously created (by him) to choose from. Example:

The user first creates 2 simple channels (Coordinate and Temperature):
Coordinate = {
"X" : "Float",
"Y" : "Float",
"instant" : "Timestamp"

"value" : "Float",
"measurement_unit" : "String",
"instant" : "Timestamp"

Then, the user creates a new channel using the 2 previously created:
"coord" : "Coordinate",
"temp" : "Temperature",
"instant" : "Timestamp"

Now, when the data comes I validate its format against the defined
channel's format, if it does't match I throw an error. Example:

"coord" : {
"X" : 31.75,
"Y" : "32.75"
"instant" : "2016-06-20T13:28:06.419Z"
"temp" : {
"value" : 25.6,
"measurement_unit" : "Celsius",
"instant" : "2016-06-20T13:28:06.419Z"
"instant" : "2016-06-20T13:28:06.419Z"

That piece of data will fail validation cause the "Y" value does't have
Float type (as defined in the Coordinate channel).

Is there a chance you could explain a little more what you said previously?
will really help me.

Thank you

2016-07-07 20:54 GMT-03:00 Ted Yu <yuzhihong@gmail.com>:

> For 1) you don't have to introduce external storage.
> You can define case classes for the known formats.
> On Thu, Jul 7, 2016 at 4:40 PM, venito camelas <robotirlandes@gmail.com>
> wrote:
>> I'm pretty new to this and I have a use case I'm not sure how to
>> implement, I'll try to explain it and I'd appreciate if anyone could point
>> me in the right direction.
>> The case has these requirements:
>>  1 - Any user shoud be able to define the format of the information they
>> want to store (channel). For example, user X defines a channel named
>> "coordinate":
>> coordinate = {
>> "X" : "Float",
>> "Y" : "Float",
>> "instant" : "Timestamp"
>> }
>>   Every channel has some time value, it can be an instant (like above) or
>> a period of time ("start" : "Timestamp", "end" : "Timestamp")
>>  2 - Given the previous example, the user should be able to ask the
>> following questions:
>> 2.1 When was the last time I went near {X : x, Y : y}?  --> Process the
>> information in order to get the "near" places and return the newest one.
>> 2.2 Where was I on march 6th between 1pm and 2pm?       --> Query by time
>> For 1) I was thinking of using some Document oriented storage because of
>> the channels lack of structure, not sure that's the only thing to consider
>> though.
>> For 2.1) I'd use some MR job
>> For 2.2) I think it would be better to have the information in the
>> document storage and make the queries there.
>> Is it a good approach to have the information stored both in the hdfs and
>> the document oriented storage (for processing and querying respectively)?
>> As I mentioned in the beginning, I'm really new to this and I'm just
>> trying to learn..so sorry if my doubts are silly.
>> Any suggestion or any good reference related to this will be much
>> appreciated.

View raw message