hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pi Song (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-143) Proposal for refactoring of parsing logic in Pig
Date Fri, 07 Mar 2008 23:27:46 GMT

     [ https://issues.apache.org/jira/browse/PIG-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pi Song updated PIG-143:
------------------------

    Description: 
This is  a place holder for me to come up with a complete proposal. In the mean time, I definitely
need your opinions!!!

The basic concept is that now we do validation logic in parsing stage (for example, file existence
checking) which I think is not clean and difficult to add new validation rules.

The way I propose briefly:-
- Only keep parsing logic in the parser and leave output of parsing logic being unchecked
logical plans.
- Create a new class called LogicalPlanValidatorManager which is responsible for validation
job.
- A new validation logic will be subclassing LogicalPlanValidator
- We can implement chaining of LogicalPlanValidator inside LogicalPlanValidatorManager to
allow new LogicalPlanValidator to be added easily. When plugging in new logic, we do it here.
Therefore a new LogicalPlanValidator can be implemented like a plug-in.

Here is a list of possible LogicalPlanValidators in my mind (Please add what you want):- 
- The first LogicalPlanValidator to be implemented is FileExistence validator which is from
the current logic we have.
- Second LogicalPlanValidator is to sort out filename conflict (At the moment you can save/load
same file over and over again in the same plan, this is very confusing. Possibly we should
not allow same file name in any single plan?)
- Meta data checking + type system checking as mentioned in Pig-142

The common way to implement a LogicalPlanValidator is based on Visitor pattern. Whether this
is universal for all cases or not, I need to think through more.

According to this, parsing errors will be detected first in the parsing stage. Errors from
validations are detected in the priority LogicalPlanValidators are organized in LogicalPlanValidatorManager.

The merit of implementing this proposal will be based on the number of validation rules we
actually need. If we don't have so many things to check, it will become just a nice feature
that doesn't have much value. However, I believe at least it will make the parsing logic cleaner.

This proposal only applies to the LogicalPlan. For PhysicalPlan, where validation logics (backend
specific) are required. The same concept can be applied.


  was:
This is  a place holder for me to come up with a complete proposal. In the mean time, I definitely
need your opinions!!!

The basic concept is that now we do validation logic in parsing stage (for example, file existence
checking) which I think is not clean and difficult to add new validation rules.

The way I propose briefly:-
- Only keep parsing logic in the parser and leave output of parsing logic being unchecked
logical plans.
- Create a new class called LogicalPlanValidatorManager which is responsible for validation
job.
- A new validation logic will be subclassing LogicalPlanValidator
- We can implement chaining of LogicalPlanValidator inside LogicalPlanValidatorManager to
allow new LogicalPlanValidator to be added easily. When plugging in new logic, we do it here.
Therefore a new LogicalPlanValidator can be implemented like a plug-in.

Here is a list of possible LogicalPlanValidators in my mind (Please add what you want):- 
- The first LogicalPlanValidator to be implemented is FileExistence validator which is from
the current logic we have.
- Second LogicalPlanValidator is to sort out filename conflict (At the moment you can save/load
same file over and over again in the same plan, this is very confusing. Possibly we should
not allow same file name in any single plan?)
- Meta data checking + type system checking as mentioned in Pig-142

The common way to implement a LogicalPlanValidator is based on Visitor pattern. Whether this
is universal for all cases or not, I need to think through more.

The merit of implementing this proposal will be based on the number of validation rules we
actually need. If we don't have so many things to check, it will become just a nice feature
that doesn't have much value. However, I believe at least it will make the parsing logic cleaner.

This proposal only applies to the LogicalPlan. For PhysicalPlan, where validation logics (backend
specific) are required. The same concept can be applied.



> Proposal for refactoring of parsing logic in Pig
> ------------------------------------------------
>
>                 Key: PIG-143
>                 URL: https://issues.apache.org/jira/browse/PIG-143
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Pi Song
>            Assignee: Pi Song
>
> This is  a place holder for me to come up with a complete proposal. In the mean time,
I definitely need your opinions!!!
> The basic concept is that now we do validation logic in parsing stage (for example, file
existence checking) which I think is not clean and difficult to add new validation rules.
> The way I propose briefly:-
> - Only keep parsing logic in the parser and leave output of parsing logic being unchecked
logical plans.
> - Create a new class called LogicalPlanValidatorManager which is responsible for validation
job.
> - A new validation logic will be subclassing LogicalPlanValidator
> - We can implement chaining of LogicalPlanValidator inside LogicalPlanValidatorManager
to allow new LogicalPlanValidator to be added easily. When plugging in new logic, we do it
here. Therefore a new LogicalPlanValidator can be implemented like a plug-in.
> Here is a list of possible LogicalPlanValidators in my mind (Please add what you want):-

> - The first LogicalPlanValidator to be implemented is FileExistence validator which is
from the current logic we have.
> - Second LogicalPlanValidator is to sort out filename conflict (At the moment you can
save/load same file over and over again in the same plan, this is very confusing. Possibly
we should not allow same file name in any single plan?)
> - Meta data checking + type system checking as mentioned in Pig-142
> The common way to implement a LogicalPlanValidator is based on Visitor pattern. Whether
this is universal for all cases or not, I need to think through more.
> According to this, parsing errors will be detected first in the parsing stage. Errors
from validations are detected in the priority LogicalPlanValidators are organized in LogicalPlanValidatorManager.
> The merit of implementing this proposal will be based on the number of validation rules
we actually need. If we don't have so many things to check, it will become just a nice feature
that doesn't have much value. However, I believe at least it will make the parsing logic cleaner.
> This proposal only applies to the LogicalPlan. For PhysicalPlan, where validation logics
(backend specific) are required. The same concept can be applied.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message