singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangzhaoqi (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SINGA-506) add autograd operators for NLP models
Date Wed, 26 Feb 2020 12:17:00 GMT

     [ https://issues.apache.org/jira/browse/SINGA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhangzhaoqi updated SINGA-506:
------------------------------
    Description: 
*We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad
and GPT-2.*

*Totally, there are still 19 operators that we need to add as following,*
|{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*Bidirectional
Attention Flow*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}|
|{color:#000000}*Transpose*{color}|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose
the input tensor similar to numpy.transpose. {color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ConstantOfShape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate
a tensor with given value and shape.{color}|{color:#000000}T{color}| |{color:#000000}T{color}|
|{color:#000000}*ReduceMax*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
the max of the input tensor's element along the provided axes. {color}|{color:#000000}T{color}| | |
|{color:#000000}*ReduceMean*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
the mean of the input tensor's element along the provided axes.{color}| |{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ReduceSum*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
the sum of the input tensor's element along the provided axes.{color}|{color:#000000}T{color}| | |
|{color:#000000}*Shape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes
a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Slice*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces
a slice of the input tensor along multiple axes. {color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Dropout*{color}|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout
takes an input floating-point tensor and an input ratio (floating-point scalar), and produces
two tensor outputs, output (floating-point tensor) and mask (Tensor<bool>). {color}|{color:#000000}T{color}| | |
|{color:#000000}*Hardmax*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}The
operator computes the hardmax (1 for the first maximum value, and 0 for all others) values
for each layer in the batch of the given input.{color}|{color:#000000}T{color}| | |
|{color:#000000}*NonZero*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns
the indices of the elements that are non-zero (in row-major order - by dimension).{color}| | |{color:#000000}T{color}|
|{color:#000000}*Split*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split
a tensor into a list of tensors, along the specified 'axis'.{color}| |{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Tile*{color}|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs
a tensor by tiling a given tensor. This is the same as function tile in Numpy, but no broadcast.
For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]{color}| |{color:#000000}T{color}| |
|{color:#000000}*Ceil*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y
= ceil(x){color}|{color:#000000}T{color}| | |
|{color:#000000}*Compress*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}Selects
slices from an input tensor along a given axis where condition evaluates to True for each
axis index.{color}|{color:#000000}T{color}| | |
|{color:#000000}*Gather*{color}|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given
data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension
of data (by default outer-most one as axis=0) indexed by indices, and concatenates them{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*ArgMax*{color}|{color:#000000}complicated{color}|{color:#000000}2d{color}|{color:#000000}Computes
the indices of the max elements of the input tensor's element along the provided axis. {color}|{color:#000000}T{color}| | |
|{color:#000000}*Cast*{color}|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The
operator casts the elements of a given input tensor to a data type specified by the 'to' argument
and returns an output tensor of the same size in the converted type.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
|{color:#000000}*Scan*{color}|{color:#000000}hard{color}|{color:#000000}2w{color}|{color:#000000}Scan
can be used to iterate over one or more scan_input tensors, constructing zero or more scan_output
tensors. It combines ideas from general recurrences, functional programming constructs such
as scan, fold, map, and zip and is intended to enable generalizations of RNN-like constructs
for sequence-to-sequence processing.{color}|{color:#000000}T{color}| | |
|{color:#000000}*CategoryMapper*{color}| |{color:#000000}-{color}|{color:#000000}not in onnx
document{color}|{color:#000000}T{color}| | |

*For details, these 19 operators belong to these three models separately:*

*Bidirectional Attention Flow:*
 ArgMax
 Cast
 CategoryMapper
 Ceil
 Compress
 ConstantOfShape
 Dropout
 Gather
 Hardmax
 ReduceMax
 ReduceSum
 Scan
 Shape
 Slice
 Transpose

*BERT-Squad:*
 Slice
 Shape
 Gather
 ReduceMean
 Cast
 Tile
 Transpose
 Split

*GPT-2:*
 ConstantOfShape
 Slice
 Shape
 Gather
 ReduceMean
 NonZero
 Cast
 Transpose
 Split

 

  was:
*We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad
and GPT-2.*

*Totally, there are still 19 operators that we need to add as following,*

Transpose, easy, 0.5 days
ConstantOfShape, easy, 0.5 days
ReduceMax, easy, 0.5 days
ReduceMean, easy, 0.5 days
ReduceSum, easy, 0.5 days
Shape, easy, 0.5 days
Slice, easy, 0.5 days
Dropout, easy, 0.5 days
Hardmax, easy, 1 days
NonZero, easy, 1 days
Split, easy, 1 days
Tile, easy, 1 days
Ceil, easy, 1 days
Compress, easy, 1 days
Gather, complicated, 2-3 days, c++
Cast, hard, change data type, maybe cannot do

CategoryMapper, not in onnx document(Only for Bidirectional Attention Flow)
ArgMax, complicated, 2-3 days, c++(Only for Bidirectional Attention Flow)
Scan, hard, functional programming constructs, 1-2 weeks(Only for Bidirectional Attention
Flow)

 

*For details, these 19 operators belong to these three models separately:*

*Bidirectional Attention Flow:*
 ArgMax
 Cast
 CategoryMapper
 Ceil
 Compress
 ConstantOfShape
 Dropout
 Gather
 Hardmax
 ReduceMax
 ReduceSum
 Scan
 Shape
 Slice
 Transpose

*BERT-Squad:*
 Slice
 Shape
 Gather
 ReduceMean
 Cast
 Tile
 Transpose
 Split

*GPT-2:*
 ConstantOfShape
 Slice
 Shape
 Gather
 ReduceMean
 NonZero
 Cast
 Transpose
 Split

 


> add autograd operators for NLP models
> -------------------------------------
>
>                 Key: SINGA-506
>                 URL: https://issues.apache.org/jira/browse/SINGA-506
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: zhangzhaoqi
>            Priority: Major
>
> *We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad
and GPT-2.*
> *Totally, there are still 19 operators that we need to add as following,*
> |{color:#000000}*Operator*{color}|{color:#000000}*Rank*{color}|{color:#000000}*Workload*{color}|{color:#000000}*Comments*{color}|{color:#000000}*Bidirectional
Attention Flow*{color}|{color:#000000}*BERT-Squad*{color}|{color:#000000}*GPT-2*{color}|
> |{color:#000000}*Transpose*{color}|{color:#000000}easy{color}|{color:#000000}1h{color}|{color:#000000}Transpose
the input tensor similar to numpy.transpose. {color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*ConstantOfShape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Generate
a tensor with given value and shape.{color}|{color:#000000}T{color}| |{color:#000000}T{color}|
> |{color:#000000}*ReduceMax*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
the max of the input tensor's element along the provided axes. {color}|{color:#000000}T{color}| | |
> |{color:#000000}*ReduceMean*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
the mean of the input tensor's element along the provided axes.{color}| |{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*ReduceSum*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Computes
the sum of the input tensor's element along the provided axes.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*Shape*{color}|{color:#000000}easy{color}|{color:#000000}2h{color}|{color:#000000}Takes
a tensor as input and outputs an 1D int64 tensor containing the shape of the input tensor.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Slice*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}Produces
a slice of the input tensor along multiple axes. {color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Dropout*{color}|{color:#000000}easy{color}|{color:#000000}3h{color}|{color:#000000}Dropout
takes an input floating-point tensor and an input ratio (floating-point scalar), and produces
two tensor outputs, output (floating-point tensor) and mask (Tensor<bool>). {color}|{color:#000000}T{color}| | |
> |{color:#000000}*Hardmax*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}The
operator computes the hardmax (1 for the first maximum value, and 0 for all others) values
for each layer in the batch of the given input.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*NonZero*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Returns
the indices of the elements that are non-zero (in row-major order - by dimension).{color}| | |{color:#000000}T{color}|
> |{color:#000000}*Split*{color}|{color:#000000}easy{color}|{color:#000000}12h{color}|{color:#000000}Split
a tensor into a list of tensors, along the specified 'axis'.{color}| |{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Tile*{color}|{color:#000000}easy{color}|{color:#000000}1d{color}|{color:#000000}Constructs
a tensor by tiling a given tensor. This is the same as function tile in Numpy, but no broadcast.
For example A = [[1, 2], [3, 4]], B = [1, 2], tile(A, B) = [[1, 2, 1, 2], [3, 4, 3, 4]]{color}| |{color:#000000}T{color}| |
> |{color:#000000}*Ceil*{color}|{color:#000000}easy{color}|{color:#000000}4h{color}|{color:#000000}y
= ceil(x){color}|{color:#000000}T{color}| | |
> |{color:#000000}*Compress*{color}|{color:#000000}easy{color}|{color:#000000}6h{color}|{color:#000000}Selects
slices from an input tensor along a given axis where condition evaluates to True for each
axis index.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*Gather*{color}|{color:#000000}complicated{color}|{color:#000000}3d{color}|{color:#000000}Given
data tensor of rank r >= 1, and indices tensor of rank q, gather entries of the axis dimension
of data (by default outer-most one as axis=0) indexed by indices, and concatenates them{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*ArgMax*{color}|{color:#000000}complicated{color}|{color:#000000}2d{color}|{color:#000000}Computes
the indices of the max elements of the input tensor's element along the provided axis. {color}|{color:#000000}T{color}| | |
> |{color:#000000}*Cast*{color}|{color:#000000}hard{color}|{color:#000000}-{color}|{color:#000000}The
operator casts the elements of a given input tensor to a data type specified by the 'to' argument
and returns an output tensor of the same size in the converted type.{color}|{color:#000000}T{color}|{color:#000000}T{color}|{color:#000000}T{color}|
> |{color:#000000}*Scan*{color}|{color:#000000}hard{color}|{color:#000000}2w{color}|{color:#000000}Scan
can be used to iterate over one or more scan_input tensors, constructing zero or more scan_output
tensors. It combines ideas from general recurrences, functional programming constructs such
as scan, fold, map, and zip and is intended to enable generalizations of RNN-like constructs
for sequence-to-sequence processing.{color}|{color:#000000}T{color}| | |
> |{color:#000000}*CategoryMapper*{color}| |{color:#000000}-{color}|{color:#000000}not
in onnx document{color}|{color:#000000}T{color}| | |
> *For details, these 19 operators belong to these three models separately:*
> *Bidirectional Attention Flow:*
>  ArgMax
>  Cast
>  CategoryMapper
>  Ceil
>  Compress
>  ConstantOfShape
>  Dropout
>  Gather
>  Hardmax
>  ReduceMax
>  ReduceSum
>  Scan
>  Shape
>  Slice
>  Transpose
> *BERT-Squad:*
>  Slice
>  Shape
>  Gather
>  ReduceMean
>  Cast
>  Tile
>  Transpose
>  Split
> *GPT-2:*
>  ConstantOfShape
>  Slice
>  Shape
>  Gather
>  ReduceMean
>  NonZero
>  Cast
>  Transpose
>  Split
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message