singa-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "zhangzhaoqi (Jira)" <j...@apache.org>
Subject [jira] [Updated] (SINGA-506) add autograd operators for NLP models
Date Wed, 26 Feb 2020 11:46:00 GMT

     [ https://issues.apache.org/jira/browse/SINGA-506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

zhangzhaoqi updated SINGA-506:
------------------------------
    Description: 
*We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad
and GPT-2.*

*Totally, there are still 19 operators that we need to add as following,*

Transpose, easy, 0.5 days
ConstantOfShape, easy, 0.5 days
ReduceMax, easy, 0.5 days
ReduceMean, easy, 0.5 days
ReduceSum, easy, 0.5 days
Shape, easy, 0.5 days
Slice, easy, 0.5 days
Dropout, easy, 0.5 days
Hardmax, easy, 1 days
NonZero, easy, 1 days
Split, easy, 1 days
Tile, easy, 1 days
Ceil, easy, 1 days
Compress, easy, 1 days
Gather, complicated, 2-3 days, c++
Cast, hard, change data type, maybe cannot do

CategoryMapper, not in onnx document(Only for Bidirectional Attention Flow)
ArgMax, complicated, 2-3 days, c++(Only for Bidirectional Attention Flow)
Scan, hard, functional programming constructs, 1-2 weeks(Only for Bidirectional Attention
Flow)

 

*For details, these 19 operators belong to these three models separately:*

*Bidirectional Attention Flow:*
 ArgMax
 Cast
 CategoryMapper
 Ceil
 Compress
 ConstantOfShape
 Dropout
 Gather
 Hardmax
 ReduceMax
 ReduceSum
 Scan
 Shape
 Slice
 Transpose

*BERT-Squad:*
 Slice
 Shape
 Gather
 ReduceMean
 Cast
 Tile
 Transpose
 Split

*GPT-2:*
 ConstantOfShape
 Slice
 Shape
 Gather
 ReduceMean
 NonZero
 Cast
 Transpose
 Split

 

  was:
*We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad
and GPT-2.*

*Totally, there are still 21 operators that we need to add as following,*
ArgMax
Cast
CategoryMapper
Ceil
Compress
ConstantOfShape
Dropout
Gather
Hardmax
Identity
NonZero
ReduceMax
ReduceMean
ReduceSum
Scan
Shape
Slice
Split
Squeeze
Tile
Transpose

 

*For details, these 21 operators belong to these three models separately:*

*Bidirectional Attention Flow:*
ArgMax
Cast
CategoryMapper
Ceil
Compress
ConstantOfShape
Dropout
Gather
Hardmax
ReduceMax
ReduceSum
Scan
Shape
Slice
Squeeze
Transpose


*BERT-Squad:*
Slice
Squeeze
Shape
Identity
Gather
ReduceMean
Cast
Tile
Transpose
Split

*GPT-2:*
ConstantOfShape
Slice
Shape
Gather
ReduceMean
NonZero
Cast
Transpose
Split

 


> add autograd operators for NLP models
> -------------------------------------
>
>                 Key: SINGA-506
>                 URL: https://issues.apache.org/jira/browse/SINGA-506
>             Project: Singa
>          Issue Type: New Feature
>            Reporter: zhangzhaoqi
>            Priority: Major
>
> *We are going to support these three NLP models, called, Bidirectional Attention Flow, BERT-Squad
and GPT-2.*
> *Totally, there are still 19 operators that we need to add as following,*
> Transpose, easy, 0.5 days
> ConstantOfShape, easy, 0.5 days
> ReduceMax, easy, 0.5 days
> ReduceMean, easy, 0.5 days
> ReduceSum, easy, 0.5 days
> Shape, easy, 0.5 days
> Slice, easy, 0.5 days
> Dropout, easy, 0.5 days
> Hardmax, easy, 1 days
> NonZero, easy, 1 days
> Split, easy, 1 days
> Tile, easy, 1 days
> Ceil, easy, 1 days
> Compress, easy, 1 days
> Gather, complicated, 2-3 days, c++
> Cast, hard, change data type, maybe cannot do
> CategoryMapper, not in onnx document(Only for Bidirectional Attention Flow)
> ArgMax, complicated, 2-3 days, c++(Only for Bidirectional Attention Flow)
> Scan, hard, functional programming constructs, 1-2 weeks(Only for Bidirectional Attention
Flow)
>  
> *For details, these 19 operators belong to these three models separately:*
> *Bidirectional Attention Flow:*
>  ArgMax
>  Cast
>  CategoryMapper
>  Ceil
>  Compress
>  ConstantOfShape
>  Dropout
>  Gather
>  Hardmax
>  ReduceMax
>  ReduceSum
>  Scan
>  Shape
>  Slice
>  Transpose
> *BERT-Squad:*
>  Slice
>  Shape
>  Gather
>  ReduceMean
>  Cast
>  Tile
>  Transpose
>  Split
> *GPT-2:*
>  ConstantOfShape
>  Slice
>  Shape
>  Gather
>  ReduceMean
>  NonZero
>  Cast
>  Transpose
>  Split
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message