SlideShare a Scribd company logo
1 of 18
Parameter Efficient Training of
Deep Convolutional Neural Networks
by Dynamic Sparse Reparameterization
Reader:
LeapMind, DL Engineer
Azuma Kohei
LeapMind ICML2019 Reading Session
1. Paper Info
2. Background
3. Problem
4. Proposed
5. Experiment
6. Discussion
7. Conclusion
2
Contents
LeapMind Inc. © 2018
Paper Info
3
● “Parameter Efficient Training of Deep Convolutional Neural Networks
by Dynamic Sparse Reparameterization”
○ ICML 2019
○ Author: Hesham Mostafa, Xin Wang
● Why did I choose this paper ?
○ It is interesting for me to see the nature of the deep learning model through Pruning
○ pruning without pre-trained model, it sounds interesting
LeapMind Inc. © 2018
Background: Overview
4
● Deep Learning Models are often over parameterized
○ ex. parameter num of VGG16: 138 million, 512MB (FP32)
○ Pressure on Memory
● Several ways to reduce model sizes without significantly reducing
accuracy
○ Quantization: weight size reduction
■ ex. 138 million, 512MB -> 138 million, 128MB (int8)
○ Pruning: model size reduction
■ ex. 138 million, 512MB -> 35 million, 128MB (FP32)
○ (Distillation)
LeapMind Inc. © 2018
Background: Pruning
5
Parameter num can be reduced by 80~90% without degrading accuracy
1. Pre-Train a large dense model
2. Prune & Re-Train and get sparse model
a. remove connections with weights below the threshold
b. Re-Train
[1]
2.a
LeapMind Inc. © 2018
Problem of pruning
6
Pre-Training remains memory-inefficient
Models for pre-training have many parameters
Can we train compact models directly ?
The effectiveness of pruning indicates
the existence of compact network parameter configurations
LeapMind Inc. © 2018
Related Works
7
● Static sparse training
○ static: during training, fix the location of non-zero parameters
○ training a static and sparse model worse than compressing a large dense model [4]
○ static models are sensitive to initialization [5]
● Dynamic sparse reparameterization training
○ dynamic: during training, alter the location of non-zero parameters
○ using certain heuristic rules
○ ex. SET [2], DeepR [3]
LeapMind Inc. © 2018
Proposed
8
● Dynamic sparse reparametrization technique
○ use adaptive threshold
○ automatically reallocate parameters across layers
parameters are reallocated every hundreds of iterations with the algorithm described below
LeapMind Inc. © 2018
Proposed
9
● Pruning is based on an adaptive global threshold
○ more scalable than methods relying on layer-specific pruning (SET[2])
● Prune roughly num parameters
○ computationally cheaper than pruning exactly smallest weights (because not to
need sort)
● Redistribute zero-initialized parameters after pruning
○ rule: layers having larger fractions of non-zero weights receive proportionally more
free parameters
○ the numbers of pruned and grown free parameters are exactly the same
rule:
G_l: reallocate parameter num for layer l
R_l: not pruned parameter num for layer l
K_l: pruned parameter num for layer l
LeapMind Inc. © 2018
Proposed
10
pruning
adjustment of
threshold
reallocate
LeapMind Inc. © 2018
Experiment 1: Evaluation with Baselines
11
comparing accuracy against existing methods
● Model: WRN-28-2 (Appendix A), Dataset: CIFAR10
● Baselines
○ Full dense : original large and dense model (original WRN-28-2)
○ Compressed sparse : sparse model obtained by iteratively pruning Full dense [4]
○ Thin dense : dense model with fewer layers
○ Static sparse : sparse static model obtained by sparsing randomly
○ SET[2] : existing dynamic reparameterization
○ DeepR[3] : existing dynamic reparameterization
● Sparse models have same global sparsity
○ parameter num of Full dense :
○ parameter num of sparse model :
LeapMind Inc. © 2018
Experiment 1: Evaluation with Baselines
12
A. Full dense: pre trained original model
B. Compressed sparse: pruned A
C. Thin dense: A with less layers
D. Static sparse:
E. Dynamic sparse: this thesis
● E slightly better than A and SET
● Increase Global sparsity, C and D
significantly degrade
LeapMind Inc. © 2018
Experiment 2: Search for Important Elements
13
Which is important ? Network sparse structure or Initialization ?
● “lottery ticket” hypothesis [5]
○ large network have some subnetworks
○ initialization is important (see Appendix B)
● After training with dynamic sparse method, retrain the final network
sparseness pattern
○ randomly re-initialized
○ original initialization
LeapMind Inc. © 2018
Experiment 2: Search for Important Elements
14
● random, original initialization failed to reach
dynamic sparse
● initialization had little effect on dynamic sparse
reparametrization
● dynamic reparametrization is important
○ solely the structure, nor to its initialization,
nor to a combination of the two
LeapMind Inc. © 2018
Discussion
15
● Dynamic reparametrization is important
○ solely the sparse structure, nor to its initialization, nor to a combination of the two
○ discontinuous jumps in parameter space when parameters are reallocated across
layers helped training escape sharp minima that generalize badly
● Better to allocate some memory to explore more sophisticated
network
● Computational efficiency is difficult
○ CPU and GPU cannot efficiently handle unstructured sparsity model
LeapMind Inc. © 2018
Conclusion
16
● Dynamic sparse reparametrization for pruning
○ use adaptive threshold
○ automatically reallocate parameters across layers
● Performance was significantly higher than baselines
○ much better than static methods
○ slightly better than Compressed sparse
● Dynamic exploration of structure during training is important
○ solely the structure, nor to its initialization, nor to a combination of the two
LeapMind Inc. © 2018
Reference
17
[1] Learning both Weights and Connections for Efficient Neural Networks, Han, et al. NIPS 2015.
[2] Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science, Mocanu, et al. Nature
Communications 2018.
[3] Deep Rewiring: Training very sparse deep networks, Bellec, et al. arxiv 2017.
[4] To prune, or not to prune: exploring the efficacy of pruning for model compression, Zhu, et al. arxiv 2017.
[5] The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks, Flankle, and Carbin. arxiv 2018.
LeapMind Inc. © 2018
Appendix A
18

More Related Content

What's hot

Recommender Systems Challenge 2017 - MultiBeerBandits
Recommender Systems Challenge 2017 - MultiBeerBanditsRecommender Systems Challenge 2017 - MultiBeerBandits
Recommender Systems Challenge 2017 - MultiBeerBanditsLeonardo Arcari
 
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Ramoni Adeogun, PhD
 
DNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review pptDNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review ppttaeseon ryu
 
Aerial detection part2
Aerial detection part2Aerial detection part2
Aerial detection part2ssuser456ad6
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
computer networking
computer networkingcomputer networking
computer networkingAvi Nash
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Ryo Takahashi
 
cloud compute
cloud computecloud compute
cloud computeAvi Nash
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBArangoDB Database
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraMasaharu Munetomo
 
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...rodrickmero
 
A Minimum Spanning Tree Approach of Solving a Transportation Problem
A Minimum Spanning Tree Approach of Solving a Transportation ProblemA Minimum Spanning Tree Approach of Solving a Transportation Problem
A Minimum Spanning Tree Approach of Solving a Transportation Probleminventionjournals
 
virtualization
virtualizationvirtualization
virtualizationAvi Nash
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From ScratchSunghoon Joo
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)seungwoo kim
 
System architecture
System architectureSystem architecture
System architectureSanjay Raj
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
Density maximization for improving graph matching with its applications
Density maximization for improving graph matching with its applicationsDensity maximization for improving graph matching with its applications
Density maximization for improving graph matching with its applicationsI3E Technologies
 

What's hot (20)

Recommender Systems Challenge 2017 - MultiBeerBandits
Recommender Systems Challenge 2017 - MultiBeerBanditsRecommender Systems Challenge 2017 - MultiBeerBandits
Recommender Systems Challenge 2017 - MultiBeerBandits
 
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
 
DNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review pptDNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review ppt
 
N ns 1
N ns 1N ns 1
N ns 1
 
Aerial detection part2
Aerial detection part2Aerial detection part2
Aerial detection part2
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
computer networking
computer networkingcomputer networking
computer networking
 
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic...
 
cloud compute
cloud computecloud compute
cloud compute
 
GraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDBGraphSage vs Pinsage #InsideArangoDB
GraphSage vs Pinsage #InsideArangoDB
 
20120140506024
2012014050602420120140506024
20120140506024
 
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale EraRealizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
Realizing Robust and Scalable Evolutionary Algorithms toward Exascale Era
 
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
Presentation of Eco-efficient Cloud Computing Framework for Higher Learning I...
 
A Minimum Spanning Tree Approach of Solving a Transportation Problem
A Minimum Spanning Tree Approach of Solving a Transportation ProblemA Minimum Spanning Tree Approach of Solving a Transportation Problem
A Minimum Spanning Tree Approach of Solving a Transportation Problem
 
virtualization
virtualizationvirtualization
virtualization
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)Graph Neural Network #2-1 (PinSage)
Graph Neural Network #2-1 (PinSage)
 
System architecture
System architectureSystem architecture
System architecture
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
Density maximization for improving graph matching with its applications
Density maximization for improving graph matching with its applicationsDensity maximization for improving graph matching with its applications
Density maximization for improving graph matching with its applications
 

Similar to [Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

Incremental Machine Learning.pptx
Incremental Machine Learning.pptxIncremental Machine Learning.pptx
Incremental Machine Learning.pptxSHAILIPATEL19
 
First steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with ExamplesFirst steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with ExamplesFelipe
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Tarik Reza Toha
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Predictionmiyurud
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningSean Yu
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image ProcessingTech Triveni
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Universitat Politècnica de Catalunya
 
Nips 2017 in a nutshell
Nips 2017 in a nutshellNips 2017 in a nutshell
Nips 2017 in a nutshellLULU CHENG
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical dataPaul Skeie
 
Learning Discrete Representations via Information Maximizing Self-Augmented T...
Learning Discrete Representations via Information Maximizing Self-Augmented T...Learning Discrete Representations via Information Maximizing Self-Augmented T...
Learning Discrete Representations via Information Maximizing Self-Augmented T...Shunsuke KITADA
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature LearningAmgad Muhammad
 

Similar to [Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization (20)

Incremental Machine Learning.pptx
Incremental Machine Learning.pptxIncremental Machine Learning.pptx
Incremental Machine Learning.pptx
 
First steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with ExamplesFirst steps with Keras 2: A tutorial with Examples
First steps with Keras 2: A tutorial with Examples
 
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Practical ML
Practical MLPractical ML
Practical ML
 
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
Exploiting a Synergy between Greedy Approach and NSGA for Scheduling in Compu...
 
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link PredictionMemory Efficient Graph Convolutional Network based Distributed Link Prediction
Memory Efficient Graph Convolutional Network based Distributed Link Prediction
 
NTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer LearningNTU DBME5028 Week8 Transfer Learning
NTU DBME5028 Week8 Transfer Learning
 
Multicore architectures
Multicore architecturesMulticore architectures
Multicore architectures
 
Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...Learning global pooling operators in deep neural networks for image retrieval...
Learning global pooling operators in deep neural networks for image retrieval...
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Finding the best solution for Image Processing
Finding the best solution for Image ProcessingFinding the best solution for Image Processing
Finding the best solution for Image Processing
 
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
Life-long / Incremental Learning (DLAI D6L1 2017 UPC Deep Learning for Artifi...
 
Nips 2017 in a nutshell
Nips 2017 in a nutshellNips 2017 in a nutshell
Nips 2017 in a nutshell
 
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
Lifelong / Incremental Deep Learning - Ramon Morros - UPC Barcelona 2018
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
GoogLeNet.pptx
GoogLeNet.pptxGoogLeNet.pptx
GoogLeNet.pptx
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 
Learning Discrete Representations via Information Maximizing Self-Augmented T...
Learning Discrete Representations via Information Maximizing Self-Augmented T...Learning Discrete Representations via Information Maximizing Self-Augmented T...
Learning Discrete Representations via Information Maximizing Self-Augmented T...
 
Unsupervised Feature Learning
Unsupervised Feature LearningUnsupervised Feature Learning
Unsupervised Feature Learning
 

More from LeapMind Inc

Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DLLeapMind Inc
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DLLeapMind Inc
 
[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...
[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...
[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...LeapMind Inc
 
エッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことエッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことLeapMind Inc
 
20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用LeapMind Inc
 
E20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAILeapMind Inc
 
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係LeapMind Inc
 
20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂LeapMind Inc
 
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜LeapMind Inc
 
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)LeapMind Inc
 
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.120180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1LeapMind Inc
 
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説LeapMind Inc
 
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGAAn Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGALeapMind Inc
 
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会LeapMind Inc
 
JUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep LearningコンパイラJUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep LearningコンパイラLeapMind Inc
 

More from LeapMind Inc (16)

Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DL
 
Survey on optical flow estimation with DL
Survey on optical flow estimation with DLSurvey on optical flow estimation with DL
Survey on optical flow estimation with DL
 
[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...
[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...
[Icml2019]LIT: Learned Intermediate Representation Training for Model Compres...
 
エッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なことエッジ向けDeepLearningプロジェクトで必要なこと
エッジ向けDeepLearningプロジェクトで必要なこと
 
20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用20190227[EDLS]JAL's INNOVATION エアラインのAI活用
20190227[EDLS]JAL's INNOVATION エアラインのAI活用
 
E20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAIE20190227[EDLS]インテル®︎FPGAによるエッジAI
E20190227[EDLS]インテル®︎FPGAによるエッジAI
 
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
20190227[EDLS]進化するAI on Edge 〜 CloudとEdgeの最適な関係
 
20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂20180831 [DeLTA TECH] 深く青い脂
20180831 [DeLTA TECH] 深く青い脂
 
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
20180831 [DeLTA TECH] 新・深層の世紀 〜第3集 ディープラーニング・時代はAIを求めた 〜
 
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
20180831 [DeLTA TECH] DeLTA-Liteを支える技術(システム構成編)
 
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.120180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
20180831 [DeLTA TECH] DeLTA-FamilyによるIndustry4.1
 
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
20180613 [TensorFlow分散学習] Horovodによる分散学習の実装方法と解説
 
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGAAn Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
An Introduction of DNN Compression Technology and Hardware Acceleration on FPGA
 
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
2018年1月19日開催 IoTビジネス共創ラボ 第6回勉強会
 
Pitch v2.2
Pitch v2.2Pitch v2.2
Pitch v2.2
 
JUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep LearningコンパイラJUIZ DLK 組込み向けDeep Learningコンパイラ
JUIZ DLK 組込み向けDeep Learningコンパイラ
 

Recently uploaded

Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfalene1
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solidnamansinghjarodiya
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Communityprachaibot
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdfsahilsajad201
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming languageSmritiSharma901052
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdfHafizMudaserAhmad
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfChristianCDAM
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Erbil Polytechnic University
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfDrew Moseley
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substationstephanwindworld
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 

Recently uploaded (20)

Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdfComprehensive energy systems.pdf Comprehensive energy systems.pdf
Comprehensive energy systems.pdf Comprehensive energy systems.pdf
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
Stork Webinar | APM Transformational planning, Tool Selection & Performance T...
 
Engineering Drawing section of solid
Engineering Drawing     section of solidEngineering Drawing     section of solid
Engineering Drawing section of solid
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
Prach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism CommunityPrach: A Feature-Rich Platform Empowering the Autism Community
Prach: A Feature-Rich Platform Empowering the Autism Community
 
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONTHE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTION
 
Robotics Group 10 (Control Schemes) cse.pdf
Robotics Group 10  (Control Schemes) cse.pdfRobotics Group 10  (Control Schemes) cse.pdf
Robotics Group 10 (Control Schemes) cse.pdf
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
OOP concepts -in-Python programming language
OOP concepts -in-Python programming languageOOP concepts -in-Python programming language
OOP concepts -in-Python programming language
 
11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf11. Properties of Liquid Fuels in Energy Engineering.pdf
11. Properties of Liquid Fuels in Energy Engineering.pdf
 
Ch10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdfCh10-Global Supply Chain - Cadena de Suministro.pdf
Ch10-Global Supply Chain - Cadena de Suministro.pdf
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
Comparative study of High-rise Building Using ETABS,SAP200 and SAFE., SAFE an...
 
Immutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdfImmutable Image-Based Operating Systems - EW2024.pdf
Immutable Image-Based Operating Systems - EW2024.pdf
 
Earthing details of Electrical Substation
Earthing details of Electrical SubstationEarthing details of Electrical Substation
Earthing details of Electrical Substation
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 

[Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

  • 1. Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization Reader: LeapMind, DL Engineer Azuma Kohei LeapMind ICML2019 Reading Session
  • 2. 1. Paper Info 2. Background 3. Problem 4. Proposed 5. Experiment 6. Discussion 7. Conclusion 2 Contents
  • 3. LeapMind Inc. © 2018 Paper Info 3 ● “Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization” ○ ICML 2019 ○ Author: Hesham Mostafa, Xin Wang ● Why did I choose this paper ? ○ It is interesting for me to see the nature of the deep learning model through Pruning ○ pruning without pre-trained model, it sounds interesting
  • 4. LeapMind Inc. © 2018 Background: Overview 4 ● Deep Learning Models are often over parameterized ○ ex. parameter num of VGG16: 138 million, 512MB (FP32) ○ Pressure on Memory ● Several ways to reduce model sizes without significantly reducing accuracy ○ Quantization: weight size reduction ■ ex. 138 million, 512MB -> 138 million, 128MB (int8) ○ Pruning: model size reduction ■ ex. 138 million, 512MB -> 35 million, 128MB (FP32) ○ (Distillation)
  • 5. LeapMind Inc. © 2018 Background: Pruning 5 Parameter num can be reduced by 80~90% without degrading accuracy 1. Pre-Train a large dense model 2. Prune & Re-Train and get sparse model a. remove connections with weights below the threshold b. Re-Train [1] 2.a
  • 6. LeapMind Inc. © 2018 Problem of pruning 6 Pre-Training remains memory-inefficient Models for pre-training have many parameters Can we train compact models directly ? The effectiveness of pruning indicates the existence of compact network parameter configurations
  • 7. LeapMind Inc. © 2018 Related Works 7 ● Static sparse training ○ static: during training, fix the location of non-zero parameters ○ training a static and sparse model worse than compressing a large dense model [4] ○ static models are sensitive to initialization [5] ● Dynamic sparse reparameterization training ○ dynamic: during training, alter the location of non-zero parameters ○ using certain heuristic rules ○ ex. SET [2], DeepR [3]
  • 8. LeapMind Inc. © 2018 Proposed 8 ● Dynamic sparse reparametrization technique ○ use adaptive threshold ○ automatically reallocate parameters across layers parameters are reallocated every hundreds of iterations with the algorithm described below
  • 9. LeapMind Inc. © 2018 Proposed 9 ● Pruning is based on an adaptive global threshold ○ more scalable than methods relying on layer-specific pruning (SET[2]) ● Prune roughly num parameters ○ computationally cheaper than pruning exactly smallest weights (because not to need sort) ● Redistribute zero-initialized parameters after pruning ○ rule: layers having larger fractions of non-zero weights receive proportionally more free parameters ○ the numbers of pruned and grown free parameters are exactly the same rule: G_l: reallocate parameter num for layer l R_l: not pruned parameter num for layer l K_l: pruned parameter num for layer l
  • 10. LeapMind Inc. © 2018 Proposed 10 pruning adjustment of threshold reallocate
  • 11. LeapMind Inc. © 2018 Experiment 1: Evaluation with Baselines 11 comparing accuracy against existing methods ● Model: WRN-28-2 (Appendix A), Dataset: CIFAR10 ● Baselines ○ Full dense : original large and dense model (original WRN-28-2) ○ Compressed sparse : sparse model obtained by iteratively pruning Full dense [4] ○ Thin dense : dense model with fewer layers ○ Static sparse : sparse static model obtained by sparsing randomly ○ SET[2] : existing dynamic reparameterization ○ DeepR[3] : existing dynamic reparameterization ● Sparse models have same global sparsity ○ parameter num of Full dense : ○ parameter num of sparse model :
  • 12. LeapMind Inc. © 2018 Experiment 1: Evaluation with Baselines 12 A. Full dense: pre trained original model B. Compressed sparse: pruned A C. Thin dense: A with less layers D. Static sparse: E. Dynamic sparse: this thesis ● E slightly better than A and SET ● Increase Global sparsity, C and D significantly degrade
  • 13. LeapMind Inc. © 2018 Experiment 2: Search for Important Elements 13 Which is important ? Network sparse structure or Initialization ? ● “lottery ticket” hypothesis [5] ○ large network have some subnetworks ○ initialization is important (see Appendix B) ● After training with dynamic sparse method, retrain the final network sparseness pattern ○ randomly re-initialized ○ original initialization
  • 14. LeapMind Inc. © 2018 Experiment 2: Search for Important Elements 14 ● random, original initialization failed to reach dynamic sparse ● initialization had little effect on dynamic sparse reparametrization ● dynamic reparametrization is important ○ solely the structure, nor to its initialization, nor to a combination of the two
  • 15. LeapMind Inc. © 2018 Discussion 15 ● Dynamic reparametrization is important ○ solely the sparse structure, nor to its initialization, nor to a combination of the two ○ discontinuous jumps in parameter space when parameters are reallocated across layers helped training escape sharp minima that generalize badly ● Better to allocate some memory to explore more sophisticated network ● Computational efficiency is difficult ○ CPU and GPU cannot efficiently handle unstructured sparsity model
  • 16. LeapMind Inc. © 2018 Conclusion 16 ● Dynamic sparse reparametrization for pruning ○ use adaptive threshold ○ automatically reallocate parameters across layers ● Performance was significantly higher than baselines ○ much better than static methods ○ slightly better than Compressed sparse ● Dynamic exploration of structure during training is important ○ solely the structure, nor to its initialization, nor to a combination of the two
  • 17. LeapMind Inc. © 2018 Reference 17 [1] Learning both Weights and Connections for Efficient Neural Networks, Han, et al. NIPS 2015. [2] Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science, Mocanu, et al. Nature Communications 2018. [3] Deep Rewiring: Training very sparse deep networks, Bellec, et al. arxiv 2017. [4] To prune, or not to prune: exploring the efficacy of pruning for model compression, Zhu, et al. arxiv 2017. [5] The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks, Flankle, and Carbin. arxiv 2018.
  • 18. LeapMind Inc. © 2018 Appendix A 18