[Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

Parameter Efficient Training of
Deep Convolutional Neural Networks
by Dynamic Sparse Reparameterization
Reader:
LeapMind, DL Engineer
Azuma Kohei
LeapMind ICML2019 Reading Session

1. Paper Info
2. Background
3. Problem
4. Proposed
5. Experiment
6. Discussion
7. Conclusion
2
Contents

LeapMind Inc. © 2018
Paper Info
3
● “Parameter Efficient Training of Deep Convolutional Neural Networks
by Dynamic Sparse Reparameterization”
○ ICML 2019
○ Author: Hesham Mostafa, Xin Wang
● Why did I choose this paper ?
○ It is interesting for me to see the nature of the deep learning model through Pruning
○ pruning without pre-trained model, it sounds interesting

Background: Overview
4
● Deep Learning Models are often over parameterized
○ ex. parameter num of VGG16: 138 million, 512MB (FP32)
○ Pressure on Memory
● Several ways to reduce model sizes without significantly reducing
accuracy
○ Quantization: weight size reduction
■ ex. 138 million, 512MB -> 138 million, 128MB (int8)
○ Pruning: model size reduction
■ ex. 138 million, 512MB -> 35 million, 128MB (FP32)
○ (Distillation)

Background: Pruning
5
Parameter num can be reduced by 80~90% without degrading accuracy
1. Pre-Train a large dense model
2. Prune & Re-Train and get sparse model
a. remove connections with weights below the threshold
b. Re-Train
[1]
2.a

Problem of pruning
6
Pre-Training remains memory-inefficient
Models for pre-training have many parameters
Can we train compact models directly ?
The effectiveness of pruning indicates
the existence of compact network parameter configurations

Related Works
7
● Static sparse training
○ static: during training, fix the location of non-zero parameters
○ training a static and sparse model worse than compressing a large dense model [4]
○ static models are sensitive to initialization [5]
● Dynamic sparse reparameterization training
○ dynamic: during training, alter the location of non-zero parameters
○ using certain heuristic rules
○ ex. SET [2], DeepR [3]

Proposed
8
● Dynamic sparse reparametrization technique
○ use adaptive threshold
○ automatically reallocate parameters across layers
parameters are reallocated every hundreds of iterations with the algorithm described below

Proposed
9
● Pruning is based on an adaptive global threshold
○ more scalable than methods relying on layer-specific pruning (SET[2])
● Prune roughly num parameters
○ computationally cheaper than pruning exactly smallest weights (because not to
need sort)
● Redistribute zero-initialized parameters after pruning
○ rule: layers having larger fractions of non-zero weights receive proportionally more
free parameters
○ the numbers of pruned and grown free parameters are exactly the same
rule:
G_l: reallocate parameter num for layer l
R_l: not pruned parameter num for layer l
K_l: pruned parameter num for layer l

Proposed
10
pruning
adjustment of
threshold
reallocate

Experiment 1: Evaluation with Baselines
11
comparing accuracy against existing methods
● Model: WRN-28-2 (Appendix A), Dataset: CIFAR10
● Baselines
○ Full dense : original large and dense model (original WRN-28-2)
○ Compressed sparse : sparse model obtained by iteratively pruning Full dense [4]
○ Thin dense : dense model with fewer layers
○ Static sparse : sparse static model obtained by sparsing randomly
○ SET[2] : existing dynamic reparameterization
○ DeepR[3] : existing dynamic reparameterization
● Sparse models have same global sparsity
○ parameter num of Full dense :
○ parameter num of sparse model :

Experiment 1: Evaluation with Baselines
12
A. Full dense: pre trained original model
B. Compressed sparse: pruned A
C. Thin dense: A with less layers
D. Static sparse:
E. Dynamic sparse: this thesis
● E slightly better than A and SET
● Increase Global sparsity, C and D
significantly degrade

Experiment 2: Search for Important Elements
13
Which is important ? Network sparse structure or Initialization ?
● “lottery ticket” hypothesis [5]
○ large network have some subnetworks
○ initialization is important (see Appendix B)
● After training with dynamic sparse method, retrain the final network
sparseness pattern
○ randomly re-initialized
○ original initialization

Experiment 2: Search for Important Elements
14
● random, original initialization failed to reach
dynamic sparse
● initialization had little effect on dynamic sparse
reparametrization
● dynamic reparametrization is important
○ solely the structure, nor to its initialization,
nor to a combination of the two

Discussion
15
● Dynamic reparametrization is important
○ solely the sparse structure, nor to its initialization, nor to a combination of the two
○ discontinuous jumps in parameter space when parameters are reallocated across
layers helped training escape sharp minima that generalize badly
● Better to allocate some memory to explore more sophisticated
network
● Computational efficiency is difficult
○ CPU and GPU cannot efficiently handle unstructured sparsity model

Conclusion
16
● Dynamic sparse reparametrization for pruning
○ use adaptive threshold
○ automatically reallocate parameters across layers
● Performance was significantly higher than baselines
○ much better than static methods
○ slightly better than Compressed sparse
● Dynamic exploration of structure during training is important
○ solely the structure, nor to its initialization, nor to a combination of the two

Reference
17
[1] Learning both Weights and Connections for Efficient Neural Networks, Han, et al. NIPS 2015.
[2] Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science, Mocanu, et al. Nature
Communications 2018.
[3] Deep Rewiring: Training very sparse deep networks, Bellec, et al. arxiv 2017.
[4] To prune, or not to prune: exploring the efficacy of pruning for model compression, Zhu, et al. arxiv 2017.
[5] The Lottery Ticket Hypothesis: Finding Small, Trainable Neural Networks, Flankle, and Carbin. arxiv 2018.

Appendix A
18

[Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to [Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization

Similar to [Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization (20)

More from LeapMind Inc

More from LeapMind Inc (16)

Recently uploaded

Recently uploaded (20)

[Icml2019] parameter efficient training of deep convolutional neural networks by dynamic sparse reparameterization