References

Ahmad, Subutai, and Gerald Tesauro. 1988. “Scaling and Generalization in Neural Networks: A Case Study.” In Proceedings of the 1st International Conference on Neural Information Processing Systems (Nips), 160–68. Morgan-Kaufmann.

Andrychowicz, Marcin, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, and Nando de Freitas. 2016. “Learning to Learn by Gradient Descent by Gradient Descent.” In Advances in Neural Information Processing Systems, 3981–9. Curran Associates, Inc.

Ba, Jimmy, and Rich Caruana. 2014. “Do Deep Nets Really Need to be Deep?” In Proceedings of the 27th International Conference on Neural Information Processing Systems (Nips), edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 2654–62.

Baker, Bowen, Otkrist Gupta, Nikhil Naik, and Ramesh Raskar. 2017. “Designing Neural Network Architectures Using Reinforcement Learning.” In International Conference on Learning Representations (Iclr).

Bartlett, Peter L. 1996. “For Valid Generalization the Size of the Weights Is More Important Than the Size of the Network.” In Proceedings of the 9th International Conference on Neural Information Processing Systems (Nips), 134–40. MIT Press.

Bastani, Osbert, Yani Ioannou, Leonidas Lampropoulos, Dimitrios Vytiniotis, Aditya Nori, and Antonio Criminisi. 2016. “Measuring Neural Net Robustness with Constraints.” In Advances in Neural Information Processing Systems, 2613–21. Curran Associates, Inc.

Baum, Eric B., and David Haussler. 1988. “What Size Net Gives Valid Generalization?” In Proceedings of the 1st International Conference on Neural Information Processing Systems (Nips), 81–90. Morgan-Kaufmann.

Bengio, Emmanuel, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. 2015. “Conditional Computation in Neural Networks for Faster Models.”

Bengio, Yoshua, Patrick Simard, and Paolo Frasconi. 1994. “Learning Long-Term Dependencies with Gradient Descent Is Difficult.” IEEE Transactions on Neural Networks 5 (2). IEEE Press:157–66. https://doi.org/10.1109/72.279181.

Bishop, Christopher M. 1995. Neural Networks for Pattern Recognition. Oxford: Oxford University Press.

Bottou, Léon. 2012. “Stochastic Gradient Descent Tricks.” In Neural Networks: Tricks of the Trade, edited by Grégoire Montavon, Geneviève B. Orr, and Klaus-Robert Müller, Second, 7700:421–36. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-642-35289-8_25.

Bulò, Samuel, and Peter Kontschieder. 2014. “Neural Decision Forests for Semantic Image Labelling.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 81–88. https://doi.org/10.1109/CVPR.2014.18.

Burges, Christopher J.C. 1998. “A Tutorial on Support Vector Machines for Pattern Recognition.” Data Mining and Knowledge Discovery 2 (2). Springer:121–67. https://doi.org/10.1023/A:1009715923555.

Caruana, Rich, Steve Lawrence, and C. Lee Giles. 2000. “Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping.” In Proceedings of the 13th International Conference on Neural Information Processing Systems (Nips), 381–87. Cambridge, MA, USA: MIT Press.

Castellano, Giovanna, Anna Maria Fanelli, and Marcello Pelillo. 1997. “An Iterative Pruning Algorithm for Feedforward Neural Networks.” IEEE Transactions on Neural Networks 8 (3). IEEE:519–31. https://doi.org/10.1109/72.572092.

Changpinyo, Soravit, Mark Sandler, and Andrey Zhmoginov. 2017. “The Power of Sparsity in Convolutional Neural Networks.”

Chen, Wenlin, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. 2015. “Compressing Neural Networks with the Hashing Trick.” In Proceedings of the 32nd International Conference on Machine Learning (ICML), edited by Francis R. Bach and David M. Blei, 37:2285–94. JMLR. http://arxiv.org/abs/1504.04788.

Chipman, Hugh, Edward I. George, and Robert E. McCulloch. 2001. “The Practical Implementation of Bayesian Model Selection.” Lecture Notes-Monograph Series 38. Institute of Mathematical Statistics:65–134.

Cogswell, Michael, Faruk Ahmed, Ross B. Girshick, Larry Zitnick, and Dhruv Batra. 2016. “Reducing Overfitting in Deep Networks by Decorrelating Representations.” In International Conference on Learning Representations (ICLR).

Cybenko, George. 1989. “Approximation by superpositions of a sigmoid function.” Mathematics of Control, Signals, and Systems (MCSS) 2 (4). Springer:303–14.

Damelin, Steven B., and Willard Miller Jr. 2012. The Mathematics of Signal Processing. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9781139003896.

Denil, Misha, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando de Freitas. 2013. “Predicting Parameters in Deep Learning.” In Proceedings of the 26th International Conference on Neural Information Processing Systems (Nips), 2148–56. Curran Associates, Inc. http://arxiv.org/abs/1306.0543.

Denker, John, Daniel Schwartz, Ben Wittner, Sara Solla, Richard Howard, Lawrence Jackel, and John Hopfield. 1987. “Large automatic learning, rule extraction, and generalization.” Complex Systems 1 (5):877–922.

Denton, Emily L., Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. “Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation.” In Proceedings of the 27th International Conference on Neural Information Processing Systems (Nips), edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 1269–77. Curran Associates, Inc.

Fahlman, Scott E., and Christian Lebiere. 1989. “The Cascade-Correlation Learning Architecture.” In Proceedings of the 2nd International Conference on Neural Information Processing Systems (Nips), edited by David S Touretzky, 524–32. Morgan Kaufmann.

Fodor, Imola K. 2002. “A Survey of Dimension Reduction Techniques.” Technical report UCRL-ID-148494. Center for Applied Scientific Computing, Lawrence Livermore National Laboratory. https://doi.org/10.2172/15002155.

Frean, Marcus. 1990. “The Upstart Algorithm: A Method for Constructing and Training Feedforward Neural Networks.” Neural Computation 2 (2). MIT Press:198–209. https://doi.org/10.1162/neco.1990.2.2.198.

Fukushima, Kunihiko. 1980. “Neocognitron: A Self-Organizing Neural Network Model for a Mechanish of Pattern Recognition Unaffected by Shifts in Position.” Biological Cybernetics 36:193–202.

———. 2013. “Artificial vision by multi-layered neural networks: Neocognitron and its advances.” Neural Networks 37. Elsevier:103–19. https://doi.org/10.1016/j.neunet.2012.09.016.

Gallant, Stephen I. 1986. “Optimal Linear Discriminants.” In Eighth International Conference on Pattern Recognition (Icpr), 849–52. IAPR.

Giles, C. Lee, and Tom Maxwell. 1987. “Learning, invariance, and generalization in high-order neural networks.” Applied Optics 26 (23). Optical Society of America:4972–8. https://doi.org/10.1364/AO.26.004972.

Glorot, Xavier, and Yoshua Bengio. 2010. “Understanding the difficulty of training deep feedforward neural networks.” In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), 9:249–56.

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.

Goodfellow, Ian J., David Warde-Farley, Mehdi Mirza, Aaron Courville, and Yoshua Bengio. 2013. “Maxout Networks.” In Proceedings of the 30th International Conference on Machine Learning (ICML), edited by Sanjoy Dasgupta and David McAllester, 28:1319–27. 3.

Gorodkin, Jan, Lars Kai Hansen, Anders Krogh, Claus Svarer, and Ole Winther. 1993. “A Quantitative Study of Pruning by Optimal Brain Damage.” International Journal of Neural Systems 4 (02). World Scientific:159–69. https://doi.org/10.1142/S0129065793000146.

Gupta, Suyog, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. “Deep Learning with Limited Numerical Precision.” Edited by Francis R. Bach and David M. Blei. http://arxiv.org/abs/1502.02551.

Han, Song, Huizi Mao, and William J. Dally. 2016. “Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding.” http://arxiv.org/abs/1510.00149.

Han, Song, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, et al. 2017. “DSD: Dense-Sparse-Dense Training for Deep Neural Networks.” In International Conference on Learning Representations (Iclr).

Han, Song, Jeff Pool, John Tran, and William J. Dally. 2015. “Learning both weights and connections for efficient neural network.” In Proceedings of the 28th International Conference on Neural Information Processing Systems (Nips), 1135–43. Curran Associates, Inc.

Hanson, Stephen José, and Lorien Y. Pratt. 1988. “Comparing Biases for Minimal Network Construction with Back-Propagation.” In Proceedings of the 1st International Conference on Neural Information Processing Systems (Nips), 177–85. Morgan Kaufmann.

Haykin, Simon. 1994. Neural Networks: A Comprehensive Foundation. Prentice Hall.

He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification.” In IEEE International Conference on Computer Vision (ICCV), 1026–34. IEEE. https://doi.org/10.1109/ICCV.2015.123.

———. 2016a. “Deep Residual Learning for Image Recognition.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–78. https://doi.org/10.1109/CVPR.2016.90.

———. 2016b. “Identity Mappings in Deep Residual Networks.” In 14th European Conference on Computer Vision (Eccv), 5:630–45. Springer. https://doi.org/10.1007/978-3-319-46493-0_38.

Hebb, Donald Olding. 1949. The Organization of Behavior: A Neuropsychological Approach. John Wiley & Sons.

Hinton, Geoffery E. 1987. “Learning translation invariant recognition in a massively parallel networks.” In Proceedings of the Conference on Parallel Architectures and Languages Europe (PARLE), 1:1–13. Springer. https://doi.org/10.1007/3-540-17943-7_117.

———. 2015. “Deep Learning.” Division F Talks. University of Cambridge: Department of Engineering; Public Lecture. http://sms.cam.ac.uk/media/2017973.

Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. 2006. “Reducing the Dimensionality of Data with Neural Networks.” Science 313 (5786). American Association for the Advancement of Science:504–7. https://doi.org/10.1126/science.1127647.

Hinton, Geoffrey E., Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012a. “Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors.”

———. 2012b. “Improving neural networks by preventing co-adaptation of feature detectors.” http://arxiv.org/abs/1207.0580.

Hochreiter, Sepp. 1991. “Untersuchungen Zu Dynamischen Neuronalen Netzen.” PhD thesis, Munich, Germany: Technische University of Munich.

Hochreiter, Sepp, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. 2001. “Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.” In A Field Guide to Dynamical Recurrent Networks, edited by John F. Kolen and Stefan C. Kremer, 237–43. Wiley-IEEE Press. https://doi.org/10.1109/9780470544037.ch14.

Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989. “Multilayer Feedforward Networks Are Universal Approximators.” Neural Networks 2 (5). Elsevier:356–66. https://doi.org/10.1016/0893-6080(89)90020-8.

Hubel, D. H., and T. N. Wiesel. 1959. “Receptive Fields of Single Neurones in the Cat’s Striate Cortex.” The Journal of Physiology 148 (3). Wiley:574–91. https://doi.org/10.1113/jphysiol.1959.sp006308.

Ioannou, Yani, Duncan Robertson, Roberto Cipolla, and Antonio Criminisi. 2017. “Deep Roots: Improving CNN efficiency with hierarchical filter groups.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). http://arxiv.org/abs/1605.06489.

Ioannou, Yani, Duncan Robertson, Jamie Shotton, Roberto Cipolla, and Antonio Criminisi. 2016. “Training Cnns with Low-Rank Filters for Efficient Image Classification.” In International Conference on Learning Representations (ICLR). http://arxiv.org/abs/1511.06744.

Ioannou, Yani, Duncan Robertson, Darko Zikic, Peter Kontschieder, Jamie Shotton, Matthew Brown, and Antonio Criminisi. 2015. “Decision Forests, Convolutional Networks and the Models in-Between.” Technical report MSR-TR-2015-58. Microsoft Research.

Ioffe, Sergey, and Christian Szegedy. 2015. “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift.” In Proceedings of the 32nd International Conference on Machine Learning (ICML), edited by Francis R. Bach and David M. Blei, 37:448–56.

Jaderberg, Max, Andrea Vedaldi, and Andrew Zisserman. 2014. “Speeding up Convolutional Neural Networks with Low Rank Expansions.” In Proceedings of the British Machine Vision Conference. BMVA Press. https://doi.org/10.5244/C.28.88.

Jain, Ashesh, Amir R. Zamir, Silvio Savarese, and Ashutosh Saxena. 2016. “Structural-Rnn: Deep Learning on Spatio-Temporal Graphs.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5308–17. https://doi.org/10.1109/CVPR.2016.573.

Jhurani, Chetan, and Paul Mullowney. 2015. “A GEMM interface and implementation on NVIDIA GPUs for multiple small matrices.” Journal of Parallel and Distributed Computing 75. Elsevier:133–40. https://doi.org/10.1016/j.jpdc.2014.09.003.

Jia, Yangqing, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. “Caffe: Convolutional Architecture for Fast Feature Embedding.” In Proceedings of the 22Nd Acm International Conference on Multimedia, 675–78. https://doi.org/10.1145/2647868.2654889.

Kaski, Samuel. 1998. “Dimensionality Reduction by Random Mapping: Fast Similarity Computation for Clustering.” In IEEE International Joint Conference on Neural Networks Proceedings, 1:413–18. IEEE. https://doi.org/10.1109/IJCNN.1998.682302.

Kim, Yong-Deok, Eunhyeok Park, Sungjoo Yoo, Taelim Choi, Lu Yang, and Dongjun Shin. 2016. “Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications.” In International Conference on Learning Representations (ICLR), 1–16. http://arxiv.org/abs/1511.06530.

Krizhevsky, Alex. 2009. “Learning Multiple Layers of Features from Tiny Images.” Technical Report. Univ. Toronto.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. 2012. “ImageNet Classification with Deep Convolutional Neural Networks.” In Proceedings of the 25th International Conference on Neural Information Processing Systems (Nips), 1097–1105. http://arxiv.org/abs/1102.0183.

Lattimore, Tor, and Marcus Hutter. 2013. “No Free Lunch Versus Occam’s Razor in Supervised Learning.” In Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, edited by David L. Dowe, 223–35. Springer. https://doi.org/10.1007/978-3-642-44958-1_17.

Lebedev, Vadim, Yaroslav Ganin, Maksim Rakhuba1, Ivan Oseledets, and Victor Lempitsky. 2015. “Speeding-Up Convolutional Neural Networks Using Fine-tuned CP-Decomposition.” In International Conference on Learning Representations (Iclr).

LeCun, Yann. 1989. “Generalization and network design strategies.” In Connectionism in Perspective, edited by R. Pfeifer, Z. Schreter, F. Fogelman-Soulié, and L. Steels, First, 143–55. Zurich, Switzerland: Elsevier.

Lecun, Yann, Yoshua Bengio, and Geoffrey Hinton. 2015. “Deep Learning.” Nature 521 (1):436–44. https://doi.org/10.1038/nature14539.

LeCun, Yann, Bernhard Boser, John S. Denker, Donnie Henderson, Richard E. Howard, Wayne Hubbard, and Lawrence D. Jackel. 1989. “Backpropagation applied to handwritten zip code recognition.” Neural Computation 1 (4). Cambridge, MA, USA: MIT Press:541–51. https://doi.org/10.1162/neco.1989.1.4.541.

LeCun, Yann, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. “Gradient-Based Learning Applied to Document Recognition.” Proceedings of the IEEE 86 (11). IEEE:2278–2324. https://doi.org/10.1109/5.726791.

LeCun, Yann, John S. Denker, and Sara A. Solla. 1989. “Optimal Brain Damage.” In Proceedings of the 2nd International Conference on Neural Information Processing Systems (Nips), 2:598–605. 1.

Lee, Tae Kwan, Wissam J. Baddar, Seong Tae Kim, and Yong Man Ro. 2017. “Convolution with Logarithmic Filter Groups for Efficient Shallow CNN.”

Lin, Min, Qiang Chen, and Shuicheng Yan. 2014. “Network in Network.” In International Conference on Learning Representations (Iclr). http://arxiv.org/abs/1312.4400.

Lowe, David G. 2004. “Distinctive Image Features from Scale-Invariant Keypoints.” International Journal of Computer Vision 60 (2). Springer:91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94.

MacKay, David J.C. 1991. “Bayesian Methods for Adaptive Models.” PhD thesis, Pasadena, CA, USA: California Institute of Technology.

———. 1992. “A Practical Bayesian Framework for Backpropagation Networks.” Neural Computation 4 (3):448–72. https://doi.org/10.1162/neco.1992.4.3.448.

———. 1995. “Probable Networks and Plausible Predictions — a Review of Practical Bayesian Methods for Supervised Neural Networks.” Network: Computation in Neural Systems 6 (3). Taylor & Francis:469–505. https://doi.org/10.1088/0954-898X_6_3_011.

Mamalet, Franck, and Christophe Garcia. 2012. “Simplifying Convnets for Fast Learning.” In 22nd International Conference on Artificial Neural Networks (Icann), edited by Alessandro E. P. Villa, Włodzisław Duch, Péter Érdi, Francesco Masulli, and Günther Palm, 58–65. 2. Springer. https://doi.org/10.1007/978-3-642-33266-1_8.

Martens, James. 2010. “Deep learning via Hessian-free optimization.” In Proceedings of the 27th International Conference on Machine Learning (ICML), 735–42.

Mathieu, Michael, Mikael Henaff, and Yann LeCun. 2014. “Fast Training of Convolutional Networks through FFTs.” In International Conference on Learning Representations (Iclr).

Mezard, Marc, and Jean-P Nadal. 1989. “Learning in feedforward layered networks: The tiling algorithm.” Journal of Physics A: Mathematical and General 22 (12). IOP Publishing:2191–2203. https://doi.org/10.1088/0305-4470/22/12/019.

Minsky, Marvin, and Seymour Papert. 1988. Perceptrons. Second, Expanded Edition. MIT press.

Montillo, Albert, Jamie Shotton, John Winn, Juan Eugenio Iglesias, Dimitri Metaxas, and Antonio Criminisi. 2011. “Entangled decision forests and their application for semantic segmentation of CT images.” In 22nd International Conference on Information Processing in Medical Imaging (Ipmi), edited by Gábor Székely and Horst K. Hahn, 184–96. Springer. https://doi.org/10.1007/978-3-642-22092-0_16.

Mozer, Michael C., and Paul Smolensky. 1988. “Skeletonization: A Technique for Trimming the Fat from a Network via Relevance Assessment.” In Proceedings of the 1st International Conference on Neural Information Processing Systems (Nips), 107–15. Morgan-Kaufmann.

———. 1989. “Using Relevance to Reduce Network Size Automatically.” Connection Science 1 (1). Taylor & Francis:3–16. https://doi.org/10.1080/09540098908915626.

Nair, Vinod, and Geoffrey E. Hinton. 2010. “Rectified Linear Units Improve Restricted Boltzmann Machines.” In Proceedings of the 27th International Conference on Machine Learning, 807–14. 3.

Parekh, Rajesh, Jihoon Yang, and Vasant Honavar. 2000. “Constructive neural-network learning algorithms for pattern classification.” IEEE Transactions on Neural Networks 11 (2). IEEE:436–51. https://doi.org/10.1109/72.839013.

Polyak, Boris T. 1964. “Some Methods of Speeding up the Convergence of Iteration Methods.” USSR Computational Mathematics and Mathematical Physics 4 (5). Elsevier:1–17.

Rastegari, Mohammad, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. “Xnor-Net: Imagenet Classification Using Binary Convolutional Neural Networks.” In 14th European Conference on Computer Vision (Eccv), 525–42. Springer. https://doi.org/10.1007/978-3-319-46493-0_32.

Rigamonti, Roberto, Amos Sironi, Vincent Lepetit, and Pascal Fua. 2013. “Learning Separable Filters.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2754–61. IEEE. https://doi.org/10.1109/CVPR.2013.355.

Rippel, Oren, Jasper Snoek, and Ryan Prescott Adams. 2015. “Spectral Representations for Convolutional Neural Networks.” In Proceedings of the 28th International Conference on Neural Information Processing Systems (Nips), edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2440–8. Curran Associates, Inc.

Rosenblatt, Frank. 1958. “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review 65 (6). American Psychological Association:386.

———. 1961. “Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms.” Technical report VG-1196-G-8. Cornell Aeronautical Laboratory Ltd., Cornell University, Buffalo, NY.

Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323 (6088):533–36. https://doi.org/10.1038/323533a0.

Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. “ImageNet Large Scale Visual Recognition Challenge.” International Journal of Computer Vision (IJCV) 115 (3):211–52. https://doi.org/10.1007/s11263-015-0816-y.

Schwartz, Daniel B., Vijay K. Samalam, Sara A. Solla, and John S. Denker. 1990. “Exhaustive Learning.” Neural Computation 2 (3). MIT Press:374–85. https://doi.org/10.1162/neco.1990.2.3.374.

Sermanet, Pierre, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. 2014. “OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks.” In International Conference on Learning Representations (Iclr). http://arxiv.org/abs/1312.6229.

Sethi, I. K. 1990. “Entropy Nets: From Decison Trees to Neural Networks.” Proceedings of the IEEE 78 (10). IEEE:1605–13. https://doi.org/10.1109/5.58346.

Setiono, Rudy. 1997. “A Penalty-Function Approach for Pruning Feedforward Neural Networks.” Neural Computation 9 (1). MIT Press:185–204. https://doi.org/10.1162/neco.1997.9.1.185.

Shankar, Sukrit, Duncan Robertson, Yani Ioannou, Antonio Criminisi, and Roberto Cipolla. 2016. “Refining Architectures of Deep Convolutional Neural Networks.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2212–20. https://doi.org/10.1109/CVPR.2016.243.

Shotton, Jamie, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake. 2011. “Real-time human pose recognition in parts from single depth images.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2:1297–1304. 3. IEEE. https://doi.org/10.1109/CVPR.2011.5995316.

Sietsma, Jocelyn, and Robert J.F. Dow. 1988. “Neural net pruning-why and how.” In IEEE International Conference on Neural Networks, 1:325–33. IEEE San Diego. https://doi.org/10.1109/ICNN.1988.23864.

Simonyan, Karen, and Andrew Zisserman. 2015. “Very deep convolutional networks for large-scale image recognition.” In International Conference on Learning Representations (Iclr). http://arxiv.org/abs/1409.1556.

Snoek, Jasper, Hugo Larochelle, and Ryan Prescott Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Proceedings of the 25th International Conference on Neural Information Processing Systems (Nips), edited by F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger, 2951–9.

Srivastava, Nitish, Geoffrey E. Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2014. “Dropout : A Simple Way to Prevent Neural Networks from Overfitting.” Journal of Machine Learning Research (JMLR) 15 (1):1929–58.

Sutskever, Ilya, James Martens, George E. Dahl, and Geoffrey E. Hinton. 2013. “On the importance of initialization and momentum in deep learning.” In Proceedings of the 30th International Conference on Machine Learning, 28:1139–47. 3.

Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. “Going Deeper with Convolutions.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/CVPR.2015.7298594.

Szegedy, Christian, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. 2016. “Rethinking the Inception Architecture for Computer Vision.” In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2818–26. https://doi.org/10.1109/CVPR.2016.308.

Szeliski, Richard. 2011. Computer Vision: Algorithms and Applications. 1st ed. New York, NY, USA: Springer-Verlag New York, Inc. https://doi.org/10.1007/978-1-84882-935-0.

Tieleman, Tijmen, and Geoffrey Hinton. 2012. “Lecture 6.5-Rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude.” Neural Networks for Machine Learning. Coursera; Public (Online) Lecture. https://www.coursera.org/learn/neural-networks.

Ullrich, Karen, Edward Meeds, and Max Welling. 2017. “Soft Weight-Sharing for Neural Network Compression.” In International Conference on Learning Representations (Iclr). http://arxiv.org/abs/1702.04008.

Vanhoucke, Vincent, Andrew Senior, and Mark Z. Mao. 2011. “Improving the speed of neural networks on CPUs.” In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.

Vapnik, Vladimir N., and Alexey Ya. Chervonenkis. 2015. “On the uniform convergence of relative frequencies of events to their probabilities.” In Measures of Complexity, edited by B. Seckler, 11–30. Springer. https://doi.org/10.1007/978-3-319-21852-6_3.

Widrow, Bernard, and Marcian E. Hoff. 1960. “Adaptive Switching Circuits.” Technical report 1553-1. Solid State Electronics Laboratory, Stanford University, Stanford, CA.

Wolpert, David H. 1996. “The Lack of a Priori Distinctions Between Learning Algorithms.” Neural Computation 8 (7). MIT Press:1341–90. https://doi.org/10.1162/neco.1996.8.7.1341.

Xie, Saining, Ross B. Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. “Aggregated Residual Transformations for Deep Neural Networks.” In IEEE Conference on Computer Vision and Pattern Recognition (Cvpr). http://arxiv.org/abs/1611.05431.

Yi, Kwang Moo, Eduard Trulls, Vincent Lepetit, and Pascal Fua. 2016. “LIFT: Learned Invariant Feature Transform.” In 14th European Conference on Computer Vision (Eccv), 467–83. 6. Springer. https://doi.org/10.1007/978-3-319-46466-4_28.

Zhang, Chiyuan, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2017. “Understanding deep learning requires rethinking generalization.” In International Conference on Learning Representations (Iclr). http://arxiv.org/abs/1611.03530.

Zhang, Ting, Guo-Jun Qi, Bin Xiao, and Jingdong Wang. 2017. “Interleaved Group Convolutions for Deep Neural Networks.” In IEEE International Conference on Computer Vision (ICCV).

Zhang, Xiangyu, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2017. “ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices.” arXiv Preprint arXiv:1707.01083.

Zhou, Bolei, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Aude Oliva. 2014. “Learning Deep Features for Scene Recognition Using Places Database.” In Proceedings of the 27th International Conference on Neural Information Processing Systems (Nips), edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 487–95. Curran Associates, Inc.

Zoph, Barret, and Quoc V. Le. 2017. “Neural Architecture Search with Reinforcement Learning.” In International Conference on Learning Representations (Iclr).