Córais

Research Group

 

ACACES 2017 - Chapter: Low Complexity Multiply Accumulate Unit for Weight-Sharing Convolutional Neural Networks

James Garland, David Gregg
HiPEAC, the European Network of Excellence on High Performance and Embedded Architecture and Compilation, 2017

• Convolutional neural networks (CNNs) are very successful deep machine learning technologies. • CNNs require large amounts of processing capacity and memory bandwidth. • Proposed hardware accelerators typically contain large numbers of multiply-accumulate (MAC) units. • One CNN accelerator approach is “weight sharing”: – Full range of trained CNN weight values are stored in bins; – Index to bin is used instead of the original weight value, thus reducing data sizes and memory traffic. • We propose here a novel multiply-accumulate (MAC) circuit that exploits binning in weight-sharing CNNs. • Rather than computing the MAC directly we: – Count the frequency of each weight and place the count in a bin. – Compute the accumulated value in a subsequent multiply phase. • Proposal allows hardware multipliers in the MAC circuit to be replaced with adders and selection logic. • Results in fewer gates, smaller logic, and reduced power with a slight latency increase in application specific integrated circuit (ASIC). • Results in fewer cells, reduced power when implemented in resource-constrained field programmable gate arrays (FPGAs).

(poster)