Press Release Summary:
The NMAX Neural Inferencing Engine is designed to deliver 1 to >100 TOPS of neural inferencing capacity. The product features 512 MACs with local SRAM, tiles that can arrayed without GDS change, and up to >100 TOPS of performance.
Original Press Release:
Flex Logix Launches NMAX Neural Inferencing Engine that Delivers 1 to 100+ TOPS Performance Using 1/10th the Typical DRAM Bandwidth
Represents new line of business for Flex Logix to enable performance edge inferencing by slashing DRAM cost and power
MOUNTAIN VIEW, Calif., Oct. 31, 2018 /PRNewswire/ -- Flex Logix®Technologies, Inc. today announced that it has leveraged its core patent-protected interconnect technology from its embedded FPGA (eFPGA) line of business to launch a completely new product line focused on neural inferencing. Unveiled today in a presentation at the Linley Processor Conference in Santa Clara, the Flex Logix NMAX™ Neural Inferencing Engine delivers 1 to >100 TOPS of neural inferencing capacity in a modular, scalable architecture that requires a fraction of the DRAM bandwidth of existing neural inferencing solutions.
"The difficult challenge in neural network inferencing is minimizing data movement and energy consumption, which is something our interconnect technology can do amazingly well," said Geoff Tate. "While performance is key in inferencing, what sets NMAX apart is how it handles this data movement while using a fraction of the DRAM bandwidth that other inferencing solutions require. This dramatically cuts the power and cost of these solutions, which is something customers in edge applications require for volume deployment."
Two High-Growth Business: One Innovative Interconnect Technology
Flex Logix has emerged as a market leader in the eFPGA market, with customers such as DARPA, Boeing, Harvard, Sandia, SiFive RISC-V, and many more designing chips based on this platform. The new NMAX neural inferencing engine leverages the same core interconnect technology used in eFPGA, but represents an entirely new product line of business for the company, targeted directly at the neural inferencing portion of the explosive AI market.
In neural inferencing, the computation is primarily trillions of operations (multiplies and accumulates, typically using 8-bit integer inputs and weights, and sometimes 16-bit integer). The technology Flex Logix has developed for eFPGA is also ideally suited for inferencing because eFPGA allows for re-configurable, fast control logic for each network stage. SRAM in eFPGA is reconfigurable as needed in neural networks where each layer can require different data sizes; and Flex Logix interconnects allow reconfigurable connections between SRAM input banks, MAC clusters, and activation to SRAM output banks at each stage.
The result is an NMAX tile of 512 MACs with local SRAM, which in 16nm has ~1 TOPS peak performance. NMAX tiles can be arrayed, without any GDS change, in configurations of whatever TOPS is required, with varying amounts of SRAM as needed to optimize for the target neural network model, up to to >100 TOPS peak performance.
For example, for YOLOv3 real time object recognition, NMAX arrays can be generated in increasing size to process 1, 2 or 4 cameras with 2 MegaPixel inputs at 30 frames per second with batch size = 1. This is done with just ~10GB/sec of DRAM bandwidth, compared to the 100s of GB/second of existing solutions. In this example, MAC utilization is in the 60-80% range, which is much better than existing solutions.
Another example is ResNet-50 for image classification. The three NMAX arrays mentioned above classify 4600, 9500 and 19,000 images/second respectively, all with batch size = 1. All of these throughputs are achieved with 1 DRAM and about 90% MAC utilization. As a comparison Nvidia Tesla T4 needs a batch size of 28 to achieve 3920 images/second, achieving <25% MAC utilization while using 8 DRAMs. Lower batch sizes are very important for all edge applications and many data center applications in order to minimize latency – long latency means slower response time.
High MAC utilization means less silicon area/cost. Low DRAM bandwidth means fewer DRAMs, less system cost and less power.
NMAX is a general purpose Neural Inferencing Engine which can run any type of NN from simple fully connected DNN to RNN to CNN and can run multiple NNs at a time. NMAX is programmed using Tensorflow and in the future will support other model description languages as well.
NMAX is in development now and will be available in the second half of 2019. For more information, prospective customers can go to www.flex-logix.com to review the slides presented today at the Linley Processor Conference and/or contact firstname.lastname@example.org for further details of NMAX under NDA.
About Flex Logix
Flex Logix, founded in March 2014, provides solutions for making flexible chips and accelerating neural network inferencing. Its eFPGA platform enables chips to be flexible to handle changing protocols, standards, algorithms and customer needs and to implement reconfigurable accelerators that speed key workloads 30-100x faster than Microsoft Azure processing in the Cloud. eFPGA is available for any array size on the most popular process nodes now with increasing customer adoption. Flex Logix's second product line, NMAX, utilizes its eFPGA and interconnect technology to provide modular, scalable neural inferencing from 1 to >100 TOPS using 1/10th the typical DRAM bandwidth, resulting in much lower system power and cost. Having raised more than $13 million of venture capital, Flex Logix is headquartered in Mountain View, California, and has sales rep offices in China, Europe, Israel, Japan, Taiwan and throughout the USA. More information can be obtained at http://www.flex-logix.com or follow on Twitter at @efpga.
Tanis Communications, Inc.