So often, conversations regarding deep learning are focused entirely on software and programming — who can train deep learning models to perform X use case better than anyone else? But developments on the hardware side of the equation reveal that there is more to advancing artificial intelligence.
As conventional big data analytics emerged, the industry turned toward traditional CPU-based commodity hardware and distributed systems like Hadoop that could handle linear big data and analytics tasks. But now, with AI, this trend is swinging in another direction.
When deep learning, with its complex algorithms like convolutional neural networks, first started popping up, the chip world was unquestionably dominated by the Intel CPU. But the need for less linear, more complex analysis led a lot of data scientists to turn to the GPU, which were traditionally used for image-heavy computing like gaming, to handle deep learning tasks. NVIDIA became the clear front runner when it optimized its GPU for AI use cases, lessening the load on CPUs and driving up deep learning model performance.
The rise of the GPU for AI coincides with the type of data tasks we’re doing today. Data hasn’t just grown linearly, scaling only with the amount collected. Data is getting wide, with speech and image data creating complexity for already taxed processors. Add to that the need to train data, and processing times can go from minutes to hours on traditional CPU-based systems.
The analogy goes that CPUs are like a Ferrari, able to transport small slivers of linear data quickly between chip and memory. Meanwhile, deep learning, which uses wide data and performs highly intensive tasks, needs something like really fast dump trucks, transporting mass data from memory to chip without losing much of the speed.
GPUs are the processor of choice for deep learning today, but where are we going in the future? For one, GPUs are still advancing — when NVIDIA released the DGX-1 supercomputer last year, it offered 12 times the speed of its deep learning hardware and software from the year before. And released specs on the follow-up DGX-2’s release confirm that NVIDIA GPUs will continue to gain in processing power, while easily outpricing other boxes. Not to be left behind, Intel recently released its own Myriad X chip, made through its acquisition of Movidius. This GPU — branded as a VPU, or vision processing unit, aims to bring deep learning to the edge, with its lightweight design making it ideal for drones, robots, wearables and VR headsets.
But, the story of AI hardware doesn’t begin and end with general purpose GPUs like NVIDIA’s. There are two other types of chips that are gaining momentum. Field-programmable gate arrays (FPGAs), once useful for mostly in-situ, on-the-fly programming — and reprogramming — are also becoming desirable for companies that need to make fast, effective changes to deep learning applications. As deep learning training continues to grow in complexity and variety of use cases, it’s easy to see why these chips are becoming more popular. In fact, Microsoft recently announced that its Azure Cloud, and more specifically its so-called “Project Olympus” servers, will use Intel FPGAs to create a configurable cloud that can be flexibly provisioned to support a wide array of deep learning applications.
If FPGAs are about flexibility, ASICs are about specificity. Application-specific integrated circuits (ASICs) are designed to perform a single custom task — like voice recognition or Bitcoin mining — with unparalleled efficiency. These chips would be ideal for applications like determining what a self-driving car’s stereo vision is seeing and deciding if it needs to brake or steer away from an object. That chip will never need to perform another function, but it can perform its one task impeccably and at a great rate of speed. Google made waves on this front when it announced the Tensor Processing Unit (TPU), its own custom processing for deep learning. NVIDIA, not willing to be left behind in the general purposes GPU world, especially if use cases and market needs shift to applications-specific circuits, is working on its own ASIC for specialized deep learning uses that require stable, high-volume processing.
Regardless of which option businesses and engineers take, hardware and software integration is going to have to get very tight, efficient and high performing for deep learning applications of the future. For example, the most popular Deep Learning software package TensorFlow works easily with NVIDIA’s CUDA, a parallel computing library that is designed to be the interface between applications and GPUs, abstracting the difficult GPU programming away from application developers. Wide data is only going to continue to grow. Programmers are only going to get better at making deep learning algorithms. Programming languages are only going to become more refined. Software libraries are only going to continue to swell in volume. As such, hardware will have to continue to made astonishing advances. And this next generation of chips — offering both flexibility and specialization at more rapid speeds — shows promise that will unearth the true potential of deep learning.