The New Kid on the Block: GPU-Accelerated Big Data Analytics

Rong Zhou, Senior Researcher and Area Manager, PARC

With open-source big data frameworks such as Apache Hadoop and Spark in the spotlight, most people are probably unfamiliar with the concept of using GPUs (graphics processing units) in either big data or analytics-rich applications. 9 out of 10 cases, the acronym is mentioned in the context of display hardware, video games, or how supercomputers can be built these days. For serious IT managers or data scientists, GPUs may seem too exotic to be the hardware of choice for big data infrastructure. While we see challenges ahead, there are however a few major misconceptions about GPUs that can use some clarification. In a nutshell, we believe most (if not all) of the analytics needs in big data can be met with advanced GPU-based machine learning technologies.

Myth #1: GPUs are only good for gamers or supercomputers

Truth: It’s true that the early adopters of GPUs are mostly in the computer gaming industry or makers of supercomputers. However, the massively parallel computing power of GPUs can also be used to speed up machine learning or data mining algorithms that have nothing to do with 3D graphics. Take the Nvidia Titan Black GPU as an example, it has 2880 cores capable of performing over 5 trillion floating-point operations per second (TFLOPS). For comparison a Xeon E5-2699v3 processor can perform about 0.75 TFLOPS, but may cost 4x as much. Besides TFLOPS, GPUsalso enjoy a significant advantage over CPUs in terms of memory bandwidth, which is more important for data intensive applications. For Titan Black, its maximum memory bandwidth is 336 GB/ sec; whereas E5-2699v3’s is only 68 GB/sec. Higher memory bandwidth means more data can be transferred between the processor and its memory in the same amount of time, which is why GPUs can process large quantities of data in a split second.

"It’s true that GPUs are not as easy to program as their CPU counterparts, due to their unconventional processor designs"

One of the hottest areas of machine learning nowadays is Deep Learning (DL), which uses deep neural networks (DNNs) to teach computers to perform tasks such as machine vision and speech recognition. GPUs are widely used in the training of DNNs, which can take up to a few months on the CPU. With GPU-accelerated DL packages such as Caffe and Theano, the training time is often reduced to a few days.

Myth #2: GPUs are only for small data

Truth: It’s true that GPU cards have limited on-board memory, which cannot be upgraded once they are manufactured, unlike the RAM of a CPU. Furthermore, the maximum RAM size of a GPU is typically much smaller. For example, the maximum memory currently supported by a single Nvidia GPU is 12GB; whereas a multi-socket CPU system can have up to a few TBs of RAM.The conventional thinking is that GPUs areonly suitable for processing small datasets.

Read Also

What's next in Business Analytics?

Rich Clayton, VP of Business Analytics & Big Data Product Group, Oracle

Capitalizing on Data Analytics Using Automation

Ashish Bansal, Senior Director, Data Science, Capital One

The Role of the IT in the Analytically Driven Organization

Dr. Kenneth Elliott, Global Director of Analytics, Hewlett Packard Enterprise Services

The Future with Data Analytics

Eui-Hong Han, Director, Big Data and Personalization, The Washington Post