Skip to content

Commit 48393c3

Browse files
authored
Merge pull request #3 from czgdp1807/blog-bugs
Removed blog bugs
2 parents 01e1d00 + e9f1cc5 commit 48393c3

File tree

1 file changed

+4
-11
lines changed

1 file changed

+4
-11
lines changed

_posts/2019-05-30-CPU-GPU-and-TPU-in-Deep-Learning.md

Lines changed: 4 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ image: "/images/benchmark-cpu-gpu.png"
99

1010
GPU (Graphical Processing Unit) is one of the most widely used processing unit used in training data models. It is a single processor chip which frees the CPU cycles from jobs of image processing and mathematical computations. For example, a game with a lot of graphics: many shadows to cast, lots of atmospheric effects, various lighting sources, complex textures, etc. All these eat up extra GPU processing. Also, settings at higher resolutions means each single frame needs more such calculations to just display each pixel on that frame. Everything is handled by the GPU. When the GPU cannot remain at par with the CPU, it causes bottlenecks.
1111

12-
Some stark differences between CPU and GPU
12+
#### Some stark differences between CPU and GPU
1313

1414
CPU processes data sequentially whereas a GPU has several threads running simultaneously. A GPU thus utilizes parallel computing to increase the speed of training models.
1515

@@ -23,18 +23,11 @@ Due to large datasets, the CPU takes up a lot of memory while training the model
2323

2424
The High bandwidth, hiding the latency under thread parallelism and easily programmable registers makes GPU a lot faster than a CPU.
2525

26-
The new Intel Xeon phi processing chip
26+
#### The new Intel Xeon phi processing chip
2727

28-
It fetches and decodes instructions from four hardware thread execution contexts and has 4 clock latency, hidden by round-robin scheduling of threads
28+
It fetches and decodes instructions from four hardware thread execution contexts and has 4 clock latency, hidden by round-robin scheduling of threads. Each microprocessor core is a fully functional, in-order core capable of running IA instructions independently of the other cores. It has two pipelines and ring interconnection.
2929

30-
Each microprocessor core is a fully functional, in-order core capable of running IA instructions independently of the other cores
31-
32-
It has two pipelines and ring interconnection.
33-
34-
A comparison between the newly released Xeon phi processor and NVIDIA GeForce GTX 1080
35-
36-
37-
The TPU chip
30+
#### The TPU chip
3831

3932
The chip has been specifically designed for Google's TensorFlow framework. In Google Photos, an individual TPU can process over 100 million photos a day. Google's Cloud TPU is currently only in beta, offering limited quantities and usage. In machine learning training, the Cloud TPU is more powerful in performance and four times larger in memory capacity than Nvidia's best GPU Tesla V100.
4033

0 commit comments

Comments
 (0)