大规模并行处理器程序设计

2023-03-01 15:40:27 百科 ℃

《大规模并行处理器程序设计》是2010年刻架落探向思划和清华大学出版社出版的图书，作者是柯克。

书名大规模并行处理器程序设计
作者柯克
ISBN 9787302229735
类别程序设计
定价 36.00元

内容简介

　　本书介绍了并行程序设计与GPU体距煤议系结构的基本概念来自，并详细探讨了用于构建并行程序的各种技术，用案例演示了并行程序设计的整个开发过程，360百科即从并行计算的思想开始，直到最终实现实际且高效的并行程序。

　　本书特点

　　介绍了并行计算的思想，使得读者可以把这种问题的思考方式渗透到高性能并行计算中周们优何看去。

　　介绍了CUDA的福后考答比封使用，CUDA是NVID选知即于际批船做常IA公司专门为大规模并行环境创建的一种软件开发工具。

　　介绍如何使用CUDA编程模式和OpenCL来获得圆轻质色丝吃思高性能和高可靠性。

目录

　　内处怕温族地牛何Preface

　　Acknowledgments

　　Dedication

　　CHAPTER 1 INTRODUCTION

　　1.1 GPUs as Parallel Computers

　　1.2 Architecture of a Modern GPU

　　1.3 Why More Speed or Parall也知告企请存穿elism?

　　1.矛存击乐报望铁4 Parallel Programming Languages and Models

　　1.5 Overarching Goals

　　1.6 Organization of the Bo为宜攻压额演肉ok

　　CHAPTER 2 HISTORY OF GPU COMPUTING

　　2.1 Evolution of Graphi是光觉还另律氢满丰cs pipelines

　　2.1.1 The Era of Fixed-Function 切有Graphics Pipelines

　　2.1.2 Evolutio根苗伟尼触n of Programmable Real-Time Graphics

　　2.1.3 Unified Graphics and Co可视收研击mputing Process久参ors

　　2.1.4 GPGPU: An Intermediate Step

　　2.2 GPU Computing

　　2.2.1 Scalable GPUs

　　2.2.2 Recent Developments

　　2.3 Future Trends

　　C月列站见HAPTER 3 INTRODUCTION TO CUDA

　　3.1 Data Parallelism

　　3.2 CUDA P飞对rogram Structure

　　3.3 A M克字atrix-Matrix Multiplication Example

　　3.4 Device Memories and Data Transfer

　　3.5 Kernel Functions and Threading

　绝展考集宣　3.6 Summary

　　3.6.1 Function declarations

　　3.6.2 Kernel launch

　　3.6.3 Predefined variables

　　3.6.4 Runtime APl

　　CHAPTER 4 CUDA THREADS

　　4.1 CUDA Thread Organization

　　4.2 Using b]ockldx and threadIdx

　　4.3 Synchronization and Transparent Scalability

　　4.4 Thread Assignment

　　4.5 Thread Scheduling and Latency Tolerance

　　4.6 Summary

　　4.7 Exercises

　　CHAPTER 5 CUDATM MEMORIES

　　5.1 Importance of Memory Access Efficiency

　　5.2 CUDA Device Memory Types

　　5.3 A Strategy for Reducing Global Memory Traffic

　　5.4 Memory as a Limiting Factor to Parallelism

　　5.5 Summary

　　5.6 Exercises

　　CHAPTER 6 PERFORMANCE CONSIDERATIONS

　　6.1 More on Thread Execution

　　6.2 Global Memory Bandwidth

　　6.3 Dynamic Partitioning of SM Resources

　　6.4 Data Prefetching

　　6.5 Instruction Mix

　　6.6 Thread Granularity

　　6.7 Measured Performance and Summary

　　6.8 Exercises

　　CHAPTER 7 FLOATING POINT CONSIDERATIONS

　　7.1 Floating-Point Format

　　7.1.1 Normalized Representation of M

　　7.1.2 Excess Encoding of E

　　7.2 Representable Numbers

　　7.3 Special Bit Patterns and Precision

　　7.4 Arithmetic Accuracy and Rounding

　　7.5 Algorithm Considerations

　　7.6 Summary

　　7.7 Exercises

　　CHAPTER 8 APPLICATION CASE STUDY: ADVANCED MRI RECONSTRUCTION

　　8.1 Application Background

　　8.2 Iterative Reconstruction

　　8.3 Computing FHd

　　Step 1. Determine the Kernel Parallelism Structure

　　Step 2. Getting Around the Memory Bandwidth Limitation.

　　Step 3. Using Hardware Trigonometry Functions

　　Step 4. Experimental Performance Tuning

　　8.4 Final Evaluation

　　8.5 Exercises

　　CHAPTER 9 APPLICATION CASE STUDY: MOLECULAR VISUALIZATION AND ANALYSIS

　　CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING

　　CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM

　　CHAPTER 12 CONCLUSION AND'FuTuRE OUTLOOK

　　APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE

　　APPENDIX B GPU COMPUTE CAPABILITIES

　　Index

点击展开全文

标签：

声明：此文信息来源于网络，登载此文只为提供信息参考，并不用于任何商业目的。如有侵权，请及时联系我们：yongganaa@126.com

评论留言

我要留言

昵称：*

邮箱：

网址：

验证码*

看不清楚,点击刷新

内容：

◎欢迎参与讨论，请在这里发表您的看法、交流您的观点。

声明：此文信息来源于网络，登载此文只为提供信息参考，并不用于任何商业目的。如有侵权，请及时联系我们：yongganaa@126.com