CUDA Kernel Optimization Techniques | Parallel and Distributed Computing Class Notes