Quantization and pruning are key strategies for optimizing deep learning models for edge devices. These techniques reduce model size and computational complexity, enabling efficient inference on resource-constrained hardware like smartphones and IoT sensors. This unit covers various quantization methods, pruning strategies, and efficiency metrics. It also explores hardware considerations, practical applications, and future trends in model compression for edge AI. Understanding these concepts is crucial for deploying AI in real-world edge computing scenarios.