Abstract
In today's digital age, AI-driven image colorization has gained significant attention, particularly for restoring historical black-and-white photographs. This study investigates the effectiveness of modified U-Net architectures, enhanced with multi-attention mechanisms and pre-trained embeddings, in improving the quality and accuracy of image colorization. By leveraging U-Net and Generative Adversarial Networks (GANs), we evaluate the performance of these enhanced architectures in generating realistic colorized images and provide insights into the impact of generative techniques on colorization outcomes.
Proposed Method
This research employed a quantitative and qualitative experimental design to evaluate how different U-Net variations impact image colorization. We implemented six models divided into two groups:
- Group A (Non-GAN): Plain U-Net, U-Net + MobileNetV3, and U-Net + Multi-Attention.
- Group B (GAN-integrated): The same three architectures with an integrated PatchGAN discriminator.
The models were trained on 10,000 images from the COCO Dataset. All images were pre-processed by resizing to 128x128 pixels and converting to the perceptually uniform LAB color space to separate lightness (L) from color information (A, B), which is optimal for colorization tasks.
Results & Discussion
The models were trained for 25 epochs using the Adam optimizer. Performance was evaluated using PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity Index Measure). The results showed that the **Plain U-Net model achieved the best performance** with a PSNR score of **24.198** and an SSIM of **0.9153**, outperforming even the more complex GAN-integrated architectures.
This suggests that while theoretically beneficial, the inclusion of GANs did not guarantee improved performance in our setup and resulted in slightly lower quantitative outcomes, likely due to increased computational load and potential training instabilities.