Traditional AI models for text-to-image generation often lack the ability to accurately customize clothing items in a photorealistic manner. This limitation is particularly pronounced in applications where users require personalized imagery (e.g., showing a specific type of hoodie or jacket on a person) with high visual fidelity and consistent style. Additionally, managing and processing diverse datasets, along with ensuring compatibility across evolving deep learning libraries, adds further complexity.
The Solution
The project developed a hand gesture recognition system using a VGG16-based CNN trained on 1,300+ ASL images across 38 classes, addressing challenges like high dimensionality (32×32 pixel inputs) and spatial dependency through convolutional layers. Training included techniques like dropout, batch normalization, and data augmentation to mitigate overfitting, with early stopping after 25 epochs of no improvement. Despite training accuracy reaching 100% in some epochs, validation accuracy peaked at 73.4%, indicating a persistent gap. The model leveraged a hybrid approach combining CNN features with logistic regression on hand landmarks, optimized for real-time inference ( 50ms/frame) via lightweight architecture and Mediapipe integration.