RICA: Real-Time Image Captioning Application

Suraj Dahake; Aditya Ohekar; Shubham Ilag; Aasim Shah

doi:XX.XXX/IJARIIT-V7I3-1729

This paper is published in Volume-7, Issue-3, 2021

Paper Details
Abstract & PDF

Area

Information Technology

Author

Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah

Org/Univ

Datta Meghe College of Engineering, Navi Mumbai, Maharashtra, India

Pub. Date

10 June, 2021

Paper ID

V7I3-1729

Publisher

IJARIIT

Edition

Volume-7, Issue-3, 2021

Keywords

Real-Time, Captioning, Image, CNN, Tensorflow

Citations

IEEE
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah. RICA: Real-Time Image Captioning Application, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah (2021). RICA: Real-Time Image Captioning Application. International Journal of Advance Research, Ideas and Innovations in Technology, 7(3) www.IJARIIT.com.

MLA
Suraj Dahake, Aditya Ohekar, Shubham Ilag, Aasim Shah. "RICA: Real-Time Image Captioning Application." International Journal of Advance Research, Ideas and Innovations in Technology 7.3 (2021). www.IJARIIT.com.

Give proper credits, use Citation.

Abstract

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. Image caption generator is a task that involves computer vision and natural language processing concepts to recognize the context of an image and describe them in a natural language like English. The recent advances in Deep Learning-based Machine Translation and Computer Vision have led to excellent Image Captioning models using advanced techniques like Deep Reinforcement Learning. While these models are very accurate, these often rely on the use of expensive computation hardware making it difficult to apply these models in real-time scenarios, where their actual applications can be realized. In this paper, we carefully follow some of the core concepts of Image Captioning and its common approaches and present our simplistic encoder and decoder-based implementation with significant modifications and optimizations which enable us to run these models on low-end hardware of hand-held devices. We also compare our results evaluated using various metrics with state-of-the-art models and analyze why and where our model trained on the MSCOCO dataset lacks due to the trade-off between computation speed and quality. Using the state-of-the-art TensorFlow framework by Google, we also implement a first-of-its-kind Android application to demonstrate the real-time applicability and optimizations of our approach.

All content is copyright protected.