This paper is published in Volume-7, Issue-5, 2021
Area
Computer Science
Author
Monesh S.
Org/Univ
Anna University, Chennai, Tamil Nadu, India
Pub. Date
16 September, 2021
Paper ID
V7I5-1219
Publisher
Keywords
Bi-Directional GAN, Scene Graph, Faster CNN, Visual Relationship Prediction, Computer Vision, Deep Learning, Visual Genome Dataset

Citationsacebook

IEEE
Monesh S.. Visual relationship detection and Scene Graph Generation using the Bi-Directional Gated Recurrent Unit, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Monesh S. (2021). Visual relationship detection and Scene Graph Generation using the Bi-Directional Gated Recurrent Unit. International Journal of Advance Research, Ideas and Innovations in Technology, 7(5) www.IJARIIT.com.

MLA
Monesh S.. "Visual relationship detection and Scene Graph Generation using the Bi-Directional Gated Recurrent Unit." International Journal of Advance Research, Ideas and Innovations in Technology 7.5 (2021). www.IJARIIT.com.

Abstract

Visual relationship detection is an intermediate image understanding task that detects two objects and classifies a predicate that explains the relationship between two objects in an image. Relations among entities play a central role in image understanding. Due to the complexity of modeling (subject, predicate, object) relation triplets, it is crucial to develop a model that cannot only recognize seen relations but also generalize to unseen cases. Inspired by a previously proposed visual translation embedding model, the context-augmented translation embedding model can capture both common and rare relations. The previous Visual translation embedding model maps entities and predicates into a low-dimensional embedding vector space and learns the embeddings guided by the constraint predicate, union(subject, object). In addition to the framework is combined with a language model that learns which relationships are more suitable between pairs of object classes. The model solves the scene graph inference problem using the standard Bi-GRU model and learns to iteratively improve its predictions. Our joint model can take advantage of contextual cues to make better predictions on objects and their relationships from an image. This work explicitly models the objects and their relationships using scene graphs, a visually grounded graphical structure of an image. Where in the graph, the nodes are represented as the objects, and the directed vertex is represented as relationships. The experiments show that our model significantly outperforms previous methods on generating scene graphs using the Visual Genome dataset.