Visual relationship detection and Scene Graph Generation using the Bi-Directional Gated Recurrent Unit

Monesh S.

doi:XX.XXX/IJARIIT-V7I5-1219

This paper is published in Volume-7, Issue-5, 2021

Paper Details
Abstract & PDF

Area

Computer Science

Author

Monesh S.

Org/Univ

Anna University, Chennai, Tamil Nadu, India

Pub. Date

16 September, 2021

Paper ID

V7I5-1219

Publisher

IJARIIT

Edition

Volume-7, Issue-5, 2021

Keywords

Bi-Directional GAN, Scene Graph, Faster CNN, Visual Relationship Prediction, Computer Vision, Deep Learning, Visual Genome Dataset

Citations

IEEE
Monesh S.. Visual relationship detection and Scene Graph Generation using the Bi-Directional Gated Recurrent Unit, International Journal of Advance Research, Ideas and Innovations in Technology, www.IJARIIT.com.

APA
Monesh S. (2021). Visual relationship detection and Scene Graph Generation using the Bi-Directional Gated Recurrent Unit. International Journal of Advance Research, Ideas and Innovations in Technology, 7(5) www.IJARIIT.com.

MLA
Monesh S.. "Visual relationship detection and Scene Graph Generation using the Bi-Directional Gated Recurrent Unit." International Journal of Advance Research, Ideas and Innovations in Technology 7.5 (2021). www.IJARIIT.com.

Give proper credits, use Citation.

Abstract

Visual relationship detection is an intermediate image understanding task that detects two objects and classifies a predicate that explains the relationship between two objects in an image. Relations among entities play a central role in image understanding. Due to the complexity of modeling (subject, predicate, object) relation triplets, it is crucial to develop a model that cannot only recognize seen relations but also generalize to unseen cases. Inspired by a previously proposed visual translation embedding model, the context-augmented translation embedding model can capture both common and rare relations. The previous Visual translation embedding model maps entities and predicates into a low-dimensional embedding vector space and learns the embeddings guided by the constraint predicate, union(subject, object). In addition to the framework is combined with a language model that learns which relationships are more suitable between pairs of object classes. The model solves the scene graph inference problem using the standard Bi-GRU model and learns to iteratively improve its predictions. Our joint model can take advantage of contextual cues to make better predictions on objects and their relationships from an image. This work explicitly models the objects and their relationships using scene graphs, a visually grounded graphical structure of an image. Where in the graph, the nodes are represented as the objects, and the directed vertex is represented as relationships. The experiments show that our model significantly outperforms previous methods on generating scene graphs using the Visual Genome dataset.

All content is copyright protected.