Using manifold structure for automatic image annotation by fusion of multiple feature spaces

Mohammad Ali Zare Chahooki, Hamid Kargar Shooroki


Automatic image annotation has been an active research topic in recent years. Low level features like as color, texture, shape as well as object spatial relations are extracted to represent images in general. These syntaxes are further used to retrieve images from large image data sets.  However, the similarity of images could not be found correctly by similarity measures such as Euclidean distance in many situations. On the other hand, graph models have been shown powerful in solving many machine learning problems in recent years. In this paper, we propose a graph-based learning approach, named Conceptual Manifold Structure (CMS), based on transition from conceptual to observation space. In the proposed method, a graph including both the trained and tested samples is constructed by fusion of multiple feature spaces. Conceptual transition in graph structure is found by altering the edge values in an innovative manner. This is caused to learn the manifold structure where the samples dissimilarity is closer to the conceptual distance. Furthermore, the continuity between the instances of a semantic in the conceptual space is kept in feature space. Keeping the continuity in manifold structure is the main idea to decrease the semantic gap in this study. The experiments on different image data sets indicated that the geometrical distances between the samples on the manifold space are closer to their conceptual distance. The proposed method has been compared to other well-known approaches. The results confirmed the effectiveness and validity of the proposed method.


Automatic image annotation, Manifold structure, Graph learning

Full Text:



Tang J., Qi G. J., M. Wang, and X. S. Hua, “Video semantic analysis based on structure-sensitive anisotropic manifold ranking”, Signal Processing, vol. 89 no. 12, pp. 2313-2323, 2009.

Tang J., Hua X. S., Qi G. J., Wang M., Mei T., and Wu X., Structure-Sensitive Manifold Ranking for Video Concept Detection, in: 15th international conference on Multimedia, 2007, pp. 852-861 .

J. Fan, Y. Gao, H. Luo, Integrating Concept Ontology and Multitask Learning to Achieve More Effective Classifier Training for Multilevel Image Annotation, IEEE Transactins on Image Processing 17 (2008) 407-426.

A. Oliva, A. Torralba, Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision 42 (2001) 145-175.

J. Ilonen, J. K. Kamarainen, H. Kalviainen, Y. L. Teknillinen, Efficient computation of Gabor features, Lappeenranta University of Technology, 2005.

J. Verbeek, M. Guillaumin, T. Mensink, C. Schmid, Image annotation with tagprop on the mirflickr set, in: International Conference on Multimedia information retrieval, 2010, pp. 537-546.

D. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2004) 91-110.

J. V. Weijer, C. Schmid, Coloring local feature extraction, Lecture Notes in Computer Vision 3952 (2006) 334-348.

B. Wang, F. Hu, J. C. Paul, Manifold-ranking based retrieval using k-regular nearest neighborgraph, Pattern Recognition 45 (2012) 1569-1577.

P. Duygulu, K. Barnard, N. Freitas, D. Forsyth, Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary, in: Seventh European Conference on Computer Vision, 2002, pp. 97-112.

J. Jeon, V. Lavrenko, R. Manmatha, Automatic image annotation and retrieval using cross-media relevance models, in: 26th Annual International ACM SIGIR, 2003, pp. 119-126.

R. Jin, J. Chai, L. Si, Effective automatic image annotation via a coherent language model and active learning, in: 12th Annual ACM Internatational Conference on Multimedia, 2004, pp. 892-899.

V. Lavrenko, R. Manmatha, J. Jeon, A model for learning the semantics of pictures, in: Advance Neutral Information Processing, 2003.

J. Liu, M.J. Li, W. Ma, Q. Liu, H.Q. Lu, An adaptive graph model for automatic image annotation, in: Eighth ACM International Workshop on Multimedia Information Retrieval, 2006, pp. 61-70.

G. Carneiro, A.B. Chan, P.J. Moreno, N. Vasconcelos, Supervised learning of semantic classes for image annotation and retrieval, IEEE Transaction on Pattern Analysis and Machine Intelligence 29 (2007) 394-410.

J. Liu, B. Wang, H. Lu, S. Ma, A graph-based image annotation framework, Pattern Recognition Letters 29 (2008) 407-415

J. Liua, M. Lib, Q. Liua, H. Lua, S. Ma, Image annotation via graph learning, Pattern Recognition 42 (2009) 218-228.

Z. Lu, Horace H. S. Ip, Y. Peng, Contextual Kernel and Spectral Methods for Learning the Semantics of Images, IEEE Transactins on Image Processing 20 (2011) 1739-1750.

X. Ke, Shaozi Li, Donglin Cao, A two-level model for automatic image annotation, Multimedia Tools and Applications (2011).

M. A. Z. Chahooki, N. M. Charkari, “Bridging the semantic gap for automatic image annotation by learning the manifold space”, Computer Systems Science and Engineering (2015) In Press.