Abstract
This paper presents new image descriptors based on color, texture, shape, and wavelets for object and scene image classification. First, a new three Dimensional Local Binary Patterns (3D-LBP) descriptor, which produces three new color images, is proposed for encoding both color and texture information of an image. The 3D-LBP images together with the original color image then undergo the Haar wavelet transform with further computation of the Histograms of Oriented Gradients (HOG) for encoding shape and local features. Second, a novel H-descriptor, which integrates the 3D-LBP and the HOG of its wavelet transform, is presented to encode color, texture, shape, as well as local information. Feature extraction for the H-descriptor is implemented by means of Principal Component Analysis (PCA) and Enhanced Fisher Model (EFM) and classification by the nearest neighbor rule for object and scene image classification. And finally, an innovative H-fusion descriptor is proposed by fusing the PCA features of the H-descriptors in seven color spaces in order to further incorporate color information. Experimental results using three datasets, the Caltech 256 object categories dataset, the UIUC Sports Event dataset, and the MIT Scene dataset, show that the proposed new image descriptors achieve better image classification performance than other popular image descriptors, such as the Scale Invariant Feature Transform (SIFT), the Pyramid Histograms of visual Words (PHOW), the Pyramid Histograms of Oriented Gradients (PHOG), Spatial Envelope, Color SIFT four Concentric Circles (C4CC), Object Bank, the Hierarchical Matching Pursuit, as well as LBP.
Original language | English (US) |
---|---|
Pages (from-to) | 173-185 |
Number of pages | 13 |
Journal | Neurocomputing |
Volume | 117 |
DOIs | |
State | Published - Oct 6 2013 |
Externally published | Yes |
All Science Journal Classification (ASJC) codes
- Computer Science Applications
- Cognitive Neuroscience
- Artificial Intelligence
Keywords
- H-descriptor
- H-fusion descriptor
- Object and scene image classification
- Pyramid histograms of visual words (PHOW)
- Scale invariant feature transform (SIFT)
- Three dimensional local binary patterns (3D-LBP) descriptor