Sparse Attribute Dictionary




We present an approach for dictionary learning of attributes via information maximization. We unify the class distribution and appearance information into an objective function for learning a sparse attribute dictionary. The objective function maximizes the mutual information between what has been learned and what remains to be learned in terms of appearance information and class distribution for each dictionary atom. We propose a Gaussian Process (GP) model for sparse representation to optimize the dictionary objective function. The sparse coding property allows a kernel with a compact support in GP to realize a very efficient dictionary learning process. Hence we can describe a siginal, an image or a video, by a set of compact and discriminative attributes. More importantly, we can recognize modeled signal categories in a sparse feature space, which can be generalized to unseen and unmodeled visual categories. Experimental results demonstrate the effectiveness of our approach in action recognition applications.




Sparsecode Plot Figure: Sparse representations of four actions (two are known and two are unknown to the attribute dictionary) using attribute dictionaries learned by different methods. Each action is performed by two different humans. For visualization purpose, each waveform shows the average of the sparse codes of all frames in an action sequence. We learned attribute dictionaries using several methods including our approach, the Maximization of Entropy approach (ME), the Liu-Shah ap-proach [2] and the K-means approach. A compact and discriminative attribute dictionary should encourage actions from the same class to be described by a similar set of attributes, i.e., similar sparse codes. The attribute dictionary learned by our approach provides similar waveforms, which shows consistent sparse representations, for the same class action sequences.


gesture recognition accuracy Figure: Recognition accuracy on the Keck gesture dataset with different features and dictionary sizes (shape and motion are global features. STIP is a local feature.). In all cases, the proposed MMI-2 (red line) outperforms the rest.


UCF sports action recognition accuracy Figure: Confusion matrix for UCF sports action dataset. We obtain 83.6% average recognition accuracy


Demos pdf

  • Demo 1: dictionary atom selection

    dict init Figure: Dictionary initialized using k-SVD [3]


    dict mmi Figure: Dictionary atom selected using MMI [1]


    dict mmi2 Figure: Dictionary atom selected using MMI2 [1]


    dict me Figure: Dictionary atom selected using ME [1]


    dict Liu-Shah Figure: Dictionary atom selected using Liu-Shah [2]


  • Demo 2: video summarization

    gesture seq Figure: A gesture video sequence


    summary mmi Figure: video summarization using MMI [1]


  • Demo 3: shape sampling

    mpeg7 shape Figure: 10 classes from MPEG shape dataset


    mpeg7 mmi Figure: Top-10 shapes sampled using MMI [1]


    mpeg7 mmi Figure: Top-10 shapes sampled using ME [1]


    If you happen to use the source code or other files provided by this webpage, please cite the following paper:

    Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, "Sparse Dictionary-based Representation and Recognition of Human Action Attributes", International Conference on Computer Vision (ICCV) 2011 pdf

    If you have any questions about this souce code, please contact: Sam Q. Qiu (qiu@cs.umd.edu)



    Reference

    [1] Qiang Qiu, Zhuolin Jiang, and Rama Chellappa, "Sparse Dictionary-based Representation and Recognition of Human Action Attributes", International Conference on Computer Vision (ICCV) 2011

    [2] J. Liu and M. Shah. Learning human actions via information maximization, 2008. CVPR

    [3] M. Aharon, M. Elad, and A. Bruckstein. k-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. on Signal Process., 54(1):4311–4322, 2006