Contest description

In Pattern Recognition, statistical and structural methods have been traditionally considered as two rather separate worlds. However, many attempts at reducing the gap between these two approaches have been done. The idea inspiring these attempts is that of preserving the advantages of an expressive structural representation (such as a graph), while using most of the powerful, vector-based algorithms from Statistical Pattern Recognition.

A possible approach to this issue has been recently suggested by Graph Embedding. The latter is a methodology aimed at representing a whole graph (possibly with attributes attached to its nodes and edges) as a point in a suitable vectorial space. Of course the relevant property is that of preserving the similarity of the graphs: the more two graphs are considered similar, the closer the corresponding points in the space should be.

Graph embedding, in this sense, is a real bridge joining the two worlds: once the object at hand has been described in terms of graphs, and the latter represented in the vectorial space, all the problems of matching, learning and clustering can be performed using classical Statistical Pattern Recognition algorithms.

The proposed contest aims at evaluating the effectiveness of the embedding process in the contest of a classical Pattern Recognition application.

Contest data and performance evaluation

The contest will be performed on three databases of graphs obtained from different sets of 2D images. Each database will be partitioned into a training set, available to the participants, and a test set, that will be disclosed only at the moment of the performance evaluation.

Participants will have to submit a command line application (all the most popular operating systems will be supported) that takes an input file containing a set of graphs and produces an output file. Two kinds of output formats will be supported:

  • for explicit graph embedding , the output file must contain a set of vectors (one vector per graph). The cardinality of the output vectors is chosen by the application, with the only constraint that all the vectors in an output file are required to have the same cardinality.
  • For implicit graph embedding , the output file must contain the scalar product of each possible pair of graphs. Hence the application does not need to compute an actual vector, and can use for instance a graph kernel to define a scalar product.

Each database will be divided into classes on the basis of the contents of the images used to generate the graphs. For performance evaluation, the vectors produced by applying each algorithm to the test sets will be grouped into clusters according to the corresponding class, and then a cluster validity index will be computed. To this aim the euclidean distance between the vectors will be used (for the implicit graph embedding, the euclidean distance can be easily computed from the scalar products). The cluster validity index will be chosen so as to be independent of the absolute scale of the vectors.

The cluster validity index will be averaged over the three databases; then the algorithms will be ranked according to the values assumed by a performance function (known at the contest time) which depends on the cluster validity index and the execution time (so as to take into account both the classification results and the computational cost, although this latter will have a lower weight than the former).

Participants will receive, together with the graphs composing the training sets and a detailed description of the input and output file formats, a program that computes the actual performance index on the training set, in order to check and tune their algorithm.