3.3 Complexity control in overlapping stochastic block models (Pierre Latouche)

Share:

Listens: 0

StatLearn 2012 - Workshop on "Challenging problems in Statistical Learning"

Education


Networks are highly used to represent complex systems as sets of interactions between units of interest. For instance, regulatory networks can describe the regulation of genes with transcriptional factors while metabolic networks focus on representing pathways of biochemical reactions. In social sciences, networks are commonly used to represent relational ties between actors. Numerous graph clustering algorithms have been proposed since the earlier work of Moreno [2]. Most of them partition the vertices into disjoint clusters depending on their connection profiles. However, recent studies showed that these techniques were too restrictive since most existing networks contained overlapping clusters. To tackle this issue, we proposed the Overlapping Stochastic Block Model (OSBM) in [1]. This approach allows the vertices of a network to belong to multiple classes and can be seen as a generalization of the stochastic block model [3]. In [1], we developed a variational method to cluster the vertices of networks and showed that the algorithm had good clustering performances on both simulated and real data. However, no criterion was proposed to estimate the number of classes from the data, which is a major issue in practice. Here, we tackle this limit using a Bayesian framework. Thus, we introduce some priors over the model parameters and consider variational Bayes methods to approximate the full posterior distribution. We show how a model selection criterion can be obtained in order to estimate the number of (overlapping) clusters in a network. On both simulated and real data, we compare our work with other approaches.