Efficiently tracking topic subgraph conductance in social networks: new paper accepted to VLDB

Blog Entry by: Amitabha Bagchi

I am happy to report that our paper on space-efficient tracking of meme-focussed subgraphs of social networks has been accepted to VLDB. The title of the paper is “Tracking the Conductance of Rapidly Evolving Topic-Subgraphs” and the story of it is something like this: A prominent line of research (including our own papers like this paper published at CIKM 2013, and, more recently, this paper which is currently under submission) has been arguing that it is possible to determine which memes are about to go viral in a social network by looking at the topological properties of the subgraph induced by the users talking about that meme. Specifically, in our CIKM 2013 paper we found that just around the time a topic is about to go viral, the conductance of the topic-subgraph takes a dip. So, if we compute the conductance of various topic-subgraphs that are evolving in the network we can spot the ones that go viral early. But how do we maintain this massive number of topic-subgraphs space efficiently and compute their conductance as they evolve? That is the question we try to address in this paper.

But first the question: what is conductance?

The conductance of a directed graph, ${G = (V,E)}$ is an isoperimetric quantity that provides a lower bound on the ratio of the outdegree of any set of vertices to their total degree. Formally, for any ${S \subseteq V}$, we define

$\displaystyle \phi_G(S) \triangleq \frac{\left|\{(u,v) \in E: u \in S, v \in V\setminus S\}\right|}{\left|\{(u,v) \in E : u \in S\}\right|},$

and we define the conductance of the graph ${G}$ as

$\displaystyle \phi_G \triangleq \min_{S \subset V} \phi_G(S).$

Conductance is a measure of how “expansive” or “close knit” a set of vertices of a graph is: higher conductance implies more outward connections for any cluster of vertices, lower conductance implies a more inward-looking cluster. As a result this quantity has very naturally found great applications in graph clustering (Kannan et. al., 2004). Since the conductance of a cluster can be easily measured in a setting where the graph can be efficiently stored, conductance has been widely used to measure the quality of a variety of clustering algorithms (see e.g., Leskovec et. al., 2010  for a survey of clustering algorithms that use conductance as a quality measure.) In fact, the clustering coefficient, another popular and natural measure of the goodness of clustering, has also been shown to be tightly related to the graph conductance for certain classes of networks, which reinforces the fundamental nature of conductance as a measure of cluster quality (Gleich and Seshadri, 2012). All this, and on top of that the recent discovery that the change in conductance is a good predictor of future virality.

But the question of how to compute conductance in a scenario where there are a large number of evolving topic-graphs to be maintained in parallel remains. And it is this problem, a vital bridge between the observations made in the literature and a possible real-time system that might put those observations into play, that we have tried to address in our new paper.

In doing so, we have used Bloom filters to come up with a new and compact representation of the adjacency lists of the nodes of a graph.  We call this new representation a BloomGraph. Given a node, the BloomGraph can’t return the list of its neighbours but it can tell you if another node is in the neighbourhood with one caveat, there may be false positives, i.e., it may say that two nodes are neighbours when in fact they are not. Does this affect the conductance computation? Can we give some theoretical bounds on how good or bad our approximate conductance is?  Have we done experiments on real data sets that show that the BloomGraph works well in practice? Answers: Yes it does, yes we can and yes we have. To find out the details, check out the paper.

At the IIT D side, the people involved were Sainyam Galhotra, a former BTech, Maya Ramanath and myself. Outside IIT D, we had Srikanta Bedathur from IBM IRL and Vidit Jain from American Express Big Data Labs. The paper will be presented in VLDB 2016 which is in New Delhi.