IIT Delhi at AAAI 2016

The thirtieth edition of AAAI’s flagship conference AAAI-2016 will be held in February and IIT Delhi will be well represented there. Three of the accepted papers have IIT Delhi authors. These are:

Scalable Training of Markov Logic Networks using Approximate Counting
Somdeb Sarkhel, Deepak Venugopal, Tuan Anh Pham, Parag Singla and Vibhav Gogate.

Summary: In this paper, we propose principled weight learning algorithms for Markov logic networks that can easily scale to much larger datasets and application domains than existing algorithms. The main idea in our approach is to use approximate counting techniques to substantially reduce the complexity of the most computation intensive sub-step in weight learning: computing the number of groundings of a first-order formula that evaluate to true given a truth assignment to all the random variables. We derive theoretical bounds on the performance of our new algorithms and demonstrate experimentally that they are orders of magnitude faster and achieve the same accuracy or better than existing approaches. This paper comes from a collaboration between IITD and UT Dallas.

Numerical Relation Extraction with Minimal Supervision.
Aman Madaan, Ashish Mittal, Mausam, Ganesh Ramakrishnan, Sunita Sarawagi.

Summary: Standard relation extraction focuses on relations where both arguments are entities like (Obama, president of, United States). This paper studies numerical relations, i.e., those where one argument is a quantity, e.g., (India, inflation rate, 4.41%). This is an important subclass that has not received much attention. Novel challenges arise in the case of numerical relation extraction. The paper identifies the challenges and presents two (one rule-based, and one probabilistic graphical model based) extractors for the problem. This paper comes from a collaboration between IITD and IITB.

Reactive Learning: Active Learning with Relabeling
Christopher H. Lin, Mausam, Daniel S. Weld.

All approaches to active learning label a data point only once. However, in a crowdsourced setting, the same training data point may be labeled multiple times to increase the confidence in the annotated label. In a budget-limited active learning scenario there is a tradeoff between labeling new training data points and increase the size of training set, and labeling existing data points and increase the accuracy of the training set. Existing algorithms such as uncertainty sampling can be stuck in infinite loops for this problem. We present a new set of algorithms called impact sampling and obtain excellent results on simulated and real-world datasets. This paper comes from a collaboration between IITD and UW.

Congratulations to all the authors! We’ll be asking them to share their slides and links to the final versions of the papers when these become available.

There is no such thing as a free lunch: Some thoughts on Free Basics

Blog post by: Amitabha Bagchi

Earlier today the founder of Facebook visited our institute and made a strong pitch for a service that provides Internet access for “free” to underserved people in countries like India. This service is now called Free Basics and was earlier known as Internet.org.

The arguments in favour of such a service are evident: they are in fact the arguments in favour of connecting underprivileged people to the Internet. The arguments against this service generally come from the direction of what is known as “Net neutrality,” an idea that all Internet Service Providers must give all web services an equal playing field. I am not looking to further that argument here, although I think it has some merit.

Instead I want to consider another issue, more relevant from a data science perspective. Clearly all the users who access the Internet through Free Basics will be creating long and rich histories of usage that the service will log. This data can and, sooner or later, will be monetised by the owner of the data, which will be Facebook. For now let us assume (and this is a huge assumption) that the privacy of these people as individuals will not be breached. Let us assume that the monetisation will be done using data aggregates that do not leak data (another massive and utopian assumption), even then the value inherent in this data, data generated by Indian users, will be Facebook’s.

Now, it is fair enough that if Facebook provides a service they get something in return. Except there are two problems.

  1. Why is the “free” nature of the service being extolled at such length? Why is it not being made clear that user data is a (natural?) resource that is being generated and is passing into the hands of the service provider?
  2. If user data is a resource that is going into the hands of Facebook that’s okay, but why is no attempt being made to value it and why is the Indian state not a beneficiary to some extent the way it is when say coal licences are given, or spectrum is allocated?

I am completely onboard with the idea that monetising this data may take a lot of effort and high tech, and a lot of time, and the entity that is able to invest the money and effort needed to monetise it should benefit. Of course it should. But shouldn’t the community that generated this data also benefit from the value inherent in the data? Isn’t that the logic applied to natural resources as well?

Twitter celebrities work featured in the media

Amit Ruhela’s thesis work on celebrities has been written up in The Telegraph this past weekend. The key finding of the paper appears as a quote from Aaditeshwar Seth in this news report:

“Just because more people receive tweets from a celebrity doesn’t mean they’ll themselves re-tweet something because it’s come from a celebrity,” Seth said.

Here is the citation of (and a link to) the paper that forms the basis for the news report:

Amit Ruhela, Amitabha Bagchi, Anirban Mahanti, Aaditeshwar Seth.
The rich and middle classes on Twitter: Are popular users indeed different from regular users?
To appear in Comput. Commun.

The news report also mentions in passing an earlier work that Amit and now graduated PhD Rudra Mohan Tripathy did on the spread of information in Twitter that was published in CIKM 2013.

It’s good to see that the work done in our department is going out into the world.

Some ethical issues surrounding big data

The big news of the day is that the EU courts have struck down the EU-US safe harbor law that allows companies to transfer personal data of European individuals to the US. One interesting aspect of all of this is that the ruling is the culmination of a process initiated by a term paper written by privacy advocate Max Schrems when he was law school.

Privacy is one of a set of ethical issues that surround the use of data by businesses and governments. For some other recent writings on ethical issues related to data, check out these links:

MacArthur Foundation recognises data scientist Christopher Ré

The MacArthur Fellows Program that annually “celebrates and inspires the creative potential of individuals through no-strings-attached fellowships” has named computer scientist Christoper Ré from Stanford University as one of their fellows for the year 2015. To quote from the citation:

Christopher Ré is a computer scientist democratizing big data analytics through theoretical advances in statistics and logic and groundbreaking data-processing applications for solving practical problems. Ré has leveraged his training in databases and deep knowledge of machine learning to create an inference engine, DeepDive, that can analyze data of a kind and at a scale that is beyond the current capabilities of traditional databases.

Does the world need intelligent weapons?

Blog entry by: Mausam.

Recently a large number of researchers all across the world signed a petition  for stopping research and development of intelligent weapons. In addition to prominent AI researchers like Stuart Russell (Berkeley) and Eric Horvitz (Microsoft Research), several other famous supporters and detractors of AI such as Stephen Hawking, Noam Chomsky and Elon Musk are in the list of almost 20,000 signatories. I signed it too.

The International Joint Conference of Artificial Intelligence 2015 (IJCAI) had a full panel dedicated to the issue. Several press releases and newspaper articles discussed the issue. Since then AI technology has become a matter of an even larger debate. (It has always been one of a tech-debater’s first loves!) Is all of AI bad? Why are scientists making such weapons? Why do people do AI research at all? Will AI some day become synonymous with weapons the way nuclear energy has almost become today? Will robots some day control or kill the whole of mankind?

The questions are complex and the arguments many. People in favor of AI weapons make some reasonable points. Many often give the bomb analogy – bombs are great if used as dynamite to make roads, but bad if used to kill people. The central argument is that technology by itself is value neutral, and should not be subjected to an inherent value; its usage or application, however, is value-laden. This kind of an argument typically absolves research and researchers, transferring the blame entirely onto policymakers, who we all know as the most just and objective people who ever lived…not! (For a funny but highly discomforting account of U.S. policies on drone operations see this John Oliver video on dronesThere are people who are scared of blue skies, because currently drone strikes only happen in clear weather; but, of course, it’s only a matter of time.)

Others argue that intelligent weapons will have the ability to track and kill just the right persons and reduce mass destruction – that can only be good news. I personally find a lot of merit in this argument – while killing is mostly bad, killing innocents is sinful. Most of us are comfortable killing those who don’t value human life as long as we ensure that the innocent aren’t hurt. Finally, a few others, usually policymakers and nationalists, love to harp on the defending the country against terrorists and neighbors arguments. I may not like the rhetoric, but are they misplaced in their assessment? Indeed, neither have gender crimes nor religious terrorism nor national vendettas seem to have reduced off late. Everyone wants a sense of safety and security. So, then, why not? But, I believe, it is important to look at these questions from a long term and holistic perspective. There will be interim consequences of intelligent weapons and then there will be the long term ones.  

In the interim the country making such weapons will definitely have an edge. It will have technology that could prove to be the difference between success and failure, victory and defeat, life and death. But it will also be a technology that will lower the barrier of waging a war. Why do we avoid wars? Because not only do they kill them, they kill us too. This dissuades both countries, even the stronger one. What happens after intelligent weapons? Well, that would mean practically no or little destruction to those countries. That would allow, in fact, encourage, impulsive actions without  negative consequences. With the fear of losing their own soldiers gone, is it not possible that governments could be tempted into retaliation over small provocations? This could lead to many more wars.

As a citizen of a developing country this bleak future scares me deeply. What if we all were living in the same fears that the people worried about drone strikes are currently living in? What if most countries lost their ability to defend themselves because a few developed countries developed ruthless weapons, which didn’t need their citizens to go and man the warfront? And they went on a traveling imperialist problem instead of a traveling salesman one; and the last few hundred years of history repeated itself. Of course, this account is somewhat hyperbolic but the dangers are real. Lowering the entry barrier for waging a war cannot be good for anyone!

But, even if,  this future where a small number of nations own such technology is  short-lived, there are issues. Intelligence is much more a software phenomenon than a hardware one. The weapons will slowly become readily available, as it is not easy to control piracy of software technology. Slowly, everyone will start owning such weapons. Not just countries, but also individuals, since these will be easy to obtain, low danger to oneself, cheap pieces of equipment.  Imagine U.S. gun control laws gone wilder! The good will have them, the bad will also have them and so will the ugly. These will not be like the super expensive nuclear weapons, which are carefully guarded by all countries (for fun, watch this eye-opening video on U. S. security of their nuclear weapons). The barrier to use weapons by everyone will have reduced now.

Just the thought of such a world makes me shudder. It is hard to visualise the exact outcome of such an arms race. But, it seems that the far-fetched science fiction of a coordinated army of robots taking over and killing all humans could become a reality. The main difference, however, will be that it won’t be caused by the intentions of robots; instead, it would be caused by people using weapons as slaves to further their individual interests. 

Artificial intelligence has continually given us technology with amazing value to people. Machine learning has led to novel scientific discoveries, aided in design of drugs, rejected fraudulent financial transactions, and much more. Intelligent web search has allowed us to access hoards of information at our fingertips. Social networks have brought long lost friends together. Other subfields of AI have proven mathematical theorems, led to enhanced border security, and assisted in inter-planetary travel. The virtues of AI technology have been many and far reaching. But the application to automated weaponry has the danger to destroy it all. The onus is upon us to take corrective measures and make the world a better place to live in.

(This article is written after discussions with Dan Weld and Subbarao Kambhampati, and also after watching a presentation by an IIT Delhi student, Akash Tanwar. It only reflects the current views of the writer.)

How to give a technical talk

Blog entry by: Amitabha Bagchi.

A momentous day in the semester’s calendar comes to an end: midterm presentation day. Inevitably in the course of the day’s presentations the thought comes that these presentations aren’t only to evaluate the progress of the projects but also to help our students learn how to present a technical talk to an audience. But the end of a long day of presentations isn’t a good time to write out my own ideas on how to give a good talk. There is just about enough energy to provide some links to resources on the Web that will repay a half hour of attentive reading with a lifetime of improved communication skills.