ConvAI3: Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)

SCAI workshop @ EMNLP 2020

Dialogue systems and conversational agents – including chatbots, personal assistants and voice control interfaces – are becoming increasingly widespread in our daily lives. NIPS is sponsoring an open competition to create a chatbot that can hold an intelligent conversation with a human partner.

Join our efforts to build an open dataset for the development of the next generation conversational AI

Participate as a human evaluator volunteer
Volunteer

Final Leaderboard of the NIPS Round 2017

Rank BOT FINAL SCORE
1-2* bot#1337 2.746
1-2* poetwannabe 2.536
3 kAIb 2.105
4 RLLChatBot 1.905
5 PolyU 1.5
6 DeepTalkHawk 1.229
Human 3.8

Workshop at NIPS

The Conversational Intelligence Challenge section on NIPS 2017 Competition Track Workshop

Press release

Press release for the Conversational Intelligence Challenge

Leaderboard of the pre-NIPS round

Leaderboard of the Human Evalutation Round

Dialogue dataset

Warning! Some dialogs cound contain words or language that considered profane, vulgar, or offensive by some readers. The dialogs are partially automatically generated and partially generated by anonymous volunteers, so the organizers of ConvAI are not responsible for dialogs contents. It you have spotted a serious issue, please contact organizers about it by email info@convai.io.

Data collected during the Human Evaluation Round

Overview of the Competition

Recent advances in machine learning have sparked a renewed interest for dialogue systems in the research community. In addition to the growing real-world applications, the ability to converse is closely related to the overall goal of AI. This NIPS Live Competition aims to unify the community around the challenging task: building systems capable of intelligent conversations. Teams are expected to submit dialogue systems able to carry out intelligent and natural conversations about specific news articles with humans. At the final stage of the competition participants, as well as volunteers, will be randomly matched with a bot or a human to chat and evaluate answers of a peer. We expect the competition to have two major outcomes: (1) a measure of quality of state-of-the-art dialogue systems, and (2) an open-source dataset collected from evaluated dialogues.

Timeline

Date Milestone
April, 2017 Registration is open. Registered teams are allowed to submit applications and solutions for the Qualification Round.
25th of June, 2017 Qualification Round is closed.
2nd of July, 2017 Notification of acceptance. Teams for the Human Evaluation Round are selected.
24th-30th of July, 2017 Human Evaluation Round. One week NLP summer school - DeepHack Turing. Teams, school participants and volunteers evaluate solutions of qualified teams. Evaluation is done remotely via a messaging platform.
1st of September, 2017 1st Dataset is published. Data collected at the Human Evaluation Round is published. Dataset is integrated with the ParlAI framework for the efficient training of models. Teams tune their solutions on the 1st Dataset.
12th of November, 2017 Submission of conversational agents is closed. Teams submit final solutions for the NIPS Live Competition.
20th of November - 3rd of December, 2017 Pre-NIPS Human Evaluation. Teams and volunteers start to evaluate solutions of teams. Evaluation is done remotely via a messaging platform.
4th-9th of December, 2017 Conversational Intelligence Live Competition at NIPS. Teams, conference participants and volunteers continue to evaluate solutions of teams. Announcement of winners.

Competition Rules

Competition rounds

The competition consists of four rounds.

  1. Qualification round. Registered participants submit an application consisting of two parts. (1) Proposal describing details of scientific approach and statement of work including milestones, resources, and other relevant information (up to 1 page and an unlimited number of pages for references), in addition provide a system architecture and relevant technical information (maximum 2 page PDF). (2) Reference to either your submission on SQuAD or MS MARCO leaderboards, or link to your chatbot for testing. Submission of results is closed on the 25th of June, 2017. Notification of acceptance till 2nd of July, 2017.
  2. Human Evaluation Round. Members of selected teams are invited to participate in a week long NLP summer school by giving a talk on their research. Participation can be on site or remote. During the school week members of teams, school participants, and volunteers recruited via the competition web page evaluate the submitted dialogue systems on the competition task. At the end of Human Evaluation Round, up to 10 teams are selected for the NIPS Live Competition. Every team is required to evaluate at least 150 dialogues during the Round. Evaluation is done remotely via a messaging platform.
  3. Tuning Round. Dataset of rated dialogues collected during the Human Evaluation Round is open sourced and can be used by participating teams to tune their solutions. Models can be trained on this dataset via the ParlAI framework.
  4. NIPS Round. Starting two weeks before the NIPS conference teams and volunteers perform evaluation of submitted dialogue systems. At the beginning of NIPS the conference participants are invited to volunteer in evaluation of teams’ solutions adjusted over the Tuning Round. Final rating of submissions is presented on the Competition session at NIPS.

Task

Both human evaluators and dialogue agents complete the same task.

  1. Connect randomly with a peer. The peer might be a chat bot or other human user. No information about identity of the peer is provided.
  2. Both parties are given a text of a recent news/wikipedia article.
  3. Discuss content of the article with the peer as long as you wish.
  4. Choose another news/wikipedia article and/or anonymous peer.

Evaluation

  1. Evaluator will not be given any information about identity of the peer.
  2. Members of the team will be automatically excluded from evaluation of their own submission and each other.
  3. The quality of every response is subjectively evaluated by “thumbs up” & “thumbs down” binary scale. This evaluation could be skipped by evaluator.
  4. The quality of the dialog as a whole as well as its breadth and engagement are evaluated on the 0 to 10 range. This evaluation is obligatory.
  5. Final rating is calculated as an average of evaluation values accumulated by submitted system during the NIPS Round of Competition.

Note on disqualification

The submitted system could be excluded from competition temporarily or completely, if there were issues with system used language. We encourage the participants to be guided by common sense, choosing the constraints on system vocabulary.

If there was a spotted issue the participating system is disqualified from competition temporarily, until the issue is resolved. If issue is not resolved until the competition’s end, the system will be disqualified completely from the competition.

Technical infrastructure

  1. Competitors will provide their solutions in the form of executables or source code supporting a common interface (API).
  2. These solutions will be run in isolated virtual environments (containers).
  3. The solutions will not be able to access any external services or the Internet, and will only be able to communicate with the supervisor bot to guard against cheating.
  4. The master bot will facilitate communication between human evaluators and the competitors’ solutions. It will be available in popular messenger services (Facebook/Telegram). It’s main function will be to connect a participant to a (randomly selected) solution or peer and log the evaluation process.
  5. The master bot will provide the instructions and a context necessary for human evaluation of presented solutions.
  6. At the NIPS Round models will be able to connect to the master bot via ParlAI framework.

Also, we are providing baseline solution for participants to start with. More details on dedicated page.

System requirements

The submitted system will be provided with virtual machine with these constraints:

If submitted system would have any issues on running on provided machine, the team would be contacted by organizers to resolve the issue.

Dataset

Dataset collected during competition will be distributed under Apache 2.0 license.

Publication of solutions

Participants have to make their systems available open source following the end of the competition, as the goal of the challenge is to enable the community to make sustained progress towards the creation of conversational intelligence. The data used by teams to pretrain models should be made open access or available on request for academic purposes.

Prizes

10,000 USD for the winning dialogue agent.

We also invite teams to use ParlAI in their solutions and submit applications for ParlAI request for proposals to $10,000 to $20,000 for academic research.

Organizers

Mikhail Burtsev, Valentin Malykh, MIPT, Moscow

Ryan Lowe, McGill University, Montreal

Iulian Serban, Yoshua Bengio, University of Montreal, Montreal

Alexander Rudnicky, Alan W. Black, Shrimai Prabhumoye, Carnegie Mellon University, Pittsburgh

Partners

Platinum Partner

Gold Partner

Silver Partners

Bronze Partner

Please contact info@convai.io for information on sponsorship.

Contact

Please send your requests to info@convai.io