Evaluation

All systems will be asked to provide answers to the open-ended questions based on the textual and visual information encoded in the dataset. All systems will be evaluated solely based on the answers to the questions.

For answers to cardinality, yes/no and unanswerable question, accuracy will be applied to compute the results. For answers to natural language questions, exact match, word-level F1 will be applied to compute the results.

Submission

Participants should submit a zip file containing a single json with the string r2vq_pred in its name, and a file of this form:

{
  "Recipe_ID1": {
    "Question_ID1": 3,
    "Question_ID2": true,
    "Question_ID3": ["natural language answer1", "natural language answer2"],
    "Question_ID4": null,
    ...
  },
  "Recipe_ID2": {
    ...
  }
}

where the Recipe_ID is the # newdoc id from the metadata in the provided CoNLL-U files. Within each recipe, the Question_ID is the key of the question to be answered, and the value is the answer that can be either an integer (cardinality answers), boolean value (yes/no answers), a list of strings (natural language answers), or just null (unanswerable).