Data Specification

We will have a single text file each for train, validation and test split. Each file has the annotation of CoNLL-U format with three types of lines:

  1. Word lines containing the annotation of a word/token in 15 fields separated by single tab characters; see below.
  2. Single blank line marking ingredient/sentence boundaries.
  3. Double blank lines marking document (recipe) boundaries
  4. Document metadata lines starting with hash (#).

Sentences consist of one or more word lines, and word lines contain the following fields:

  1. ID: Word index, integer starting at 1 for each new sentence; .
  2. FORM: Word form or punctuation symbol.
  3. LEMMA: Lemma of word form.
  4. UPOS: Universal POS tag.
  6. PART: Word index of the head of EVENT entity when the participant-of relation exists between the event and another entity in current line.
  7. PART: Word index of the head of EVENT entity when the result-of relation exists between the event and another entity in current line.
  8. HIDDEN: Hidden entities that are involved in the event in current line; see below for details.
  9. COREF: Coreference ID for entities that are cross-referred. It is represented as, e.g. asparagus.2.1.3.
  10. PREDICATE: The sense of the word that is annotated as a predicate.
  11. ARG1: The arguments of the first predicate in current sentence.

12-15. ARGX: The arguments of the X-th predicate in current sentence.

Annotation for the HIDDEN field:

  • It can only appear in the same row where the token is the head of the EVENT entity.
  • Possible keyword values are: Drop (syntactically hidden ingredients), Shadow (semantically hidden ingredients), Result, Tool (hidden tools), Habitat (hidden habitats)
  • Each hidden argument writes as Keyword=value, e.g. Drop=mixture
  • Multiple hidden arguments with same keywords in the same cell are separately with :, e.g. Drop=mixture:olive oil
  • Multiple hidden arguments with different keywords are separately with |, e.g. Drop=mixture:olive oil|Tool=spoon
  • If a hidden argument is co-referred else where in the recipe, a coreference ID will be appended to the hidden argument value. A coreference id is represented as of the first appearance of the co-referral, e.g. Drop=mixture:olive oil.1.1.3|Tool=spoon (the head of the first appearance of olive oil is the 3rd token of step 1, sentence 1)

Annotation for questions and corresponding answers are from metadata lines. For one recipe, lines start with # question ID1 and # answer ID1 are a question-answer pair.

Image Data

Accompanying each recipe is a series of images extracted from YouTube videos that are associated with a particular sentence in the recipe. The images were pulled from a set of YouTube videos that were selected by querying YouTube for recipe titles. For each recipe title, 5 Creative Commons licensed videos were downloaded. These videos were indexed by generating a embedding using the tensorflow implementation of the S3D model available here. For each sentence in the recipes, the 10 closest video clips were selected from the youcookii dataset and 5 closest clips were selected from the additional YouTube videos we downloaded. We then selected the most representative of the 15 clips to include in the dataset. Importantly, the action represented in the image clips do not necessarily include the same ingredients of as those used in the recipe. The videos were chosen based on the similarity of the cooking event described in the sentence.

The frames are stored in a directory matching the sentence id for the corresponding sentence.