Tidy data

Attributes

  • 3rd normal form
    • each variable -> one column
    • each observation -> one row
    • each type of observational unit -> one table
      • observational unit = e.g. a questionnaire

Common Problems

  • Column = value instead of variable name (e.g. col = โ€œpregnantโ€ and โ€œnot pregnantโ€)
    • -> long format (pivot_longer)
  • Column has more than one variable (e.g. col = โ€œnum-male-inf-studentsโ€)
    • -> separate (separate)
  • Variable is in both rows and columns (in a field)
    • -> wide format (pivot_wider)
  • Multiple observation units in one table (e.g. two different questionnaires, repeating values)
  • (One observation split over multiple tables)

Todo

1. Small cleanup

  • post_study_questionnaires$knows_about_featured_snippets fac

  • post_study_questionnaires$featured_snippet_usage fac

  • pre_task_questionnaires$pre_answer fac

  • post_task_questionnaires$post_answer fac

  • snippetscochrane_answer & anno_final fac

2. Joins & new tables

  • create table topic_results
    • topic, $time (from pre/post_task_questionnaires)
  • move demographicsusers and calc time

3. Further cleanup

  • create web_page_results$doc_time (start-end)