Tidy data
Attributes
- 3rd normal form
- each variable -> one column
- each observation -> one row
- each type of observational unit -> one table
- observational unit = e.g. a questionnaire
Common Problems
- Column = value instead of variable name (e.g. col = โpregnantโ and โnot pregnantโ)
- -> long format (
pivot_longer
)
- -> long format (
- Column has more than one variable (e.g. col = โnum-male-inf-studentsโ)
- -> separate (
separate
)
- -> separate (
- Variable is in both rows and columns (in a field)
- -> wide format (
pivot_wider
)
- -> wide format (
- Multiple observation units in one table (e.g. two different questionnaires, repeating values)
- (One observation split over multiple tables)
Todo
1. Small cleanup
-
post_study_questionnaires$knows_about_featured_snippets fac
-
post_study_questionnaires$featured_snippet_usage fac
-
pre_task_questionnaires$pre_answer fac
-
post_task_questionnaires$post_answer fac
-
snippetscochrane_answer & anno_final fac
2. Joins & new tables
- create table topic_results
- topic, $time (from pre/post_task_questionnaires)
- move demographicsusers and calc time
3. Further cleanup
- create web_page_results$doc_time (start-end)