Data Preparation and Variability (MTL) – Michelle Hoda Wilkerson

New paper in Mathematical Thinking and Learning that explores students’ opportunities to engage with variability when working with complex public datasets:

Wilkerson, M. H., Lanouette, K., & Shareff, R. L. (2021). Exploring variability during data preparation: A way to connect data, chance, and context when working with complex public datasets. Mathematical Thinking and Learning. doi: 10.1080/10986065.2021.1922838

Data preparation (also called “wrangling” or “cleaning”)—the evaluation and manipulation of data prior to formal analysis—is often dismissed as a precursor to meaningful engagement with a dataset. Here, we re-envision data preparation in light of calls to prepare students for a data-rich world. Traditionally, curricular statistics explorations involve data that are derived from observations that students record themselves or that reflect familiar, relatively closed systems. In contrast, pre-constructed public datasets are much larger in scope and involve temporal, geographic, and other dimensions that complicate inference and blur boundaries between “signal” and “noise.” As a result, students have fewer opportunities to consider sources of variability in such datasets. Due to these constraints, we argue that data preparation becomes an important site for students to reason about variability with public data. Through analyses of repeated task-based interviews with five pairs of adolescent participants, we find that specific actions during data preparation, such as filtering data or calculating new measures, presented opportunities to engage leaners with variability as they prepared and analyzed several public socioscientific datasets. More broadly, our study highlights some changes to theory and curriculum in statistics education that are necessitated by a focus on “big data literacy”.