Supporting Data Workers to Perform Exploratory Programming
ACM Reference Format: Krishna Subramanian, Ilya Zubarev, Simon Voelker, Jan Borchers. 2019. Supporting Data Workers to Perform Exploratory Programming. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, Scotland UK. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3290607.3313027
Data science is an open-ended task in which exploratory programming is a common practice. Data workers often need faster and easier ways to explore alternative approaches to obtain insights from data, which frequently compromises code quality. To understand how well current IDEs support this exploratory workflow, we conducted an observational study with 19 data workers. In this paper, we present two significant findings from our analysis that highlight issues faced by data workers: (a) code hoarding and (b) excessive task switching and code cloning. To mitigate these issues, we provide design recommendations based on existing work, and propose to augment IDEs with an interactive visual plugin. This plugin parses source code to identify and visualize high-level task details. Data workers can use the resulting visualization to better understand and navigate the source code. As a realization of this idea, we present HypothesisManager an add-in for RStudio that identifies and visualizes the hypotheses that a data worker is testing for statistical significance through her source code.
Ilya Zubarev. Visual and Functional Aids to Support the Statistical Analysis Workflow. Masters's Thesis, RWTH Aachen University, Aachen, March 2019.
Krishna Subramanian, Ilya Zubarev, Simon Voelker and Jan Borchers. Supporting Data Workers to Perform Exploratory Programming. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, CHI EA '19, pages 6, ACM, New York, NY, USA, May 2019.