Krishna's PhD thesis: "Lowering the Barriers to Hypothesis-Driven Data Science"
- Preparing to perform data science tasks: We discuss our findings about the impact of formal training on real-world statistical practice; trade-offs among information sources used for selecting statistical procedures; perceived complexity and uncertainty about statistical procedure selection; and reluctance among data workers to adopt alternative methods of analysis. Based on the above findings, we present design recommendations and two artifacts to improve data workers' workflows. Our artifacts include Statsplorer, a web-based tool to help data workers kickstart analysis and learn about common issues in statistical practice, such as over-testing, overlooking assumptions, and selecting the appropriate test; and StatPlayground, an interactive simulation tool that can be used to self-learn or teach statistical concepts and statistical procedure selection.
- Performing data science tasks: Our findings include an overview of data workers' workflows when performing hypothesis testing using programming IDEs, which follows an exploratory programming workflow; and a comparison of existing interfaces for data science programming, namely computational notebooks, scripts, and consoles, and a discussion of how well they support various steps in hypothesis testing. To improve data workers' workflows when performing data science tasks, we contribute design recommendations and two artifacts. Our artifacts include StatWire, an experimental hybrid-programming interface that encourages data workers to write high-quality source code; and Tractus, an interactive visualization that can lower the cost of working with experimental source code.
From the phase "Preparing to perform data science"
- Details of interviews, survey, and content analysis from studies to understand how data workers seek statistical procedure
- Back-end code and video preview from StatPlayground
- Details of interview/observational study used to understand how data workers use various programming interfaces
From the phase "Performing data science"
- Details of observational study used to understand exploratory analysis workflow, Tractus GitHub source, and details of Tractus' parser evaluation
- Details of code corpus analysis, StatWire GitHub source, and StatWire demo video