Research Directions in Data Wrangling: Visualizations and Transformations for Usable and Credible Data
Source: https://idl.uw.edu/papers/data-wrangling Parent: https://idl.uw.edu/papers
Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, Paolo Buono. Information Visualization Journal, 2011
Sean Kandel, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, Paolo Buono
Information Visualization Journal, 2011
Materials
Abstract
In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of 'data wrangling' often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration are longstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations.
BibTeX
@article{2011-data-wrangling,
title = {Research Directions in Data Wrangling: Visualizations and Transformations for Usable and Credible Data},
author = {Kandel, Sean AND Heer, Jeffrey AND Plaisant, Catherine AND Kennedy, Jessie AND van Ham, Frank AND Henry Riche, Nathalie AND Weaver, Chris AND Lee, Bongshin AND Brodbeck, Dominique AND Buono, Paolo},
journal = {Information Visualization Journal},
year = {2011},
volume = {10},
number = {4},
pages = {271--288},
url = {https://idl.uw.edu/papers/data-wrangling},
doi = {10.1177/1473871611415994}
}
{"status":200,"statusText":"","headers":{},"body":"[\n {\n \"fullName\": \"Proc. ACM Human Factors in Computing Systems (CHI)\",\n \"nickname\": \"CHI\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics (Proc. VIS)\",\n \"nickname\": \"VIS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Computer Graphics Forum (Proc. EuroVis)\",\n \"nickname\": \"EuroVis\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. EuroVis Short Papers\",\n \"nickname\": \"EuroVis-Short\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. IEEE VIS Short Papers\",\n \"nickname\": \"VIS-Short\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM User Interface Software & Technology (UIST)\",\n \"nickname\": \"UIST\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM Computer-Supported Cooperative Work (CSCW)\",\n \"nickname\": \"CSCW\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM Intelligent User Interfaces\",\n \"nickname\": \"IUI\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"ACM Trans. on Computer-Human Interaction\",\n \"nickname\": \"ACM TOCHI\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Advanced Visual Interfaces\",\n \"nickname\": \"AVI\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Conference on Innovative Data Systems Research (CIDR)\",\n \"nickname\": \"CIDR\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Very Large Database Endowment (PVLDB)\",\n \"nickname\": \"PVLDB\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Empirical Methods in Natural Language Processing\",\n \"nickname\": \"EMNLP\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. NAACL-HLT\",\n \"nickname\": \"NAACL-HLT\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. International Conference on Weblogs and Social Media (ICWSM)\",\n \"nickname\": \"ICWSM\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis)\",\n \"nickname\": \"InfoVis\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Beautiful Data\",\n \"nickname\": \"Beautiful Data\",\n \"venueType\": \"book\"\n },\n {\n \"fullName\": \"Information Visualization Journal\",\n \"nickname\": \"IV Journal\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. IEEE Visual Analytics Science & Technology (VAST)\",\n \"nickname\": \"VAST\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Cortex\",\n \"nickname\": \"Cortex\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Hawaii International Conference on Systems Sciences (HICSS)\",\n \"nickname\": \"HICSS\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. IEEE Information Visualization (InfoVis)\",\n \"nickname\": \"InfoVis (Pre-TVCG)\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Ubiquitous Computing\",\n \"nickname\": \"UbiComp\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. WEBKDD Workshop\",\n \"nickname\": \"WEBKDD\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"ACM Trans. on Information Systems\",\n \"nickname\": \"ACM TOIS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Communications of the ACM\",\n \"nickname\": \"CACM\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Workshop on Social Network Mining & Analysis, ACM KDD\",\n \"nickname\": \"SNAKDD\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. Social Visualization Workshop, ACM CHI\",\n \"nickname\": \"CHI Social Vis\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. AVI Workshop on Invisible & Transparent Interfaces\",\n \"nickname\": \"AVI ITI\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. Color Imaging Conference\",\n \"nickname\": \"Color Imaging Conf.\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Workshop on Applications for Topic Models, NIPS\",\n \"nickname\": \"NIPS Topic Model Ws\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. Mining Software Repositories\",\n \"nickname\": \"MSR\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Journal of Animal Ecology\",\n \"nickname\": \"J Anim Eco\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"J Am Med Inform Assoc\",\n \"nickname\": \"JAMIA\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. International Conference on Machine Learning (ICML)\",\n \"nickname\": \"ICML\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Computer Graphics and Applications\",\n \"nickname\": \"CG&A\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. IEEE Biological Data Visualization (BioVis)\",\n \"nickname\": \"BioVis\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Poetics\",\n \"nickname\": \"Poetics\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. ACM Web Search and Data Mining (WSDM)\",\n \"nickname\": \"WSDM\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. User Modeling and User-Adapted Interaction (UMUAI)\",\n \"nickname\": \"UMUAI\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Workshop on Eye Tracking and Visualization (ETVIS)\",\n \"nickname\": \"ETVIS\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Trends in Ecology & Evolution\",\n \"nickname\": \"TREE\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"PLOS ONE\",\n \"nickname\": \"PLOS ONE\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. ACM SIGMOD Human-in-the-Loop Data Analysis (HILDA)\",\n \"nickname\": \"HILDA\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics (Proc. VAST)\",\n \"nickname\": \"VAST-TVCG\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Workshop on Dealing with Cognitive Biases in Visualisations (DECISIVe), IEEE VIS\",\n \"nickname\": \"DECISIVe\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"arXiv\",\n \"nickname\": \"arXiv\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"The Journal of Open Source Software\",\n \"nickname\": \"JOSS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proceedings of the National Academy of Sciences\",\n \"nickname\": \"PNAS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Association for Computational Linguistics (ACL)\",\n \"nickname\": \"ACL\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Distill\",\n \"nickname\": \"Distill\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Harvard Data Science Review\",\n \"nickname\": \"HDSR\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Organizational Behavior and Human Decision Processes\",\n \"nickname\": \"OBHDP\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"EPJ Data Science\",\n \"nickname\": \"EPJ-DS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. IEEE Symposium on Visual Languages and Human Centric Computing (VL/HCC)\",\n \"nickname\": \"VL/HCC\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM Management of Data (SIGMOD)\",\n \"nickname\": \"SIGMOD\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Companion of ACM Management of Data (SIGMOD)\",\n \"nickname\": \"SIGMOD-Demo\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics\",\n \"nickname\": \"TVCG\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. ACM Creativity & Cognition\",\n \"nickname\": \"C&C\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Workshop on Intelligent and Interactive Writing Assistants (In2Writing)\",\n \"nickname\": \"In2Writing\",\n \"venueType\": \"workshop\"\n }\n]\n"} {"status":200,"statusText":"","headers":{},"body":"{\n \"title\": \"Research Directions in Data Wrangling: Visualizations and Transformations for Usable and Credible Data\",\n \"year\": 2011,\n \"start_page\": 271,\n \"end_page\": 288,\n \"volume\": 10,\n \"issue\": 4,\n \"editors\": \"\",\n \"publisher\": \"\",\n \"location\": \"\",\n \"pdf\": \"https://idl.cs.washington.edu/files/2011-DataWrangling-IVJ.pdf\",\n \"abstract\": \"In spite of advances in technologies for working with data, analysts still spend an inordinate amount of time diagnosing data quality issues and manipulating data into a usable form. This process of 'data wrangling' often constitutes the most tedious and time-consuming aspect of analysis. Though data cleaning and integration are longstanding issues in the database community, relatively little research has explored how interactive visualization can advance the state of the art. In this article, we review the challenges and opportunities associated with addressing data quality issues. We argue that analysts might more effectively wrangle data through new interactive systems that integrate data verification, transformation, and visualization. We identify a number of outstanding research questions, including how appropriate visual encodings can facilitate apprehension of missing data, discrepant values, and uncertainty; how interactive visualizations might facilitate data transform specification; and how recorded provenance and social interaction might enable wider reuse, verification, and modification of data transformations.\",\n \"thumbnail\": \"images/thumbs/data-wrangling.png\",\n \"figure\": \"\",\n \"caption\": \"\",\n \"web_name\": \"data-wrangling\",\n \"visible\": true,\n \"mod_date\": \"2012-01-05\",\n \"note\": \"\",\n \"pub_date\": \"2011-10-24\",\n \"venue\": \"IV Journal\",\n \"authors\": [\n {\n \"first_name\": \"Sean\",\n \"last_name\": \"Kandel\",\n \"url\": \"http://skandel.us\"\n },\n {\n \"first_name\": \"Jeffrey\",\n \"last_name\": \"Heer\",\n \"url\": \"http://homes.cs.washington.edu/~jheer/\"\n },\n {\n \"first_name\": \"Catherine\",\n \"last_name\": \"Plaisant\"\n },\n {\n \"first_name\": \"Jessie\",\n \"last_name\": \"Kennedy\"\n },\n {\n \"first_name\": \"Frank\",\n \"last_name\": \"van Ham\"\n },\n {\n \"first_name\": \"Nathalie\",\n \"last_name\": \"Henry Riche\"\n },\n {\n \"first_name\": \"Chris\",\n \"last_name\": \"Weaver\"\n },\n {\n \"first_name\": \"Bongshin\",\n \"last_name\": \"Lee\"\n },\n {\n \"first_name\": \"Dominique\",\n \"last_name\": \"Brodbeck\"\n },\n {\n \"first_name\": \"Paolo\",\n \"last_name\": \"Buono\"\n }\n ],\n \"materials\": [],\n \"tags\": [],\n \"doi\": \"10.1177/1473871611415994\"\n}"}
{ __sveltekit_17copn9 = { base: new URL("..", location).pathname.slice(0, -1), assets: "/uwdata.github.io" }; const element = document.currentScript.parentElement; const data = [null,null]; Promise.all([ import("../_app/immutable/entry/start.CZdZnu7S.js"), import("../_app/immutable/entry/app.qRA-U4ZQ.js") ]).then(([kit, app]) => { kit.start(app, element, { node_ids: [0, 7], data, form: null, error: null }); }); }