Differentiating Language Usage through Topic Models
Source: https://idl.uw.edu/papers/topic-models-language-usage Parent: https://idl.uw.edu/papers
Daniel A. McFarland, Daniel Ramage, Jason Chuang, Jeffrey Heer, Christopher D. Manning. Poetics, 2013
Daniel A. McFarland, Daniel Ramage, Jason Chuang, Jeffrey Heer, Christopher D. Manning
Poetics, 2013
Language borrowing across academic fields, using dissertation abstracts from 2000–2010.
Materials
PDF | Journal
Abstract
Sociologists wishing to employ topic models in their research need a helpful guide that describes the variety of topic modeling procedures, their issues, and various means of resolving them so as to convincingly answer sociological questions. We present this overview by recounting a series of our prior collaborative projects that have employed and developed various forms of topic models to understand language differentiation in academe. With each project, we encountered a variety of model-specific issues concerning the validity of topics and their suitability to our data and research questions. We developed a variety of novel visualization techniques to make sense of topic-solutions and used a variety of techniques to validate our results. In addition, we created a variety of new topic modeling techniques and procedures suitable to different kinds of data and research questions.
BibTeX
@article{2013-topic-models-language-usage,
title = {Differentiating Language Usage through Topic Models},
author = {McFarland, Dan AND Ramage, Daniel AND Chuang, Jason AND Heer, Jeffrey AND Manning, Christopher},
journal = {Poetics},
year = {2013},
url = {https://idl.uw.edu/papers/topic-models-language-usage},
doi = {10.1016/j.poetic.2013.06.004}
}
{"status":200,"statusText":"","headers":{},"body":"[\n {\n \"fullName\": \"Proc. ACM Human Factors in Computing Systems (CHI)\",\n \"nickname\": \"CHI\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics (Proc. VIS)\",\n \"nickname\": \"VIS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Computer Graphics Forum (Proc. EuroVis)\",\n \"nickname\": \"EuroVis\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. EuroVis Short Papers\",\n \"nickname\": \"EuroVis-Short\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. IEEE VIS Short Papers\",\n \"nickname\": \"VIS-Short\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM User Interface Software & Technology (UIST)\",\n \"nickname\": \"UIST\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM Computer-Supported Cooperative Work (CSCW)\",\n \"nickname\": \"CSCW\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM Intelligent User Interfaces\",\n \"nickname\": \"IUI\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"ACM Trans. on Computer-Human Interaction\",\n \"nickname\": \"ACM TOCHI\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Advanced Visual Interfaces\",\n \"nickname\": \"AVI\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Conference on Innovative Data Systems Research (CIDR)\",\n \"nickname\": \"CIDR\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Very Large Database Endowment (PVLDB)\",\n \"nickname\": \"PVLDB\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Empirical Methods in Natural Language Processing\",\n \"nickname\": \"EMNLP\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. NAACL-HLT\",\n \"nickname\": \"NAACL-HLT\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. International Conference on Weblogs and Social Media (ICWSM)\",\n \"nickname\": \"ICWSM\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics (Proc. InfoVis)\",\n \"nickname\": \"InfoVis\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Beautiful Data\",\n \"nickname\": \"Beautiful Data\",\n \"venueType\": \"book\"\n },\n {\n \"fullName\": \"Information Visualization Journal\",\n \"nickname\": \"IV Journal\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. IEEE Visual Analytics Science & Technology (VAST)\",\n \"nickname\": \"VAST\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Cortex\",\n \"nickname\": \"Cortex\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Hawaii International Conference on Systems Sciences (HICSS)\",\n \"nickname\": \"HICSS\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. IEEE Information Visualization (InfoVis)\",\n \"nickname\": \"InfoVis (Pre-TVCG)\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Ubiquitous Computing\",\n \"nickname\": \"UbiComp\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. WEBKDD Workshop\",\n \"nickname\": \"WEBKDD\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"ACM Trans. on Information Systems\",\n \"nickname\": \"ACM TOIS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Communications of the ACM\",\n \"nickname\": \"CACM\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Workshop on Social Network Mining & Analysis, ACM KDD\",\n \"nickname\": \"SNAKDD\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. Social Visualization Workshop, ACM CHI\",\n \"nickname\": \"CHI Social Vis\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. AVI Workshop on Invisible & Transparent Interfaces\",\n \"nickname\": \"AVI ITI\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. Color Imaging Conference\",\n \"nickname\": \"Color Imaging Conf.\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. Workshop on Applications for Topic Models, NIPS\",\n \"nickname\": \"NIPS Topic Model Ws\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Proc. Mining Software Repositories\",\n \"nickname\": \"MSR\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Journal of Animal Ecology\",\n \"nickname\": \"J Anim Eco\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"J Am Med Inform Assoc\",\n \"nickname\": \"JAMIA\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. International Conference on Machine Learning (ICML)\",\n \"nickname\": \"ICML\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Computer Graphics and Applications\",\n \"nickname\": \"CG&A\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. IEEE Biological Data Visualization (BioVis)\",\n \"nickname\": \"BioVis\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Poetics\",\n \"nickname\": \"Poetics\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. ACM Web Search and Data Mining (WSDM)\",\n \"nickname\": \"WSDM\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. User Modeling and User-Adapted Interaction (UMUAI)\",\n \"nickname\": \"UMUAI\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Workshop on Eye Tracking and Visualization (ETVIS)\",\n \"nickname\": \"ETVIS\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"Trends in Ecology & Evolution\",\n \"nickname\": \"TREE\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"PLOS ONE\",\n \"nickname\": \"PLOS ONE\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. ACM SIGMOD Human-in-the-Loop Data Analysis (HILDA)\",\n \"nickname\": \"HILDA\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics (Proc. VAST)\",\n \"nickname\": \"VAST-TVCG\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Workshop on Dealing with Cognitive Biases in Visualisations (DECISIVe), IEEE VIS\",\n \"nickname\": \"DECISIVe\",\n \"venueType\": \"workshop\"\n },\n {\n \"fullName\": \"arXiv\",\n \"nickname\": \"arXiv\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"The Journal of Open Source Software\",\n \"nickname\": \"JOSS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proceedings of the National Academy of Sciences\",\n \"nickname\": \"PNAS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. Association for Computational Linguistics (ACL)\",\n \"nickname\": \"ACL\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Distill\",\n \"nickname\": \"Distill\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Harvard Data Science Review\",\n \"nickname\": \"HDSR\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Organizational Behavior and Human Decision Processes\",\n \"nickname\": \"OBHDP\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"EPJ Data Science\",\n \"nickname\": \"EPJ-DS\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. IEEE Symposium on Visual Languages and Human Centric Computing (VL/HCC)\",\n \"nickname\": \"VL/HCC\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Proc. ACM Management of Data (SIGMOD)\",\n \"nickname\": \"SIGMOD\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Companion of ACM Management of Data (SIGMOD)\",\n \"nickname\": \"SIGMOD-Demo\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"IEEE Trans. Visualization & Comp. Graphics\",\n \"nickname\": \"TVCG\",\n \"venueType\": \"journal\"\n },\n {\n \"fullName\": \"Proc. ACM Creativity & Cognition\",\n \"nickname\": \"C&C\",\n \"venueType\": \"conference\"\n },\n {\n \"fullName\": \"Workshop on Intelligent and Interactive Writing Assistants (In2Writing)\",\n \"nickname\": \"In2Writing\",\n \"venueType\": \"workshop\"\n }\n]\n"} {"status":200,"statusText":"","headers":{},"body":"{\n \"title\": \"Differentiating Language Usage through Topic Models\",\n \"year\": 2013,\n \"start_page\": null,\n \"end_page\": null,\n \"volume\": null,\n \"issue\": null,\n \"editors\": \"\",\n \"publisher\": \"\",\n \"location\": \"\",\n \"pdf\": \"\",\n \"abstract\": \"Sociologists wishing to employ topic models in their research need a helpful guide that describes the variety of topic modeling procedures, their issues, and various means of resolving them so as to convincingly answer sociological questions. We present this overview by recounting a series of our prior collaborative projects that have employed and developed various forms of topic models to understand language differentiation in academe. With each project, we encountered a variety of model-specific issues concerning the validity of topics and their suitability to our data and research questions. We developed a variety of novel visualization techniques to make sense of topic-solutions and used a variety of techniques to validate our results. In addition, we created a variety of new topic modeling techniques and procedures suitable to different kinds of data and research questions.\",\n \"thumbnail\": \"images/thumbs/topic-models-language-usage.png\",\n \"figure\": \"images/figures/word-borrowing.png\",\n \"caption\": \"Language borrowing across academic fields, using dissertation abstracts from 2000–2010.\",\n \"web_name\": \"topic-models-language-usage\",\n \"visible\": true,\n \"mod_date\": \"2013-10-24\",\n \"note\": \"\",\n \"pub_date\": \"2013-09-09\",\n \"venue\": \"Poetics\",\n \"authors\": [\n {\n \"first_name\": \"Dan\",\n \"last_name\": \"McFarland\",\n \"display_name\": \"Daniel A. McFarland\"\n },\n {\n \"first_name\": \"Daniel\",\n \"last_name\": \"Ramage\"\n },\n {\n \"first_name\": \"Jason\",\n \"last_name\": \"Chuang\",\n \"url\": \"http://jason.chuang.info\"\n },\n {\n \"first_name\": \"Jeffrey\",\n \"last_name\": \"Heer\",\n \"url\": \"http://homes.cs.washington.edu/~jheer/\"\n },\n {\n \"first_name\": \"Christopher\",\n \"last_name\": \"Manning\",\n \"display_name\": \"Christopher D. Manning\"\n }\n ],\n \"materials\": [\n {\n \"name\": \"Journal\",\n \"link\": \"http://www.sciencedirect.com/science/article/pii/S0304422X13000442\"\n }\n ],\n \"tags\": [],\n \"doi\": \"10.1016/j.poetic.2013.06.004\"\n}"}
{ __sveltekit_17copn9 = { base: new URL("..", location).pathname.slice(0, -1), assets: "/uwdata.github.io" }; const element = document.currentScript.parentElement; const data = [null,null]; Promise.all([ import("../_app/immutable/entry/start.CZdZnu7S.js"), import("../_app/immutable/entry/app.qRA-U4ZQ.js") ]).then(([kit, app]) => { kit.start(app, element, { node_ids: [0, 7], data, form: null, error: null }); }); }