|
Digital Humanities Tools
This version was saved 7 years, 4 months ago
View current version Page history
Saved by Alan Liu
on July 25, 2017 at 1:38:59 pm
|
|
Digital Humanities Tools Curated by Alan Liu
(DH Toychest started 2013; last update 2017)
|
Guides to Digital Humanities | Tutorials | Tools | Examples | Data Collections & Datasets
Online or downloadable tools that are free, free to students, or have generous trial periods without tight usage constraints, watermarks, or other spoilers. Bias toward tools that can be run online or installed on a personal computer without needing an institutional server. (Also see Other Tool Lists)
Note about organization: At present, these tools are organized in an improvised scheme of categories. For the most deliberate and comprehensive taxonomy of digital-humanities activities, objects, and techniques currently available, see TaDiRAH. (See also about TaDiRAH)
= Currently a tool that is prevalent, canonical, or has "buzz" in the digital humanities community.
= Other tools with high power or general application
- Animation & Storyboarding Tools
- Bonsai (tool for programmatic creation of simple animated graphics in a Web browser using "graphics library which includes an intuitive graphics API and an SVG renderer")
- FrameByFrame (stop-motion animation tool for Mac) ( creates stop-motion animation videos using any webcam/video camera connected to your Mac, including iSight)
- Pencil (2D animation software suitable for beginners at animation)
- Popcorn Maker (creates interactive videos; "helps you easily remix web video, audio and images into cool mashups that you can embed on other websites. Drag and drop content from the web, then add your own comments and links . . . ; videos are dynamic, full of links and unique with every view") | Tutorial by Miriam Posner
- Scratch (visual programming platform developed by the MIT Media Lab to teach children about programming by allowing them to use a visual interface to create interactive programs, games, etc.; useful for allowing advanced humanities scholars without programming skills to program dynamic, interactive visual scenes and learn about programming logic) | Scratch 2.0 Offline Editor
- Storyteller ("application from Amazon Studios that lets you turn a movie script into a storyboard. You choose the backgrounds, characters, and props to visually tell a story")
- Audio Tools (see Audio Editing Tutorials)
- Audiotool (free, web-based application for electronic music production; meant to serve as a fully functioning virtual studio. Users drop and drag synthesizers, drum machines, sequencers, filters, samples, and note sequences into the workspace from a toolbar)
- Augmented Notes ("integrates scores and audio files to produce interactive multimedia websites in which measures of the score are highlighted in time with music")
- MusicAlgorithms (tools and resources for the creation and analysis of algorithmically-generated music)
- Paperphone ("interactive audio app that processes vocal performance in realtime, designed for presentations & sound essays")
- Praat (free software package for phonetic analysis; designed to analyse, synthesize, manipulate, and visualize speech)
- Sonic Visualiser (program to facilitate study of musical recordings; "of particular interest to musicologists, archivists, signal-processing researchers and anyone else looking for a friendly way to take a look at what lies inside the audio file")
- Authoring/Annotation/Editing/Publishing Platforms & Tools (including collaborative platforms) (see also Content Management Systems and Exhibition/Collection/Edition Platforms & Tools)
- Annotation Studio ("suite of tools for collaborative web-based annotation.... Currently supporting the multimedia annotation of texts... will ultimately allow students to annotate video, image, and audio sources")
- Brat Rapid Annotation Tool ("online environment for collaborative text annotation"; focused on structured annotation of text, e.g., tagging named entities such as persons, organizations, etc., and their relationships)
- CommentPress ("open source theme and plugin for the WordPress blogging engine that allows readers to comment paragraph-by-paragraph, line-by-line or block-by-block in the margins of a text. Annotate, gloss, workshop, debate: ... do all of these things on a finer-grained level, turning a document into a conversation")
- Fold ("a context creation platform for journalists and storytellers, allowing them to structure and craft complex stories"; created at MIT Media Lab)
- INKE Tools and Prototypes (tools and platforms developed by the INKE project)
- Interactive Fiction Writing tools and platforms
- Fungus (open-source "Unity 3D library for creating illustrated interactive fiction games")
- Inklewriter (free online tool "designed to allow anyone to write and publish interactive stories. It’s perfect for writers who want to try out interactivity, but also for teachers and students looking to mix computer skills and creative writing"; "keeps your branching story organised, so you can concentrate on what’s important – the writing." Also allows export of stories to Kindle with hyperlinks for the interactive features of a story.)
- Undum ("a game framework for building a sophisticated form of hypertext interactive fiction"; "consists of a HTML file [with CSS stylesheets] and three Javascript files... To create your own game, you edit the HTML file a little..., edit one of the Javascript files [and upload to a web server]") (sample story created in Undam: "Mrs. Wobbles and the Tangerine House," by the Marino Family)
- NewRadial (visualization interface from the INKE project designed to facilitate studying, commenting on, and social editing of texts)
- Odyssey (online tool that provides "a simple way for journalists, designers, and creators to weave interactive stories" based on a mapping paradigm; allows for mixing "written narrative, multimedia, and map based interaction")
- Oppia (Google's tool for making "embeddable interactive educational 'explorations' that let people learn by doing"; "Oppia aims to simulate the one-on-one interaction that a student has with a teacher by capturing and generalizing 'interaction dialogues'"; explorations can contain maps, images, text)
- Prism ("a tool for "crowdsourcing interpretation." Users are invited to provide an interpretation of a text by highlighting words according to different categories, or "facets." Each individual interpretation then contributes to the generation of a visualization which demonstrates the combined interpretation of all the users. We envision Prism as a tool for both pedagogical use and scholarly exploration, revealing patterns that exist in the subjective experience of reading a text.")
- Pullquotes (tool for tweeting full quotations or images on Twitter, for tweeting a stream of quotations; and for collection Twitter quotations
- Scalar (multi-modal authoring platform: "free, open source authoring and publishing platform that’s designed to make it easy for authors to write long-form, born-digital scholarship online. Scalar enables users to assemble media from multiple sources and juxtapose them with their own writing in a variety of ways, with minimal technical expertise required")
- Scroll Kit (drag-and-drop online platform for creating scrollable multimedia narratives that also scale for mobile device screens; "Make stories people will want to touch. Scroll Kit is a powerful visual content editor . . . typography, images, motion")
- StoryMapJS ("free tool to help you tell stories on the web that highlight the locations of a series of events; ... you can use StoryMapJS to tell a story with photographs, works of art, historic maps, and other image files. Because it works best with very large images, we call these 'gigapixel' StoryMaps")
- Twine (" You don't need to write any code to create a simple story with Twine, but you can extend your stories with variables, conditional logic, images, CSS, and JavaScript when you're ready. Twine publishes directly to HTML, so you can post your work nearly anywhere.")
- Code Versioning Systems (see Code Versioning Tutorials)
- GitHub ("collaboration, code review, and code management for open source and private projects"; also used by scholars for non-code projects, e.g., creating documents, syllabi, or any project that benefits from tracking, forking, or roll-back of modular parts contributed by one or more participants)
- Command Line Tools (see Command Line Tutorials)
- The Sourcecaster (set of instructions for using the command line to perform common text preparation tasks--e.g., conversion of text or media formats, wrangling and cleaning text, batch filename editing, etc.)
- Content Management Systems (see Content Management Systems Tutorials) (see also Authoring/Annotation/Editing Tools)
- PBWorks (content management system hosted online with strong educational user base; particular robust as a wiki platform for project or course sites; free education-user licenses)
- WordPress (content management system based originally on blog paradigm; hosted online or downloadable for installation on local server)
- Crowdsourcing Tools
- AllOurIdeas ("social data collection" wiki platform that solicits information online by survey "while still allowing for new information to 'bubble up' from respondents as happens in interviews, participant observation, and focus groups")
- Exhibition/Collection/Edition Platforms & Tools (see also tools for Infographics and Timelines; and selected tools in Mapping)
- CollectiveAccess ("cataloguing tool and web-based application for museums, archives and digital collections")
- DH Press (WordPress-based "flexible, repurposable, extensible digital humanities toolkit designed for non-technical users. It enables administrative users to mashup and visualize a variety of digitized humanities-related material, including historical maps, images, manuscripts, and multimedia content. DH Press can be used to create a range of digital projects, from virtual walking tours and interactive exhibits, to classroom teaching tools and community repositories")
- Exhibit (downloadable software for creating "web pages with advanced text search and filtering functionalities, with interactive maps, timelines, and other visualizations"; part of the Simile Widgets suite)
- Google Open Gallery (Users must request an invite; "Powerful free tools for artists, museums, archives and galleries ... Easily upload images, videos and audio
to create online exhibitions and tell your stories ... Enhance your existing website, or create a brand new one for free ... Very powerful zoom for your beautiful images... Help visitors discover your content using search and filtering options")
- oldweb.today (online emulator platform from Rhizome that allows users to see what past or present web sites look like in historical browsers going back to the NCSA Mosaic browser)
- Omeka ("create complex narratives and share rich collections, adhering to Dublin Core standards with Omeka on your server, designed for scholars, museums, libraries, archives, and enthusiasts"; hosted online or downloadable for installation on server) | Getting Started
- Open Exhibits ("free multitouch & multiuser software initiative for museums, education, nonprofits, and students")
- Neatline ("allows scholars, students, and curators to tell stories with maps and timelines. As a suite of add-on tools for Omeka, it opens new possibilities for hand-crafted, interactive spatial and temporal interpretation"; downloadable for installation on server)
- Prezi (alternative to PowerPoint; uses an infinite canvas metaphor rather than a slide metaphor; free online production and viewing version; offline production version by subscription)
- Silk (online data visualization and exhibition platform; takes datasets input as spreadsheets and allows users to create collections, maps, graphs, etc.)
- Simile Widgets (embeddable code for visualizing time-based data, including Timeline, Timeplot, Runway, and Exhibition)
- TextGrid ("a virtual research environment (VRE) for humanities scholars in which various tools and services are available for the creation, analysis, editing, and publication of texts and images"; provides "a variety of tested tools, services, and resources, allowing for the complete workflow of, for example, generating a critical textual edition"; "also supports the storage and re-use of research data through the integration of the TextGrid Repository")
- ViewShare ("free platform for generating and customizing view--interactive maps, timelines, facets, tag clouds--that allow users to experience your digital collections"; upload spreadsheets or other collection data formats with information about a collection of materials; then configure how and what to show. Visualizations of collections are embeddable on Web pages. Users must request an account)
- Internet Research Tools (tools for studying the Internet or parts of the Internet) (this section is heavily indebted to the Digital Methods Initiative at the University of Amsterdam and its collection of tools)
- Censorship Explorer ("Check whether a URL is censored in a particular country by using proxies located around the world"; a Digital Methods Initiative tool)
- Compare Lists ("Compare two lists of URLs for their commonalities and differences"; a Digital Methods Initiative tool)
- Facebook (tools for studying Facebook)
- Like Scraper ("For each URL entered, this script queries the Facebook api and retrieves the number of likes, shares, comments and clicks for given URLs. The output is a table with the URLs queried and the numbers retrieved"; a Digital Methods Initiative tool)
- Netvizz ("Extracts various datasets from Facebook"; a Digital Methods Initiative tool)
- NetvizzToSentiStrength ("uses Sentistrength to analyze the sentiment of short texts. Three types of data can be uploaded: Netvizz, DMI-TCAT, or a regular CSV file"; a Digital Methods Initiative tool)
- Google (tools for studying Google)
- Google AutoComplete (retrieves Google autocomplete suggestions according to language and country; a Digital Methods Initiative tool)
- Google Blog Search Scraper (allows for batch queries of Google Blog Search; "query the resonance of a particular term, or a series of terms, in a set of blogs"; a Digital Methods Initiative tool)
- Google Image Scraper ("query images.google.com with one or more keywords, and/or use images.google.com to query specific sites for images"; a Digital Methods Initiative tool)
- Google News Scraper ("The scraper batch queries news.google.com, outputting a table of returns including URL, title, source, city/country, date and teaser text"; a Digital Methods Initiative tool)
- GoogleScraper ("The Googlescraper ... queries Google and makes the results available for further analysis ... Google will be asked if each keyword occurs in each URL. Results are displayed as a tag cloud and an html table. They also are written to a text file which you can access at the bottom or through previous results ... The most common use of the tool is researching the presence as well as the ranking of particular sources within Google engine results"; a Digital Methods Initiative tool)
- Harvester ("Extract URLs from text, source code or search engine results. Produces a clean list of URLs"; a Digital Methods Initiative tool)
- Image Scraper ("scrape images from a single page"; a Digital Methods Initiative tool)
- Internet Archive Wayback Machine Link Ripper ("Enter a host or URL to retrieve the links to the URL's archived versions at wayback.archive.org. A text file is produced which lists the archive URLs"; a Digital Methods Initiative tool)
- IssueCrawler ("Enter URLs and the Issue Crawler performs co-link analysis in one, two or three iterations, and outputs a cluster graph....; also has modules for snowball crawling [up to 3 degrees of separation] as well as inter-actor crawling [finding links between seeds only]" ; a Digital Methods Initiative tool) (Instructions) (Auto-request a login)
- Compare Networks Over Time ("Compares IssueCrawler networks over time, and displays ranked actor lists")
- Extract URLs ("Extracts URLs from an Issuecrawler result file [.xml]; useful for retrieving starting points as well as a clean list of the actors in the network")
- Issue Geographer ("Geo-locates the organizations on an IssueCrawler map, using whois information, and visualizes the organizations' registered locations on a geographical map")
- Ranked Deep Pages from Core Issue Crawler Network ("Enter an IssueCrawler XML file and this script will get out all pages from the core network and rank those by pages by inlink count")
- Issue Discovery Tool ("Enter URLs, and discover the most relevant words and phrases contained in them. One also may enter text, or an Issuecrawler result file (.xml)"; a Digital Methods Initiative tool)
- iTunes Store Research Tool ("This tool queries http://itunes.apple.com/linkmaker/, retrieves all available results and outputs a csv file, as well as a gexf file [for visualization in Gephi] containing the relations between items in the iTunes stores and their categories"; a Digital Methods Initiative tool)
- Language Detection ("Detects language for given URLs. The first 500 characters on the Web page(s) are extracted, and the language of each page is detected"; a Digital Methods Initiative tool)
- LinkRipper ("Capture all internal links and/or outlinks from a page"; a Digital Methods Initiative tool)
- Lippmannian Device ("device ... named after Walter Lippmann [that] provides a coarse means of showing actor partisanship"; a Digital Methods Initiative tool)
- Lippmannian Device to Gephi ("visualize the output of the Lippmannian device as a network with Gephi"; a Digital Methods Initiative tool)
- Open Calais ("Discovers the most relevant words and phrases among a set of websites, within a text, or within an issue network"; a Digital Methods Initiative tool)
- Rip Sentences ("Enter a URL, and this script will split the text of the html page into sentences"; a Digital Methods Initiative tool)
- ProfileWords (creates word clouds visualizing frequent words in the profile bios of Twitter users and the last 25 tweets of their followers and those they follow)
- SentiStrength ("Automatic sentiment analysis of up to 16,000 social web texts per second with up to human level accuracy and 14 languages available - others easily added. SentiStrength estimates the strength of positive and negative sentiment in short texts, even for informal language")
- Source Code Search (loads a URL and searches for keywords + optional number of trailing characters in the page's source code [e.g., "cool" or with 5 trailing spaces, "cool cats"; a Digital Methods Initiative tool)
- StoryTracker ("tools for tracking stories on news homepages"; includes ability to identify and track changing locations of stories on news site pages)
- Table to Net ("Extract a network from a table. Set a column for nodes and a column for edges. It deals with multiple items per cell; by Médialab Sciences-Po")
- Text Ripper ("Rip all non-html (i.e. text) from a specified page"; a Digital Methods Initiative tool)
- Timestamp Ripper ("Rips and displays a web page's last modification date (using the page's HTML header). Beware of dynamically generated pages, where the date stamps will be the time of retrieval"; a Digital Methods Initiative tool)
- TLD (Top Level Domain) Counts ("Enter URL's, and count the top level domains"; a Digital Methods Initiative tool)
- Tracker Tracker ("The tool Tracker Tracker can be used to make (some parts of) the 'cloud' visible. The tool allows for the characterization of a set of websites or pages by detecting a set of 900+ predefined 'fingerprints' of cloud devices, including those that fall under the category of analytics, ad programs, widgets or social plugins, trackers, and privacy. Tracker Tracker may thus be used to gain an overall picture of detectable trackers or for a number of specified analytical purposes, such as social plugin detection, mapping 'power concentrations of the cloud' - mapping the political economy of the cloud, by looking at 'cloud technology'"; a Digital Methods Initiative tool)
- Triangulation ("Enter two or more lists of URLs or other items to discover commonalities among them. Possible visualizations include a Venn Diagram"; a Digital Methods Initiative tool)
- Twitter Advanced Search (Twitter's search interface for their complete index of historical tweets)
- Hashalyzer (creates reports of Twitter hashtag participants and tweets)
- MentionMap (online tool that shows an interactive social network graph of a Twitter user's mentions of other users; clicking on another user shows the mention map of that user
- Mohio Social (shows in node-and-link style a user and the user's sphere of tweets, mentioned links, etc., each of which is clickable to go to the original tweet or linked document)
- Twarc (Python "command line utility to archive Twitter search results as line-oriented-json")
- Twiangulate (search for people followed in common by two people--i.e., the intersection of the sets of two people's follow lists)
- Twitter Capture and Analysis Toolset (DMI-TCAT) (downloadable source code for tool that "captures tweets and allows for multiple analyses [hashtags, mentions, users, search, ...]." Due to Twitter's terms of service, the online version of this tool from the Digital Methods Initiative cannot be used by users unaffilliated with their program; but users may install the source code for themselves)
- Twitonomy (online analytics platform for detailed study of users' Twitter activity, followers, mentions, retweets, hashtags, links, etc.; access to tracking statistics requires paid subscription)
- Ernesto Priego, "Some Thoughts on Why You Would Like to Archive and Share [Small] Twitter Data Sets" (2014)
- Wikipedia (tools for studying Wikipedia)
- Wikipedia Cross-Lingual Image Analysis ("Insert a full Wikipedia URL ... and the tool will retrieve all language versions for the article. The tool will then scrape all the images of each language version and show them side by side in a table for comparison. The images retain the order in which they appear in the HTML"; a Digital Methods Initiative tool)
- Wikipedia Edits Scraper and IP Localizer ("The tool scrapes the complete edit history for a specific Wikipedia page. When the tool finds an IP address instead of a user name it will use Maxmind's GeoCity Lite database to resolve the IP address to a geo-location"; a Digital Methods Initiative tool)
- Wikipedia History Flow Companion ("The script chops Wikipedia edit histories in chronological chunks of 100 edits. It will display links which can be used to export those chunks from Wikipedia into IBM's History Flow Visualization"; a Digital Methods Initiative tool)
- Wikipedia TOC Scraper ("Scrape Table of Contents for revisions of a Wikipedia page and explore the results by moving a slider to browse across chronologically ordered TOCs"; a Digital Methods Initiative tool)
- Mapping Tools & Platforms (see Mapping Tutorials)
- BatchGeo ("create Google maps from your data [in spreadsheet format] ... accepts addresses, intersections, cities, states, and postal codes")
- CartoDB (online tools for visualizing and analyzing geospatial data; free plan includes up to 5 tables and 5Mb of data)
- Torque for CartoDB ("efficient, fast, and styleable rendering method" to animate data on an interactive map; "see how your data has grown, moved, or changed over time and space")
- ChartsBin (creates interactive maps)
- Clio (online tool that shows locations of historical interest in user's proximity; "Clio is an educational website and mobile application that guides the public to thousands of historical and cultural sites throughout the United States. Built by scholars for public benefit, each entry includes a concise summary and useful information about a historical site, museum, monument, landmark, or other site of cultural or historical significance. In addition, “time capsule” entries allow users to learn about historical events that occurred around them. Each entry offers turn-by-turn directions as well as links to relevant books, articles, videos, primary sources, and credible websites")
- Esri Story Maps: Storytelling with Maps ("Story maps combine intelligent Web maps with Web applications and templates that incorporate text, multimedia, and interactive functions")
- Flow Mapping with Graph Partitioning and Regionalization ("an integrated software tool to explore flow patterns in large spatial interaction data. It involves two packages: (1) GraphRECAP, which uses spatially constrained graph partitioning to find a hierarchy of natural regions defined by spatial interactions; and (2) FlowMap, which visualize flows based on the discovered regions and related attributes")
- GeoExtraction (extracts geographical location from text; a Digital Methods Initiative tool)
- Geo IP ("Translates URLs or IP addresses into geographical locations"; a Digital Methods Initiative tool)
- Google Fusion Tables: create a fusion table and use the map chart type to map data with geographical information: instructions.
- Google Earth
- Google Lit Trips (site unaffiliated with Google that provides "free downloadable files that mark the journeys of characters from famous literature on the surface of Google Earth. At each location along the journey there are placemarks with pop-up windows containing a variety of resources including relevant media, thought provoking discussion starters, and links to supplementary information about 'real world' references made in that particular portion of the story. The focus is on creating engaging and relevant literary experiences for students." Includes documentation about how to make lit trips.)
- Google Maps "My Maps" ("create and share maps of your world, marked with the locations, routes and regions of interest that matter to you")
- Map Stack ("Assemble a selection of different map layers like backgrounds, satellite imagery, terrain, roads or labels! Tweak Photoshop-like controls like colors, masks, opacity and brightness to make a map your own! Share your map with a link or Pinterest or Tumblr")
- MapStory (online platform for creating animated maps with "storylayers" of data "highlighting changes over time whether they be social, cultural or economic in nature")
- Neatline ("allows scholars, students, and curators to tell stories with maps and timelines. As a suite of add-on tools for Omeka, it opens new possibilities for hand-crafted, interactive spatial and temporal interpretation"; downloadable for installation on server)
- Odyssey (online tool that provides "a simple way for journalists, designers, and creators to weave interactive stories" based on a mapping paradigm; allows for mixing "written narrative, multimedia, and map based interaction")
- Power Map Preview for Excel (download) (tool from Microsoft Research Labs that allow users to generate from Excel spreadsheets map visualizations with geolocation, 2D and 3D data mapping, and interactive "video tours")
- QGIS (downloadable open source GIS system positioned as alternative to the industry-standard, institutionally-priced ArcGIS tools; "Create, edit, visualise, analyse and publish geospatial information on Windows, Mac, Linux, BSD")
- See the Geospatial Historian tutorial lessons on using QGIS for historical and other GIS mapping work.
- StoryMapJS ("free tool to help you tell stories on the web that highlight the locations of a series of events; ... you can use StoryMapJS to tell a story with photographs, works of art, historic maps, and other image files. Because it works best with very large images, we call these 'gigapixel' StoryMaps"; requires a Google account and uses Google Drive as repository for user-provided photos to be shown on maps)
- Thematic Mapping Engine (TME) ("enables you to visualise global statistics on Google Earth. The primary data source is UNdata. The engine returns a KMZ file that you can open in Google Earth or download to your computer")
- Timemap ("Javascript library to help use online maps, including Google, OpenLayers, and Bing, with a SIMILE timeline. The library allows you to load one or more datasets in JSON, KML, or GeoRSS onto both a map and a timeline simultaneously")
- TimeMapper ("Elegant timelines and maps created in seconds")
- WorldMap (open source platform "to lower barriers for scholars who wish to explore, visualize, edit, collaborate with, and publish geospatial information. WorldMap is Open Source software.... provides researchers with the ability to: upload large datasets and overlay them up with thousands of other layers; create and edit maps and link map features to rich media content; share edit or view access with small or large groups; export data to standard formats; make use of powerful online cartographic tools; georeference paper maps online...; publish one’s data to the world or to just a few collaborators")
- Mind-Mapping Tools (Conceptualization Tools)
- DebateGraph (collaborative mindmapping platform that allows individuals or groups to: facilitate group dialogue, make shared decisions, report on conferences, make and share posters, tell non-linear stories, explore the connections between subjects, etc.)
- Network Analysis / Social Network Analysis Tools (see Network Analysis Tutorials) (see also Internet Research Tools, Tools for Studying Twitter, and Network Visualization Tools)
- Gephi ("interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs")
- Google Fusion Tables
- Jigsaw (downloadable platform for importing a variety of unstructured or structured documents; identifying entities (people, organizations, locations, dates, etc.); and visualizing relations between entities. Designed for investigating networks and clusters of relations implicit in large numbers of documents. Video tutorials | Instruction manual)
- Local Wikipedia Map (online tool for visualizing networks of Wikipedia articles by choosing topics, filtering the resulting nodes [articles], and downloading and sharing the visualization; allows live access to the articles at nodes) (detailed instructions)
- Netlytic ("cloud-based text and social networks analyzer that can automatically summarize large volumes of text and discover social networks from online conversations on social media sites such as Twitter, Youtube, blogs, online forums and chats")
- Personae - A Character Visualization Tool for Dramatic Texts ("The aim of these visualisations is to use the XML files from the New Variorum Shakespeare edition of The Comedy of Errors to create a resource for exploring patterns of speeches by and mentions of characters in Shakespeare's work. Visualising the frequency, extent, and position of dialogue relating to a particular character presents users with a simple and immediate measure of that character’s prominence within the play. The tool enables users to select and visualise individual characters’ involvement, producing a novel means of exploring large-scale structural, narrative, or character-focused patterns within the text") (Github repository for the tool's code)
- ProfileWords (creates word clouds visualizing frequent words in the profile bios of Twitter users and the last 25 tweets of their followers and those they follow)
- TAGS v5.0 (Twitter Archiving Google Spreadsheet)
- Twitter Analysis (see Tools for Studying Twitter in Internet Research Tools section of this page)
- UCINet for Windows ("software package for the analysis of social network data"; free trial for 90 days; discounted pricing for students & faculty)
- Programming Languages Tools & Resources (programming/scripting languages and major program toolkits/packages used to facilitate text and data analysis, collection, preparation, etc.) (see Programming Languages Tutorials)
- Python Tools & Resources (see Python Tutorials)
- Python ("a clear and powerful object-oriented programming language" often used for text and data wrangling)
- Beautiful Soup ("Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work")
- IPython Notebook ("web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document.... These notebooks are normal files that can be shared with colleagues, converted to other formats such as HTML or PDF, etc. You can share any publicly available notebook by using the IPython Notebook Viewer service which will render it as a static web page")
- "R" Tools & Resources (see "R" Tutorials)
- "R" ( R Project for Statistical Computing) ("language and environment for statistical computing and graphics.... provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ... and graphical techniques, and is highly extensible")
- rOpenSci (workflow environment based on R that is designed for scientists but may be useful for other scholars working with processing and narrating data. "Use our packages to acquire data (both your own and from various data sources), analyze it, add in your narrative, and generate a final publication in any one of widely used formats such as Word, PDF, or LaTeX"; packages that allow access to data repositories through the R statistical programming environment [and] facilitate drawing data into an environment where it can readily be manipulated"; "analyses and methods can be easily shared, replicated, and extended by other researchers")
- Stylo for R (computational stylistics methods implemented as R package; see how-to article [PDF]; warning: requires advanced knowledge)
- FACTORIE ("toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating factor graphs, estimating parameters and performing inference")
- Web Browser Automation Tools (can be used for scripting web-scraping)
- Google Chrome Scraper ("highlight a part of the webpage you'd like to scrape, right-click and choose "Scrape similar...." Anything that's similar to what you highlighted will be rendered in a table ready for export, compatible with Google Docs")
- Selenium (browser-automation tool that can be used to create scripts and other automation for web-scraping)
- Mashup Tools
- Yahoo Pipes ("composition tool to aggregate, manipulate, and mashup content from around the web.... Simple commands can be combined together to create output that meets your needs: combine many feeds into one, then sort, filter and translate it; geocode your favorite feeds and browse the items on an interactive map....")
- Simulation Tools & Platforms (see Simulation Tutorials)
- NetLogo (downloadable software for agent-based simulations: "NetLogo is a programmable modeling environment for simulating natural and social phenomena. . . . NetLogo is particularly well suited for modeling complex systems developing over time. Modelers can give instructions to hundreds or thousands of independent 'agents' all operating concurrently. This makes it possible to explore the connection between the micro-level behavior of individuals and the macro-level patterns that emerge from the interaction of many individuals. NetLogo lets students open simulations and 'play' with them, exploring their behavior under various conditions. It is also an authoring environment which enables students, teachers and curriculum developers to create their own models. NetLogo is simple enough that students and teachers can easily run simulations or even build their own. And, it is advanced enough to serve as a powerful tool for researchers in many fields. NetLogo has extensive documentation and tutorials. It also comes with a Models Library, which is a large collection of pre-written simulations that can be used and modified. These simulations address many content areas in the natural and social sciences, including biology and medicine, physics and chemistry, mathematics and computer science, and economics and social psychology")
- Second Life (general-purpose, Internet-based, immersive, 3D, and highly scalable (massively multi-user) "virtual world" where users can create an avatar, create richly rendered spaces and objects, and interact with each other as well as with various media sources)
- SET (Simulated Environment for Theatre) ("3D environment for reading, exploring, and directing plays. Designed and developed by a multidisciplinary team of researchers, SET uses the Unity game engine to allow users to both author and playback digital theatrical productions")
- Text Analysis Tools (complemented by tools for Text Preparation for Digital Work, Topic Modeling Tools, and Text Visualization Tools below; see also the TAPoR 2 portal of text-analysis tools for an omnibus listing with reviews, ratings, difficulty levels, etc.) (see Text Analysis Tutorials). For some text-analysis tools, stop word lists are useful (lists of common words to ignore). Two common English-language stop lists are: Fox 1992 stop word list (429 words) | SMART 1971 stop word list (571 words)
- AntConc ("concordance program developed by Prof. Laurence Anthony," with versions for Windows, Mac & Linux; site includes video tutorials)
- Bookworm ("Search for trends in 4.6M public domain texts from HathiTrust Digital Library")
- Related Bookworm sites and interfaces:
- Various Bookworms: (interface for Google Ngram visualization of trends in a select number of corpora: Open Library books; ArXiV science puplications; Chronicling America historical newspapers; US Congress bills, amendments, and resolutions; Social Science Research Network research paper abstracts)
- Ben Schmidt,
- In-browser Text Classification Using Bookworm ("This page automatically classifies a snippet of text (pasted into the text area below) against a bookworm database, so you can see how any given snippet lines up with the metadata you've defined for a collection.... If you have a Bookworm installation of your own, you can easily modify the code here to classify by whatever text variables you might have on hand")
- Bookworm: Movies ("Search for trends in the dialogue of thousands of movie and TV shows, based on subtitles from Open Subtitles")
- Bookworm: Simpsons ("Search across every word from 25 years of the Simpsons (at least, the ones that made it into closed captions) by episode, season, or even time within in the episode") (How the Simpsons bookworm was made)
- FAQ and Guide to Making Bookworms (by Ben Schmidt)
- Also see the following on the nature and limitations of the underlying Hathi trust corpus for Bookworm: David Mimno, "Word Counting, Squared" (2014).
- CLAWS ("grammatical tagger that analyzes words in a text by part of speech. Based on the approximately 10 million words of the British National Corpus")
- Concordance Programs (see Concordance Program Tutorials)
- Corpus Linguistics Programs/Resources (see Corpus Linguistics Tutorials) (see also Corpora sets in Data Collections & Datasets)
- U. Portsmouth, Online Corpus Linguistics Resources (including tools)
- Wmatrix (corpus analysis and comparison tool providing "a web interface to the USAS and CLAWS corpus annotation tools, and standard corpus linguistic methodologies such as frequency lists and concordances. It also extends the keywords method to key grammatical categories and key semantic domains"; 1-month free trial)
- WordSimilarity (also known as Word 2 Word) (downloadable, java-based "open-source tool to plot and visualize semantic spaces, allowing researchers to rapidly explore patterns in visual data representative of statistical relations between words. Words are visualized as nodes and word similarities as directed edges of varying strengths or thicknesses.... system contains a large library of ready to use, modern, statistical relationship models along with an interface to teach them from various language sources"
- DataBasic ("a suite of easy-to-use web tools for beginners that introduce concepts of working with data . . . WordCounter analyzes your text and tells you the most common words and phrases. . . . WTFcsv tells you WTF is going on with your .csv file. . . . SameDiff compares two or more text files and tells you how similar or different they are")
- DPLA (Digital Public Library of America) Visual Search Prototype ("prototype visual search interface that explores content from the Digital Public Library of America. It is designed to provide an 'at-a-glance' visual overview of search results, and an intuitive means of narrowing the scope of the search")
- Google Ngram Viewer (search for and visualize trends of words and phrases in the Google Books corpus; includes ability to focus on parts of the corpus [e.g., "American English," "English Fiction"] and to use a variety of Boolean and other search operators); see the related article: Jean-Baptiste Michel, Erez Lieberman Aiden, et al., "Quantitative Analysis of Culture Using Millions of Digitized Books" (2011)
- See also: Bookworm (interface for Google Ngram visualization of trends in a select number of corpora: Open Library books; ArXiV science puplications; Chronicling America historical newspapers; US Congress bills, amendments, and resolutions; Social Science Research Network research paper abstracts
- HathiTrust Research Center (HTRC) Portal (allows registered users to search the HathiTrust's ~3 million public domain works, create collections, upload worksets of datra in CSV format, and perform algorithmic analysis -- e.g., word clouds, semantic analysis, topic modeling) (Sign-up for login to HTRC portal; parts of the search and analysis platform requiring institutional membershkp also require a userid for the user's university)
- Features Extracted From the HTRC ("A great deal of fruitful research can be performed using non-consumptive pre-extracted features. For this reason, HTRC has put together a select set of page-level features extracted from the HathiTrust's non-Google-digitized public domain volumes. The source texts for this set of feature files are primarily in English. Features are notable or informative characteristics of the text. We have processed a number of useful features, including part-of-speech tagged token counts, header and footer identification, and various line-level information. This is all provided per-page.... The primary pre-calculated feature that we are providing is the token (unigram) count, on a per-page basis"; data is returned in JSON format)
- Tools and Tutorials related to using the HathiTrust Research Center:
- Peter Organisciak and Boris Capitanu (in Programming Historian), "Text Mining in Python through the HTRC Feature Reader" (2016) )("We introduce a toolkit for working with the 13.6 million volume Extracted Features Dataset from the HathiTrust Research Center. You will learn how to peer at the words and trends of any book in the collection, while developing broadly useful Python data analysis skills")
- IBM Watson User Modeling service ("uses linguistic analytics to extract cognitive and social characteristics, including Big Five, Values, and Needs, from communications that the user makes available, such as email, text messages, tweets, forum posts, and more; online demo site allows users to input text samples for analysis)
- Lexos - Integrated Lexomics Workflow ("online tool ... to "scrub" (clean) your text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, and choose from a suite of analysis tools for investigating those texts. Functionality includes building dendrograms, making graphs of rolling averages of word frequencies or ratios of words or letters, and playing with visualizations of word frequencies including word clouds and bubble visualizations")
- Macro-Etymological Analyzer (program by Jonathan Reeve that runs a frequency analysis of plain-text documents, looking up each word using the Etymological Wordnet, and tallying the words according to origin language family)
- Named Entity Recognition (NER) Tools (see NER Tutorials)
- NEX - Named Entity eXtraction (Web tool from dataTXT to identify names, concepts, etc. in short texts; also allows API access)
- Stanford Named Entity Recognizer (NER) ("a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances")
- New York Times "Chronicle" (use an interface similar to Google Books Ngram Viewer to explore the rise and fall in frequencies of words/phrases published in the New York Times. Instructions: first "clear graph"; then add one word or phrase to the graph at a time whose frequency you are interested in)
- OpenCalais ("The OpenCalais Web Service allows you to automatically annotate your content with rich semantic metadata, including entities such as people and companies and events and facts such as acquisitions and management changes")
- Overview (open-source web-based tool designed originally for journalists needing to sort large numbers of stories automatically and cluster them by subject/topic; includes visualization and reading interface; allows for import of documents in PDF and other formats. "Overview has been used to analyze emails, declassified document dumps, material from Wikileaks releases, social media posts, online comments, and more." Can also be installed on one's own server.)
- Personal-Nouns (Python scripts by Cory A. Taylor for generating list of "personal nouns" found in a text--i.e., nouns applying to persons such as "conscript, consecrater, conservator, consignee, consigner," etc. The GitHub site includes a list of personal nouns generated from the 1890 Webster's Unabridged Dictionary)
- Poem Viewer ("web-based tool for visualizing poems in support of close reading")
- Prospero ([documentation in French] text-analysis suite designed for humanists working with from historical and diachronic textual series; focused on exploring "complex cases")
- Prosodic ("a python script which performs two main functions: 1. annotating English and Finnish text for their phonological properties; 2. evaluating the relative metricality of lines of English and Finnish text")
- Prospect ("a sophisticated web-app implemented as a plugin for WordPress that enables users to collect and curate data and then enable the wider public to visualize and access that data. The graphical representation of data – whether it be geographical information shown on maps, temporal data shown on timelines, interpersonal relationships shown as connected graphs, etc. – can facilitate end-users in comprehending it quickly and analyzing it in domain-specific ways") (more detailed "About" page)
- Robots Reading Vogue (online tools from Digital Humanities at Yale University Library for datamining the archives of Vogue magazine; includes covermetrics, n-gram search, topic-modeling, and statistics for advertisements, circulation, etc.)
- Vogue N-gram Search (use an interface similar to Google Books Ngram Viewer to explore the rise and fall in frequencies of words/phrases published in the Vogue magazine)
- Sentiment Analysis
(Useful cautionary critique of sentiment analysis: Sarah Kessler, "The Problem With Sentiment Analysis" (2014))
- Sentiment Analysis (interactive demo plus information and research paper for the analysis of degrees of positive/negative "sentiment" in text passages based on an extensive "sentiment bank"; site includes downloadable dataset and code)
- Sentiment140 ("allows you to discover the sentiment of a brand, product, or topic on Twitter"; "Our approach is different from other sentiment analysis sites because: we use classifiers built from machine learning algorithms. Some other sites use a simpler keyword-based approach, which may have higher precision, but lower recall. We are transparent in how we classify individual tweets. Other sites do not show you the classification of individual tweets and only show aggregated numbers, which makes it difficult to assess how accurate their classifiers are.")
- Umigon ("sentiment analysis for tweets, and more")
- Signature Stylometric System ("program designed to facilitate "stylometric" analysis and comparison of texts, with a particular emphasis on author identification")
- TaPOR (Text Analysis Portal) (collection of online text-analysis tools--ranging from the basic to sophisticated)
- TaPOR 2.0 (current, redesigned TAPoR portal; includes tool descriptions and reviews; also includes documentation of some historical or legacy tools)
- Statistical Natural Language Parsers (Probabalistic Grammar / Syntax Parsers) (about statistical parsing)
- Stylo for R (computational stylistics methods implemented as R package; see how-to article [PDF]; warning: requires advanced knowledge)
- Text Mechanic ("A suite of simple, single task, browser based, text manipulation tools"--e.g., working with lines, words, spaces, etc. in texts)
- Textometrica (web-based text analysis tool designed to "analyse large amounts of text in several different ways. For example, you can examine the frequency of individual words, see how often one term is linked to another, and see which words together form ideas and concepts in the text. Users can also create different visualisations and graphs from their text in order to gain a better overview of the structure of the text")
- Textal ("free smartphone app that allows you to analyze websites, tweet streams, and documents, as you explore the relationships between words in the text via an intuitive word cloud interface. You can generate graphs and statics, as well as share the data and visualizations in any way you like")
- TextPlot ("Texplot is a little program that turns a document into a network of terms that are connected to each other depending on the extent to which they appear in the same locations in the text")
- twXplorer (online service that provides search tools for Twitter tweets, terms, links, and hashtags in relation to each other; provides a first-pass analytical view of a tweet or term, for example, in its relevant context)
- TXM (Textométrie) ("The TXM platform combines powerful and original techniques for the analysis of large text corpora using modular components and open-source.... Helps users to build and analyze any type of digital textual corpus possibly labeled and structured in XML... Distributed as a Windows, Linux or Mac software application ... and as an online portal run by a web application")
- VariAnt ("A freeware spelling variant analysis program for Windows" -- scroll down on this page for the download links)
- Voyant Tools (Online text reading and analysis environment with multiple capabilities that presents statistics, concordance views, visualiztions, and other analytical perspectives on texts in a dashboard-like interface. Works plain text, HTML, XML, PDF, RTF, and MS Word files (multiple files best uploaded as a zip file). Also comes with two pre-loaded sets of texts to work on (Shakespeare's works and the Humanist List archives [click the “Open” button on the main page to see these sets])
- Word and Phrase.info (powerful tool that allows users to match texts they enter against the 450-million word Corpus of Contemporary American English [COCA] to analyze their text by word frequencies, word lists, collocates, concordance, and related phrases in COCA)
- Word2Vec ("deep-learning" neural network analysis tool from Google that seeks out relationships (vectors) between words in texts)
- Explanations and discussions of the tool:
- WordHoard ("Powerful text-analysis tool for a select group of "highly canonical literary texts"--currently, all of early Greek epic (in original and translation), all of Chaucer and Shakespeare, and Edmund Spenser's Faerie Queene and Shepheardes Calendar"
- Word Map (enter a word and visualize on a map its relation to equivalent words in different languages and nations around the world; "this experiment brings together the power of Google Translate and the collective knowledge of Wikipedia to put into context the relationship between language and geographical space")
- WordSeer ("web-based text analysis and sensemaking environment for humanists and social scientists") (for full discussion of the site, see Aditi Muralidharan and Marti A. Hearst, "Supporting Exploratory Text Analysis in Literature Study," 2013 ) [paywalled])
- Word Tree (generate word trees like those originally created for the ManyEyes visualization site from pasted-in text or from URL; example)
- WordWanderer ("We are experimenting with visual ways in which we can enhance people's engagement with language. By fusing the information we can obtain from corpus searches, concordance outputs and word clouds we are aiming to enable and encourage people to notice and wander through the words they read, write and speak")
- Text Collation Tools (see Text Collation Tutorials)
- Juxta Commons ("a tool that allows you to compare and collate versions of the same textual work")
- TRAViz (a JavaScript library that "generates visualizations for Text Variant Graphs that show the variations between different editions of texts. TRAViz supports the collation task by providing methods to: align various editions of a text; visualize the alignment; improve the readability for Text Variant Graphs compared to other approaches; interact with the graph to discover how individual editions disseminate")
- Versioning Machine, version 4.0 ("a framework and an interface for displaying multiple versions of text encoded according to the Text Encoding Initiative (TEI) Guidelines")
- Visualizing Variation ("code library of free, open-source, browser-based visualization prototypes that textual scholars can use in digital editions, online exhibitions, born-digital articles, and other projects. All of the visualization prototypes offered here deal with different aspects of the bibliographical phenomenon of textual variation: the tendency of words, lines, passages, images, prefatory material, and other aspects of texts to change from one edition to the next, and even between supposedly identical copies of the same edition. Variants are material reminders of the complex social lives of texts")
- VVV (Version Variation Visualization) ("explore great works with their world-wide translations")
- Text Encoding Tools (see Text Encoding Tutorials)
- TEI Tools (tools page from the Text Encoding Initiative; includes tools for generate TEI schemas, convert to and from TEI documents, and stylesheets for converting TEI documents to HTML and other formats)
- Music Encoding Initiative (MEI) (XML schemas and downloadable editing tool for text-encoding of music notation documents)
- OpenCalais ("The OpenCalais Web Service allows you to automatically annotate your content with rich semantic metadata, including entities such as people and companies and events and facts such as acquisitions and management changes")
- Oxygen XML Editor (free 30-day trial period)
- XMLSpy (free 30-day trial period)
- Text Preparation for Digital Work (Text & Data "Wrangling" Tools for Harvesting, Scraping, Cleaning, Classifying, etc.) (see Text Preparation Tutorials) (see also Programming Languages Tools & Resources to facilitate wrangling)
- The Sourcecaster (set of instructions for using the command line to perform common text preparation tasks--e.g., conversion of text or media formats, wrangling and cleaning text, batch filename editing, etc.)
- AntFileConverter ("freeware tool to convert PDF files into plain text for use in corpus tools like AntConc" -- scroll down for the download links on this page)
- BookNLP ("natural language processing pipeline that scales to books and other long documents in English, including: part-of-speech tagging ... dependency parsing ... named entity recognition ... character name clustering ... quotation speaker identification ... pronominal coreference resolution)
- CSVkit ("suite of utilities for converting to and working with CSV, the king of tabular file formats")
- Data Science Tool Kit (variety of tools for such purposes as converting/mapping street address to geographical coordinates, coordinates to political areas, coordinates to statistics, IP address to coordinates, text to sentences [i.e., removing boilerplate from text passages], text to sentiment, HTML to text, HTML to story, text to people, and files to text [e.g., PDF, Word docs, and Excel spreadsheets to text])
- DataWrangler ("interactive tool for data cleaning and transformation"; suggests and facilitates restructuring, extraction, deletion, and other transformations of tabular and other structured data)
- Import.io ("Turn any website into a table of data or an API in minutes without writing any code")
- Jeroen Janssens, "7 Command-line Tools for Data Science" (2013) (tools for "obtaining, scrubbing, exploring, modeling, and interpreting data")
- Lexos - Integrated Lexomics Workflow ("online tool ... to "scrub" (clean) your text(s), cut a text(s) into various size chunks, manage chunks and chunk sets, and choose from a suite of analysis tools for investigating those texts. Functionality includes building dendrograms, making graphs of rolling averages of word frequencies or ratios of words or letters, and playing with visualizations of word frequencies including word clouds and bubble visualizations")
- NameChanger ("Rename a list of files quickly and easily. See how the names will change as you type")
- OpenRefine ("tool for working with messy data, cleaning it, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase")
- OutWit Hub (standalone program or Firefox extension for data extraction using Firefox: "contents extracted from a Web page are presented in an easy and visual way, without requiring any programming skills or advanced technical knowledge. Users can easily extract links, images, email addresses, RSS news, data tables, etc. from series of pages without ever seeing the source code. Extracted data can be exported to CSV, HTML, Excel or SQL databases, while images and documents, are directly saved to your hard disk"; paid "pro" version has more capabilities and capacity)
- Overview ("automatically sorts thousands of documents into topics and sub-topics, by reading the full text of each one")
- Pandoc ("If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, OPML, Emacs Org-Mode, or Haddock markup to HTML formats: XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides; Word processor formats: Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML; Ebooks: EPUB version 2 or 3, FictionBook2; Documentation formats: DocBook, GNU TexInfo, Groff man pages, Haddock markup; Page layout formats: InDesign ICML; Outline formats: OPML; TeX formats: LaTeX, ConTeXt, LaTeX Beamer slides; PDF via LaTeX; Lightweight markup formats: Markdown, reStructuredText, AsciiDoc, MediaWiki markup, Emacs Org-Mode, Textile...")
- pdf2htmlEX ("renders PDF files in HTML, utilizing modern Web technologies. It aims to provide an accurate rendering, while keeping optimized for Web display")
- PhoTransEdit ("Text to Phonetics online transcriber for turning English text into phonetic transcription using IPA symbols; also has free downloadable version)
- Rip Sentences ("Enter a URL, and this script will split the text of the html page into sentences"; a Digital Methods Initiative tool)
- Scan Tailor (" interactive tool for post-processing of scanned pages. It gives the ability to cut or crop pages, compensate for skew angle, and add / delete content fields and margins, among others. You begin with raw scans, and end up with tiff's that are ready for printing or assembly in PDF or DjVu file")
- Scraper ("simple data mining extension for Google Chrome"; "to use it: highlight a part of the webpage you'd like to scrape, right-click and choose "Scrape similar...". Anything that's similar to what you highlighted will be rendered in a table ready for export, compatible with Google Docs")
- ScraperWiki (free tools for scraping from Twitter and table extraction from PDF's)
- ScraperWiki Classic (archive of user-created scraping tools for specific purposes and resources; includes resources and tutorials for creating your own scraper)
- Scrapy (downloadable Python-based tool for "fast high-level screen scraping and web crawling framework, used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing")
- Source Code Search (loads a URL and searches for keywords + optional number of trailing characters in the page's source code [e.g., "cool" or with 5 trailing spaces, "cool cats"; a Digital Methods Initiative tool)
- Stanford Named Entity Recognizer (NER) ("a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances")
- TET Plugin (Plugin for Adobe Acrobat designed to extract text from PDFs)
- Text Ripper ("Rip all non-html (i.e. text) from a specified page"; a Digital Methods Initiative tool)
- VARD 2 ("software produced in Java designed to assist users of historical corpora in dealing with spelling variation, particularly in Early Modern English texts. The tool is intended to be a pre-processor to other corpus linguistic tools such as keyword analysis, collocations," etc.)
- Text Preparation "Recipes" for Topic Modeling Work:
- Matthew Jockers
- "'Secret' Recipe for Topic Modeling Themes" (guidance on creating stop lists, using parts-of-speech taggers to filter text, and "chunking" texts into suitable-length sections to optimize topic-modeling results)
- "Expanded Stopwords List" ("Below is the list of stop words I used in topic modeling a corpus of 3,346 works of 19th-century British, American, and Irish fiction. The list includes the usual high frequency words (“the,” “of,” “an,” etc) but also several thousand personal names.")
- Andrew Goldstone & Ted Underwood, "Code Used ... in Analyzing Topic Models of Literary-studies Journals" (GitHub repository of stoplist, code, and resources for Goldstone and Underwood's topic modeling project)
- Topic Modeling Tools (complemented by Text Preparation "Recipes" for Topic Modeling Work above) (see Topic Modeling Tutorials)
- DFR-Browser (browser-based visualization interface created by Andrew Goldstone for exploring JSTOR articles [facilitated by the JSTOR "Data for Research" (DFR) site through topic-modeling)
- FACTORIE ("toolkit for deployable probabilistic modeling, implemented as a software library in Scala. It provides its users with a succinct language for creating factor graphs, estimating parameters and performing inference")
- Gensim ("free Python library: scalable statistical semantics, analyze plain-text documents for semantic structure, retrieve semantically similar documents")
- Glimmer.rstudio.com Topic Modeling (LDA) visualization tool (allows users to upload their own data to generate scatterplots and bar charts)
- In-Browser Topic Modeling ("Many people have found topic modeling a useful (and fun!) way to explore large text collections. Unfortunately, running your own models usually requires installing statistical tools like R or Mallet. The goals of this project are to (a) make running topic models easy for anyone with a modern web browser, (b) explore the limits of statistical computing in Javascript and (c) allow tighter integration between models and web-based visualizations"; by David Mimno.) Note: the files for this tool can be downloaded and run locally; download from GitHub here.
- LDAvis ("R package for interactive topic model visualization") (example of use)
- MALLET
- Mallet (MAchine Learning for LanguagE Toolkit)
- MALLET-to-Gephi Data Stacker (online tool that takes "the '--output-doc-topics' output from MALLET and reorganize it into a format that Gephi understands")
- The Networked Corpus ("a Python script that generates a collection of Web pages like the ones we have created for <em>The Spectator</em>.... designed to work with MALLET." The Networked Corpus project "provides a new way to navigate large collections of texts. Using a statistical method called topic modeling, it creates links between passages that share common vocabularies, while also showing in detail the way in which the topic modeling program has “read” the texts. ")
- Stanford Topic Modeling Toolbox ("brings topic modeling tools to social scientists and others who wish to perform analysis on datasets that have a substantial textual component. The toolbox features that ability to: * Import and manipulate text from cells in Excel and other spreadsheets; * Train topic models (LDA, Labeled LDA, and PLDA new) to create summaries of the text; * Select parameters (such as the number of topics) via a data-driven process; * Generate rich Excel-compatible outputs for tracking word usage across topics, time, and other groupings of data")
- TMVE ("basic implementation of a topic model visualization engine")
- Topic Modeling Tool (Java-based "graphical user interface tool for Latent Dirichlet Allocation topic modeling" by David Newman; comes with test input files [look in "Downloads" tab on site]. Input files should be in .txt files saved in same directory; the input files are formatted with returns between each separate document) (Note: this latest implementation of the Topic Modeling Tool is by Scott Enderle).
- "Two Topic Browsers" by Jonathan Goodwin
- Video Tools
- Video & Film Analysis Tools
- Cinemetrics (Frederic Brodbeck's project for "measuring and visualizing movie data, in order to reveal the characteristics of films and to create a visual 'fingerprint' for them. Information such as the editing structure, color, speech or motion are extracted, analyzed and transformed into graphic representations so that movies can be seen as a whole and easily interpreted or compared side by side"; includes downloadable code for Python script tools used to create the metrics)
- ClipNotes ("designed for use with any film or video in which you want to quickly and easily retrieve selected segments and display them along with your notes or annotations.... you must first prepare an XML file which contains the starting and stopping times of the segments you wish to access, together with a caption to appear on a list, and any description or annotation you want to display along with the clip. Preparing an XML file is a remarkably easy procedure"; currently available as app for iOS and Windows 8.1, with Android app coming)
- Film Impact Rating tool (provides a ranking of a film's impact based on a number of factors, including numbers of screenings, venues, receipts, review ratings, awards, etc. While designed for Australian film, the tool can be used for other films.)
- Kinomatics Project ("collects, explores, analyses and represents data about the creative industries.... Current focus is on the spatial and temporal dimensions of international film flow and the location of Australian live music gigs"; also includes visualizations and tools for film impact rating)
- YouTube Tools
- YouTube Data Tools ("collection of simple tools for extracting data from the YouTube platform via the YouTube API v3. For some context and a small introduction, please check out this blog post. . . there is a FAQ section with additional information, and an introductory video")
- Thomas Padilla, "YouTube Data for Research" (includes tutorial, suggested command-line tools, and use case)
- Visualization Tools (see Visualization Tutorials)
- General or Multiple Purpose Viz Tools:
- Better World Flux ("beautiful interactive visualization of information on what really matters in life. Indicators like happiness, life expectancy, and years of schooling are meaningfully displayed in a colourful flowing Flux.... visually communicates the world state in terms of standards of living and quality of life for many countries and how this has changed, and mostly improved, over a period of up to 50 years. This site is a tool for building a consensus, telling a story and sharing it, all whilst raising awareness for the UN Millennium Development Goals.")
- Bonsai (tool for programmatic creation of simple animated graphics in a Web browser using "graphics library which includes an intuitive graphics API and an SVG renderer")
- Chart and Image Gallery: 30+ Free Tools for Data Visualization and Analysis (gathering of tools by Sharon Machlis)
- Circos ("software package for visualizing data and information ... in a circular layout ... ideal for exploring relationships between objects or positions")
- D3.js ("a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS")
- GapMinder World (online or desktop data/statistics animation)
- Gephi ("interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs")
- Google Fusion Tables (Google's "experimental data visualization web application to gather, visualize, and share larger data tables. Visualize bigger table data online; Filter and summarize across hundreds of thousands of rows. Then try a chart, map, network graph, or custom layout and embed or share it")
- ImageJ (image processing program that can create composite or average images)
- ImagePlot ("free software tool that visualizes collections of images and video of any size.... implemented as a macro which works with the open source image processing program ImageJ")
- OpenHeatMap (creates "heat map" visualizations from spreadsheets)
- Palladio (data visualization tool designed for humanities work; "web-based app that allows you to upload, visualize, and filter your data on-the-fly")
- Pixlr Editor (highly-capable, free, Photoshop-like photoeditor program that runs entirely in Flash in a browser; allows users to import images from local on online sources, edit, resize, crop, adjust image features, apply filters, etc.; Pixlr also has versions for mobile devices)
- Processing ("Processing is a simple programming environment that was created to make it easier to develop visually oriented applications with an emphasis on animation and providing users with instant feedback through interaction. The developers wanted a means to “sketch” ideas in code. As its capabilities have expanded over the past decade, Processing has come to be used for more advanced production-level work in addition to its sketching role. Originally built as a domain-specific extension to Java targeted towards artists and designers, Processing has evolved into a full-blown design and prototyping tool used for large-scale installation work, motion graphics, and complex data visualization")
- Prospect ("a sophisticated web-app implemented as a plugin for WordPress that enables users to collect and curate data and then enable the wider public to visualize and access that data. The graphical representation of data – whether it be geographical information shown on maps, temporal data shown on timelines, interpersonal relationships shown as connected graphs, etc. – can facilitate end-users in comprehending it quickly and analyzing it in domain-specific ways") (more detailed "About" page)
- "R" ( R Project for Statistical Computing) ("language and environment for statistical computing and graphics.... provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ... and graphical techniques, and is highly extensible")
- RAW (open web app that allows users to use a simple interface to upload data from a spreadsheet, choose and configure a vector graphics visualization, and export the results; built on top of the D3.js library)
- Silk (online data visualization and exhibition platform; takes datasets input as spreadsheets and allows users to create collections, maps, graphs, etc.)
- Tableau Public ("within minutes, our free data visualization tool can help you create an interactive viz and embed it in your website or share it")
- Viewshare ("free platform for generating and customizing views (interactive maps, timelines, facets, tag clouds) that allow users to experience your digital collections")
- Visualize Free (online visualization platform that allows uploading of datasets for multiple styles of graph-style visualization; "free visual analysis tool based on the advanced commercial dashboard and visualization software developed by InetSoft")
- VisualSense ("interactive visualization and analysis tool ... developed for textual and numerical data extracted from image analysis of images from different cultures and influences")
- WiGis ("visualization of large-scale, highly interactive graphs in a user's web browser. Our software is delivered natively in your web browser and does not require any plug-ins or add-ons. Our method produces clean, smooth animation in a browser through asynchronous data transfer (AJAX), and access to rich server side resources without the need for technologies such as Flash, Java Applets, Flex or Silverlight. We believe that our new techniques have broad reaching potential across the web")
- yEd ("downloadable and online diagramming tools. Functions include the automatic layout of networks and diagrams: "the yFiles library offers the user many advantages, one of which is its ability to automatically draw networks and diagrams. yFiles layout algorithms enable the clear presentation of flow charts, UML diagrams, organization charts, genealogies, business process diagrams, etc.")
- Diagramming & Graphing Tools:
- aiSee Graph Visualization ("graphing program for Windows, Mac OS X, and Linux")
- Gliffy (online diagramming and flow-charting
- inzight (downloadable tool for Windows, Mac, Linus; "intelligently draws the appropriate graph depending on the variables you choose"; " automatically detects the variable type as either numeric or categorical, and draws a dot plot, scatter plot, or bar chart.")
- yEd ("downloadable and online diagramming tools. Functions include the automatic layout of networks and diagrams: "the yFiles library offers the user many advantages, one of which is its ability to automatically draw networks and diagrams. yFiles layout algorithms enable the clear presentation of flow charts, UML diagrams, organization charts, genealogies, business process diagrams, etc.")
- Image Tools:
- 123D Catch (Autodesk's free phone app for creating 3D scans from photos taken of objects. 3D scans are created by taking many photos of an object from multiple sides and angles, then uploading to Autodesk for processing. Download the as phone app for Android, iOS; complemented by additional online and Windows software for editing and for 3D printing)
- GIMP (powerful, free software for photo and image editing; runs on Macs, Windows, Linux, and other platforms)
- Pixlr (Autodesk's online or offline image and photo editiing tool; Photoshop-like)
- Infographics Tools:
- Network Visualization Tools (see also General or Multiple Purpose Viz Tools and Network Analysis/Social Network Analysis) (see Network Visualization Tutorials)
- D3.js ("a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS")
- Gephi ("interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs")
- MALLET-to-Gephi Data Stacker (online tool that takes "the '--output-doc-topics' output from MALLET and reorganize it into a format that Gephi understands")
- NodeTrix ("a node-link diagram hybrid visualization with adjacency matrix"; see theory and research behind NodeTrix)
- NodeXL ("free, open-source template for Microsoft® Excel® 2007, 2010 and (possibly) 2013 that makes it easy to explore network graphs. With NodeXL, you can enter a network edge list in a worksheet, click a button and see your graph, all in the familiar environment of the Excel window")
- Textexture (online tool that allows users to "visualize any text as a network. The resulting graph can be used to get a quick visual summary of the text, read the most relevant excerpts (by clicking on the nodes), and find similar texts")
- VOSviewer (Java-based software designed to "create maps based on network data," especially bibliometric networks, e.g., network maps of "publications, authors, or journals based on a co-citation network or to create maps of keywords based on a co-occurrence network")
- yEd ("downloadable and online diagramming tools. Functions include the automatic layout of networks and diagrams: "the yFiles library offers the user many advantages, one of which is its ability to automatically draw networks and diagrams. yFiles layout algorithms enable the clear presentation of flow charts, UML diagrams, organization charts, genealogies, business process diagrams, etc.")
- Text Visualization Tools (specialized text visualization tools, including word clouds, text difference, text variation) (see also Text Analysis Tools above)
- History Flow Visualization (tool for visualizing the evolution of documents created by multiple authors) (download site for tool)
- Textexture (online tool that allows users to "visualize any text as a network. The resulting graph can be used to get a quick visual summary of the text, read the most relevant excerpts (by clicking on the nodes), and find similar texts")
- Word Cloud Tools (& Related Tools for Visualizing Terms Sized by Value)
- Bubble Lines (create proportionately sized circles in SVG format by manually entering terms and values, e.g., Wordsworth (16) Keats (4) Byron (68); allows for manual input of terms and values; a Digital Methods Initiative tool)
- Deduplicate ("Insert a tag cloud, e.g. war (5) peace (6) and the tool will write ouput 'war' five times and 'peace' six times-- Can be used to input preformatted tag clouds into services like wordle"; allows for manual input of terms and values; a Digital Methods Initiative tool)
- Tag Cloud Generator ("Input tags and values to produce a tag cloud. Output is in SVG."; allows for manual input of terms and values; a Digital Methods Initiative tool)
- Tagxedo (word cloud from multiple sources)
- Wordle ("toy for generating 'word clouds' from text that you provide")
- Word Tree (tool for online, interactive word trees for texts submitted by users)
- Time Line Tools:
- ChronoZoom (open-source project that allows users to create zoomable, Prezi-like timeline-history exhibitions "of everything," on various scales of time-space)
- Dipity (timeline infographics)
- Histropedia ("Discover a new way to visualise Wikipedia. Choose from over 1.5 million events to create and share timelines in minutes")
- Simile Widgets (embeddable code for visualizing time-based data, including Timeline, Timeplot, Runway, and Exhibition)
- Tiki-Toki (web-based platform for creating timelines with multimedia; capable of "3D" timelines)
- Timeline Builder (online tool for building interactive Flash-based timelines from the Roy Rosenzweig Center for History and New Media)
- Timeline JS (the Knight Lab's "open-source tool that enables anyone to build visually rich, interactive timelines. Beginners can create a timeline using nothing more than a Google spreadsheet.... Experts can use their JSON skills to create custom installations, while keeping TimelineJS's core functionality")
- Timemap ("Javascript library to help use online maps, including Google, OpenLayers, and Bing, with a SIMILE timeline. The library allows you to load one or more datasets in JSON, KML, or GeoRSS onto both a map and a timeline simultaneously")
- TAGSExplorer (step-by-step instructions with tools for archiving Twitter event hashtags and creating interactive visualizations of the conversations)
- TweetBeam (creates "Twitter Wall" to "visualize the conversation around your event")
- TweetsMap (analyzes and maps geographical location of one's Twitter followers)
- Visible Tweets ("Visible Tweets is a visualisation of Twitter messages designed for display in public space")
- "Deformance" Tools: (While many tools can be used against-the-grain to "deform" materials for play or discovery, the following are tools expressly designed for this purpose. On "deformance" in the digital humanities, see for example Mark Sample, "Notes Towards a Deformed Humanities")
- The Eater of Meaning ("tool for extracting the message from the medium. Format and presentation are unaffected, but words and letters are subjected to an elaborate nonsensification progress that eliminates semantics root and branch")
- GIFMelter (creates dynamic, flowing distortions of online images)
- Glitch Images (interactive interface with sliders to "glitch" imported .jpg images)
- Ivanhoe Game -- WordPress Theme version | more info about this version (requires Wordpress site installed on a local or institutional server) ("This tool is a vibrant reimagining of a game originally developed in the U. Virginia SpecLab. . . . The Ivanhoe Game can be played on any type of cultural object or topic. In Ivanhoe, players assume roles and generate criticism by pretending to be characters or voices relevant to their topic and making moves from those perspectives")
- N + 7 Machine (English version only; "The N+7 procedure, invented by Jean Lescure of Oulipo, involves replacing each noun in a text with the seventh one following it in a dictionary")
- Synonym Machine (set of Python scripts that download "famous works of literature and replaces specified parts of speech with random synonyms. The script is currently configured to do this with Moby Dick, in reaction to Robin Sloan's fascinating question: if you replaced every adjective with a close synonym, would it be fair to call this new text by the same title?")
DH Toychest was started in 2013, and supersedes Alan Liu's older "Toy Chest". Last update: 2017.
Bubble Lines (create proportionately sized circles in SVG format by manually entering terms and values, e.g., Wordsworth (16) Keats (4) Byron (68))
Digital Humanities Tools
|
Tip: To turn text into a link, highlight the text, then click on a page or file from the list above.
|
|
|
|
|
Comments (0)
You don't have permission to comment on this page.