Lecture Notes

October 2


  • Categories: use them!
  • Project 1 draft due before midnight tonight. Post them to this site as a Post. +New –> Post –> Copy and Paste your work (always draft elsewhere first) –> Select “Project 1 Drafts” Category –> Add Featured Image –> Publish
    • I will review drafts and offer feedback by Thursday
  • News from the forum
    • Featured forums (new!): Candace; Jessica; Akintunde
    • Some recaps:
      • all of us visual learners?
      • data viz is intuitive? But we can’t make out certain graphs…
      • data is never neutral 
      • visual literacy is key
      • anyone going to try the hangover cures?

Data Visualization!

Presenter: Carlos



Text Analysis consists of two processes: analysis (where the computer breaks down information into smaller bits like words) and synthesis (in which the computer counts these units, manipulates them, and reassembles a new text)

(Sinclair and Rockwell 243)

  • Text analysis systems can search large texts quickly. They do this by preparing electronic indexes to the text so that the computer does not have to read through the entire text. When finding words can be done so quickly that it is “interactive”, it changes how you can work with the text – you can serendipitously explore without being frustrated by the slowness of the search process.
  • Text analysis systems can conduct complex searches. Text analysis systems will often allow you to search for lists of words or for complex patterns of words. For example you can search for the cooccurence of two words.
  • Text analysis systems can present the results in ways that suit the study of texts. Text analysis systems can display the results in a number of ways; for example, a Keyword In Context display shows you all the occurrences of the found word with one line of context.


What can we do with this? (via Ted Underwood)

1) Categorize documents. You can “categorize” in several different senses.

  • a) Information retrieval: retrieve documents that match a query. This is what you do every time you use a search engine.
  • b) (Supervised) classification: a program can learn to correctly distinguish texts by a given author, or learn (with a bit more difficulty) to distinguish poetry from prose, tragedies from history plays, or “gothic novels” from “sensation novels.” (See “Quantitative Formalism,” Pamphlet 1 from the Stanford Literary Lab.) The researcher has to provide examples of different categories, but doesn’t have to specify how to make the distinction: algorithms can learn to recognize a combination of features that is the “fingerprint” of a given category.
  • c) (Unsupervised) clustering: a program can subdivide a group of documents using general measures of similarity instead of predetermined categories. This may reveal patterns you don’t expect.

3) Trace the history of particular features (words or phrases) over time. This could be viewed as a special category of corpus comparison, where you’re comparing corpora segmented on the time axis.

4) Visualization. Perhaps this isn’t technically a form of analysis, but in practice it’s important enough that it deserves to be treated as a separate analytical step. It’s impractical to list all possible forms of visualization here, but for instance, results can be visualized:

a) Geographically — to reflect, for instance, density of references to different parts of the world.

b) As a network graph — to reflect strength of affinity between different entities (characters, or topics, or what have you).

c) Through “Principal Component Analysis,” if you have multidimensional data that need to be flattened to two dimensions for ease of comprehension.

When Viz Goes Bad

Fewer people would have a hard time giving up salt, but those people must feel extra strongly about it.


What is the recidivism rate for pie chart offenses?

From this I glean: 1) The US has no paid paternity leave, despite being the center of the universe. 2) The UK, Denmark, Australia, Venezuela, and Kenya are all actually the same country. 3) This really should have been a bar chart.

From this I glean:

1) The US has no paid paternity leave, despite being the center of the universe.
2) The UK, Denmark, Australia, Venezuela, and Kenya are all actually the same country.
3) This really should have been a bar chart.

In-Class Exercise

Pre-Attentive vs. Attentive Processing

  • Color (hue and intensity)
  • Form (e.g. density, size, shapes, width, length)
  • Movement (flicker and motion)
  • Spatial Positioning (depth, distance)


Data collection: in groups of 4, poll the class on one set of data points: birthdays (month and day).

Data viz:  decide as a group how you will represent this data. As you’ve seen in your readings this week, this decision is important. Some examples may include numbers, bar or pie charts, numbered icons, sized icons, tree maps. Here are tons to choose from. Once you’ve designed your visualization, put it on the board. We will discuss them as a class.

Leave a Reply

Theme by Anders Norén

Need help with the Commons? Visit our
help page
Send us a message
Skip to toolbar