To see the ordinary so intensely
that the ordinary becomes extraordinary, becoming
so focused, so specific about something,
that it becomes something other than what it ordinarily is.


Good ideas are a dime a dozen for a smart person. What distinguishes good from great is how an idea is executed — how it becomes reality.


music is the space between the notes
it’s not the notes you play
it’s the notes you don’t play


This subject — turning the line into a poem — is one that every poet deals with throughout his or her working life. Every turning is a meaningful decision, the effect of which is sure to be felt by the reader. I cannot say too many times how powerful the techniques of line length and line breaks are.


Models are a way of seeing, and better seeing comes from better models.

Models summarize, show, and explain something relevant. Their purpose is to lead to consequential actions. Some models are better than others.


To choose a model is to choose assumptions — unknown, unseen, forgotten. Some assumptions are worse than others.


Bureaucracies construct models as if everything is hierarchical, mirroring Conway’s law: “Organizations that designs systems are constrained to produce designs that duplicate their own communication structures.”


Models sanctified and celebrated by insiders can evolved into uncontested, lucrative, congealed monopolies / specialties / cartels / cults / disciplines — which in time, become self-centered and selfish, more and more about themselves, and less and less about their original substantive content. Local optimizing adds up to global pessimizing. Disciplines require hard-working true believers in local doctrines / assumptions that do not correspond to the truth.


Do not go lazy into default models, justified by “we’ve always done it this way” — words that end thoughts, censor deviations, block searches for alternatives. Nonetheless, many conventions and standards have got it right, or at least good enough, but fresh seeing and attempted remodeling can confirm their continuing righteousness.


thinking annotates the world

fresh seeing challenges old conventions


More than meter, more than rhyme,
more than images or alliteration or figurative language,
line is what distinguishes our experience of poetry as poetry,
rather than some other kind of writing.


For 1500 years, printed text has used grids indifferent / hostile to meaning.


This natural phrasing (as in music or speech) makes for intelligibility between writer and reader and should not be left to the compositor [typesetter].


content locates typography

space, location, linebreaks create and clarify meaning


Penciled annotations show real-time performance strategies. To outsiders, insider mark-ups appear chaotic and cryptic, but these personal annotations are for Menuhin’s eyes, the only eyes that matter.

All can learn from this useful workaday grid strategy: a relevant and intense data layer can become a coherent substrate scaffold upon which to overlay additional information. Maps do this all day long.


Photography is all right, if you don’t mind looking at the world from the point of view of a paralyzed Cyclops — for a split second.


Eye-brain systems routinely detect and reason about
edges margins perimeters contours outlines fences boundaries
transitions changes differences convergences divergences collisions

At the borderline and the borderspace between
meaning and space
presence and absence
figure and ground
this and that
us and them
where verbs / actions / interactions / conflicts all thrive
and eyes see / think / design / produce.


Tectonic plates move 20-100mm / year, and converge / fracture / overlap, as earthquakes and volcanoes erupt along borderspaces.


To explore / understand / explain dynamic flowing information, stop-action images adjacent in space are helpful — and often better than continuous video, where the quick pace and high autocorrelation blurs analytical thinking.


When models break down
The curse of high dimensional spaces:
Many dimensions result in sparse local data everywhere
The curse of dimensionality is inherent in models and databases


In order to obtain a statistically sound and reliable result, the amount of data needed to support the result often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form groups with similar properties; in high-dimensional data, however, all objects appear to be sparse and dissimilar in many ways.


Philosophers will say that humans can never be silent because the mind is made of words. For these half-witted logicians, silence is no more than a word. To overcome language by means of language is obviously impossible. Turning within, you will find only words and images that are parts of yourself. But if you turn outside yourself — to the birds and animals and the quickly changing places where they live — you may hear something beyond words. Even humans can find silence, if they can bring themselves to forget the silence they are looking for.


Thinking / seeing / analytical / fresh eyes have a sense of what is relevant, finding what is important in a mass of data and knowledge, seeing something worthwhile, where to gain leverage. A sense of the relevant is the ability to identify and detect those things that have consequences beyond themselves.

The common guide to relevance is a profession or discipline. There are many good reasons for local expertise. That’s why it is called a discipline. But from a reforming and creative point of view, my point of view, the world is so much ore interesting / amazing / consequential than any one discipline or profession.

Creativity is connecting things. Connections become enshrined, narrowed, exhausted, prohibited within a discipline or specialty. Fresh eyes approach a new field with deep curiosity and focused looting — to quarry, make new connections, do whatever it takes, seek the useful and relevant, learn afresh not merely confirm prior views.


Sentences survive content-indifferent and content-hostile spacings, but surviving is not thriving. Text space should not be owned and governed by generic production grids, which make for convenient production but inconvenient meaning. Space can and should be content-responsive, actively contributing to meaning — forever practices in poetry, maps, math, computer code, comics, theater / movie scripts, posters. Subtle visual spacing differentiates and clarifies sentences, and meaning becomes more consequential, memorable, retrievable.


To make sense of this display, readers must briefly memorize a one-time color code stashed in a disordered legend. For 50 years, office and data-analysis software have published trillions of legends — coffins of dead conventions — and trillions of impediments to seeing and learning. Data graphics should have the same intense commitment to content, clarity, exactitude, integrity as mathematics, maps, computer code, science.


Spacing enhances complex meaning, encourage slow, thoughtful reading.


The fox knows many things, but the hedgehog knows one big thing. The fox, for all its cunning, is defeated by the hedgehog’s one defense.


In biological systems, DNA masterplans are a cumulative tangle of local random evolutionary hacks and work-arounds. Unlike universal physical laws, living organisms are historical structures, literally creations of history. They represent a patchwork of odd sets pieced together when and where opportunities rose.


Confirmation bias is the tendency to search for, interpret, favor, and recall information so as to confirm one’s pre-existing beliefs or hypotheses.

It is a principle that shines impartially on the just and the unjust alike that once you have a point of view all history will back you up.


A mistake in the operating room can threaten the life of one patient, a mistake in statistical analysis or interpretation can lead to hundreds of early deaths.


A statistician I knew once replied to a brain surgeon whose hobby was statistics that hers was brain surgery.


Big company polluted big river, environmental agencies forced polluters to clean it up, monitor progress by daily water samples. Directly observe data collection: small boat goes out, boat driver has container on end of pole, dips into water after looking around for cleaner water. Statisticians call this “sampling to please.” Observing actual data collection reveals the early limits of self-monitoring, and that people can’t keep their own score.


Can the data presented be traced back to primary data, free of any data processing or other manipulation?


A clerical error in including the wrong data in a figure represents poor procedures, but not misconduct. But such innocent explanations may require understanding of the state of mind of the authors at the time the data were prepared, and this cannot be determined definitely. It must be noted that the credibility of a particular innocent explanation depends on the overall credibility of the scientist in question. This in turn depends on whether there is an unreasonable number of problems or a pattern of questionable practices. Rather, the problems with the data are already established, and the questions is whether many improbable, innocent explanations should be accepted.


Weak evidence and stupidity spawn big attitudes, a rage to conclude: “Ignorance more frequently begets confidence than does knowledge.”


Is the database at hand capable of answering the research questions at hand?


Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.


Data cleaning programs correct logical inconsistencies, data duplications, impossible values, conflicting postal codes, outliers, other low-tide stuff.


Of all the technical debt you can incur, the worst in my experience is bad names — for database columns, variables, functions, etc. Fix those immediately before they metastasize all over your code and become extremely painful to fix later, and they always do.


Knowing where data came from provides huge insights into its limitations. Survey data is not exhaustive. Sensors vary in accuracy. Governments are disinclined to give you unbiased information. War-zone data are geographically biased due to dangers of crossing battle lines. Various sources are daisy-chained together. Every stage in that chain is an opportunity for error. Know where your data came from.


There is no worse way to screw up data than to let a single human type it in, without validation. This database had 250 spellings of Chihuahua.


Often overlooked complications are batch effects, occurring because measurements are affected by laboratory conditions, reagent lots, and personnel differences. This is a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions.


All such databases require independent forensic audits, checking for batch processing errors, changes and biases in measurement practices in space and time, and a 100 other issues. Attributing truth to a database and claiming “proof of concept” fools around with the meanings of truth and proof.


Nature’s laws and survival rates are authentic Ground Truths.


Most medieval castles were made of wood. We think most were made of stone because of survivor bias. Research databases are those that survived long enough to be selected for Ground Truth status.


In the US, the suits maximize profits and report to commercial interests and investment bankers. The result is vast transfer of money from the sick to the rich.


Smoothed data summaries reduce clutter, make our result understandable to journalists / doctors / sponsors. Readers don’t want data data data! They love simple graphics with a strong message. Readers look only at abstracts and graphics, so that’s where our team works hard to pitch our findings.


“Clutter” in data graphics is evidence that your models don’t fit the data — and that you know it. Your also know that your summary graphics cover up contrary data and depict dubious thresholds not present int he dataset. Such cheats are obvious and easily detected, and damage your credibility.


If someone shows you simulations that only show the superiority of their method, you should be suspicious. Good simulations will show where the method shines but also where it breaks.


Organizing and searching data often relies on detecting areas where objects form groups with similar properties: in high-dimensional data, however, all objects appear to be sparse and dissimilar in many ways.


Constructing predictive models is easy, but authenticity and practical use are difficult.


Good research is rigorous and relevant, producing credible results that improve lives. Scientific credibility begins with honest independent quality assessment and replication with multiple data sets, especially for models with high-dimensional inputs.


Our minds are quick to convert new optical experiences into familiar stories, favored viewpoints, comforting metaphors. No wonder, for how else can we manage optical data flows of 20MB/s without a million predetermined categories for filing, without the rage for wanting to conclude?


The rage for wanting to conclude is one of the most deadly and most fruitless manias to befall humanity. Each religion and each philosophy has pretended to have God to itself, to measure the infinity, and to know the recipe for happiness. What arrogance and what nonsense! I see, to the contrary, that the greatest geniuses and the greatest works have never concluded.


Audience members read 2 or 3 times faster than you can talk. The document is in hand, everyone in the audience reads with their own eyes, at their own pace, their own choice of what they read closely. In slide presentations, viewers have no control over pace and sequence as the presenter clicks through a deck — viewers must sit in the dark waiting for the diamonds in the swamp.


Decks are easier to prepare than documents, however. Documents require coherence, thinking, sentences. But convenience in preparing decks harms the content and the audience. Optimizing presenter convenience is selfish, lazy, and worst of all, replaces thinking.


I hate the way people use slide presentations instead of thinking. People who know what they’re talking about don’t need PowerPoint.


A sure sign of trouble is an inability to write a paragraph explaining:

  • What the problem is.
  • Why it is relevant, why anyone should care.
  • What you’re going to do to solve the problem.

The data may not contain the answer. And, if you torture the data long enough, it will tell you anything.


It’s not what you say, it’s what they hear. The biggest problem in communication is the illusion that it has taken place.