The Very Spring and Root

An engineer's adventures in education (and other musings).

This content shows Simple View


Boskone: Dataliths – Digging the Idea of the Programmer/Archaeologist


Dataliths: Digging the Idea of the Programmer/Archaeologist
Our GOH Vernor Vinge has posited that as computing-based civilizations age, layers upon layers of legacy code build up in vast — let’s call them dataliths. Who gets to dig through them for valuable info? How do they do it? Isn’t our data already in pretty deep doodoo in this regard?
Janice Gelb (M), Charles Stross, Vernor Vinge, Gary D. McGath, Dana Cameron

Stross opens by mentioning that we are obsolescing file formats at an ever-growing rate. Some of this is intentional, related to corporate greed, media consumption, and encryption, and DRM. Brings up the example of Microsoft’s .lit format, which was trying to compete in the ebook market but then was discontinued. Now the license servers for the protected format have been shut off, because there is no financial incentive to keep them up, which means that content that people bought in .lit is now unreadable.

McGath points out how ironic it is that we are in an age of so much data, yet so much of it is actually inaccessible. Calls it an approaching digital dark age.

Stross starts nerding out about an idea he has had for super-compressed solid state information storage, called memory diamond. Carbon-12 would be a 1 and Carbon-13 would be a 0 for example, and you could compress information into a very dense space. Note: I think this is a cool idea, but I also note that it still considers information storage in the paradigm of binary digits. What about quantum computing, which is near on the horizon? Or other natural phenomena which have many more possible states that two, and in which information could be usefully embedded?

Vinge talks about how we have so much redundancy of information. The same file has zillions of copies around the world, all of which have to be stored. He then moves to a larger topic of what do we do when civilization falls? How will we preserve our knowledge and culture for future generations and civilizations? We can’t rely on a particular data format that is proprietary and would never be resurrected. We would need stacked layers of ever more complex generations of data, that could be read and reinterpreted after a fall.

Cameron (the archaeologist): We would need something like a Rosetta Stone for data, for future civilizations to access our culture. I am trying to think about what an archaeologist of the future would want to know, and how best to store and format that information for them.

McGath counters that it is a tricky thing to try and determine the line between what we want to preserve and what we should preserve.

Cameron: culture through the eyes of individuals is the holy grail of archaeology and anthropology. With data we have an amazing opportunity to have that continuous spectrum of the broad down to the specific and then back up again. Even the mundane details of everyday life would help inform theories and ideas about the macroscopic scale system.

Stross wonders about convergence instead of divergence, citing figures that 80% or so of the operating systems out there have converged to either iOS or Android, and these have very similar architecture and heritage. (iOS is from Unix/BSD line and Android from Linux). Note that he is including the huge number of mobile devices out there, which more and more are outnumbering actual “computers” in the desktop and even laptop sense. 

Audience questions conclude with interesting discussion about the role of libraries, particularly public libraries in storing, archiving, and retrieving the data of an age. Calls for innovation on this front.

Politicians and Data

Some thoughts on politics and data. A friend who is definitely more conservative than I am, but nevertheless among those I respect and admire most, passed on the following data from the Brookings Institute, which was apparently cited by a certain politician:

For every $1 change in average income for wage earners:

The bottom quintile (bottom 20%) of income earners experience a $0.76 delta

The next quintile (20%-40%) of income earners experience a $0.90 delta

The next quintile (40%-60%) experience <$0.70

The next quintile (60%-80%) experience <$0.70

But the higher wage earners (95%-99%) experience a $1.01 delta

And the highest wage earners (99%-100%) experience a $2.16 delta

Though the conversation leapt off from this initial point off into various related issues on which we surprisingly agreed on a lot of things, I wanted to post quickly about these numbers in particular because I think they illustrate a broader phenomenon of how numbers are used in politics.

So, just looking at that data, one might draw the conclusion that the top 1% of wage earners are disproportionately affected by changes in overall average income. This might then lead one to consider this apparent fact as reasoning toward believing that the 1% perhaps should receive a tax break of some sort to compensate for the the increased risk they assume on behalf of the economy as a whole, or perhaps that the 1% clearly share a natural, enlightened-self-interest incentive to trickle down their wealth and thereby increase their own leveraging power. A rising tide, if you will permit the analogy, symbolically lifting all metaphorical boats.

Here is the flaw: the face-value implication of presenting the data this way is that a dollar has a constant value for all people, which is false.  Though the delta shown here would indicate that the absolute dollar difference on the top 1% of earners is about 3 times that of the middle quintile, that doesn’t tell the full story. Because: the median income of the general population is $55,223, whereas the mean income of the top 1% is… $31,000,000. (quick check on Wikipedia). 

So lets take that earning sensitivity per segment (the listed <0.7/$ for the middle quintile is vague, so I’ll use 0.5/$ for conservative margin), divide each by the average income for that segment of the population, and compare the values.

Scaled as a percentage of their total income , which is what would really matter when we are talking about how these changes affect people’s lives, we see that the same $1 change in average income affects the middle quintile 143 TIMES as much as it does the top 1%. Same data… very different conclusion.

And mind you, nothing is necessarily wrong with the data, its very likely correct (e.g. I am in no way suggesting that the data as presented is false, Brookings is a highly reputable name in think tanks). The numbers are just numbers. The problem is that they can be presented in a way that I think misses what is really going on. How we look at the numbers is just as, if not more important than, what numbers we have.

Lesson? Don’t believe any data from the mouth of any politician, regardless of party. On science, on education, on their opponent’s voting history, nothing.

GR on Value-Added Metrics

Gary Rubinstein has recently published two blog posts dissecting the data for VAM teacher evaluation in NYC. Check out Part 1 and Part 2 on his site.

A hard lesson-learned for most engineers early in their careers doesn’t seem to have translated to the reform movement very well… data is amazing and we should use it to guide decisions to the greatest prudent extent, but more more important than what data we have is what we know about it’s limitations. I know I’ve made a few rash errors in the cubicle; in my haste to get to the right answer I treated as gospel a dataset which turned out later to be highly suspect. Fortunately I hadn’t published the papers yet, I only ended up having to redo a lot of work. Lucky.

Core rule of the engineering world: you can almost never directly measure what it is you want to know. You must infer results based off of imperfect measurements of associated variables. That is a process that comes with a varying degree of uncertainty, bias, error, and false assumptions. The right thing to do is to explore what data can tell us, experiment and validate using simple and solid areas of known science, then build up models from there with increasing confidence.

In engineering, the consequences of such an error in judgement can be dire — bridges collapse, space shuttles explode, levees fail, weapons misfire, or a lot of money gets burned. These are serious indeed, but in education, the consequences seem potentially horrifying. We’re talking about the education of children, and by extension their future livelihoods and those of their children. The future society and workforce of the nation and world. Not to mention the careers of many thousands of teachers.

“Not everything that can be counted counts; not everything that counts can be counted.” –Albert Einstein.

I don’t get it. What is the root of the contemporary popular belief that if something can’t be quantified it has no value? Or more dangerous, that if the computer spits out a number, it must be right?