Blogs > Cliopatria > Archival Bits and Bytes

Nov 10, 2004

Archival Bits and Bytes




No, it's not news to those of us who have even the slightest interest in archival issues or skim the pages of our professional journals, but it's worth noting when the New York Times takes up the question of archiving digital information. They grossly overestimate the resources available --"Professional archivists and librarians have the resources to duplicate materials in other formats and the expertise to retrieve materials trapped in obsolete computers." -- as if only" consumers" were dealing with the problem. But they make it clear that the solution will not be simple, or cheap, nor consistently applied, in all likelihood.

And apparently it's basic science day over there, because they also have a nicely written attack on the anti-bacterial/anti-microbial product movement:"These products are as much about cooties as they are about viruses or bacteria." I'm sure somewhere, some enterprising anthropologist or cultural historian will take up her comments about the psycho-social aspects of pollution and make some sense (or nonsense) of our immensely profitable phobias.

second thoughts: I worked as a computer programmer for a biochemistry researcher who insisted on multiple backups: one copy of the code on each of the computers in the office (an IBM XT and AT, for those of you with long memories), one copy on each of two removable Bernoulli hard drives (each the size of a small laptop today), printouts in the binder, a copy of the previous week's code and printouts in the fireproof cabinet, and also at my home. To this day, I don't feel secure unless the data is stored in two physically different places, though I'm much less careful about hard-copy backups. I tell my students that one of the best preservatives, particularly for the ephemera of popular culture, is the wide distribution of multiple copies. Centralization of archives is great for researchers, but vulnerable. Multiple copies protects against political and natural disaster, degradation, etc., much more effectively.

OOPS: Also from today's NYTimes:

Atlantic cordgrass is a fast-growing weed that can grow up to 7 feet tall and cover mudflats in dense meadows that can have a profound impact on an ecosystem. Some countries import the weed to convert marsh into solid land. Atlantic cordgrass was introduced in San Francisco Bay three decades ago to combat erosion and has since become the most dangerous nonnative species to take root there. ... Even more distressing, the East Coast cordgrass has mixed with native species and spawned hybrids that grow faster and stronger than their parents.
Science is fun, isn't it?

In Defense of Javert: Nicholas Kristof, understandably shrill at recent judicial and prosecutorial attacks on journalistic source confidentiality repeatedly invokes the character of Inspector Javert, from Victor Hugo's Les Miserables (I saw the musical before I read the book, but I didread the book), as a symbol of law enforcement run amok. It's not fair. Or rather, it may be fair to the judges and prosecutors which Kristof excoriates to compare them with Javert, but not in the sense that Kristof means. Javert was indeed an unyielding, literal, law-before-mercy petty tyrant, but he was also competent, self-critical (in the book, his last act before his suicide is to write out his reports detailing his errors and flaws in the system), unbiased, apolitical, and deeply humble about his role as a powerful public servant. He did not target the innocent, but rather failed to distinguish degrees of guilt and refused to substitute his own judgement for that of the properly constituted judicial authorities. Javert is not a hero, but neither is he, properly speaking, a villain or an anti-hero. When Kristof compares these judges and prosecutors to Javert, for me that highlights the fact, particularly in the Plame case, that some of these public servants are trying to do a very difficult job to the best of their ability within the law (some of them are clearly overreaching, which complicates things). The law and its authors, not the enforcers thereof, is at fault for its lack of mercy or, in this case, integrity. Kristof isn't wrong, and I understand the sense of embattlement that leads to a lack of perspective, but he's lashing out at individuals when it is a systemic issue.



comments powered by Disqus

More Comments:


Charles V. Mutschler - 11/11/2004

This is an interesting thread. Out here in Washington state, the newest branch of the state archives opened this fall. It is specifically intended to begin to address the problems of electronic records, and how they will be accessioned, arranged, described, stored, and preserved for future use.

At the present, the problem of migrating data seems to be the $64.000 question. How many times can we migrate data before the cost exceeds microfilming on silver film? How can we address the problems with needing to correct for data lost in transfer? This seems to always be an issue, at least so far.

Obviously the cost of any sort of archival storage - for paper documents, microfilm, or electronic records - is going to have to be factored into long range planning. At the present, I think we have a serious difference in understanding. Many of the technical people think of "archiving" data with a retention of a few years at the most. The concept of permanent - in the sense that most archivists think of it - is not really one that the computer data people give much thought to. Unfortunately, while money is frequently available to buy new software and hardware, funds to trsnafer data is probably overlooked more often than not. And yet, that is going to be increasingly important.

At the present, the State Archives people are looking at ways to traslate data to new storage formats, and to minimize data loss during transfer. I will be very interested to see how this works out. My current approach is to always keep a paper copy of anything that is really valuable enough to warrant long-term retention. However, that is not realistic for many records series generated electronically. Electronically generated records are becoming more common, both in business and in government. We live in interesting times.

CVM


Maarja Krusten - 11/11/2004

If all my tag lines about personal time are annoying--and yes, I am sure there are many things about me that are annoying here on HNN, LOL--do read also my PS on timing added at http://hnn.us/readcomment.php?id=46580#46580 . I actually was the subject of some mild harrassment by an opponent within the government during the years when the Nixon records lawsuit was ongoing in the early 1990s. No point in sharing the details, everything ended up ok and I tend to think I am fine right now, I'm a GS-14 civil service level official, not a member of the "senior executive service."


Maarja Krusten - 11/11/2004

Home on holiday today, so I have time to post one more thought. I really won't be bothering you guys on Cliopatria any more, that's a promise, LOL!!! I think the window has closed for reaching out to people on the Presidential Libraries issues, perhaps for the same reason that it is difficult to find news sources "everyone" can trust, or motivate voters by focusing on issues such as Abu Ghraib. I do believe there still is a center in this country. But those who derive their identity from politics have the loudest voices these days. On another listserv, I've noticed that a few readers bristle when I mention problems with the Nixon records--they seem to react that way simply because they are Republicans. And these people are archivists, the ones historians depend on for neutral, objective, nonpartisan action! If it is hard for me to reach them with appeals for statutory compliance, I can forget about getting through to the general readers. Remember the "to hell with the historians" Reagan assessment by Patrick Fagan that was published on HNN recently? Was it an aberration or a sign of something else? Also, you mention family honor. You might look to the link I posted yesterday at http://hnn.us/readcomment.php?id=46571#46571 to an op ed on the role of politics and self esteem.

Thanks again for responding!

Maarja


Maarja Krusten - 11/10/2004

Excellent response -- good points, all! Many thanks!


Jonathan Dresner - 11/10/2004

To clarify my earlier answer, as a citizen I am deeply concerned, outraged, about the question of document collection and access in government. As an historian I know that data loss and selective document destruction are par for the course in human history.

I'm more concerned, in some ways, about the increasingly proprietary nature of documents ('work product' and copyright extension, as well as the presidential library issues) not to mention a sort of perpetual family interest in protecting the reputation of one's ancestors (there's an increase, I think, in the concept of family honor in this country which doesn't bode well for social order or academic discourse in the long run.)

I really do think that historians have an interest in all archival issues, and I think that we do pretty well promoting libraries, archives, collection and access. But there are limits of time and resources and partisanship which we have not yet found a way to transcend.


Grant W Jones - 11/10/2004

Both Javert and Jean Valjean were men of principle that in a different reality could have been brothers. Hugo's messege (of many) was that the corrupt, brutal system destroyed the best within the men assigned to enforce it.

I had read the book twice before seeing the musical. Anyone who hasn't done both needs to get busy.


Maarja Krusten - 11/10/2004

I do appreciate the fact that you bothered to respond to me, Dr. Dresner, and did not view my attempt to broaden the scope as me simply acting as an annoying "troll." I do remember your earlier note to me that HNN by and large does not get into areas of records management, archival access, etc., those matters seem to be viewed as largely irrelavent to the perceived purpose of HNN. I happen to see issues of preservation, data migration, records management, and archival access as ones that all are inextricably linked in how they affect what you, the end user, will ultimately see. You may see these issues as discrete matters, to which you assign differing priorities. We'll have to agree to disagree.

The National Archives is not unaware of all these challenges. John W. Carlin, who for now heads the National Archives, summarized his agency’s concerns in 1999. “We have paper dispatches that generals sent and received in the course of conducting the Civil War more than a hundred years ago, but a hundred years from now will electronic records remain accessible on which to study command decisions in the Gulf War? Or, to take the current daily headlines, how much of the record material accumulating in the computers of the independent prosecutor will survive deletion, deterioration, and the discarding of machinery that can read it?”( Lecture transcript, John W. Carlin, “Ready Access to Essential Evidence: The Meaning of Records in American Life,” March 1, 1998)

Of course, as Bruce Craig noted in his NCH newsletter last week, he expects Allen Weinstein to be confirmed as U.S. Archivist. I don't know Dr. Weinstein. I should not stereotype him due to his age, some older people are very techno savvy and readily see the implications of all the challenges. If you believe he is the right man to be guiding the Archives in age with so many electronic challenges, fine. If not, well, I don't see that there is much that you can do about it. At any rate, the U.S. Archivist is a subordinate of the President of the United States, and, as is the case with all executive branch employees, will receive direction from the White House, probably from the White House Counsel's office. For signs of where the Archives may be headed, look not only at Weinstein's record, but also that of the White House Counsel and the new Attorney General, whoever he may be.


Maarja Krusten - 11/10/2004

Take a look at this guidance at the Department of Interior, a federal agency I picked at random off of the ones with guidance available on public websites: http://policy.fws.gov/m0089.html

You mention backup copies, etc. But consider the following DOI guidance from 2001, the term NARA within refers, of course, to the National Archives and Records Administration:

"Why aren’t tape backup copies considered the ‘official record’?

NARA requires every employee to be responsible for the records they create.  They
require that - prior to destruction of email - employees either electronically archive or
print out and file copies of all email which qualifies as a record.  Employees should
save those records in accordance with their agency’s records schedule.  Tape backup
copies therefore, do not qualify as the official agency record.  They are merely a
safeguard related to records - in the event that there is a problem with a Service
computer or system.

What is the Service’s requirement related to retention of tape backup copies?

The current Service requirement is for Information Resources Management personnel
to store tape backup copies for a period of 30 to 90 days (to allow local IRM managers’
flexibility related to customer needs) after which time the tape backup copies are deleted.
This will allow the Service to reduce costs and storage requirements "


Maarja Krusten - 11/10/2004

I think we can all agree that how electronic data is preserved is important. The NYT article explains the issues pretty well. Beyond that, if cataloguing is the primary issue, shouldn't you think more about what it is that will remain to be catalogued? I agree that how to capture, emulate or migrate electronic data is important. But it shouldn't be considered in isolation. Where is it being stored, on hard drives or within document management systems, such as Hummingbird DM 5? Or on flash drives which people remove from their computers and walk away with? How vulnerable is data to deletion by the creator, long before it reaches any archives? Perhaps my point about records management and records scheduling was lost ss I segued to my second point, which was access.

Just think of any person you know on your own campus. If that person has the ability to delete, at will, any electronic record he receives or creates (e-mail, photos, documents, spreadsheets) are you confident that if down the road you needed to get critical information from him, what you needed would still exist? Cataloguing is nice, but how can you ensure that the person is keeping the most important information on his computer hard drive, flash drive, CDs, etc.--regardless of whether it reflects negatively or positively on his performance?

Yes, there have always been shredders and some documents always have been lost to history because they never reached the cataloguing stage. And countless Presidential documents were lost to history in the 19th century because widows or other persons of interest simply burned them. But the computer age has increased the risk.

Reliance on personal computers largely has removed from the office the neutral buffer of a secretary, the third party with no vested interest in the content of records. Fewer offices rely on secretaries, who once were responsible for filing incoming letters and keeping the notes, working documents and carbon copies of outgoing correspondence. (When I first entered goverment, one still had a yellow tissue copy, a white tissue copy, and a green tissue copy, carefully cross referenced and marked with the name of the author--usually different from the person actually signing a letter--and the initials of the typist.)

Think of Oliver North and what happened with the Bush-I Iran-Contra e-mails. The NSC officials went back in to the computers, tried to alter and delete e-mails, etc. And then the Archivist of the United States, Don Wilson, who was, after all, a subordinate official of the government, signed an agreement to give President George H. W. Bush control over WH e-mail as he left office. It was overturned, to the extent it was, only because public interest groups filed a lawsuit and got a judge to overturn it. But the problems lingered--remember the controvesy over the National Archives' General Records Schedule 20 a few years ago? Some of what the Archives is trying to do now with records management--the "buckets" and "flexibile scheduling" approach to records scheduling at the federal agencies--is a follow on to all that. How many historians--the end users if you will--have been following the issue, and providing input to the Archives, I don't know. My guess is, not many.

If you can get a copy, take a look at the book, WHITE HOUSE E-MAIL: THE TOP-SECRET MESSAGES THE REAGAN/BUSH WHITE HOUSE TRIED TO DESTROY, By Thomas S. Blanton.


Jonathan Dresner - 11/10/2004

It seems to me that there are four fundamental archival issues: collection, preservation, cataloging, access. The NYTimes is mostly concerned with the most obvious preservation issue. Your primary concern, it seems to me, is with collection and access. As an historian, I have little control over any of them, but the one that affects me most directly is cataloging, with access a close second....


Maarja Krusten - 11/10/2004

Thanks for the heads up about the NYT piece on archival preservation. The NYT article is fine for what it covers, but dilutes some points due to a mixture of quotes from government archivists (who by statute must preserve permanently valuable records under the Federal Records Act, Presidential Records Act, etc.) and comments from users of home computers. Moreover, it omits what everyone seems to omit these days, the toughest questions about the substantive content of what is being created and preserved.

First of all, not everything is worth keeping, so the federal government applies a risk management (records management) approach to deciding what to keep. The government presently is re-writing federal records schedules which are supposed to provide guidance on what is preserved for future historians and what can be tossed within federal agencies. This includes management of electronic records, which may reside in document management systems or on the hard drives of officials. Executive branch agencies are gaining some more autonomy in writing this guidance, although the National Archives still is playing a role. For more on this and the ramifications, see http://www.archives.gov/records_management/initiatives/rm_redesign_project.html

Then, as if things weren't complicated enough, consider this from a story at http://snipurl.com/ahty
about the Clinton Presidential Library, which opens officially next week:

" Clinton is just the third president to fall under the Presidential Records Act of 1978, which established that records generated in the Oval Office, such as memos, briefing books and drafts of policy papers, are owned by the public, not the president. That includes papers. . . that are collected or generated by the president's advisory commissions.

Clinton's 80 million pages — more than 27,000 for every day he was in office — are believed to be far more than those generated by any other president, officials said. There are approximately 50 million documents, for instance, at the Ronald Reagan (news - web sites) Presidential Library and Museum in Simi Valley.

Historians debate the quality of the information contained in the archives, said Texas A&M University historian H.W. Brands. Clinton was mired in scandal and hounded by investigators with subpoena power, Brands said.

"Once it became clear that various investigators could subpoena anything — diaries or confidential records, anything written between the president and his staff — I have to guess that a lot of people simply stopped writing things down," he said."

Historian Michael Beschloss has expressed similar concerns about records covering George W. Bush's tenure.

And then there is the Bush executive order, strengthening the ability of sitting Presidents and former Presidents to block disclosures to the public from their archival materials. There already are some stories that the White House is delaying release of materials which Clinton had intended to open.

No offense to any of you here who are professors--remember, I've always worked for the government, first as an archivist, then as an historian--but you might also consider my reaction to a recent article in the Chronicle of Higher Education. Hey, at least my man Jon Stewart got it right--check out what I posted to the Archives and Archivists Listserv this week. No disrespect to academics, but the government's archivists have needed your help and strong voices for a long time, the best time for helping them was about ten years ago, and I'm not optimistic about the future:

"Thanks for the link, Rebecca, I appreciate it. That article was a real hoot! It confirms my long held belief that many academics have their head in the sand and take the easy way out when it comes to looking at issues such as the Presidential Libraries. But I loved the Jon Stewart quote.

Professor Michael Nelson spends gobs of typefonts solemnly discussing the cost and location of Presidential Libraries, and says in passing, "The 1978 law eased most scholarly concerns about getting access to presidential papers, but it did not stem the growing tide of criticism of the libraries that house them." Later he adds, "Getting a closed file opened now can be as simple a matter as knocking on the archivist's door and making a case. That wouldn't happen at a central repository where the archivist had a staff of hundreds to supervise." Wh-aaaaat?

What a waste of ink this article was. What is that due to, just ignorance? Knock on an archivist's door and a closed file can be made accessible? Ummmm, what about the bloody battles that surrounded Nixon's files and tapes? I know many academics are clueless, but how can you write an article about Presidential Libraries and not focus on the core issue of intellectual access and the impact of the Bush Executive Order. Is Michael Nelson just a nom du plum for some well known figure on the political scene, LOL? He can't be a professor of political science. Just kidding.

The part of the article that rang most true for me as a former Presidential Libraries archivist was the first paragraph, which read, "According to Jon Stewart and the other Daily Show authors of the best-selling America (The Book), a presidential library consists of a tiny room of public records, a large room of sealed records, and a vast museum featuring exhibits like "The President as Young Man," "The President as President," and "The President as Angry Coot." Just as with his "Crossfire" appearance, Stewart once again shows that he has more common sense than most pundits or professors."


Manan Ahmed - 11/10/2004

Ah, paper.

But seriously, trying to "future-proof" our data is a losing game. Our only hope is in multiple media backups - optical, mechanical, etc. The crucial task is to keep the data migrating along with us, as we move from one machine to the other and from one media to another.

The greater challenge is, of course, software that will always "READ" that data. There, we do have a problem. As standards evolve (.tiff to .gif to .jpeg to .xjpeg to whatever), software will stop recognizing older formats or force you to convert (compare .aiff files to .mp3) up with quality loss. The one nice thing is that with evolving technologies our data becomes smaller and smaller. I can store all the files from my 1994-2000 school/programming era onto one 512 MB file on my iPod :)

Or.

We can create 21st century technical monastaries where "archived" hardware exists solely to read "archived" data. They will need to put everything on the most long term media available. Then when they're done, do it again. Repeat ad infinitum. Perfect task for a monk.
I will now retire my geeky self.