How do we erase a hard disk, for sure?

Richard Nixon was not a theorist—of course. He was the ${37^{th}}$ President of the United States from 1969 until 1974, when he resigned in disgrace. He was also an “un-indicted co-conspirator” and probably one of the most paranoid presidents ever.

Today I want to talk about paranoia. I have a problem that the more I think about, the more worried I get. I would like to share the problem with you in the hope that there is a technical solution.

One of my favorite quotes on paranoia is:

Even a paranoid can have enemies—Henry Kissinger

I often think that one of the best attributes for working in security and cryptography is to be a bit paranoid. You must think of all the ways that people are indeed out to get you.

Let’s turn to my problem.

The Problem

My problem is simple to state, but I will use Alice and Bob to explain it—even though it is my problem. Alice has a laptop that she has used for years, and while it is old it still works fine. She has upgraded to a new shiny machine and would like to give her old laptop to Bob. Alice of course would like Bob not to get any information that is on her disk. Alice has no bad information—she never visited one of those web sites—but she does not want Bob to know anything about her. She is a bit paranoid. The problem is: how can Alice erase her hard disk in a way that Bob will get no information?

The Standard Solution

Alice does not want to remove her disk and throw it away. She wants a software solution. She is well aware that there are plenty of software programs that operate as follows: These programs write random bits to all of the disk. For safety, they will repeat this process and write random bits again and again to all of the disk. Note, just deleting all the files is useless, since this just deletes the meta-data and leaves the actual file data intact. Thus real delete programs must change all the bits of the disk, not just some of the disk. Even more, these programs need to write random bits and write them to each bit more than once. The reason for this repeated writing is simply paranoia. If a bit is not changed enough times then there may be ways for Bob to recover the bit.

Imagine a bit ${b}$ on the disk. We usually think of ${b}$ as boolean—it is ${0}$ or ${1}$. But in real life the bit is a physical quantity which has a range of values. Think of it as a value in the range ${[-1,+1]}$. A zero is a value near ${-1}$ and ${1}$ is a value near ${1}$. The problem is that as the bit is “erased” the physical process can be viewed as replacing ${b}$ by

$\displaystyle b \times r_1 \times \dots \times r_m$

where the ${r_i}$‘s are random values. Each of these values is near either ${-1}$ or ${+1}$. Clearly, while ${b}$ is changed to a pretty random value, if ${m}$ is small there is still some information potentially left in the bit. A clever Bob may be able to use hardware methods to recover Alice’s bit ${b}$.

Okay, Alice selects one of these programs, and turns it on. They all warn that the deletion process can take hours. This makes sense since the delete programs must touch all of the disk multiple times.

Alice’s Paranoia

As the laptop hums away running the delete program, Alice has some paranoid thoughts: how does she know that the program is really deleting her files? How does Alice know that it is not cheating her? She realizes the delete program could be doing one of several things:

1. It could be actually using a good pseudo random generator and actually writing to all of the disk.
2. But it could be using a trivial sequence, like ${0101010\dots}$
3. Or it could be missing some of the disk blocks. This would erase all of the disk except some small part.
4. Even worse the program could be a total fraud. It pretends to erase the disk information, but does nothing.

Alice realizes she has no way to tell what the program is really doing. Should she trust that it really works?

This is the problem.

An Approach To Alice’s Problem

How does Alice get a delete program she can trust? I have some ideas on how this might be done, but all of them do not work yet. I envision that a solution would be a protocol like this: Alice runs a program on her new laptop that interacts with the old laptop and the delete program. This interaction forces the delete program to really delete the information on the disk.

The man idea I have—that does not work yet—is based on the following process. Let’s use ${\cal D}$ to denote the delete program that is running on the old laptop. Also assume that the disk on the old laptop can store ${N}$ bits exactly. Then, one can imagine Alice’s new laptop running a program ${\cal T}$ that operates as follows:

1. The program ${\cal T}$ sends ${N}$ random bits to ${\cal D}$ to store on the disk.
2. Then ${\cal T}$ asks ${\cal D}$ to return the random bits.
3. If the returned bits agree with the sent bits, then ${\cal T}$ accepts; otherwise, it declares the delete program ${\cal D}$ a fraud.

This process can clearly be repeated as many times as needed.

The point of the protocol is that ${\cal D}$ can try to not store the ${N}$ bits, but then how does it return the bits? Since the bits sent will be pseudo random, the delete program could try and “break” the random generator. However, we can assume that it cannot do that.

Open Problems

The open problem is how does Alice get a delete program that she really can trust? The above ideas fail to convince the real paranoid for at least two reasons. What if the program ${\cal T}$ does not really work? Have we just replaced our trust with one program by another? Or is this program easier to verify? Another problem is that as I have described the protocol, the delete program ${\cal D}$ could cheat. It could avoid writing to all of the disk by storing some of the bits in its random access memory. How do we handle that?

What should Alice do?

July 30, 2010 11:14 am

To verify that it isn’t storing bits anywhere except the disk, you could simply have T store its PRNG seed, run D with N random bits, power down the computer, start it back up, generate the same bits and verify that D can tell us what they are.

July 30, 2010 11:22 am

For the second problem, send a number of bits equal to Hard Disk + RAM [+ CPU cache if necessary].

For the first problem, shouldn’t program T be pretty simple? We should be able to write it in C and inspect the source code.

Another problem:
(What if the old laptop transmits the random bits to a different location and then “replays” them back to program T? We might need a different trusted program on the old laptop to ensure no new connections are being opened.

July 30, 2010 11:29 am

Program D would probably be even simpler, so I think the question is whether one can do it without inspecting the internal operation of the programs in question. After all, if you write it in C, and it looks right, you’re still not out of the woods; how do you know GCC isn’t also conspiring against you? If you disassemble it to look at what came out of GCC, how do you know that the disassembler is correct? And so on.

4. July 30, 2010 12:02 pm

what if you just encrypted the hard disk instead? is there any advantage to zeroing it out multiple times to just encrypting it once?

5. July 30, 2010 12:11 pm

In practice we often just pull the drive out of the machine and either physically destroy it, or store it somewhere (all my old drives are in a closet).

The number of bits sent would have to exceed the unused disk and storage capacity of the old lap top, otherwise D could just use that to fake the results. The new laptop would also have to have a large amount of available storage. Enough to insure that it generated more data than the old lap top. Of course a really crafty D could wipe out some stuff, but keep what it knows is valuable. And any non-randomness would be compressible.

Wouldn’t it be easier to install a T’ that reads directly? If you’re sure the reader isn’t compromised then you can double check the writer.

For a random number of bits to completely disable any information collection, wouldn’t they have to be fairly dense (at least once per byte)? When I was younger I once extracted a speech written in text from a floppy that had be stapled to a letter. The blocks of text where scrambled, and the two pin holes removed a bit, but otherwise it was still readable. Alice would not be amused.

Paul.

July 30, 2010 2:47 pm

The new laptop would also have to have a large amount of available storage. Enough to insure that it generated more data than the old lap top. Of course a really crafty D could wipe out some stuff, but keep what it knows is valuable.

But T can do the same thing, and to its distinct advantage — assuming a good enough PRNG, let T keep a list of 64 or (better) 128-bit random seeds and use each one to generate a pseudorandom “block” a couple megabytes in length. Then storing a list of seeds sufficient to generate a terabyte of data would take “only” 10 or 100 megabytes*. Then instead of asking for the entire disk back, it would instead ask for “Block N”, and it could compare what it gets back with what the PRNG generates from the stored seed.

Unfortunately I don’t think there’s any way of improving on the slowness of the protocol; D could throw away an epsilon fraction of the blocks it gets from T, and T couldn’t detect this without checking most of the blocks. That said, generally speaking T can throw away its data much more easily than D can.

* Although you’d probably want this to be genuinely random, and I don’t know where you’d get 100 MB of random data.

• July 30, 2010 3:03 pm

Assuming Alice is totally paranoid, what would stop her from thinking that the makers of D might reverse engineering the PRNG and mirroring the same technique back? This sort of reminds me of the virus writers and the anti-virus developers.

Any deterministic algorithm would be vulnerable. The only safe way of generating random numbers would rely on something like a physical device that always created brand new sequences, but that runs into space constraints.

One guy I know restored to a drill punch. He drilled holes in the drives and then ripped up all of the platters 🙂

Paul.

July 30, 2010 3:26 pm

Well, certainly Alice could be paranoid enough to think that. And if T and D were some kind of package deal, the paranoia might even be justifiable. But if it’s a secure PRNG it’d be computationally impossible for a computer to actually work out the seeds from the sequences generated by them within the few hours it’d take to run the protocol — indeed, this is half the definition of a secure PRNG!

For, um, “information-theoretic” security, yeah, I guess you’d need a hard-disk’s worth of genuinely random data and space for T to store it. But honestly that would be overkill.

In general, particularly if Alice were to write T herself implementing a standard cryptographically secure PRNG, worrying about D breaking it moves into “irrational paranoia,” at which point we’ve moved from CS to psychology. But if Alice purchases T and D together from a single vendor, I don’t know how she could check to see that they weren’t “colluding” by sending each other pseudorandom seeds. Anyone else have an idea?

July 30, 2010 12:38 pm

This program need not be huge complicated binary!
Alice can write a script of her own to do the following
Keep on appending to a file random set of data till the file can grow to its max. If it doesn’t fill in the entire disk – then open another file – and do the same. And on and on..
After there is no more user-mode space left in the computer, reinstall the OS.

repeat the same sequence a few more times?

point being – if she cannot trust a program – then write a simple one on her own – or get one for which she can read and understand the code. no reason for this code to be complex.

July 30, 2010 12:41 pm

“It could avoid writing to all of the disk by storing some of the bits in its random access memory. How do we handle that? ”

isn’t that a non-issue ? – just reboot the computer and check ?

July 30, 2010 12:48 pm

Hello,

We have a paper on a very similar issue concerning embedded system at this years’ Esorics.
The solution we propose is similar to your solution in many aspects.
In the introduction, we also discuss the possibility to use the same protocol to securely delete sensitive contents on a system.

Here is the ePrint version of the paper: http://eprint.iacr.org/2010/217

Regards,
Daniele

July 31, 2010 10:17 am

I will look at this. Seems right on the target. Sorry was not aware of it.

July 30, 2010 12:54 pm

I think the magnetic hysteresis is the physics behind it. http://en.wikipedia.org/wiki/Hysterisis#Magnetic_hysteresis

July 30, 2010 1:51 pm

I’m skeptical that it is always possible to erase all sensitive data. One problem is that nowadays when a disk controller detects bad sectors it silently remaps them to different physical locations. After that, original sectors are no longer accessible and no program can erase sensitive information that was stored there. However, Bob can still read that information by physically examining the surface of the disk.

Am I too paranoid?

10. July 30, 2010 3:09 pm

The quick format might just update the File Systme metadata, but Does not the full format of the disk help?

July 30, 2010 3:11 pm

An easy way to check T would be to, on its second run (assuming that D passed the test) deliberately flip a single arbitrarily chosen bit (or a small number of bits) on the original laptop’s hard drive before having T check the hard drive against what it sent. Then if T returns a negative evaluation of cheating on D’s part, we know for sure that it’s a false negative; if it returns a positive, we can sort of Bayesianly argue that T is likely checking every bit. I’m not sure how to guard against false positives, although that’s probably not as much of an issue. I’d think flipping a small number of bits can be done “by hand,” so even if Alice can’t write these programs T might be verifiable.

This process is considerably faster in the “block approach,” too, actually, although there I think you’d need another layer of complexity. What you’d want would be to send back a block which, again, differs only in a single bit from the block that T sent. This would be more likely to require another piece of software, and maybe it would be turtles all the way down.

By the way, you described Nixon as “probably one of the most paranoid presidents ever.” To me this sounds like unnecessary qualification — I’d say he was definitely the most paranoid, and I generally like the guy! (Relatively speaking, I mean.) Was there someone else who gives Tricky Dick a run for his money in the paranoia department?

July 30, 2010 4:27 pm

Actually, if T and D are treated as a single “black-box,” then it’s possible to fool Alice. In general:

Proposition: Let D be an untrusted disk-eraser, and let V_0 = T, V_1, \ldots, V_m be a sequence of untrusted verifiers, with V_0 purportedly verifying D and V_{k+1} purportedly verifying V_k. Consider where the only trusted operation Alice can perform is to “corrupt” the disk on any machine between two runs of the system. Then there is a black-box system in which D is fraudulent and which Alice will w/h/p be unable to distinguish from an honest system.

Sketch of a proof: Essentially all that’s necessary is for each machine to be able to tell if its data has been corrupted. This is possible by each machine computing and storing a hash of the disk at the end of each run, before it reports its result and/or passes data to its verifier. Upon the next run, it recomputes the hash and checks whether it matches. If not, the disk has been corrupted; if so, then w/ vanishingly small probability it is clean. QED

I think if you weaken the conditions (i.e. give Alice more tools) the theorem still broadly holds, unless you let Alice have a trusted method of writing to a hard drive (or a similarly unreasonable assumption). So I think Alice eventually has to trust some verifier. This is probably trivial, but it was fun to think about at least.

July 30, 2010 5:52 pm

I have an elementary question. Why can’t the software solution just write a string of 0s to the disk – i.e every bit of the disk is replaced by 0? (This can also be verified later ). Seeing the disk later reveals no information about the previous contents, isn’t it? I am sure I must have understood the question incorrectly, but I would like to know where exactly..

July 31, 2010 11:17 am

Overwriting with zeros is perfectly fine, although many paranoid people seem to think that information about the previous state then can be retrieved — i.e. a previous “1” ends up as a “0.1”.

This does not seem to be the case — see the article “Overwriting Hard Drive Data: The Great Wiping Controversy” by Craig Wright, Dave Kleiman, and Shyaam Sundhar.

July 30, 2010 7:07 pm

I think Ken Thompson’s “Reflections on trusting trust” is a good read (if a bit too classic) relating to the non-technical aspect of this post: http://cm.bell-labs.com/who/ken/trust.html

14. August 1, 2010 10:27 am

I was going to mention Thompson’s famous compiler/login Trojan, but Kevembuangga beat me to it. If you have never heard of it, go read about it; it is fascinating and very on-topic.

It seems to me that your verifier program already does 99% of the hard work of the deleter itself; specifically, producing a secure stream of random or pseudo-random bits. Actually writing such bits to disk is trivial… So by the the time you have written such a tester, you might as well dispense with the original deleter and just use your own.

But the problem of needing to trust the compiler, operating system, CPU, etc. still exists.

August 2, 2010 4:00 am

If the physics of a harddisk can be neglected for a moment, I think the Alien vs. Quine approach ( http://www.iacr.org/archive/eurocrypt2006/40040048/40040048.ps )
might convince a paranoid user that her disk has been deleted: Invoke the recursion theorem to get a quine that has the size of the disk (and is mostly random), store the quine on the disc an run it. If the output matches your input, then you’re convinced that the disk is clean.

• August 9, 2010 7:07 pm

An earlier paper actually fleshes out the approach proposed in the “Alien vs. Quine” paper, even though the Alien vs Quine paper fails to cite these guys:

August 2, 2010 7:51 am

nmd,

Thanks for the pointer. The paper looks very neat. Thanks again.

17. August 2, 2010 12:57 pm

I know this isn’t really an answer to the dilemma, but as bits get smaller, the thermal energy required to flip bits randomly should decrease. So you should just write a program that invokes the most energy intensive processes of the hard drive, to heat it up as much as possible, until the disk has reached it’s Curie temperature and becomes paramagnetic (or superparamagnetic? I’m not sure I’m getting my terminology right here…). The obvious drawback is that such heating might (and seems very likely to) harm the disk, probably immediately (or at least shorten it’s lifespan).

18. August 3, 2010 5:21 am

I’d do that:

I assume the new laptop has a bigger hard drive.
Boot with a “UNIX/Linux/BSD” distribution only from a CD.
Mount the hard disk.
Copy the huge file from new laptop to old laptop hard disk.
Shutdown old laptop and wait 3 minutes in an at leas 20°C environment.
Boot again the old laptop with a “UNIX/Linux/BSD” distribution CD.
Verify the checksum of the file in the old laptop.
Verify the checksum of the file with the new laptop.
If the copy was not done correctly the checksum should differ.

Now, unless the openssl program was compromised on the two laptop you are assured all worked properly.

More technically, when I say copy, I mean a dd if=/mnt/source of=/dev/sda.
And verify the checksum openssl sha1 /dev/sda
In order to destroy as most as possible all metadata of the hard-drive.

Another thought: the probability for an “Eraser” program to be compromised is relatively high.
But the probability for a Linux system to compromise the command dd and the openssl is far less.

• August 3, 2010 5:24 am

I forgot to tell, you have to create a random file on the new laptop.
This isn’t difficult to make one you can trust from /dev/sound or if you trust openssl, using this program.

August 3, 2010 8:09 am

Yogsototh,

Thanks for detailed comment. My point in the discussion was to point out that A;ice has to rely on certain software working. I was looking for a way to reduce the amount of trust she needs.

But please thanks for taking the time to give such detailed advice.

• August 3, 2010 11:39 am

For advising on disk erasure the Nyarlathotep pseudo would have been more appropriate. 🙂

August 7, 2010 10:49 pm

I think you might want to delve into the world of Open Source and actually start trusting your utilities 🙂
I’ll never call anyone paranoid because I am security concerned myself, but being able to depend on the tools you use is essential.

20. August 8, 2010 9:06 am

I don’t know what you’re asking here. It’s clearly obvious that you need to trust some program. Is it the testing program, or the deleting program? Or the OS executing both? Something needs to be trusted. So I don’t know if you’re wanting to re-work the whole issue of trust on computers or not, but that seems like a problem too large to address within comments.

As to the practical solution to the problem, I haven’t seen anyone mention the work of Peter Gutmann: https://www.cs.auckland.ac.nz/~pgut001/pubs/secure_del.html (note: article is old but contains good information).

If it were me, I would solve this problem in the multitude of ‘side channel’ ways: replacing the disk, putting dummy data on the disk, misleading information, or simply perform ‘sensitive’ operations from a machine that you simply don’t sell, or deal with in a different way.

August 9, 2010 1:22 am

In the real world of security this just isn’t considered a pressing practical problem. There is no software-only solution and we just accept that and deal with it. Either never store plaintext on the hard drive, or slag the drive and replace it when you sell the laptop (hard drives are cheap). There is even some sense to the approach of believing it’s impossible to remove computer viruses from a hard drive. A sysadmin I worked with believed his machine had been infected, so he replaced the drive. Hard drives are astonishingly cheap these days.

August 9, 2010 10:36 am

You can’t solve it by software it you don’t trust your software. You must trust not only the testing program, but also the compiler and the OS. Anyway, if you need to trust some program, there is no reason for not making it the errasing program, and go without a test. (Errasing a disk also seems to be easier than checking if it was really errased, and thus less error prone.) Anyway, any software solution will fail to errase all the blocks of a modern disk. Alice should better (a) keep her sensitive information on a cryptographed filesystem, (b) errase it by hardware or (c) not sell the disk at all. (Given the cost of errasing the disk by hardware, if not able to do (a), any sane person goes directly to (c)).

In a similar way, there is no hardware solution if you have no hardware to trust. You’ll need at least a trusted osciloscope and signal generator if you want to diassemble and errase the disk.

23. August 9, 2010 7:47 pm

Everyone here is naively assuming that the old laptop is communicating only with the new laptop. In reality, the writers of software D have all along been in cahoots with the manufacturers of the old laptop and are using a secret wireless device to store some of the random bits offsite.

August 9, 2010 8:14 pm

Use an open source software on an open source operating system, and study the code for both.
Or write the software yourself from scratch, and also the operating system.

25. August 10, 2010 3:29 am

If you do trust your hard disk…

Secure erase is an internal purge command that is part of the ATA specification (ATA Security Feature Set). Recent drives have bit randomizers in their PRML read channels too. Secure Erase overwrites every single track on the hard drive including bad blocks.

http://cmrr.ucsd.edu/people/Hughes/SecureErase.shtml

If you do not trust your hard disk… well nothing will do. Firmware could cheat and trick “trusted” erase software or even “backup” your data on separate hidden HDD plates.

August 11, 2010 7:20 am

What if Alice simply did a “dd” before (to get an image of the partition), then ran the program, then did a “dd” after (to get an image of the partition post-erasure), and then compared the pre- and post- images? She should get a very big diff 🙂

Or does this simply move the problem to “how can Alice trust the implementation of ‘dd’?”

August 12, 2010 1:12 am

A company I worked for ran into a big problem where people were losing data off their RL01 removable disks (a long time ago). It was causing huge problems, and appeared to be random.

Digital Equipment Corp replaced all the disk drives – twice – but the problem persisted.

Worse, it became personal as several people were more prone to losing data than others.

Paranoia set in, and everybody was backing up their files on multiple disk platters every time they saved job files. All the computers seemed to exhibit the problem, but some people were much more prone to having their disks erased than others.

What was it about these people?

It turns out they all had one thing in common: they liked to flirt with the new receptionist – the receptionist who, for whatever reason, stuck a very large magnet to the side of her metal desk under the lip where it couldn’t be seen. It was right at arm height where an arm holding a big platter swings back and forth while flirting.

Alice could use a big ol’ magnet and a few circular passes of that would leave the NSA in tears.

August 12, 2010 7:11 am

Regarding the paranoia, one may also start question if the keyboard is not hijacked, if one’s arms and legs are real, eventually wondering whether we live in a Matrix-like simulation 🙂
all of this can be traced back to modern scientific skepticism, “cogito ergo sum” by Descartes, critical thinking by Socrates and allegory of cave by Plato 🙂

August 12, 2010 9:17 am

The problem I see is getting program T, and getting program T to interact with D without being able to write code.

Most software will not have handles for another program to feed it data or force commands. So using a program like T on a different machine seems much too complicated to be practical. In the unlikely event that you found a semi-reliable program D that did indeed let you pass it random data to write… and did indeed allow you to query data that was written… you would have to write T in order to access it. If you are able to write T… you might as well write your own erase program. Even if you were to find T and D… how could you trust T?

The dd command in the unices should do the trick and you can supply your own random data file. You could also make a sequence of data files since you probably want to write multiple times with different patterns.

Or you could implement an erase algorithm like Gutmann ( http://en.wikipedia.org/wiki/Gutmann_method ) or the DoD method.

In the end though… it does seem overparanoid to me… unless you are using some random google search return for a delete program (and worried about malware/viruses). There are rumors of being able to extract data that has been written over on a hard drive… but it would be intelligence agencies that would be able to do it… or some very well-funded, powerful people. The odds are that you are selling your laptop to a non-powerful tech person… and it seems unlikely that you would be on the radar for a powerful agency to care about your data.

If you are worried about the govt or some powerful corp getting your data… a) You should be fearful of anything you do on the internet because they could well have already broken AES and b) You should not try to sell the laptop and should rather destroy the harddisk with fire and bury it in the middle of nowhere… because valueable data should not be passed to someone to make $100-$800 dollars.

August 12, 2010 2:45 pm

Then Alice knows her bits are safe even in cases where she has not had the opportunity to run D.

If you don’t want to trust your crypto filesystem code, then you also have to assume that it is also laundering data out to another location via e.g. “random” TCP sequence numbers.

-Alex-

August 13, 2010 4:26 pm

A large computer company always destroys hard drives before donating old computers.

I, once, accidentally, used a powerful permanent magnet near a laptop. The disk crashed; I was hoping I can just reformat and use it as a blank disk; unfortunately, a magnet can permanently destroy a drive; probably by causing a major head gouge on the media.

August 15, 2010 4:16 pm

don’t give the laptop to Bob

33. August 15, 2010 8:41 pm

Consider this protocol:

Let’s assume all characters have the same chance to be appear in the content of a bit, i.e., each character appears about N/256 times on the hard disk.

There are two programs, D and V:

The program D gets a set of bytes, S, as its input. Program D is claimed to work as follows:
For each bits of the old hard disk, B [i], D selects an element from S, uniformly at random.
If D is not legitimate, it might skip resetting p percent of the bits on the harddisk.
To run program D with set S, Alice needs to pay Cd (|S|) dollars. I assumed that Cd (|S1|) is more expensive than Cd (|S2|) if |S1|< |S2| .

Program V which gets an 1\le Index\le n and returns the i-th bit on the hard disk (For now, let's assume that Alice believes V is legitimate).

Alice does the following:
1- Selects an integer s and creates a set, S, of s distinct characters.
2- Runs D (S).
3- Alice installs V on the old laptop's disk (we assume Alice knows the size of V but she does not know which bits on the disk are allocated to V)
3- selects an integer m and generates m random integer between 1 and n, Indices [1]…Indices [m], and counts the number of times, Succ, where V (Indices[i]) \in S for all i=1..m.
4- Also, V counts the number of times V (Indices [i]) returns each character, n [c].
5- We assume she needs to pay a Cv dollars for each execution of V.

To complete this protocol, one needs to compute the followings as functions of s, m and size of V:

P (the obtained values {Succ and n [1]…n[c]} are consistent under the assumption that D is cheating)
and
P (the obtained values {Succ and n [1]…n[c]} are inconsistent under the assumption that D is not cheating)

As soon as this probability computed, as a function of s, m and size of V, we need to find the a value for s which minimizes the cost of verification. As I am sitting in a coffee shop, and I do not have access to pen and paper, so I cannot do that right now :).

• August 16, 2010 12:09 am

All bits in my comment should be replace with Bytes. Also, I assumed that the disk can store exactly N bytes.

August 19, 2010 2:50 am

For small disks here’s a strategy that guarantees that an arbitrarily large percentage p < 1 of the disk has been deleted:

Alice disconnects old laptop from any sort of network connection. Then Alice gives the deleter D a hard chess problem and asks it to decide if it leads to a win for white. From Scott Aaronson's post, we know that checking D's answer for correctness is easy. Repeat this protocol many times. We know that unless PSPACE = some lower complexity class, D must have use all the storage it has access to, to answer this question correctly.

😀

August 20, 2010 6:03 am

I may be a bit simple but aren’t you over thinking this problem? It seems to me to yield easily to a brute force solution (no, not a hammer). All you need to do is:

1. take a few random photos and copy them to the hard drive.
2. select them all and make copies
4. repeat
5. repeat

Very shortly the hard drive is filled up with more or less random images. and thus random 0s and 1s.

6. delete them all and do it again.
7. repeat as often as you feel like.

I absolutely guarantee that anyone using this method will never have any data retrieved from their hard drive. It takes no technical skill or any great amount of effort or time.

August 24, 2010 1:49 pm

So it seems like there’s something potentially very profound here. I’ve been reading Chaitin recently which gives me this thought: it isn’t sufficient to generate N “random” bits. The bits need to be truly random, in the sense of algorithmically random/maximal entropy. If they aren’t, then program D can receive the bits, compress them, and use the excess space to preserve at least parts of the data on the old laptop. (This doesn’t require breaking the pseudo-random generator).

This opens up another problem: you’re dealing with a finite string, so algorithmically random is well defined up to a bound, unlike an infinite string. So, does this mean that you can’t (ever) guarantee a string random enough to completely fill the old laptop? It seems that you could practically reach a good enough limit, but I’m not sure what the trade off is on space left in the old laptop hard drive.

Now think of this: the new laptop is communicating over a channel with the old laptop, sending this maximally entropic random string. How do you distinguish between an endogenous channel error (which includes, say, thermal bit flipping in the storage medium, or channel noise) and a signal of a misappropriation by D?

It seems like this is wandering into information theory…

Here’s a thought of how might make this work — assume T has a complete image of the old laptop. T sends an algorithmically random string of N = size of data store (including RAM available for storage), D does some composite operation — e.g. adding the random string bits to the data store bits modulo 2, D sends the result of the operation back to T. T checks the operation against it’s complete image of the old laptop, and assuming everything comes back right, T deletes the mirror of the image.

Here you have to assume that T is trusted to delete the mirror the image though. But, you also you run into the problem of D potentially compressing the old image and using the free space to save parts of the image… So, say you try to solve this — before giving the laptop away, you could compress the old image to M bits, fill the remaining N-M bits with an algorithmically random string, and try again. But, if D has a better compression algorithm than you, it can compress and save parts of the image still…

August 25, 2010 8:51 am

I suppose you are really only interested in the mathematics, but let me address the practical questions anyway.

1. Don’t give Bob the hard disk, he can buy a new one.

2a. If you must give Bob your old disk, and you’ve used the disk in a plain and ordinary fashion (no RAID or other logical device or filesystem magic), then check whether the disk is working fine and has no marked bad sectors (the disk will tell you); failing that goto (1).

2b. Just writing zeros everywhere on your healthy, raw disk (e.g. with “dd if=/dev/zero of=/dev/sda”) will do the trick. You will have to trust the dd program, or write your own (should be simple); and you will also have to trust the kernel. If you don’t, you’re semi-stuck, you could boot DOS and try to write your own hard disk controller driver. You’ll still have to trust your bus logic and the hard disk’s firmware. (Are you seriously THAT paranoid?)

3. I suppose you realise that once you have overwritten (even once, non-randomly) every bit of your disk, there is no way using the ordinary inband hardware to access the old information. Every recovery process henceforth requires terribly advanced high-tech; this includes a clean room, magnetic force microscopes, etc. According to Wikipedia (e.g. http://en.wikipedia.org/wiki/Data_recovery, and references therein), there is “no practical evidence that overwritten data can be recovered”. Elsewhere (I lost the source) it also said that no commercial data recoverer (think carefully about this: no matter how much money you throw at them!) claims to be able to recover files which have been completely overwritten.

Perhaps I am misunderstanding you, though, and you were really asking about _remote_ solutions which guarantee the deletion of data on physical media to which you have no access. That is an entirely different league, and indeed a very interesting problem.