Useful code-snippets in datarecovery

Since I've had quite a bunch of failing harddrives recently in for recovery, I've created some scripts which help me doing that.

So there's one script that simply parses a logfile from DDrescue and shows you the amount of data already recovered, and the amount that's still bad/unrecoverable. Second script is intended to determine the affected files (on NTFS), which also parses a DDrescue log. Third and last script is to keep re-reading a single sector, so you might get the data out of it. DDrescue doesn't provide an integrated routine to do this.
With all these little helpers, you're able to see which are the defective/damaged files, and if something is really really important, you can let the script run to gather that data in an endless-loop.

DDrescue Log Parsing

This is not actually a script, but a simple oneliner for the DDrescue. It prints the statistics of a diskimage (in fact of the logfile). This allows you to get a quick overview of the status of such an image. There's also no dependency for the original software.
It's tested against version 1.11 of ddrescue.

Code

cat log.txt | grep -E "^0x[0-F]+ 0x[0-F]+ [+-]" | awk --non-decimal-data ' /+/ { ok+=$2 } /-/ { bad+=$2 } END { total=ok+bad ; printf "TOTAL: %d MB nRecovered: %d MBnBad: %d KBnnPercentage: %3.3fn" ,total/1048576,ok/1048576,bad/1024,100/total*ok } '  

Sample Output

|| user@workstation ~ || cat log.txt | grep -E "^0x[0-F]+ 0x[0-F]+ [+-]" | awk --non-decimal-data ' /+/ { ok+=$2 } /-/ { bad+=$2 } END { total=ok+bad ; printf "TOTAL: %d MB nRecovered: %d MBnBad: %d KBnnPercentage: %3.3fn" ,total/1048576,ok/1048576,bad/1024,100/total*ok } '  
TOTAL: 152627 MB  
Recovered: 152626 MB  
Bad: 996 KB

Percentage: 99.999  

Notes

You might notice a slight difference between this output and the one from ddrescue. This is because the values computed by this script are using 1KB = 1024B, whereas ddrescue 'assumes' 1KB = 1000B.

Find files occupying badblocks (NTFS)

This script allows you to parse a ddrescue logfile and print a list of files which occupy some of the bad-sectors. This allows you to easily identify which areas cause problems, and you can also modify the log manually (nice description is available here), so that ddrescue can concentrate on a specific area.

Code

#!/bin/bash  

#########################################################  
# Author: Raphael Hoegger  
# Source: http://pfuender.net/?p=80  
# License: This file is licensed under the GPL v2.  
# Latest change: 2010.06.24 17:40:32 CEST  
# Version: 1.1  

#########################################################

FSoffset=32256 # this is equal to the value used in 'losetup' as the
offset  
DEVICE=/dev/loop1  
LOGFILE=log.txt ## the one from ddrescue  
OUTPUT=results.txt ## where you want your results stored

for failingSector in $(grep - $LOGFILE | awk ' { print $1 } ') ; do  
  NTFSsector=$(( ($failingSector-$FSoffset)/4096 ))  
  echo "Sector $NTFSsector:" >>$OUTPUT  
  ntfscluster -f -c $NTFSsector $DEVICE 2>/dev/null >>$OUTPUT  
done  

Output

|| user@workstation ~ ||$ sudo losetup -o 32256 -r /dev/loop1 image.dd
|| user@workstation ~ ||$ ./find_damaged_files-ntfs.sh
...
Sector 1262075:
Searching for cluster 1262075
Inode 17166 /Windows/system32/$INDEX_ALLOCATION($I30)
Sector 1263743:
Searching for cluster 1263743
Inode 92016 /System Volume
Information/_restore{B2482471-DF35-4094-86D3-41D285BA1DE9}/RP1060/$INDEX_ALLOCATION($I30)
Sector 1263744:
Searching for cluster 1263744
Inode 9872 /Dokumente und Einstellungen/USER/Lokale
Einstellungen/Anwendungsdaten/Microsoft/Windows Live
Contacts/$INDEX_ALLOCATION($I30)
Sector 1263771:
Searching for cluster 1263771
Inode 109395 /Dokumente und Einstellungen/USER/Lokale
Einstellungen/Anwendungsdaten/Microsoft/Windows Live
Contacts/{19fddb48-6b7f-40b7-b4d2-f40b59677fea}/DBStore/$INDEX_ALLOCATION($I30)
Sector 1278455:
Searching for cluster 1278455
Inode 5697 /Dokumente und Einstellungen/USER/ntuser.dat/$DATA
...

Re-read single sector until success

As you can see above, you can easily generate a list of the files that are corrupted. You can also see which part is affected, like the \$INDEX_ALLOCATION etc. So to me line 18 looks like the most important, since it contains the HKCU part of the registry. So we'll now simply adjust the line below with the right sector using the calculations done previously but in the reverse order:

  • As per the log above, our cluster is 1278455.
  • Let's convert it from clusters back to sectors -- 1278455*4096=5236551680
  • Add back the original filesystem offset -- 5236551680+32256=5236583936

So now below you can see the code with the right sector number:

Code

while [ $? -eq 1 ] ; do dd bs=512 skip=5236583936 if=/dev/sda of=sector_5236583936 count=1 ; done

Notes

This command above will not just run, you need to run it once before without the whole loop (just 'dd bs=... '). This is by design and even has a reason, you can quickly verify if you're targetting the right sector, so if it fails, you know that you're messing with the right one. Now you can start your while-loop and let it run, forever.... Can take quite a while to gather any data. Now before anybody throws in 'spinrite' as a keyword, I have to say at this point that I've tried it out, but it seems to be really buggy/irresponsive or whatever, that's my experience and might also be only because of my buggy-bios or what so ever. If you have different experience / other tools, just let me know in the comments!

Further resources

  • DDrescue manual - This is actually my favorite imager under linux. If you follow the link, you can read about it's algorithm used to obtain the data in a fast and effective way, definitely worth reading!
  • Forensicswiki - A nice wiki which contains a bunch of useful articles about forensics/datarecovery.

That's it for the moment! As usual, questions can be asked in the comments, I'll answer them as time permits ;-)

Cheers,
Raphi