You need to transfer a lot of files across a slightly temperamental ssl connection. You want something like a recursive scp command that supports resuming and will keep on trying until it gets the job done.
rsync is ideal for this purpose – however, I find it quite dodgy under cygwin, especially when transferring large files.
A sweet alternative is Unison, for synchronizing filesets over ssh.
However, I often find myself falling back on a nice script called scp-resume.sh designed for resuming the transfer of large files using dd over ssh. We can invoke this script inside a loop to transfer lots of files at a time.
One problem with the script is the use of the construct below to determine file sizes:
localsize=`ls -l "${localfile}" | awk '{ print $5 }'`
This will fail if there are spaces in the username of the file owner. Most likely you’ll get:
Resuming download of [file] at byte None
...
dd: invalid number `None'
where the group owning the file is reported by cygwin as ‘None’. The fix is to replace every instance of this ls -l
construct with something like localsize=`ls -g "${localfile}" | awk '{ print $4 }'`
. The -g
option displays the file size but not the owner name, so you should be safe from spaces confusing awk. I don’t know if the -g option is POSIX, but it’s in GNU ls anyway.
You might be tempted to use ls -s
, but this reports the amount of disk space used, rather than the actual length of the file (i.e. it will be a multiple of the allocation blocks). You can see the difference using ls -ls
:
Hugh Denman@gpplap3 ~
$ cat > asd.txt
fre
hschui
huernui
Hugh Denman@gpplap3 ~
$ ls -ls --block-size=1 ./asd.txt
1024 -rw-r--r-- 1 Hugh Denman None 19 Mar 28 17:59 ./asd.txt
Here my 19-byte text file is taking up 1024 bytes of disk space.
Two other possibilities, suggested by Erik Jan Taal, are perl -e "print -s '$filename'"
and ls -l | sed -n 's/.* [^0-9]*\([0-9]\+\) .*/\1/ p'
. These will work on FreeBSD, for example, which does not support ls -g
.
To use the scp-resume
script, we’ll need a text file containing the filenames to transfer from the remote machine. Here’s one way to generate this list.
$ ssh remote-user@remote.machine.ip.addr "/bin/find /cygdrive/d -type f" | grep -vi i386 > ./filelist.txt
In this example, the remote drive contains the OS installation files in /cygdrive/d/I386
, which we don’t want to transfer.
With a fixed scp-resume
script, and the list of files to transfer present, all that’s left to do is iterate over each file in the list and tell scp-resume to download it. We use the cat filelist.txt | while read FILE
approach because it will preserve spaces in the filename (unlike for file in `cat filelist.txt`
).
cat filelist.txt | while read FILE ; do
DIR=`dirname "$FILE"`;
mkdir -p "./$DIR" ;
./scp-resume.sh -d "remote-user@remote.machine.ip.addr
:$FILE" "./$FILE" ;
done
This very nearly works – the only trouble is that it will only transfer the first file in the list, and then inexplicably stops without an error! This is a difficulty that arises whenever you use the cat [file] | while read VAR
idea, with a shell invocation inside the while loop: whenever a shell is started, it gets STDIN, which kills the pipe (I found that out in a Usenet post). So we have to modify scp-resume
one last time, changing the download command
ssh -C -c arcfour "$userhost" "dd bs=1 skip=$localsize \"if=${remotefile}\"" >> $localfile < /dev/null
With this change, you can’t enter the ssh password in manually – but you’d have to have automatic authentication setup anyway really, as you don’t want to enter your password for every file. A simple way to set up automatic authentication is described here.
Lastly, you can wrap the whole command above in a for loop with a few iterations so that if the connection is dropped on a few transfers, the file can be resumed in a subsequent pass:
for i in `seq 0 100`; do
cat filelist.txt | while read FILE ; do
DIR=`dirname "$FILE"`; mkdir -p "./$DIR" ;
./scp-resume.sh -d "remote-user@remote.machine.ip.addr
:$FILE" "./$FILE" ;
done; done
This whole process is hideously inefficient for large numbers of files, alas. But it seems to get the job done. Here’s my edited version of scp-resume, using the redirect from /dev/null for ssh and using ls -g
instead of ls -l
to query the file size. Note that I’ve only tested the downloading functionality, never the uploading bits.
#!/bin/sh
#
# scp-resume - by erik jan taal
# http://ejtaal.net/scripts-showcase/#scp-resume
# Speed improvements by using blocks by nitro.tm@gmail.com
# Fixed by Hugh Denman to use ls -g (safe with usernames containing spaces)
# this versions assumes that ssh is setup for automatic authentication rather than manual password entry
#
# This script assumes that you have access to the 'dd' utility
# on both the local and remote host.
# dd transfer blocksize (8192 by default)
blocksize=8192
usage()
{
echo
echo "Usage: `basename $0` -u(pload) $localfile $remotefile [$sshargs]"
echo " `basename $0` -d(ownload) $remotefile $localfile [$sshargs]"
echo
echo " $remotefile should be in the scp format, i.e.: [user@]host:filename"
echo " $sshargs are option further ssh options such as a port specification"
echo " (-p 1234) or use of compression (-C)"
echo
echo " -u:"
echo " $remotefile may be [user@]host: for uploading to your remote home directory"
echo " -d:"
echo " $localfile may be a period (.) when downloading a remote file to the"
echo " current working directory."
echo
exit 1
}
[ -z "$1" -o -z "$2" -o -z "$3" ] && usage
option=$1
case $option in
-[uU]*)
localfile=$2
remote=$3
shift 3
sshargs="$*"
userhost=${remote%:*}
remotefile=${remote#*:}
if [ ! -f "$localfile" ]; then
echo "!! File not found: $localfile"
usage
fi
if [ x"$userhost" = x"$remote" ]; then usage; fi
if [ x"$remotefile" = x"$remote" -o -z "$remotefile" ]; then remotefile=`basename "$localfile"`; fi
echo "==>> Getting size of remote file:"
localsize=`ls -g "${localfile}" | awk '{ print $4 }'`
remotesize=`ssh $sshargs "$userhost" "[ -f \"${remotefile}\" ] && ls -g \"${remotefile}\"" | awk '{ print $4 }' < /dev/null`
[ -z "$remotesize" ] && remotesize=0
echo "=> Remote filesize: $remotesize bytes"
if [ $localsize -eq $remotesize ]; then
echo "=> Local size equals remote size, nothing to transfer."
exit 0;
fi
remainder=$((remotesize % blocksize))
restartpoint=$((remotesize - remainder))
blockstransferred=$((remotesize / blocksize))
echo "=> Resuming upload of '$localfile'"
echo " at byte: $restartpoint ($blockstransferred blocks x $blocksize bytes/block),"
echo " will overwrite the trailing $remainder bytes."
dd bs=$blocksize skip=$blockstransferred "if=${localfile}" |
ssh $sshargs "$userhost" "dd bs=$blocksize seek=$blockstransferred of=\"$remotefile\"" < /dev/null
echo "done."
;;
-[dD]*)
localfile=$3
remote=$2
shift 3
sshargs="$*"
userhost=${remote%:*}
remotefile=${remote#*:}
if [ x"$localfile" = x"." ]; then localfile=`basename "$remotefile"`; fi
if [ ! -f "$localfile" ]; then
localsize=0;
else
localsize=`ls -g "${localfile}" | awk '{ print $4 }'`
fi
[ x"$remotefile" = x"$remote" ] && usage
[ -z "$localsize" ] && localsize=0
remainder=$((localsize % blocksize))
restartpoint=$((localsize - remainder))
blockstransferred=$((localsize / blocksize))
echo "=> Resuming download of '$localfile'"
echo " at byte: $restartpoint ($blockstransferred blocks x $blocksize bytes/block)"
echo " filesize: $localsize; will overwrite the trailing $remainder bytes."
ssh $sshargs "$userhost" "dd bs=$blocksize skip=$blockstransferred \"if=${remotefile}\"" < /dev/null |
dd bs=$blocksize seek=$blockstransferred "of=$localfile"
;;
*)
usage
;;
esac
Second real post exactly one year after the first! Prolific.