Moving to Freedom, .Org(on)

bash shell script: copy only files modified after specified date

Here’s something I came up with to emulate a feature of a tool I had in Windows that I couldn’t figure how to do in GNU. The xcopy DOS command lets you recursively copy files modified after a certain date by using the /D:date option.

Doesn’t tar take care of this with the --after-date option?

Yes, but I haven’t been able to figure out how to get it to exclude directories that are empty of files after the date. The only mention of “prune” in nearly one megabyte of documentation is in reference to the fruit.

If I have a hierarchy with hundreds of folders and only a handful of files have changed, I might not want to store all of the empty directories in my tar file. Just for general tidiness purposes, if nothing else. Imagine untarring all those empty dirs. Gauche. It’s not that big of a deal, but I became motivated by the challenge of figuring out a way to do this, and kept scratching this (minor) itch.

sail

Well, why didn’t I just use [some easy/obvious method]?

Now you tell me! I’d be very interested to learn about built-in ways to accomplish what I’m trying to do, whether it’s copying files to a target directory or directly into an archive file. I might feel silly if it’s really obvious or something I should have found in the documentation or my searches, but I wouldn’t regret the time I’ve spent cobbling together these scripts. I learned a lot about the tools and about bash scripting.

It’s often best for me to struggle with things to get the lessons embedded in my pulpy grey matter. In previous entries I’ve railed against the free software migration and learning process because my time is in such short supply, but I’m getting past that. It will take as long as it takes. (Plus time is never short. We waste it away every day.)

I’m enjoying the learning. I’ve always known the Unix command line is a powerful tool, and learning how to make it go is fun for me. You can find a lot of information about shell scripting online, of course, but I also recommend the O’Reilly book Learning the bash Shell. It’s a good introduction and reference.

getting to it

We’ll use the find, mkdir, and cp commands as our building blocks.

find -daystart -mtime n

Find lets you search for files last modified n*24 hours ago (or more recently). You can read the man page to see how the -mtime option works, but how about if we try some examples? I do much better with examples.

Let’s say it’s 3pm today and you have three files:

  1. file_today_2pm
  2. file_yesterday_4pm
  3. file_yesterday_2pm

find . -type f -mtime 0 returns files #1 and #2.

find . -type f -mtime 1 returns file #3.

This is because -mtime causes find to look at things in discrete 24-hour time periods, starting from right now. At 3pm, files #1 and #2 are in the first time period (counting backwards), and #3 is in the next.

To see all files modified in the first two time periods (counting backwards!), use find . -type f -mtime -1, which returns files 1-3.

To make things cleaner, the -daystart option makes the time periods start at midnight:

find . -type f -daystart -mtime 0 returns file #1.

find . -type f -daystart -mtime 1 returns files #2 and #3.

The cumulative “minus” modifier behaves differently than I’d expect:

find . -type f -daystart -mtime -1 only gives you file #1.

find . -type f -daystart -mtime -2 returns all three files.

Once we find the files we care about, we’ll want to copy them. For this, we can call on the -exec option. I’ve used -exec for years with grep in HP-UX (their grep doesn’t have a recursive option), and now I had the chance to learn some more about it. -exec allows you to call another program to operate on each of the found files. Let’s look at the grep example first, since it’s simple:

find . -type f -exec grep some_string {} \;

This will take each file and tell grep to look for some_string in that file. {} is an argument for grep — it’s the “found” filename. The semi-colon terminates the grep command. (The man page for find says that both {} and ; may need to be escaped with a \backslash or quoted to protect them from expansion by the shell. In my experience in HP-UX and GNU/Linux, it’s just the semi-colon that needs the backslash.)

mkdir and cp

We can use find -mtime to find our files and then call another command to copy them to the target hierarchy. Since cp (as far as I can tell) doesn’t allow you to force creation of parent directories if needed, we’ll need a second script to be called via -exec. It will simply make the target directory if necessary and then copy the file over.

logical operation

sail2

Let’s say we have:

~/test/
       file1.txt
       file2.txt
       apple/
             green.txt
             red.txt
       jupiter/
               moon/
                    io.txt
                    europa.txt
               red_spot/
                        storm.txt
~/test_target/

And we want to copy files modified in the past couple of days from “test” to “test_target”. Maybe that includes ~/test/apple/red.txt and ~/test/jupiter/moon/europa.txt. cpafter.sh will descend into ~/test/ so that:

find . -type f -daystart -mtime -2 will give these results:

./apple/red.txt ./jupiter/moon/europa.txt

We’ll pass these via -exec to copy_it.sh along with our target_dir with the understanding that it will build those paths (if necessary) under ~/test_target/ and then copy the files.

let’s look at the scripts (finally)

cpafter.sh

Let’s take a look at selected parts of the script and its operation. Here’s the usage/help information for cpafter.sh:

Copies files modified on or after the given date from source dir
to target dir, creating any subdirectories as needed.

    usage: cpafter.sh [-vf] -a after_date_YYYYMMDD -s source_dir -t target_dir
        -v verbose
        -f force target dir creation or copying to non-empty target dir

To manage the parameters, I learned how to use the nice getopts built-in command. Unfortunately, getopts doesn’t handle long option names, but I can live with that. (There is also the older external command “getopt” that can be made to work with long option names.)

while getopts ":vfa:s:t:" opt
do
    case $opt in
        v  ) verbose=" -v " ;;
        f  ) force="true" ;;
        a  ) after_date=$OPTARG ;;
        s  ) from_dir=$OPTARG ;;
        t  ) to_dir=$OPTARG ;;
        \\? ) echo -e $usage
             exit 1
    esac
done

My bash book goes into these features, and I also found a lot of web pages explaining them. Try searching for [bash getopt getopts].

To me it seems more natural and less ambiguous to pass in a date as an argument rather than the number of days, so let’s convert our YYYYMMDD formatted date into a number for -mtime:

after_date_epochal=$(date -d $after_date +%s)

today=$(date +%Y%m%d)
today_epochal=$(date -d $today +%s)

date_dif=$(( (($after_date_epochal - $today_epochal) / 60 / 60 / 24) - 1))

$after_date_epochal is the number of seconds since the epoch (1 Jan 1970) for our “after date.” Then we get midnight of the current day as seconds also, and do some math to find the difference for our -mtime number, $date_dif. That extra “- 1” on the end makes up for the unexpected (to me) extra one that we had to subtract in our example above.

Now for some funny business with our directories:

orig_dir=$PWD

cd $to_dir
to_dir=$PWD
cd $orig_dir

cd $from_dir

I wanted to make certain assumptions in copy_it.sh about the target dir which wouldn’t work if a path with no slashes was used for the target directory, and the above seemed like an easy (if not elegant) way to do it. Then we descend into the “from dir,” which works whether it is relative or absolute because we first return to the original directory that the script started in.

Now! Let’s run our big find command:

find . -type f -daystart -mtime $date_dif -exec copy_it.sh $verbose -s {} -t $to_dir \;
find . -type l -daystart -mtime $date_dif -exec copy_it.sh $verbose -s {} -t $to_dir \;

I run it twice to look for -type f (regular files) and -type l (symbolic links). I think those are the only things I’d want to use this for. I don’t know of a way to search for both at the same time.

copy_it.sh

Here’s the usage/help information for copy_it.sh:

Copies a file to a dir, using the path information from the file to
build a path from the given dir root if necessary. Meant to be used
with cpafter.sh.

    usage: copy_it.sh [-v] -s source_file -t target_directory_root
        -v verbose (just lists the source file)

And here is the meat of the script:

#regex -- does string start with dot slash?
if [[ ! "$from_file" =~ ^\./ ]]; then
    from_file="./$from_file"    #in case only a filename was given
fi

#return $from_file up to (but not including) last slash
add_to_target=${from_file%/*}

if [[ ! -d "$to_dir/$add_to_target" ]]; then

    mkdir -p "$to_dir/$add_to_target"
fi
cp -pdf "$curdir/$from_file" "$to_dir/$from_file"

mkdir -p causes any necessary parent directories to be created. cp -pdf preserves permissions, etc., doesn’t follow symbolic links, and forces a copy if the destination file cannot be opened (removes it and tries again).

run it

Using our earlier dir structure, let’s say we tried from our home dir:

cpafter.sh -v -a 20070414 -s test -t test_target

The verbose output would be:

copying from /home/username/test
        to /home/username/test_target

(mkdir) ./apple/
./apple/red.txt
(mkdir) ./jupiter/moon/
./jupiter/moon/europa.txt

And you would have:

~/test_target/apple/red.txt
~/test_target/jupiter/moon/europa.txt

Download

cpafter.tar.gz (14 KB)

contains