Copying Files With Links
Exercise 3.11 involves copying a tree of directories and files while
preserving hard and symbolic links. This article discusses what "preserving"
might mean and sketches how to do the copying. Multiple links to directories are
not discussed.
Hard Links
How to Preserve Multiple Links
Every file has at least one hard link, and some have more than one. There's
no distinction between them; that is, no concept of "main" and "secondary"
links. It's obvious how to handle files with only one hard link, so this section
is concerned only with multiply-linked-to files.
In a general-purpose copying utility, which is what Exercise 3.11 is about,
there are three ways to interpret what "preserving" the hard links might mean:
- Don't copy the file at all. Just re-create all the links to it in the
target tree. This is really unworkable in general, because the target might be
on another device, and hard links can't span devices.
- Make one copy of the file and re-create any other links to it that were in
the source tree as links to the copy in the target tree.
- Don't pay any attention to the link count, and just make multiple copies
of the file in the target tree. This is in effect what happens in typical
solutions to Exercises 3.9 and 3.10.
A mentioned, #1 is no good, and #3 wastes space and severs the linking
arrangement, so #2 is the best.
Copying Algorithm
Two members of the stat
structure,
st_dev
and st_ino
,
uniquely identify an i-node on a mounted device. So, given several links, these
two members can be used to tell whether they are linking to the same source i-node.
While traversing the source tree, it's necessary to keep track (via a linear
list, hash table, etc.) of each source i-node (st_dev
/st_ino
pair) that corresponds to a file with multiple links and to associate that i-node
with the path name in the target tree. That path name is the path that results
from the first encounter with the source i-node, which is when the file is
actually copied. When another link to a source i-node that has already been
copied is encountered, a link is created in the target tree to the first copy of
the file in the target tree. This way, there will be exactly one copy in the
target tree of every file in the source tree.
Although only files with more than one link need to be tracked to handle hard
links, also tracking files with one link will help with symbolic links, which
are discussed next.
Symbolic Links
Two Copying Situations
In copying a source tree, there are two kinds of symbolic links that might be
encountered:
- Internal: A symbolic link that references something within
the source tree. Whatever that something is, it will be treated separately, so
the symbolic link should be re-created in the target tree with its contents (a
path in the source tree) appropriately transformed to a path in the target
tree.
- External: A symbolic link that references something outside
the source tree. The link should simply be re-created in the target tree with
the same contents.
(Of course, this suggested treatment of symbolic links isn't the only
reasonable one. Perhaps the files externally linked to should be copied,
especially if the purpose of the tree-copy is for backup.)
Copying Algorithm
If files are tracked as explained in the section "Hard Links," it's easy to
tell whether a symbolic link points to an internal file: Use
stat
to get the st_dev
/st_ino
pair and see if it's in the table. If so, it's internal, and the new path, also
in the table, is what you pass to symlink
.
For external links, the old path can be read with
readlink
and simply used directly in a call to
symlink
.
Acknowledgements
comp.unix.programmer Thread
Updated
03/26/2005 11:53:01 AM