Item 17: Check for assignment to self in operator=.
An assignment to self occurs when you do something like this:
class X { ... };
X a;
a = a;
// a is assigned to itself
This looks like a silly thing to do, but it's perfectly legal, so don't doubt for a moment that programmers do it.
More importantly, assignment to self can appear in this more benign-looking form:
a = b;
If b is another name for a (for example, a reference that has been initialized to a), then this is also an assignment
to self, though it doesn't outwardly look like it. This is an example of aliasing: having two or more names for the
same underlying object. As you'll see at the end of this Item, aliasing can crop up in any number of nefarious
disguises, so you need to take it into account any time you write a function.
Two good reasons exist for taking special care to cope with possible aliasing in assignment operator(s). The
lesser of them is efficiency. If you can detect an assignment to self at the top of your assignment operator(s), you
can return right away, possibly saving a lot of work that you'd otherwise have to go through to implement
assignment. For example, Item 16 points out that a proper assignment operator in a derived class must call an
assignment operator for each of its base classes, and those classes might themselves be derived classes, so
skipping the body of an assignment operator in a derived class might save a large number of other function
calls.
A more important reason for checking for assignment to self is to ensure correctness. Remember that an
assignment operator must typically free the resources allocated to an object (i.e., get rid of its old value) before
it can allocate the new resources corresponding to its new value. When assigning to self, this freeing of
resources can be disastrous, because the old resources might be needed during the process of allocating the new
ones.
Consider assignment of String objects, where the assignment operator fails to check for assignment to self:
class String {
public:
String(const char *value);
~String();
// see Item 11 for
// function definition
// see Item 11 for
// function definition
...
String& operator=(const String& rhs);
private:
char *data;
};
// an assignment operator that omits a check
// for assignment to self
String& String::operator=(const String& rhs)
{
delete [] data;
// delete old memory
// allocate new memory and copy rhs's value into it
data = new char[strlen(rhs.data) + 1];
strcpy(data, rhs.data);
return *this;
// see Item 15
}
Consider now what happens in this case:
String a = "Hello";
a = a;
// same as a.operator=(a)
Inside the assignment operator, *this and rhs seem to be different objects, but in this case they happen to be
different names for the same object. You can envision it like this:
The first thing the assignment operator does is use delete on data, and the result is the following state of affairs:
Now when the assignment operator tries to do a strlen on rhs.data, the results are undefined. This is because
rhs.data was deleted when data was deleted, which happened because data, this->data, and rhs.data are all the
same pointer! From this point on, things can only get worse.
By now you know that the solution to the dilemma is to check for an assignment to self and to return immediately
if such an assignment is detected. Unfortunately, it's easier to talk about such a check than it is to write it,
because you are immediately forced to figure out what it means for two objects to be "the same."
The topic you confront is technically known as that of object identity, and it's a well-known topic in
object-oriented circles. This book is no place for a discourse on object identity, but it is worthwhile to mention
the two basic approaches to the problem.
One approach is to say that two objects are the same (have the same identity) if they have the same value. For
example, two String objects would be the same if they represented the same sequence of characters:
String a = "Hello";
String b = "World";
String c = "Hello";
Here a and c have the same value, so they are considered identical; b is different from both of them. If you
wanted to use this definition of identity in your String class, your assignment operator might look like this:
String& String::operator=(const String& rhs)
{
if (strcmp(data, rhs.data) == 0) return *this;
...
}
Value equality is usually determined by operator==, so the general form for an assignment operator for a class
C that uses value equality for object identity is this:
C& C::operator=(const C& rhs)
{
// check for assignment to self
if (*this == rhs)
// assumes op== exists
return *this;
...
}
Note that this function is comparing objects (via operator==), not pointers. Using value equality to determine
identity, it doesn't matter whether two objects occupy the same memory; all that matters is the values they
represent.
The other possibility is to equate an object's identity with its address in memory. Using this definition of object
equality, two objects are the same if and only if they have the same address. This definition is more common in
C++ programs, probably because it's easy to implement and the computation is fast, neither of which is always
true when object identity is based on values. Using address equality, a general assignment operator looks like
this:
C& C::operator=(const C& rhs)
{
// check for assignment to self
if (this == &rhs) return *this;
...
}
This suffices for a great many programs.
If you need a more sophisticated mechanism for determining whether two objects are the same, you'll have to
implement it yourself. The most common approach is based on a member function that returns some kind of
object identifier:
class C {
public:
ObjectID identity() const;
// see also Item 36
...
};
Given object pointers a and b, then, the objects they point to are identical if and only if a->identity() ==
b->identity(). Of course, you are responsible for writing operator== for ObjectIDs.
The problems of aliasing and object identity are hardly confined to operator=. That's just a function in which you
are particularly likely to run into them. In the presence of references and pointers, any two names for objects of
compatible types may in fact refer to the same object. Here are some other situations in which aliasing can show
its Medusa-like visage:
class Base {
void mf1(Base& rb);
// rb and *this could be
// the same
...
};
void f1(Base& rb1,Base& rb2);
class Derived: public Base {
// rb1 and rb2 could be
// the same
void mf2(Base& rb);
// rb and *this could be
// the same
...
};
int f2(Derived& rd, Base& rb);
// rd and rb could be
// the same
These examples happen to use references, but pointers would serve just as well.
As you can see, aliasing can crop up in a variety of guises, so you can't just forget about it and hope you'll never
run into it. Well, maybe you can, but most of us can't. At the expense of mixing my metaphors, this is a clear case
in which an ounce of prevention is worth its weight in gold. Anytime you write a function in which aliasing
could conceivably be present, you must take that possibility into account when you write the code.
An assignment to self occurs when you do something like this:
class X { ... };
X a;
a = a;
// a is assigned to itself
This looks like a silly thing to do, but it's perfectly legal, so don't doubt for a moment that programmers do it.
More importantly, assignment to self can appear in this more benign-looking form:
a = b;
If b is another name for a (for example, a reference that has been initialized to a), then this is also an assignment
to self, though it doesn't outwardly look like it. This is an example of aliasing: having two or more names for the
same underlying object. As you'll see at the end of this Item, aliasing can crop up in any number of nefarious
disguises, so you need to take it into account any time you write a function.
Two good reasons exist for taking special care to cope with possible aliasing in assignment operator(s). The
lesser of them is efficiency. If you can detect an assignment to self at the top of your assignment operator(s), you
can return right away, possibly saving a lot of work that you'd otherwise have to go through to implement
assignment. For example, Item 16 points out that a proper assignment operator in a derived class must call an
assignment operator for each of its base classes, and those classes might themselves be derived classes, so
skipping the body of an assignment operator in a derived class might save a large number of other function
calls.
A more important reason for checking for assignment to self is to ensure correctness. Remember that an
assignment operator must typically free the resources allocated to an object (i.e., get rid of its old value) before
it can allocate the new resources corresponding to its new value. When assigning to self, this freeing of
resources can be disastrous, because the old resources might be needed during the process of allocating the new
ones.
Consider assignment of String objects, where the assignment operator fails to check for assignment to self:
class String {
public:
String(const char *value);
~String();
// see Item 11 for
// function definition
// see Item 11 for
// function definition
...
String& operator=(const String& rhs);
private:
char *data;
};
// an assignment operator that omits a check
// for assignment to self
String& String::operator=(const String& rhs)
{
delete [] data;
// delete old memory
// allocate new memory and copy rhs's value into it
data = new char[strlen(rhs.data) + 1];
strcpy(data, rhs.data);
return *this;
// see Item 15
}
Consider now what happens in this case:
String a = "Hello";
a = a;
// same as a.operator=(a)
Inside the assignment operator, *this and rhs seem to be different objects, but in this case they happen to be
different names for the same object. You can envision it like this:
The first thing the assignment operator does is use delete on data, and the result is the following state of affairs:
Now when the assignment operator tries to do a strlen on rhs.data, the results are undefined. This is because
rhs.data was deleted when data was deleted, which happened because data, this->data, and rhs.data are all the
same pointer! From this point on, things can only get worse.
By now you know that the solution to the dilemma is to check for an assignment to self and to return immediately
if such an assignment is detected. Unfortunately, it's easier to talk about such a check than it is to write it,
because you are immediately forced to figure out what it means for two objects to be "the same."
The topic you confront is technically known as that of object identity, and it's a well-known topic in
object-oriented circles. This book is no place for a discourse on object identity, but it is worthwhile to mention
the two basic approaches to the problem.
One approach is to say that two objects are the same (have the same identity) if they have the same value. For
example, two String objects would be the same if they represented the same sequence of characters:
String a = "Hello";
String b = "World";
String c = "Hello";
Here a and c have the same value, so they are considered identical; b is different from both of them. If you
wanted to use this definition of identity in your String class, your assignment operator might look like this:
String& String::operator=(const String& rhs)
{
if (strcmp(data, rhs.data) == 0) return *this;
...
}
Value equality is usually determined by operator==, so the general form for an assignment operator for a class
C that uses value equality for object identity is this:
C& C::operator=(const C& rhs)
{
// check for assignment to self
if (*this == rhs)
// assumes op== exists
return *this;
...
}
Note that this function is comparing objects (via operator==), not pointers. Using value equality to determine
identity, it doesn't matter whether two objects occupy the same memory; all that matters is the values they
represent.
The other possibility is to equate an object's identity with its address in memory. Using this definition of object
equality, two objects are the same if and only if they have the same address. This definition is more common in
C++ programs, probably because it's easy to implement and the computation is fast, neither of which is always
true when object identity is based on values. Using address equality, a general assignment operator looks like
this:
C& C::operator=(const C& rhs)
{
// check for assignment to self
if (this == &rhs) return *this;
...
}
This suffices for a great many programs.
If you need a more sophisticated mechanism for determining whether two objects are the same, you'll have to
implement it yourself. The most common approach is based on a member function that returns some kind of
object identifier:
class C {
public:
ObjectID identity() const;
// see also Item 36
...
};
Given object pointers a and b, then, the objects they point to are identical if and only if a->identity() ==
b->identity(). Of course, you are responsible for writing operator== for ObjectIDs.
The problems of aliasing and object identity are hardly confined to operator=. That's just a function in which you
are particularly likely to run into them. In the presence of references and pointers, any two names for objects of
compatible types may in fact refer to the same object. Here are some other situations in which aliasing can show
its Medusa-like visage:
class Base {
void mf1(Base& rb);
// rb and *this could be
// the same
...
};
void f1(Base& rb1,Base& rb2);
class Derived: public Base {
// rb1 and rb2 could be
// the same
void mf2(Base& rb);
// rb and *this could be
// the same
...
};
int f2(Derived& rd, Base& rb);
// rd and rb could be
// the same
These examples happen to use references, but pointers would serve just as well.
As you can see, aliasing can crop up in a variety of guises, so you can't just forget about it and hope you'll never
run into it. Well, maybe you can, but most of us can't. At the expense of mixing my metaphors, this is a clear case
in which an ounce of prevention is worth its weight in gold. Anytime you write a function in which aliasing
could conceivably be present, you must take that possibility into account when you write the code.
Comments
Post a Comment
https://gengwg.blogspot.com/