Featured image of post Shallow Copy vs Deep Copy: A Memory-Level Explanation

Shallow Copy vs Deep Copy: A Memory-Level Explanation

Understanding how data is copied in programming is fundamental to writing robust and predictable software. The distinction between shallow copy and deep copy is often a source of confusion, yet it is critical for managing memory, preventing unintended side effects, and ensuring data integrity. This article delves into the intricacies of these copying mechanisms, exploring their conceptual differences, language-specific implementations, and the underlying memory dynamics that govern their behavior.

Introduction to Copying Mechanisms

When an object is copied, the operation essentially creates a new entity that, to varying degrees, resembles the original. The nature of this resemblance — whether it’s a mere replication of references or a complete duplication of all nested data — defines the type of copy performed. This distinction becomes particularly significant when dealing with complex data structures that contain other objects or collections.

Conceptual Differences: Shallow vs. Deep

Shallow Copy

A shallow copy creates a new object, but instead of duplicating nested objects, it copies their references. This means that the new object will have its own distinct memory address, but any mutable objects nested within it will still point to the same memory locations as those in the original object. Consequently, modifications to nested mutable objects in either the original or the copied structure will be reflected in both.

Deep Copy

A deep copy, in contrast, creates a new object and recursively duplicates all nested objects. This results in a completely independent copy where no shared references exist between the original and the new object. Changes made to the deep copy, or its nested components, will not affect the original object, and vice-versa.

Memory Dynamics: Stack vs. Heap and References

To fully grasp shallow and deep copies, it is essential to understand how memory is managed. The stack is used for static memory allocation, primarily for primitive data types and function call frames. The heap is used for dynamic memory allocation, where objects and complex data structures reside. Variables often store references (memory addresses) to objects on the heap.

Pointer/Reference Semantics

In many languages, variables do not directly hold complex objects but rather references or pointers to their locations in memory. When a shallow copy occurs, it is these references that are duplicated, not the objects they point to. A deep copy, however, dereferences these pointers and creates new objects at new memory locations for each nested component.

Mutability vs. Immutability

Mutability refers to the ability of an object to be changed after it has been created. Objects like lists, dictionaries, and custom class instances are typically mutable. Immutability means an object cannot be altered after creation (e.g., strings, numbers, tuples in Python). The implications of shallow and deep copies are most pronounced with mutable nested objects, as changes to shared mutable components can lead to unexpected behavior.

Visualizing Memory: ASCII Diagrams

Consider an object A containing a mutable nested object B.

Original Object

Stack:         Heap:
+---+          +-------------------+
| A | -------> | Object A          |
+---+          |   +-------------+ |
               |   | ref to B    | ---> +-------------------+
               |   +-------------+ |    | Object B          |
               +-------------------+    |   Value: 10       |
                                        +-------------------+

Shallow Copy

When A_shallow = shallow_copy(A):

Stack:         Heap:
+---+          +-------------------+
| A | -------> | Object A          |
+---+          |   +-------------+ |
               |   | ref to B    | ---> +-------------------+
               |   +-------------+ |    | Object B          |
+-----------+  +-------------------+    |   Value: 10       |
| A_shallow | --+-------------------+    +-------------------+
+-----------+   | Object A_shallow  |
                |   +-------------+ |
                |   | ref to B    | --^
                |   +-------------+ |
                +-------------------+

Notice that Object B is shared. Modifying Object B through A or A_shallow will affect both.

Deep Copy

When A_deep = deep_copy(A):

Stack:         Heap:
+---+          +-------------------+
| A | -------> | Object A          |
+---+          |   +-------------+ |
               |   | ref to B    | ---> +-------------------+
               |   +-------------+ |    | Object B          |
               +-------------------+    |   Value: 10       |
                                        +-------------------+

+---------+    +-------------------+
| A_deep  | -->| Object A_deep     |
+---------+    |   +-------------+ |
               |   | ref to B_new| ---> +-------------------+
               |   +-------------+ |    | Object B_new      |
               +-------------------+    |   Value: 10       |
                                        +-------------------+

Here, Object B and Object B_new are entirely separate entities.

Language-Specific Implementations and Examples

Python

Python’s copy module provides copy() for shallow copies and deepcopy() for deep copies. For custom objects, __copy__ and __deepcopy__ methods can be implemented.

import copy

original_list = [1, [2, 3], 4]

# Shallow Copy
shallow_copied_list = copy.copy(original_list)
shallow_copied_list[1][0] = 99 # Modifies nested list in both
print(original_list)        # Output: [1, [99, 3], 4]
print(shallow_copied_list)  # Output: [1, [99, 3], 4]

# Deep Copy
deep_copied_list = copy.deepcopy(original_list)
deep_copied_list[1][0] = 100 # Modifies only the deep copy
print(original_list)        # Output: [1, [99, 3], 4] (original remains unchanged)
print(deep_copied_list)     # Output: [1, [100, 3], 4]

Java

In Java, Object.clone() performs a shallow copy by default. To achieve a deep copy, one must manually implement the Cloneable interface and override the clone() method to recursively clone mutable fields, or use serialization.

class Course {
    String name;
    public Course(String name) { this.name = name; }
}

class Student implements Cloneable {
    String studentName;
    Course course;

    public Student(String studentName, Course course) {
        this.studentName = studentName;
        this.course = course;
    }

    // Shallow copy via default clone()
    @Override
    protected Object clone() throws CloneNotSupportedException {
        return super.clone();
    }

    // Deep copy implementation (manual)
    public Student deepCopy() throws CloneNotSupportedException {
        Student clonedStudent = (Student) super.clone();
        clonedStudent.course = new Course(this.course.name); // Deep copy the Course object
        return clonedStudent;
    }

    public static void main(String[] args) throws CloneNotSupportedException {
        Course math = new Course("Math");
        Student originalStudent = new Student("Alice", math);

        // Shallow Copy
        Student shallowCopyStudent = (Student) originalStudent.clone();
        shallowCopyStudent.course.name = "Physics"; // Changes originalStudent.course.name too
        System.out.println(originalStudent.course.name); // Output: Physics

        // Deep Copy
        Course history = new Course("History");
        Student originalStudent2 = new Student("Bob", history);
        Student deepCopyStudent = originalStudent2.deepCopy();
        deepCopyStudent.course.name = "Chemistry"; // Only changes deepCopyStudent.course.name
        System.out.println(originalStudent2.course.name); // Output: History
    }
}

JavaScript

JavaScript lacks a built-in deep copy function. Shallow copies can be achieved using the spread operator (...), Object.assign(), or Array.prototype.slice(). Deep copies typically require JSON.parse(JSON.stringify()) (with limitations for functions, undefined, Date objects, etc.) or external libraries like Lodash’s _.cloneDeep().

const originalObject = {
  a: 1,
  b: {
    c: 2
  }
};

// Shallow Copy
const shallowCopyObject = { ...originalObject };
shallowCopyObject.b.c = 99; // Modifies originalObject.b.c too
console.log(originalObject.b.c); // Output: 99

// Deep Copy (with JSON serialization limitation)
const deepCopyObject = JSON.parse(JSON.stringify(originalObject));
deepCopyObject.b.c = 100; // Only modifies deepCopyObject.b.c
console.log(originalObject.b.c); // Output: 99 (original remains unchanged)

Go

In Go, assignment (=) creates a shallow copy for structs. Slices and maps are reference types, so assigning them also creates a shallow copy of the reference. Deep copying requires manual iteration or custom serialization/deserialization.

package main

import (
	"fmt"
)

type Address struct {
	City string
}

type Person struct {
	Name    string
	Address *Address // Pointer to Address
}

func main() {
	originalAddress := &Address{"New York"}
	originalPerson := Person{"Alice", originalAddress}

	// Shallow Copy (struct assignment)
	shallowCopyPerson := originalPerson
	shallowCopyPerson.Address.City = "London" // Modifies originalPerson.Address.City too
	fmt.Println(originalPerson.Address.City)    // Output: London

	// Deep Copy (manual)
	originalAddress2 := &Address{"Paris"}
	originalPerson2 := Person{"Bob", originalAddress2}

	deepCopyAddress := &Address{originalPerson2.Address.City}
	deepCopyPerson := Person{originalPerson2.Name, deepCopyAddress}

	deepCopyPerson.Address.City = "Rome" // Only modifies deepCopyPerson.Address.City
	fmt.Println(originalPerson2.Address.City) // Output: Paris
}

C++

In C++, assignment (=) for custom classes performs a shallow copy by default (member-wise copy). To achieve deep copy, a custom copy constructor and copy assignment operator must be implemented to handle dynamic memory allocation and recursively copy nested objects. The Rule of Three/Five/Zero applies here.

#include <iostream>
#include <string>

class Course {
public:
    std::string name;
    Course(std::string n) : name(n) {}
};

class Student {
public:
    std::string studentName;
    Course* course; // Pointer to Course

    Student(std::string sn, Course* c) : studentName(sn), course(c) {}

    // Default copy constructor (shallow copy)
    Student(const Student& other) : studentName(other.studentName), course(other.course) {
        std::cout << "Shallow copy constructor called\n";
    }

    // Deep copy constructor
    Student deepCopy() {
        std::cout << "Deep copy performed\n";
        return Student(this->studentName, new Course(*this->course)); // Create new Course object
    }

    ~Student() { // Destructor to free memory for course
        // Only delete if this object owns the course pointer
        // This highlights the complexity of manual memory management with shallow/deep copies
        // For simplicity, in this example, we'll assume ownership for deepCopy()
        // In real-world C++, smart pointers are preferred.
        // delete course; 
    }
};

int main() {
    Course* math = new Course("Math");
    Student originalStudent("Alice", math);

    // Shallow Copy (using default copy constructor)
    Student shallowCopyStudent = originalStudent; // Calls copy constructor
    shallowCopyStudent.course->name = "Physics"; // Modifies originalStudent.course->name too
    std::cout << originalStudent.course->name << std::endl; // Output: Physics

    // Deep Copy
    Course* history = new Course("History");
    Student originalStudent2("Bob", history);
    Student deepCopyStudent = originalStudent2.deepCopy();
    deepCopyStudent.course->name = "Chemistry"; // Only modifies deepCopyStudent.course->name
    std::cout << originalStudent2.course->name << std::endl; // Output: History

    delete math;
    delete history;
    // Note: deepCopyStudent.course also needs to be deleted if not using smart pointers
    // delete deepCopyStudent.course; // This would be needed in a complete example

    return 0;
}

Rust

Rust distinguishes between Copy and Clone traits. The Copy trait is for types that can be duplicated by simply copying bits (e.g., primitive types, structs containing only Copy types). The Clone trait is for types that require a deep copy, and it must be explicitly implemented. For types that implement Copy, assignment performs a bitwise copy. For types that implement Clone, the clone() method performs a deep copy by convention.

#[derive(Debug, Clone)] // Derive Clone for deep copy
struct Address {
    city: String,
}

#[derive(Debug, Clone)] // Derive Clone for deep copy
struct Person {
    name: String,
    address: Address,
}

fn main() {
    let original_address = Address { city: String::from("New York") };
    let original_person = Person { name: String::from("Alice"), address: original_address.clone() };

    // Shallow Copy (assignment for types that are not Copy, like String, means move)
    // For structs with non-Copy fields, assignment is a move, not a shallow copy.
    // To demonstrate shallow copy behavior, we need to explicitly clone the outer struct
    // but not the inner mutable parts, which is not idiomatic Rust for mutable data.
    // Rust's ownership system makes direct 
shallow copy difficult to illustrate without violating ownership rules. Instead, we'll focus on `Clone` for deep copying.

    // Deep Copy using Clone trait
    let deep_copy_person = original_person.clone();
    deep_copy_person.address.city = String::from("London"); // Only modifies deep_copy_person
    println!("Original person: {:?}", original_person); // Output: Original person: Person { name: "Alice", address: Address { city: "New York" } }
    println!("Deep copy person: {:?}", deep_copy_person); // Output: Deep copy person: Person { name: "Alice", address: Address { city: "London" } }

    // Example with Copy trait (for primitive types)
    let x = 5;
    let y = x; // y is a copy of x, x is still valid
    println!("x: {}, y: {}", x, y);
}

Binary-Level Analysis

At the lowest level, memory is a contiguous sequence of bytes. Variables hold values, which can be either the data itself (for primitives) or a memory address (for references/pointers to objects).

When a shallow copy occurs, the bytes representing the references to nested objects are copied. The actual data bytes of the nested objects are not duplicated. This means two distinct objects now contain identical memory addresses pointing to the same underlying data structure.

For a deep copy, the process is more involved. It entails traversing the object graph, starting from the top-level object. For each nested object encountered, new memory is allocated on the heap, and the data from the original nested object is copied byte-for-byte into this new memory location. This recursive process ensures that all components of the new object are distinct from the original, residing in their own allocated memory segments.

Stack vs. Heap Revisited

  • Stack: Stores local variables, function parameters, and return addresses. Memory allocation/deallocation is automatic and fast. Primitive values are often stored directly on the stack.
  • Heap: Stores dynamically allocated objects and data structures whose size might not be known at compile time or whose lifetime extends beyond the scope of a single function call. Objects created with new (Java, C++), malloc (C), or implicitly (Python, JavaScript) reside here. Variables on the stack often hold pointers to these heap-allocated objects.

A shallow copy duplicates the stack-allocated reference, but the heap-allocated object remains singular. A deep copy involves new heap allocations for all duplicated objects.

Performance Considerations

  • Shallow Copy: Generally faster as it only involves copying references (pointers), which is a fixed-size operation regardless of the complexity of the nested objects. It does not require additional memory allocation for nested data.
  • Deep Copy: Can be significantly slower and consume more memory, especially for large and deeply nested object graphs. It involves recursive traversal, new memory allocations for each nested object, and byte-by-byte data copying. The performance impact can be substantial, making it a consideration for performance-critical applications.

Real-World Bugs Caused by Shallow Copies

Shallow copies are a common source of subtle bugs, particularly when developers are unaware of the shared references. Examples include:

  1. Unexpected State Changes: Modifying a configuration object that was shallow-copied from a default template. Any changes to nested lists or dictionaries in the copied configuration inadvertently alter the default template, affecting other parts of the application that rely on the original default.
  2. Data Corruption in Multi-threaded Environments: In concurrent programming, if multiple threads operate on shallow copies of an object, and those copies share mutable nested data, race conditions can occur, leading to data corruption or inconsistent states.
  3. UI Component Re-rendering Issues: In front-end frameworks, if a component receives a shallow copy of props containing mutable objects, and those nested objects are modified directly, the framework might not detect a change in props (because the reference itself hasn’t changed), failing to re-render the component with the updated data.

When Deep Copy is a Code Smell

While deep copies provide complete independence, their overuse can indicate a code smell:

  • Performance Overhead: As noted, deep copying can be expensive. If a deep copy is performed frequently on large data structures without a clear need for complete independence, it can become a performance bottleneck.
  • Increased Memory Consumption: Duplicating entire object graphs consumes significantly more memory. Excessive deep copying can lead to higher memory usage and potentially out-of-memory errors.
  • Complexity and Maintenance: Implementing custom deep copy logic (especially in languages without built-in support) can be complex and error-prone. It adds boilerplate code and increases the maintenance burden.
  • Breaking Object Identity: If object identity is crucial (e.g., for caching, unique identifiers, or graph structures), deep copying can break these relationships by creating new, distinct objects.

Often, a need for deep copy suggests that the object’s design might be problematic, or that the data flow could be managed more effectively, perhaps through immutable data structures or clearer ownership semantics.

When to Use Each Type of Copy

Feature Shallow Copy Deep Copy
Independence New object, but nested mutable objects are shared Completely independent object and all its nested data
Performance Faster (copies references) Slower (recursive, allocates new memory)
Memory Usage Lower (shares nested data) Higher (duplicates all nested data)
Use Cases Copying simple objects, immutable nested data, or when shared references are desired When complete isolation from the original is required, especially with mutable nested objects
Side Effects Changes to nested mutable objects affect both original and copy Changes only affect the copied object

Conclusion

The choice between a shallow and a deep copy is not arbitrary; it is a deliberate design decision with significant implications for program behavior, performance, and memory usage. A shallow copy is efficient for simple objects or when shared references to nested mutable data are acceptable or even desired. Conversely, a deep copy is essential when complete independence from the original object, including all its nested components, is paramount. A thorough understanding of these copying mechanisms, coupled with an awareness of memory management and language-specific behaviors, empowers developers to make informed choices, prevent subtle bugs, and build more robust and maintainable software systems.