Skip to main content

Command Palette

Search for a command to run...

Track and Manage Word Revisions Using Python

Updated
7 min read
Track and Manage Word Revisions Using Python
A

Share office file processing skills in .NET, Java, and C++.

In collaborative environments, Word's revision tracking feature is essential for document review and version control. By enabling revision tracking, you can record all insertions, deletions, and formatting changes while tracking each modification's author and timestamp. This article demonstrates how to automate Word document revision tracking management using Python, including enabling revisions, extracting revision information, and accepting or rejecting changes.

Why Programmatic Revision Management Matters

Manually handling Word document revision tracking presents several challenges:

  • Batch Processing Difficulties: Manual operations become inefficient when processing multiple documents simultaneously

  • Cumbersome Information Extraction: Filtering specific authors or change types from numerous revisions is time-consuming

  • Automation Requirements: Automatically accepting standard revisions or flagging anomalous changes within document workflows

Managing revision tracking programmatically with Python enables you to:

  • Batch enable or disable revision tracking across documents

  • Automatically extract detailed revision information (author, type, timestamp)

  • Bulk accept or reject specific revisions based on rules

  • Set or modify revision author information

Environment Setup

Start by installing the Spire.Doc for Python library:

pip install Spire.Doc

This library provides comprehensive Word document operation APIs, supporting all core revision tracking functionalities.

Enabling and Disabling Revision Tracking

The most fundamental operation is enabling or disabling a document's revision tracking feature. When revision tracking is enabled, all modifications to the document are recorded.

from spire.doc import *
from spire.doc.common import *

# Create a Word document object
document = Document()
# Load file from disk
document.LoadFromFile("Sample.docx")

# Enable revision tracking
document.TrackChanges = True

# Save the document
document.SaveToFile("EnableTrackChanges.docx", FileFormat.Docx2013)
document.Close()

By setting the TrackChanges property to True, you enable revision tracking. Subsequently, any modifications to the document (text insertion, content deletion, formatting adjustments, etc.) are recorded as revisions. To disable revision tracking, simply set this property to False.

Extracting Revision Information

Extracting revision information is one of the most common tasks in revision management. You can iterate through all paragraphs and text ranges in a document to retrieve detailed information about each revision, including revision type, author, and timestamp.

from spire.doc import *
from spire.doc.common import *

def write_revisions_to_file(filename, content):
    """Write revision information to a text file"""
    with open(filename, "w", encoding="utf-8") as fp:
        for line in content:
            fp.write(line + "\n")

# Load a document containing revisions
document = Document()
document.LoadFromFile("GetRevisions.docx")

insert_revisions = ["Insert Revisions:"]
delete_revisions = ["Delete Revisions:"]
insert_index = 0
delete_index = 0

# Iterate through all sections in the document
for section_idx in range(document.Sections.Count):
    section = document.Sections.get_Item(section_idx)
    
    # Iterate through all child objects in the section body
    for body_idx in range(section.Body.ChildObjects.Count):
        doc_item = section.Body.ChildObjects.get_Item(body_idx)
        
        # Handle paragraph-level revisions
        if isinstance(doc_item, Paragraph):
            paragraph = doc_item
            
            # Check for insert revisions
            if paragraph.IsInsertRevision:
                insert_index += 1
                insert_revisions.append(f"Index: {insert_index}")
                
                revision = paragraph.InsertRevision
                insert_revisions.append(f"Type: {revision.Type.name}")
                insert_revisions.append(f"Author: {revision.Author}")
                insert_revisions.append("")
            
            # Check for delete revisions
            elif paragraph.IsDeleteRevision:
                delete_index += 1
                delete_revisions.append(f"Index: {delete_index}")
                
                revision = paragraph.DeleteRevision
                delete_revisions.append(f"Type: {revision.Type.name}")
                delete_revisions.append(f"Author: {revision.Author}")
                delete_revisions.append("")
            
            # Iterate through all child objects in the paragraph (text ranges)
            for text_idx in range(paragraph.ChildObjects.Count):
                obj = paragraph.ChildObjects.get_Item(text_idx)
                
                if isinstance(obj, TextRange):
                    text_range = obj
                    
                    # Check for insert revisions in text ranges
                    if text_range.IsInsertRevision:
                        insert_index += 1
                        insert_revisions.append(f"Index: {insert_index}")
                        
                        revision = text_range.InsertRevision
                        insert_revisions.append(f"Type: {revision.Type.name}")
                        insert_revisions.append(f"Author: {revision.Author}")
                        insert_revisions.append("")
                    
                    # Check for delete revisions in text ranges
                    elif text_range.IsDeleteRevision:
                        delete_index += 1
                        delete_revisions.append(f"Index: {delete_index}")
                        
                        revision = text_range.DeleteRevision
                        delete_revisions.append(f"Type: {revision.Type.name}")
                        delete_revisions.append(f"Author: {revision.Author}")
                        delete_revisions.append("")

# Save revision information to files
write_revisions_to_file("insert_revisions.txt", insert_revisions)
write_revisions_to_file("delete_revisions.txt", delete_revisions)

This code demonstrates how to systematically extract all revision information from a document:

  1. Traverse Document Structure: Sequentially access each Section, Paragraph, and TextRange

  2. Identify Revision Types: Determine revision types using IsInsertRevision and IsDeleteRevision properties

  3. Retrieve Revision Details: Use InsertRevision or DeleteRevision properties to obtain revision objects, extracting type and author information

  4. Categorize Storage: Save insert and delete revisions separately into different lists

This approach can be used to generate revision reports, filter modifications by specific authors, or analyze document change statistics.

Setting Revision Author Information

In certain scenarios, you may need to uniformly set or modify revision author information. For example, when merging documents from multiple authors, you can standardize all revision authors to a unified identifier.

from spire.doc import *
from spire.doc.common import *

# Load the document
document = Document()
document.LoadFromFile("GetRevisions.docx")

# Iterate through all sections in the document
for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    
    # Iterate through all child objects in the section body
    for j in range(section.Body.ChildObjects.Count):
        doc_item = section.Body.ChildObjects.get_Item(j)
        
        if isinstance(doc_item, Paragraph):
            paragraph = doc_item
            
            # Set author for paragraph-level insert revisions
            if paragraph.IsInsertRevision:
                paragraph.InsertRevision.Author = "E-iceblue"
            
            # Set author for paragraph-level delete revisions
            elif paragraph.IsDeleteRevision:
                paragraph.DeleteRevision.Author = "E-iceblue"
            
            # Iterate through text ranges in the paragraph
            for k in range(paragraph.ChildObjects.Count):
                text_range = paragraph.ChildObjects.get_Item(k)
                
                if isinstance(text_range, TextRange):
                    # Set author for text range insert revisions
                    if text_range.IsInsertRevision:
                        text_range.InsertRevision.Author = "E-iceblue"
                    
                    # Set author for text range delete revisions
                    elif text_range.IsDeleteRevision:
                        text_range.DeleteRevision.Author = "E-iceblue"

# Save the modified document
document.SaveToFile("SetRevisionAuthor.docx", FileFormat.Docx2013)
document.Close()

By modifying the Revision.Author property, you can uniformly set revision author names. This proves valuable in document standardization processes or anonymized review scenarios.

Accepting or Rejecting Revisions

After completing document review, you typically need to accept or reject specific revisions. This operation can be performed on entire documents, specific sections, or individual paragraphs.

from spire.doc import *
from spire.doc.common import *

# Load a document containing revisions
document = Document()
document.LoadFromFile("AcceptOrRejectTrackedChanges.docx")

# Get the first section and paragraph to process
section = document.Sections[0]
paragraph = section.Paragraphs[0]

# Accept all revisions
paragraph.Document.AcceptChanges()

# Or reject all revisions
# paragraph.Document.RejectChanges()

# Save the processed document
document.SaveToFile("AcceptOrRejectTrackedChanges_out.docx", FileFormat.Docx2013)
document.Close()

The AcceptChanges() method applies all revisions, making the document content reflect the final modified state. Conversely, the RejectChanges() method reverses all revisions, restoring the document to its pre-revision state.

For more granular control, you can combine this with the revision extraction functionality described earlier—first filtering specific revisions, then selectively accepting or rejecting them. For instance, you might accept only revisions from a particular author, or accept insert-type revisions while rejecting deletions.

Practical Application Techniques

Batch Processing Multiple Documents

In real-world scenarios, you often need to batch process revision tracking across multiple documents. This can be automated by combining Python's file operations:

import os
from spire.doc import *

def batch_enable_track_changes(folder_path):
    """Batch enable revision tracking for all Word documents in a folder"""
    for filename in os.listdir(folder_path):
        if filename.endswith(".docx"):
            filepath = os.path.join(folder_path, filename)
            
            document = Document()
            document.LoadFromFile(filepath)
            document.TrackChanges = True
            
            output_path = os.path.join(folder_path, f"tracked_{filename}")
            document.SaveToFile(output_path, FileFormat.Docx2013)
            document.Close()
            
            print(f"Processed: {filename}")

Filtering Revisions by Specific Authors

By extending the revision extraction functionality, you can filter revisions from specific authors:

def filter_revisions_by_author(document, author_name):
    """Filter revisions by a specific author"""
    filtered_revisions = []
    
    for section in document.Sections:
        for body_item in section.Body.ChildObjects:
            if isinstance(body_item, Paragraph):
                if body_item.IsInsertRevision and body_item.InsertRevision.Author == author_name:
                    filtered_revisions.append(body_item)
                
                for child in body_item.ChildObjects:
                    if isinstance(child, TextRange):
                        if child.IsInsertRevision and child.InsertRevision.Author == author_name:
                            filtered_revisions.append(child)
    
    return filtered_revisions

Generating Revision Statistics Reports

You can count various revision types in a document to generate summary reports:

def generate_revision_report(document):
    """Generate a revision statistics report"""
    stats = {
        "insert_count": 0,
        "delete_count": 0,
        "authors": set()
    }
    
    for section in document.Sections:
        for body_item in section.Body.ChildObjects:
            if isinstance(body_item, Paragraph):
                if body_item.IsInsertRevision:
                    stats["insert_count"] += 1
                    stats["authors"].add(body_item.InsertRevision.Author)
                elif body_item.IsDeleteRevision:
                    stats["delete_count"] += 1
                    stats["authors"].add(body_item.DeleteRevision.Author)
                
                for child in body_item.ChildObjects:
                    if isinstance(child, TextRange):
                        if child.IsInsertRevision:
                            stats["insert_count"] += 1
                            stats["authors"].add(child.InsertRevision.Author)
                        elif child.IsDeleteRevision:
                            stats["delete_count"] += 1
                            stats["authors"].add(child.DeleteRevision.Author)
    
    return stats

Conclusion

This article has covered the core techniques for managing Word document revision tracking using Python, including enabling revision tracking, extracting revision information, setting revision authors, and accepting or rejecting revisions. These capabilities enable automation of document review workflows, improving team collaboration efficiency.

In practical applications, you can combine these features according to specific requirements—such as batch document processing, filtering specific revisions, or generating review reports. The comprehensive API support provided by Spire.Doc for Python simplifies these tasks, making them ideal for integration into various document management workflows.