Python redirect automation script running in Google Colab interface

Automate Redirects Using Python + Google Colab

    Redirects are one of SEO’s most tedious yet critical tasks. One mistake can cost you rankings, traffic, and revenue. The stakes get even higher when you’re migrating thousands of pages or rebuilding an entire eCommerce site structure.

    We faced this exact challenge when a client wanted to overhaul their eCommerce SEO strategy by switching from SKU-based URLs to product name URLs. With over 7,000 products and no WordPress backend to simplify the process, manual redirect mapping wasn’t realistic.

    After collaborating with Mark Williams-Cook from Candour, who’d encountered similar challenges, we built an automated solution. This Python script uses machine learning to match URLs based on content similarity, turning weeks of manual work into hours of automated processing.

    Python redirect automation script running in Google Colab interface

    Why Automated Redirect Matching Matters

    Traditional redirect planning relies on manual URL mapping – a process that’s error-prone and time-consuming. For small sites, this works fine. But when you’re dealing with enterprise-level migrations, the math becomes impossible.

    Consider our client’s scenario: 7,000 product pages, each requiring individual assessment and matching. At 2 minutes per redirect (conservative estimate), that’s 233 hours of work. Factor in common SEO mistakes from fatigue and repetitive work, and you’ve got a recipe for ranking disasters.

    Our Python solution addresses these pain points by using TF-IDF (Term Frequency-Inverse Document Frequency) algorithms to compare page elements like titles, headings, and meta descriptions. The script identifies content similarity patterns that humans might miss while processing hundreds of matches simultaneously.

    How TF-IDF Powers URL Matching

    TF-IDF measures how important specific words are within documents relative to a larger collection. In our redirect context, it compares page elements to find the closest content matches between old and new URLs.

    The algorithm assigns scores based on:

    • Term Frequency: How often specific words appear in page titles or descriptions
    • Inverse Document Frequency: How unique those words are across your entire site
    • Content Similarity: Overall semantic matching between source and destination pages

    This approach works particularly well for eCommerce sites where product names, categories, and descriptions provide rich matching data. Even when URL structures change completely, content elements remain consistent enough for accurate automated matching.

    TF-IDF algorithm visualisation showing URL matching process

    Setting Up Your Redirect Automation Workflow

    Step 1: Crawl Both Website Versions

    Start by crawling your origin and destination sites using Screaming Frog or Sitebulb. Export both crawls as .xlsx files with identical column structures. This consistency ensures the script can properly compare data points between versions.

    Pro tip for Mac users: Duplicate your Screaming Frog application in the Applications folder to run simultaneous crawls. This saves significant time when working with large sites.

    Focus on extracting meaningful comparison elements. Standard crawl data works well, but consider using Screaming Frog’s custom extraction feature for unique identifiers like SKU numbers in span tags or specific product attributes via XPath selectors.

    Step 2: Load and Configure the Script

    Access our Google Colab redirect script and begin by running the first cell. This imports essential Python libraries including pandas for data manipulation and sklearn for TF-IDF processing.

    The beauty of Google Colab lies in its accessibility – no local Python installation required. Everything runs in your browser with Google’s computing power handling the heavy processing.

    Step 3: Upload and Process Crawl Data

    Execute the second cell to upload your origin crawl data. The script prompts you to select which column to use for matching – typically page titles, H1 headings, or meta descriptions work best.

    Repeat this process for your destination crawl data in the fourth cell. Choose matching columns that provide the richest semantic content for comparison. Product descriptions often outperform generic page titles for eCommerce redirects.

    Google Colab interface showing crawl data upload process

    Running the Matching Algorithm

    Cell five executes the core TF-IDF matching process. The algorithm processes your selected columns, creates similarity scores between URLs, and generates preliminary redirect mappings.

    The script produces a precision-recall graph showing matching quality. High precision indicates accurate matches, while high recall means the algorithm found matches for most URLs. You want both metrics elevated for optimal results.

    Processing time varies based on site size. Our 7,000-page example took approximately 15 minutes to complete – dramatically faster than manual matching while maintaining accuracy levels above 85% in most cases.

    Interpreting Your Results

    The output includes confidence scores for each suggested redirect. High-confidence matches (typically above 0.8) usually require minimal manual review. Lower scores need human verification to prevent incorrect redirects.

    Pay special attention to:

    • Product variations that might cross-match (different colours, sizes)
    • Similar product names in different categories
    • Generic page titles that could create false positives
    • Discontinued products with no suitable destination match
    Precision-recall graph showing redirect matching accuracy metrics

    Manual Verification and Quality Control

    Automated matching handles the bulk work, but human oversight remains essential. Review the output systematically, starting with low-confidence matches and working upward.

    Common issues to watch for include category mismatches where products with similar names belong to different sections. The algorithm might match “Blue Widget Pro” with “Blue Widget Standard” when “Blue Widget Pro V2” would be more appropriate.

    Create a verification checklist covering:

    • Content relevance between source and destination
    • Category alignment and URL structure logic
    • Product specifications and key features
    • Internal linking implications

    This quality control phase typically reduces the 233-hour manual process to 20-30 hours of focused review time – a 90% efficiency gain while maintaining redirect accuracy.

    Implementation and Testing

    Once you’ve verified your redirect mappings, implement them through your preferred method – .htaccess files, server-level redirects, or WordPress redirect plugins for CMS-based sites.

    Test your redirects systematically using tools like Screaming Frog’s redirect checker or Google Search Console’s URL inspection tool. Focus on high-traffic pages first, then work through the entire redirect list.

    Monitor key SEO metrics post-implementation including crawl errors, ranking positions, and organic traffic patterns. Properly implemented redirects should maintain ranking strength while supporting your new URL structure.

    Redirect testing workflow showing implementation verification steps
    ” alt=”Redirect testing workflow showing implementation verification steps”/>

    Advanced Customisation Options

    The script accepts modifications for specific use cases. You can adjust similarity thresholds, weight different content elements, or incorporate additional matching criteria like product categories or custom fields.

    For complex migrations involving technical SEO considerations, consider combining multiple data sources. Merge Analytics data showing page performance with crawl data for smarter matching decisions.

    The Google Colab environment allows easy experimentation with different approaches. Clone the script, test variations, and find the optimal configuration for your specific site structure and content patterns.

    Results and Performance Impact

    Our client’s URL structure migration maintained 94% of organic traffic through the transition period. Properly matched redirects preserved ranking signals while supporting their improved user experience goals.

    The automated approach delivered:

    • Time savings: 200+ hours reduced to 25 hours total
    • Accuracy improvement: 87% correct matches vs estimated 70% with manual processing
    • Ranking preservation: 94% traffic retention during migration
    • Error reduction: Systematic approach eliminated common redirect mistakes

    These results demonstrate how strategic SEO automation can improve both efficiency and outcomes for complex technical projects.

    How accurate is automated redirect matching compared to manual methods?

    Our Python script achieves 85-90% accuracy in most cases, often outperforming manual methods which suffer from fatigue-related errors. However, human verification remains essential for quality control.

    Can this script work with non-eCommerce websites?

    Yes, the script works with any website type. It matches based on content elements like titles, headings, and descriptions, making it suitable for blogs, corporate sites, and service-based businesses.

    Do I need Python programming experience to use this tool?

    No programming experience required. The Google Colab script runs entirely in your browser with simple point-and-click operations for uploading files and selecting matching columns.

    How long does the matching process take for large websites?

    Processing time varies by site size. A 1,000-page site typically takes 2-5 minutes, while 10,000+ pages might need 15-30 minutes. This is still dramatically faster than manual matching.

    What crawling tools work best with this script?

    Screaming Frog and Sitebulb provide the best data export formats. Both crawls must use identical column structures and be exported as .xlsx files for proper script compatibility.

    Can I customise the matching criteria for specific content types?

    Yes, the script allows customisation of similarity thresholds and content weighting. You can prioritise certain page elements or adjust accuracy requirements based on your specific needs.

    Frequently Asked Questions

    How accurate is automated redirect matching compared to manual methods?

    Our Python script achieves 85-90% accuracy in most cases, often outperforming manual methods which suffer from fatigue-related errors. However, human verification remains essential for quality control.

    Can this script work with non-eCommerce websites?

    Yes, the script works with any website type. It matches based on content elements like titles, headings, and descriptions, making it suitable for blogs, corporate sites, and service-based businesses.

    Do I need Python programming experience to use this tool?

    No programming experience required. The Google Colab script runs entirely in your browser with simple point-and-click operations for uploading files and selecting matching columns.

    How long does the matching process take for large websites?

    Processing time varies by site size. A 1,000-page site typically takes 2-5 minutes, while 10,000+ pages might need 15-30 minutes. This is still dramatically faster than manual matching.

    What crawling tools work best with this script?

    Screaming Frog and Sitebulb provide the best data export formats. Both crawls must use identical column structures and be exported as .xlsx files for proper script compatibility.

    Can I customise the matching criteria for specific content types?

    Yes, the script allows customisation of similarity thresholds and content weighting. You can prioritise certain page elements or adjust accuracy requirements based on your specific needs.

    More From Our Blog

    Google Display Network campaign dashboard with ad previews and targeting options on a marketing strategist's desk

    Google Display Network Setup: The Real Story (GDN Guide)

    Analytics dashboard showing dramatic drop from healthy organic traffic to complete zero traffic loss

    Google Search Penalties: From Speed Bumps to Death Row

    A digital illustration of a sales funnel with icons like magnifying glasses and tick marks swirling into it, highlighting keyword intent, and segmented sections at the base with symbols for shopping trolleys, checklists, and user profiles.

    Keyword Intent for SEO: How to Drive ROI With Smarter Targeting

    SEND US A MESSAGE
    Let’s grow your business, together!
    This field is for validation purposes and should be left unchanged.
    I’m interested in
    This field is hidden when viewing the form