Redirects are one of SEO’s most tedious yet critical tasks. One mistake can cost you rankings, traffic, and revenue. The stakes get even higher when you’re migrating thousands of pages or rebuilding an entire eCommerce site structure.
We faced this exact challenge when a client wanted to overhaul their eCommerce SEO strategy by switching from SKU-based URLs to product name URLs. With over 7,000 products and no WordPress backend to simplify the process, manual redirect mapping wasn’t realistic.
After collaborating with Mark Williams-Cook from Candour, who’d encountered similar challenges, we built an automated solution. This Python script uses machine learning to match URLs based on content similarity, turning weeks of manual work into hours of automated processing.

Why Automated Redirect Matching Matters
Traditional redirect planning relies on manual URL mapping – a process that’s error-prone and time-consuming. For small sites, this works fine. But when you’re dealing with enterprise-level migrations, the math becomes impossible.
Consider our client’s scenario: 7,000 product pages, each requiring individual assessment and matching. At 2 minutes per redirect (conservative estimate), that’s 233 hours of work. Factor in common SEO mistakes from fatigue and repetitive work, and you’ve got a recipe for ranking disasters.
Our Python solution addresses these pain points by using TF-IDF (Term Frequency-Inverse Document Frequency) algorithms to compare page elements like titles, headings, and meta descriptions. The script identifies content similarity patterns that humans might miss while processing hundreds of matches simultaneously.
How TF-IDF Powers URL Matching
TF-IDF measures how important specific words are within documents relative to a larger collection. In our redirect context, it compares page elements to find the closest content matches between old and new URLs.
The algorithm assigns scores based on:
- Term Frequency: How often specific words appear in page titles or descriptions
- Inverse Document Frequency: How unique those words are across your entire site
- Content Similarity: Overall semantic matching between source and destination pages
This approach works particularly well for eCommerce sites where product names, categories, and descriptions provide rich matching data. Even when URL structures change completely, content elements remain consistent enough for accurate automated matching.

Setting Up Your Redirect Automation Workflow
Step 1: Crawl Both Website Versions
Start by crawling your origin and destination sites using Screaming Frog or Sitebulb. Export both crawls as .xlsx files with identical column structures. This consistency ensures the script can properly compare data points between versions.
Pro tip for Mac users: Duplicate your Screaming Frog application in the Applications folder to run simultaneous crawls. This saves significant time when working with large sites.
Focus on extracting meaningful comparison elements. Standard crawl data works well, but consider using Screaming Frog’s custom extraction feature for unique identifiers like SKU numbers in span tags or specific product attributes via XPath selectors.
Step 2: Load and Configure the Script
Access our Google Colab redirect script and begin by running the first cell. This imports essential Python libraries including pandas for data manipulation and sklearn for TF-IDF processing.
The beauty of Google Colab lies in its accessibility – no local Python installation required. Everything runs in your browser with Google’s computing power handling the heavy processing.
Step 3: Upload and Process Crawl Data
Execute the second cell to upload your origin crawl data. The script prompts you to select which column to use for matching – typically page titles, H1 headings, or meta descriptions work best.
Repeat this process for your destination crawl data in the fourth cell. Choose matching columns that provide the richest semantic content for comparison. Product descriptions often outperform generic page titles for eCommerce redirects.

Running the Matching Algorithm
Cell five executes the core TF-IDF matching process. The algorithm processes your selected columns, creates similarity scores between URLs, and generates preliminary redirect mappings.
The script produces a precision-recall graph showing matching quality. High precision indicates accurate matches, while high recall means the algorithm found matches for most URLs. You want both metrics elevated for optimal results.
Processing time varies based on site size. Our 7,000-page example took approximately 15 minutes to complete – dramatically faster than manual matching while maintaining accuracy levels above 85% in most cases.
Interpreting Your Results
The output includes confidence scores for each suggested redirect. High-confidence matches (typically above 0.8) usually require minimal manual review. Lower scores need human verification to prevent incorrect redirects.
Pay special attention to:
- Product variations that might cross-match (different colours, sizes)
- Similar product names in different categories
- Generic page titles that could create false positives
- Discontinued products with no suitable destination match

Manual Verification and Quality Control
Automated matching handles the bulk work, but human oversight remains essential. Review the output systematically, starting with low-confidence matches and working upward.
Common issues to watch for include category mismatches where products with similar names belong to different sections. The algorithm might match “Blue Widget Pro” with “Blue Widget Standard” when “Blue Widget Pro V2” would be more appropriate.
Create a verification checklist covering:
- Content relevance between source and destination
- Category alignment and URL structure logic
- Product specifications and key features
- Internal linking implications
This quality control phase typically reduces the 233-hour manual process to 20-30 hours of focused review time – a 90% efficiency gain while maintaining redirect accuracy.
Implementation and Testing
Once you’ve verified your redirect mappings, implement them through your preferred method – .htaccess files, server-level redirects, or WordPress redirect plugins for CMS-based sites.
Test your redirects systematically using tools like Screaming Frog’s redirect checker or Google Search Console’s URL inspection tool. Focus on high-traffic pages first, then work through the entire redirect list.
Monitor key SEO metrics post-implementation including crawl errors, ranking positions, and organic traffic patterns. Properly implemented redirects should maintain ranking strength while supporting your new URL structure.

Advanced Customisation Options
The script accepts modifications for specific use cases. You can adjust similarity thresholds, weight different content elements, or incorporate additional matching criteria like product categories or custom fields.
For complex migrations involving technical SEO considerations, consider combining multiple data sources. Merge Analytics data showing page performance with crawl data for smarter matching decisions.
The Google Colab environment allows easy experimentation with different approaches. Clone the script, test variations, and find the optimal configuration for your specific site structure and content patterns.
Results and Performance Impact
Our client’s URL structure migration maintained 94% of organic traffic through the transition period. Properly matched redirects preserved ranking signals while supporting their improved user experience goals.
The automated approach delivered:
- Time savings: 200+ hours reduced to 25 hours total
- Accuracy improvement: 87% correct matches vs estimated 70% with manual processing
- Ranking preservation: 94% traffic retention during migration
- Error reduction: Systematic approach eliminated common redirect mistakes
These results demonstrate how strategic SEO automation can improve both efficiency and outcomes for complex technical projects.
How accurate is automated redirect matching compared to manual methods?
Our Python script achieves 85-90% accuracy in most cases, often outperforming manual methods which suffer from fatigue-related errors. However, human verification remains essential for quality control.
Can this script work with non-eCommerce websites?
Yes, the script works with any website type. It matches based on content elements like titles, headings, and descriptions, making it suitable for blogs, corporate sites, and service-based businesses.
Do I need Python programming experience to use this tool?
No programming experience required. The Google Colab script runs entirely in your browser with simple point-and-click operations for uploading files and selecting matching columns.
How long does the matching process take for large websites?
Processing time varies by site size. A 1,000-page site typically takes 2-5 minutes, while 10,000+ pages might need 15-30 minutes. This is still dramatically faster than manual matching.
What crawling tools work best with this script?
Screaming Frog and Sitebulb provide the best data export formats. Both crawls must use identical column structures and be exported as .xlsx files for proper script compatibility.
Can I customise the matching criteria for specific content types?
Yes, the script allows customisation of similarity thresholds and content weighting. You can prioritise certain page elements or adjust accuracy requirements based on your specific needs.
Frequently Asked Questions
How accurate is automated redirect matching compared to manual methods?
Our Python script achieves 85-90% accuracy in most cases, often outperforming manual methods which suffer from fatigue-related errors. However, human verification remains essential for quality control.
Can this script work with non-eCommerce websites?
Yes, the script works with any website type. It matches based on content elements like titles, headings, and descriptions, making it suitable for blogs, corporate sites, and service-based businesses.
Do I need Python programming experience to use this tool?
No programming experience required. The Google Colab script runs entirely in your browser with simple point-and-click operations for uploading files and selecting matching columns.
How long does the matching process take for large websites?
Processing time varies by site size. A 1,000-page site typically takes 2-5 minutes, while 10,000+ pages might need 15-30 minutes. This is still dramatically faster than manual matching.
What crawling tools work best with this script?
Screaming Frog and Sitebulb provide the best data export formats. Both crawls must use identical column structures and be exported as .xlsx files for proper script compatibility.
Can I customise the matching criteria for specific content types?
Yes, the script allows customisation of similarity thresholds and content weighting. You can prioritise certain page elements or adjust accuracy requirements based on your specific needs.


