SEO मूल सिद्धांत

XML Sitemap और robots.txt: सही तरीके से configure करें

6 min

XML sitemap Googlebot द्वारा उनकी discovery facilitate करने के लिए आपके priority pages list करता है। Robots.txt file control करती है कि robot किन sections को crawl कर सकता है। ये दोनों files complementary हैं और indexing errors से बचने के लिए up to date रखनी चाहिए।

Sitemap और robots.txt दो सबसे fundamental SEO configuration files हैं। Poorly configured होने पर, वे inadvertently key pages को exclude कर सकते हैं या useless URLs पर crawl budget बर्बाद कर सकते हैं।

XML Sitemap: structure और best practices

XML sitemap उन URLs को list करता है जिन्हें आप indexed देखना चाहते हैं, optionally metadata के साथ (modification date, update frequency, priority)। Google ये metadata पढ़ता है लेकिन इन्हें verbatim follow नहीं करता।

50,000 से अधिक URLs या 50 MB से बड़ी sites के लिए, कई thematic sitemap files (articles, products, categories) की ओर pointing sitemap index create करें।

केवल canonical, indexable और 200 code return करने वाले URLs शामिल करें।
noindex pages, redirections और parameter वाले pages exclude करें।
Search Console में अपना sitemap submit करें और robots.txt में reference करें।
प्रत्येक नई publication पर sitemap automatically update करें।

Robots.txt file: directives और limits

Robots.txt domain की root पर स्थित है और user-agent के अनुसार Allow और Disallow rules की simple syntax का उपयोग करती है। यह Googlebot को बताती है कि site के किन हिस्सों को crawl न करें — लेकिन indexation exclusion की गारंटी नहीं देती।

Robots.txt द्वारा blocked page फिर भी results में appear हो सकता है यदि external links उसकी ओर point करें। पूर्ण exclusion के लिए robots.txt नहीं, noindex tag उपयोग करें।

Administration, staging और test folders block करें।
Internal search URLs जो हजारों variations generate करती हैं उन्हें block करें।
Page rendering के लिए जरूरी CSS और JS files कभी block न करें।
Robots.txt file के नीचे sitemap URL reference करें।

Critical errors और उन्हें कैसे avoid करें

सबसे गंभीर error: migration या poorly cleaned staging configuration के बाद robots.txt में 'Disallow: /' से accidentally पूरी site block करना। प्रत्येक deployment के बाद priority से इस file को verify करें।

Sitemap में error URLs (404, 301) include करना एक common error है जो Google को rigor की कमी signal करती है और non-existent resources पर crawl budget बर्बाद करती है।

SEO audits के दौरान, 15 से 40% sites उनके sitemap और actually indexable pages के बीच inconsistencies दिखाती हैं, अक्सर site updates के बाद insufficient maintenance के कारण।

Technical SEO audits पर 2025-2026 क्षेत्रीय अध्ययन

FAQ

क्या sitemap में priority और frequency indicate करनी चाहिए?

ये tags (priority और changefreq) Google द्वारा largely ignored हैं जो crawl frequency estimate करने के लिए अपने signals पर rely करता है। उनकी presence harmful नहीं है लेकिन उनकी absence भी कोई problem नहीं।

Submitted sitemap पढ़ने में Google को कितना समय लगता है?

Search Console में submission के बाद, Google generally 24 से 72 घंटों में sitemap पढ़ता है। नए URLs की discovery और actual indexation में site की authority के अनुसार अधिक समय लगता है।

क्या Robots.txt सभी search engines के लिए काम करता है?

Standard का सम्मान करने वाले सभी robots robots.txt को follow करते हैं। हालांकि, malicious robots (scrapers, non-conforming crawlers) इसे ignore करते हैं। Robots.txt इसलिए एक security tool नहीं बल्कि एक crawl management tool है।