When it comes to SEO, controlling how search engines interact with our website is crucial.
Two of the most important tools for this are the disallow and noindex directives.
Let’s explore their differences, when to use them, and how to implement them effectively to optimize website accessibility and ensure the privacy and security of our content.
What is Disallow?
Disallow is a directive used in a website’s robots.txt file to instruct search engines not to crawl certain pages or directories.
In other words, disallow prevents search engine crawlers like Googlebot from accessing specific parts of our website.
How does Disallow work?
To use disallow, we need to create or edit the robots.txt file, which must be located in the root directory of our website.
Within this file, we can specify which areas of the website we want to block from trackers.
Example:
User-agent: *
Disallow: /admin/
Disallow: /privado/
In this example, we are telling search engines not to crawl the directories. /admin/
and /privado/
.
This can be useful for protecting sensitive areas or areas that are not relevant for indexing.
What is Noindex?
Noindex
is a directive used to instruct search engines not to index a page, even if it is crawled.
This means that the page will not appear in search results.
How it Works Noindex
?
To use noindex
we can add a meta tag in the HTML head of the page or set an HTTP header.
This approach allows search engines to crawl the page, but prevents it from being included in the search index.
Example (meta tag):
HTML code:
HTML code
By including this meta tag in our page’s HTML, we are instructing search engines not to index that specific page.
Main differences between Disallow and Noindex
While both guidelines control how search engines interact with our site, disallow
and noindex
have different purposes.
Disallow
: Prevents crawling of specific pages or directories. Pages may still be indexed if found through other means (such as external links).Noindex
: Allows crawling but prevents indexing. The page will not appear in search results even if it is crawled.
Practical Examples of Use
Let’s explore some scenarios where disallow
and noindex
can be used effectively.
Scenario 1: Site Administration Area
We at iWEBAPP have a website with an administration area (/admin/
) that we don’t want to be crawled or indexed.
In this case, we can use disallow
to prevent tracking.
User-agent: *
Disallow: /admin/
This prevents trackers from accessing the admin area, protecting our internal settings and sensitive data.
Scenario 2: Private Content Pages
If we have private content pages that should not appear in search results, we use noindex
.
HTML code:
HTML code
By adding this meta tag to private pages, we ensure that they are not indexed by search engines, maintaining the privacy of the content.
Scenario 3: Strategic Combination
To ensure that a page is neither crawled nor indexed, we can combine disallow
and noindex
.
User-agent: *
Disallow: /confidencial/
HTML code:
HTML code
This two-pronged approach ensures that the page /confidencial/
not be tracked or indexed, providing an extra layer of security.
When to Use Disallow and Noindex?
Know when to use disallow
and noindex
is essential for an effective SEO strategy.
When to Use Disallow:
- To block trackers: When we want to prevent search engines from accessing certain areas of the website.
- To protect sensitive areas: Such as administration pages or internal directories that should not be publicly accessible.
- To control website accessibility: Preventing crawling of parts of the site that are not relevant to search engines, avoiding overloading your site’s Crawl Budget.
When to Use Noindex:
- To prevent indexing: When we want a page not to appear in search results, but it can still be crawled.
- To manage page visibility: Controlling which pages users should find in search results.
- To delete duplicate content: Preventing similar pages from cannibalizing search results.
Which one to use to prevent your site from appearing on Google?
If you want to ensure that your website does not appear on Google, the ideal is to use the tagnoindex
.
This is because:
- If you use only
disallow
Google won’t crawl the page, but it may still appear in search results if other sites link to it. - If you use so much
disallow
asnoindex
Google will not crawl the page and therefore will not see the tagnoindex
. So, the use ofnoindex
will have no effect.
How to Deindex a Site Already Indexed on Google?
If your site already appears in search results and you want to remove it, we recommend using the Google Search Console Removal Tool.
This is more effective than just adding the tag noindex
as it removes the page from search results more immediately.
For more details on how to remove or deindex a URL from Google, see the specific article on how to remove a URL from Google.
Common Mistakes to Avoid
When using disallow
and noindex
some common mistakes can compromise the effectiveness of these guidelines:
- Block entire site with
disallow
: IncludeDisallow: /
in the filerobots.txt
prevents the entire site from being crawled, which is generally not desirable. - Forget to remove
noindex
: If we putnoindex
On pages that we want to be indexed in the future, we need to remember to remove the meta tag. - Do not test the settings: It is important to test our settings.
robots.txt
andnoindex
to ensure they are working as expected. Tools like Google Search Console can help with this.
Additional Tools and Resources
To assist in the implementation and monitoring of disallow
and noindex
we can use several tools and resources:
- Google Search Console: Lets test our file
robots.txt
and check if crawlers are following the directives correctly. Additionally, we can use the Google Search Console removal tool to deindex specific URLs from Google. See our guide on how to How do I know if my website is indexed on Google? for more details. - Screaming Frog SEO Spider: A tool that crawls our website and helps us identify blocked or inaccessible pages.
noindex
. - Yoast SEO (WordPress Plugin): A plugin that makes it easy to add directives
noindex
and editing the filerobots.txt
.
Why doesn’t my website appear on Google?
There are a number of reasons why your site may not appear on Google, from incorrect disallow and noindex settings to issues with content quality. If your site isn’t appearing in search results, review these guidelines and other SEO practices to identify potential issues. Check out our article on Why isn’t my site appearing on Google? for more information.
How long does it take for the site to be indexed?
The time it takes for a website to be indexed can vary depending on a number of factors, including how often Google crawls it and the quality of the content. In general, it can take anywhere from a few days to several weeks for a new website to be indexed. To speed up the process, follow our tips on How long does it take for a website to be indexed by Google.
SEO Consulting
We at iWEBAPP understand that managing the indexing and crawling of a website can be complex.
We offer SEO consulting to help you implement the best SEO techniques to index your website on Google and optimize your online presence.
Conclusion
We at iWEBAPP know that controlling how search engines interact with our website is crucial to a successful SEO strategy. The guidelines disallow
and noindex
are powerful tools that, when used correctly, can improve website accessibility, ensure content privacy, and optimize visibility in search results.
Remember, disallow
prevents tracking, while noindex
prevents indexing. Using these guidelines strategically will help protect sensitive areas of our site and effectively manage page visibility. If you need help implementing these settings, we are available to provide expert support.
Need help with indexing your website?
Get in touch
FREQUENTLY ASKED QUESTIONS
What is disallow and what is it for?
Disallow is a directive used in the robots.txt file to prevent search engines from crawling certain pages or directories on your site. It is useful for protecting sensitive or irrelevant areas from being crawled.
What is noindex and when should I use it?
Noindex is a meta tag that instructs search engines not to include a page in search results, even if it is crawled. Use noindex when you don’t want a specific page to appear in Google search results.
Can I use disallow and noindex together?
While it is possible, it is not recommended to use disallow and noindex together to prevent a page from appearing in Google. If disallow blocks crawling, Google will not see the noindex tag, and the page may still appear in search results if other sites link to it.
How can I remove my website from Google search results?
If your site is already indexed and you want to remove it, the most effective way is to use the Google Search Console removal tool. This tool allows you to request the removal of specific URLs from Google search results.