Robots Exclusion Guide

To exclude individual pages:

Include the following meta tags in your code:

<meta name="robots" content="noindex, nofollow"></meta>

between the <head> and </head> tags.

This will prevent crawlers (robots) from indexing the page, and from following any links from the page. If the page has already been indexed, it will be removed from the index the next time the Search UNM crawls the page.

Removing this code will allow your page to be indexed when Search UNM bot crawls again.

To exclude your entire site:

If you maintain your own web server, you can remove your entire site from the UNM Search index by including a robots.txt file at the root of your server. This file can be configured to remove you from the UNM Search or from all search engines on the internet.

robots.txt

This is the standard protocol that most web crawlers observe for excluding a web server or directory from an index. More information on robots.txt is available at http://www.robotstxt.org.

Please note that if the Search UNM bot is denied access to a robots.txt file ("401 Unauthorized" or "403 Forbidden" response) it does not interpret this as a request not to crawl any pages on the site.

To remove your site from search engines and prevent all robots from crawling it in the future, place the following robots.txt file in your server root:

User-agent: * Disallow: /

To remove your site from Search UNM only and prevent just UNM Search bot from crawling your site in the future, place the following robots.txt file in your server root:

User-agent: UNM_Search Disallow: /

Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow UNM Search bot to index all http pages but no https pages, you'd use the robots.txt files below.

For your http protocol (http://yourserver.unm.edu/robots.txt):

User-agent: * Allow: /

For the https protocol (https://yourserver.unm.edu/robots.txt):

User-agent: * Disallow: /

Search UNM will continue to exclude your site or directories from successive crawls if the robots.txt file exists in the web server root.