Understanding How Search Works
What Happens When You Search the Marshall Web site?
Marshall University, like many other public colleges and universities, relies on Google Syndicated Search services to answer search requests on our web site. Google’s technology is superior to that of many competitors, and search results are generally very fast and reliable.
It’s important for users and developers to understand the way in which content is indexed and retained as they examine the way that they construct their pages and review how search results are delivered. This document will address some of the most common questions related to web searches on the Marshall site.
I deleted a site/page/changed my content/changed my URL but the old information is still showing up in the search results. Why?
Googlebot (Google’s automated site crawler) crawls our site incrementally every 8-10 days. During these incremental crawls, any changes to content in the index for our site are noted, and the site index is updated to reflect these changes. When content is changed or removed, it will remain in the search results until it is crawled again by Googlebot and the changes are updated in their site index. The web group has no control over how often the site is crawled, nor can we “flush” old content from the results prior to another crawl cycle.
Full crawls of our site occur every 15-40 days, depending on the traffic to our site during a given period. When new content is added, it will be anywhere from 15-40 days (depending on when in the crawl cycle the content is added) before you should expect to see that content in the search results returned for our site.
How can I get my site to show up as the first result for “insert search phrase”?
Google’s index is built based on relevance, links in to a given page, and several other factors that are all a part of their search algorithim. The best way to insure that your page is listed highly in the search results for a given search term is to make sure that your content is highly relevant to that search term.
Unfortunately, there is no way at this time for us to adjust which results appear for a given search phrase, nor are we able to control the order in which search results appear.
I’ve mistakenly published some content and need to have it removed, can you do that?
If you’ve mistakenly published content and it’s showing up in search results, the first step you should take is to remove the content from the web server. During the next crawl, the URL for that content will return a 404 (not found) error to the crawler, and the page will be removed from the site index. Until the time of that crawl, the page will continue to show up in the search results for the site, even though it no longer exists.
If you have published content that is highly sensitive and need to have it removed as soon as possible, we can petition Google to remove the content from our index manually. Again, you first need to remove the content, and then contact the Help Desk asking that search content be removed. We will petition Google for the content removal. This process usually takes anywhere from 2 to 4 days.
How can I improve the way that may page is seen by the crawler?
Many of the details of Google’s search algorithm are private, but there are some generally accepted ways of improving the way your page is indexed through the use of metadata within the source code of your page.
There are two types of “Meta” tags that you can include in the <head> section of your page to influence search results. The first is the meta “description” tag, and the second is the meta “keywords” tag. Different search engines interpret this information in different ways, but including it in your pages will usually help the crawler determine how to best index your page.
Meta Description Tags
The description tag is used much as you might imagine it would be, by allowing you to provide a description of the content that your page is providing. As an example, the description for the Marshall home page might look something like the example below:
<meta name=”description” content=”Welcome to the Marshall University home page. This is the primary web site for Marshall University, located in Huntington, WV.”>
In this way, you’re telling the search engine what your page is about, as a complement to the content itself.
Meta Keywords Tags
The keywords tag allows a content developer to associate additional search terms with a particular page, as a complement to the page content itself. To use a simple example, suppose you were developing a page about Cincinnati, Ohio. You might want to use the meta keywords tag to associate the keyword “cincinatti” with your page, so that even those users who misspell the city name will still be able to find your content.
While there are still a few search crawlers that review meta keywords, most major crawlers (including Google) now ignore this tag due to its use and abuse by content developers. There is, however, no harm in including it in your pages to assist those few crawlers that still do support it. Example usage:
<meta name=”keywords” content=”a, comma, separated, list, of, words, or phrases that, you, want, to associate, with your page>
Meta Robots Tag
At times, there will be situations where you do not want a particular page to be indexed by the various crawlers. In these situations, you can use the meta robots tag to instruct crawlers not to index a particular piece of content. Example usage:
<meta name=”robots” content=”noindex”>
It’s worth noting here that crawlers also will not index pages that are protected by an authentication layer. Any pages requiring a username/password entry to access will not be indexed.