|
A search engine must be able to determine what a page is about before it can apply appropriate keyword ranking. Search engines do this by crawling or spidering a page.
Readable by Search Engines - During the crawling process, search engines examine the code that is used to generate a page. However, search engines may not always be able to read all the words that may appear on a page. Some websites use graphics or images for text. Others may use JavaScript print commands. This kind of content is not readable by search engines and should be avoided. Rather, content should be coded in a manner that is easy for the search engines to read.
250-350 Words - A common question is: “how much text is enough”. The answer comes from statistics sampling methods. Search engines remove all punctuation (except apostrophe marks and hyphens) and all “stop” words. Stop words are commonly used words (e.g. the, is, a, it, an, with, etc.) that typically account for 40-50% of all content. After stop words are removed, search engines will examine the remaining words.
Consider a website page of 300 words. If 50% are stop words, only 150 remain. Statistically speaking, a sample size of 96 words provides the confidence interval of 95% (approx.). A sample size of 150 provides a confidence level of approximately 99%. In other words, an analysis of a website page with 150 words (excluding stop words) will produce a strong understanding of what the page is about.
Fewer than 250 words (stop words included) in paragraph format will start to compromise the statistical confidence. More then 350 words will add little value from a search engine perspective because the statistical confidence level does not increase much more.
Unique Content - Some have speculated that as much as 30% of the internet is duplicate content. Technically speaking, duplicate content is an exact copy of a website page. Since each page has a calculated “hash” value that acts like a unique fingerprint, search engine detection of exact duplicate content is relatively easy.
But modifying even one thing on a page will change the hash value. After making a small change, the content is no longer an exact duplicate – it is near-duplicate.
Google has developed special technology to counter this condition. According to Google’s patent, pages found “to be in the same cluster” (pages sharing a portion of the same content) will not appear on the same SERP (Search Engine Results Page). Therefore, a website page worthy of ranking on page one may be forced to page two if there is another website page of slightly greater strength that shares some duplicate content. From this point of view, the page forced to page two is penalized. If a website is found to have substantial near-duplicate content, the entire site may be penalized.
Valuable Information – Internet pages that have unique content but little or no value may avoid search engine penalties. Unfortunately, these types of pages don’t satisfy the public. Pages that offer real value tend to attract greater public visibility. And search engines measure this visibility by monitoring the number and quality of links to these pages (backlinks).
The concept of valued content extends beyond search engines. Strong ranking may produce many visitors. But strong content with a compelling message helps convince visitors to take a desired action:
- Fill in a request for more information
- Call, chat, or email
- Download article, white paper, case study, or resource sheet.
- Sign up for Webinar, RSS feed, or newsletter
- Request samples
- Enter Store to shop for specific items
The choice and form of an appropriate “call to action” is very dependant on the purpose of the website, market and the purchase cycle.
|
Page Relevancy – Any page can be optimized for any keyword. But an optimized page may not read well. In some cases, current content doesn’t properly support the keyword(s). Optimization efforts, although technically correct, violates the Valuable Information concept above. In these cases, there are 2 options:
- Abandon the page and create fresh content (new or existing URL
- Modify the current content to better suit the keywords
Keyword Density – Search engines use variations of a probabilistic term vector analysis to determine what a website page is about. Although it is not the same as keyword density, the methods have similarities. Since an index of billions of website pages is required to calculate term vectors and this information is not publicly available, the SEO community relies on keyword density analysis.
Keyword density is a ratio of keyword phrases to the number of words found on a website page. The method of calculation varies depending upon whether “stop words”, link text, and other tags are included. For this reason, various online keyword density tools produce different values.
Ideal keyword density values depend on the densities found among top ranking websites. The objective is to create a density that is among the top range of websites currently ranking well. If a keyword density is too high, it may trigger a search engine flag as “unnatural”; too low of a keyword density will produce lower rankings. As a standard rule of thumb, 3-4% density, without “stop word” or tag exclusion, is typical (3-4 mentions of the keyword for every 100 words).
Keyword density in the Title Tag is nearly as important as densities in the body. Since the page Title Tag (a tag found in the head section of the HTML page) has limited space, the wording is particularly important. Long titles will have lower keyword densities, which will have a negative affect on rankings. It should be noted that the Title Tag, more than any other condition, limits the number of keywords that may be effectively optimized for a specific page.
SPAM – The definition of SPAM has taken on several definitions. From a search engine perspective, SPAM is a violation of published search engine guidelines.
All SPAM techniques either:
- Hide content from the search engines that is visible to the public,
- Or, serve content to the search engine that is not visible to the public.
The most common SPAM techniques include:
- Hidden Text / links
- Hidden Div Tags
- Cascading Style Sheets (CSS)
- Text same or near color as background color
- Very small text
- Repetitive keywords
- JavaScript and Meta Refresh redirects
- Selling text links without rel=”nofollow” attributes
Spam is still a big issue with search engines. Technology does exist to automatically detect all these techniques. However, the required computing resources to detect these techniques are very intensive. For this reason, search engines rely heavily on SPAM reports from other webmasters (competitors) who report offending websites. Google’s SPAM reporting tool is located here: https://www.google.com/webmasters/tools/spamreport.
Read about Better Website Content here.
Next>> PageRank
Get PDF>>
|