Technical SEO & Site Architecture
Optimizing your website’s architecture can help ensure that your keyword and link-building efforts have the maximum impact. The following advice will help you avoid some common mistakes that many website owners (and designers) make.
Choosing a Domain Name
- Domain names that include keywords are more likely to rank higher in search engines than those that do not. Also, users are more likely to click on keyword-rich domain names. Google released the Exact Match Domain Update on 9/28/12 to reduce the rankings of low-quality exact keyword-match domain names, so it’s important to remember that building your brand is stillthe most important part of your business. Domain names without keywords, yet a high volume of incoming links from credible websites, can easily outperform keyword-rich domain names without such incoming links. With this in mind, consider two different approaches to choosing a domain name when building a new website:
- Short, catchy domain name (i.e. – www.yelp.com): Domain names such as these are easy to market and build a brand around, although they lack keywords to rank for in search engines.
- Keyword-rich domain name (i.e. – www.numismaticnews.net): Domain names such as these include keywords that people are (or might be) typing into search engines, and have a leg up on the competition in the SERPs (search engine ranking positions). Look for exact keyword matches in .com, .net and .org extensions. If you go this route, keep in mind it’s no substitute for building your brand via quality content (which attracts quality links).
- If your domain name is already established and ranked in search engines, it’s not advised to switch domains since this will require a lot of work to (hopefully) retain all of your SERPs. 301 redirects would have to be deployed for all of the old webpage URLs, and many hours would have to be spent reaching out to external websites (linking to your site) requesting that they update the anchor text pointing to your website.
- Exact keyword matches are ideal for highly optimized domain names (i.e. – purchasing www.denverhomeblog.com to rank for “denver home blog”).
- Avoid excessively long and/or hyphenated domain names, as they appear spammy to search engines.
- Domain name age is a factor considered by search engines in their ranking algorithms. The sooner you register your domain name, the better.
- Register hyphenated and misspelled versions of your domain name to avoid future competition.
- Register “exact match” domains for primary keywords, for future online publishing endeavors.
- Register international domain extensions (i.e. – .co.uk) if you plan on taking your website global. International extensions also greatly increase your chances of driving search engine traffic when the particular countries for which your domain extension relates to.
- Beware .info domain extensions as they are cheap and have been abused by spammers in the past.
Search Engine Optimized URLs
- Avoid dynamically generated URLs:Short directory structured URLs rich with keywords are more likely to achieve a larger number of pages indexed by the major search engines than if dynamically-generated URLs were used (with characters such as &, ?, and !).
- If you must have dynamically created pages (i.e. – ASP websites), then be sure to use mod_rewrite to fix dynamically generated URLs.
- Never change a URL without doing a 301 Redirect. If you do it’s like changing your address without forwarding your address with the post office. Just as the post office won’t know the new address to deliver your future mail to, the search engines won’ t know what new URL address to send your future site visitors to.
Avoid Session IDs and Dynamic URLs
- Only start using session ID’s at the point where you must absolutely start tracking customer actions. Otherwise, your category and product webpage URLs will exist with many variations tacked onto the end of the URL. This will appear to search engine spiders like you have an overly large number of URLs to index, and create more than one URL for the same page (leading to a perceived duplicate content issue in the eyes of the spider).
- Most SEOs consider the “add to shopping cart” stage to be the definitive time to begin employing session IDs. At this time, you don’t want search engines to index of the subsequent pages anyway.
- Consider using cookies to store session ID’s instead of within the URL. Problem solved.
- Avoid any URL structure which is not alpha-numeric (except for – _ and / characters), and which does not guarantee only one URL for each page.
Use a Sitemap & Allow Search Engines to Easily Index Your Website
- There are two types: XML and HTML. The XML version produces an XML file for all of your URLs to be listed, easily index-able by the search engines. The HTML version is a more user-friendly version which users can use to navigate your site (and spiders can use to index your content). Due to the time-intensive nature of an HTML version, it’s often only possible to produce an XML version using a plugin for WordPress sites (for example) like the Google XML Sitemaps plugin.
- Using a sitemap, and linking to it from your footer, helps a search engine spider by preventing it from having to crawl any deeper than one link beyond your home page to locate rest of your pages.
- A sitemap will not influence your rank for keywords, but it will enable your pages to rank quickly by ensuring that search engines can always find newly created pages (as well as legacy content).
- Visit Sitemaps.org for more information.
Using a Robots.txt File for SEO
- The Robots.txt file is best used to prevent Google from indexing any webpage (or image) that you don’t want to appear in search results. Place the appropriate command within your robots.txt file:
- “User-agent: *
- The first asterisk in “User-Agent: * tells all search engines to not index any files located at or beyond the asterisk in Disallow: /cgi-bin/”
- “User-agent: *
- One example would be to tell search engines not to index you shopping cart pages. There’s no need. Other examples would be newsletter confirmation pages (with free downloads), anything in your /cgi-bin/ folder, and any other pages that you want to keep private.
- Put the name of the spider in place of the asterisk in “User-Agent: * if you only want to prevent a certain search engine spider from indexing your specified pages. If you decide that you want to block a specific search engine spider, you should put the
- Obviously, if it makes sense for all of your website pages to be indexed by search engien spiders, then there is no need for a robots.txt file.
- If you’re not sure why certain pages aren’t being indexed by Google, take a look at Google Webmaster Tools and see which pages Google is ignoring. You can find this data at the following path: Diagnostic > Web crawl > URLs restricted by robots.txt.
- Avoid using frames. Repeat. Avoid using frames. Frames are a form of website architecture that embeds multiple webpages/bits of data into one webpage, and are very difficult to get ranked in search engines. Avoid using Frames. If frames must be used, use the <noframes> tag/section of the page for content you want indexed in the search engines.
- Avoid using Flash as your primary on-page content. Since search engines cannot read Flash, just like they cannot read images, you’re selling yourself short. Search engines need text in HTML format in order to determine how to rank your webpage in its SERPs (Search Engine Ranking Positions). You may use Flash as an element of your webpage, but it needs to be complimented by well-written, optimized, and compelling text in HTML format.
Whether you are launching a new website, or continuing to build an existing one, it’s critical to keep an eye on your duplicate content situation. If your site opens its floodgates for Google to index all sorts of low-quality, duplicate content pages where content is 100% repurposed from other areas of your site (or other websites), you will eventually be penalized by Google and lose traffic.
Some SEOs define duplicate content can be defined as any portion of text (paragraph or longer) that is repeated on more than a single URL on either your website, or someone else’s website. Others are more lenient in their definition, with companies like SEOmoz using a 95% unique content filter for their site crawling feature for Pro members. The bottom line here is that we don’t know what Google’s % threshold is for duplicate content, either per-page or site-wide. You must minimize it as much as possible. This topic is so important that an entire section of this site is dedicated to preventing and fixing duplicate content.