From time to time I have to troubleshoot SharePoint Search Engine issues dealing with it not properly indexing some external web site. Just thought I’d share the steps I typically take in case you are trying to figure out what might be causing the problem. I find it helpful to isolate where the problem is before doing anything else. Here’s how I approach determining where the problem is.
The first thing I do is make sure the SharePoint Search Engine is working, period. Is it at lease indexing the SharePoint site properly? If not, it’s pretty obvious the problem is with the search engine configuration, or at least something on the SharePoint side of the street, including the network it lives on. However, if it’s not having any problems indexing the SharePoint site, I move on and see if it is (or will) index any other external web sites properly.
The ‘spider view’ of the page you have configured your SharePoint Search Engine content source to start from is the best place to start. Paste the URL into the spider-simulator and let it rip. If it can’t crawl the page, you have found the problem. If it can crawl the page, carefully check the results. Examine the internal links as well as the external links. This is when you will find out if what your browser sees is the same as what the spider sees. Copy a link out and paste it your browser address bar and see if it loads. If you get a 404, the problem is with the links on the start page using some type of black-arts to be handled by the application which is preventing search engines from crawling properly.
By now I have usually pin-pointed the problem and can begin taking steps to resolve it. Resolution steps include pointing to a different start page from within my content source, or re-writing the start page with clean HTML, or ensuring the security on the external site is not preventing a crawl, etc… You get the point, but the important thing is we have quickly isolated where the problem is before we started fixing it.