There is also bandwidth. You may have a big site and only a few pages that should really be of interest to the spider and you don't want all your bandwidth used up by a spider.
http://www.robotstxt.org - resource
If you create a spider your actually suppose to register it. I think few do though.
To add to the mystery, I looked at ebays robots.txt:
Code:
User-agent: *
Disallow: /help/confidence/
Disallow: /help/policies/
Disallow: /disney/
So apparently disney is a no touchy. I wonder why they wouldn't allow their policies to be spidered though. Maybe they don't want someone just copying the policy for their own use?