How does a search engine "know" about the millions of documents on the Internet?
Here's the way it works. When a document is placed on the internet, it will only be found by a search engine if information about that document is recorded in the search engine's database. The degree of detail recorded by search engines varies greatly. For instance, some may enter the entire text of the document into a searchable field and others may only enter a short description. This is only one way in which search engines differ. Another difference is in the level of sophistication employed by the search engine when it looks through its database.
There are at least two ways a search engine finds out about a document. One way is that the publisher of the document registers it with the engine. If a document publisher wants to ensure that a document is "found" by search engines, then the publisher will usually register with as many engines as possible. Some search engines use "spiders" or search robots (commonly referred to as "bots") which search the Internet and gather information which is subsequently entered into the engine's database.