Advanced Searching: Tricks of the Trade by Peggy Zorn, Mary Emanoil, and Lucy Marshall Parke-Davis Pharmaceutical Research Library and Mary Panek United Technologies Research Center ONLINE, May 1996 Copyright © Online Inc. [----------------------------------------------------------------] Searching the World Wide Web has become as easy as clicking [express stop] on the Net Search button in Netscape. Type in a few keywords to describe what you're looking for, click on the Submit button and in a few seconds you have your answer. . .or do you? Despite the rapid proliferation and development of Web search systems over the past months, little attention has been devoted to the advanced features professional searchers and librarians have become accustomed to in other online information resources. Web novices and information professionals alike often overlook or miss detailed information such as: * what a particular Web search system is searching * how the data has been indexed * how the search engine retrieves data * what advanced techniques (proximity operators, nested queries, search set manipulation and combination, duplicate detection, etc.) are available In simple Web systems such as Yahoo! and Aliweb, knowing advanced search features is not as necessary because the depth of indexing and the power of the search engine is not as great. However, with more sophisticated Web search systems, the need to narrow basic keyword retrieval due to large retrieval sets using types of features available in established commercial online services is becoming painfully obvious. As the Web grows, and databases searched by the major Web search engines increase in size, the power to seek, index, and retrieve information must grow--along with the information professional's knowledge of how to effectively search the Web. Typical end-users may have no trouble browsing and locating information on uncomplicated topics, but constructing complex search queries using sophisticated Web search engines is another matter entirely. End-users may increasingly rely on information professionals for complex Web searching, much as they do for online commercial databases. ADVANCED FEATURES IN WEB SEARCH SYSTEMS Not all Web search systems, their search engines, and the databases they search are created equal. Only a handful really cover the Web exhaustively in terms of URLs covered and depth of indexing. Only a few provide the type of advanced searching features that are standard for librarians. In the November/December 1995 ONLINE, Martin Courtois, William Baer, and Marcella Stark provide a comprehensive overview of many search systems on the Web and detail the results of performance testing on sample searches. Several times the authors state that, even in the search systems that offer advanced searching features, instructions for use were either difficult to locate or difficult to comprehend. The good news is that there is a trend toward more search features and simplified, advanced searching documentation. These enhancements promise to improve Web searching accuracy and efficiency. Besides advanced searching features to refine retrieval, a database's size and depth of indexing also greatly affects the complexity of searching that can be done using a particular engine--and the resulting retrieval. Some engines support full-text searching of the Internet sites included in their databases. Others, such as Magellan and Open Text, have gone further by indexing on the field level (title, author, abstract, keyword, etc.). They have also added descriptors and other information not necessarily available on home pages to aid searching and retrieval. This type of field and descriptor searching and value-added indexing is a mainstay of online commercial databases, and likewise is of great use for narrowing broad topics and filtering retrieval in Web searches. EVALUATING ADVANCED SEARCH FEATURES The purpose of this article is to look closely at several Web search systems that provide advanced search features and search a comprehensive and authoritative database of Internet sites. Based on these two key requirements, we selected Alta Vista, InfoSeek, Lycos, and Open Text for evaluation and testing. The search features we looked for included: * complex Boolean (nested logic) * duplicate detection * keyword(s) in context * limiting retrieval by field * proximity and/or phrase searching * relevancy ranking of results * retrieval display options * search set manipulation * truncation (automatic or user-defined) We also looked closely at depth of indexing and quality of documentation, as well as the size of the database the engine searched. Only search engines with a database of more than 200,000 Internet sites were considered. Initially, we considered several other Web search engines and databases, including Harvest, Magellan, NlightN, and Yahoo!. NlightN and Yahoo! both search significant databases, but their search engines are not powerful enough to handle our queries effectively. Harvest and Magellan, on the other hand, provide powerful search engines that include most of the features listed above. However, their databases are too small at this point to really label comprehensive. Here is a detailed description of the four Web search systems we studied--their features, how to use them, and how the search engine performed on our sample searches. Each sample search requires more than a simple keyword search and tries to incorporate as many advanced features as possible. All three searches were run on all four search systems, varying the syntax based on features available on each. The following table shows a quick overview of the features of the four systems. Ê Alta Vista InfoSeek Lycos Open Text Approximate # 16 million 1 million 10.75 million 1 million of URLs Documentation Excellent; Lengthy; Fair, not as Excellent, includes somewhat thorough as explains all details and difficult it could be. features in search to locate. detail. examples for both Simple and Advanced queries. Duplicate No Yes No Yes Detection Field Yes No No Yes, choose Searching field limits from pull-down menus. Indexing Full-text Full-text URLs, also Full-text choice of other parts of pages and text. Multiple No No No No, but allows Search Sets up to four Boolean or proximity operators per query. Nested Yes No No No Boolean Proximity Yes, ensures Yes Yes, choose Yes, choose Searching both terms are match type operators from within ten (loose, fair, pull-down words (or etc.) from menu. phrases if pull-down terms are menu. enclosed in quotes). Relevancy Yes, can Yes Yes Yes Ranking specify which terms to weight first in retrieval display. Truncation Yes, can use No Yes, Yes, an asterisk on automatic. automatic. word stems longer than three letters. SAMPLE SEARCHES Search #1: Locate information on cancer research grant or funding opportunities. This search could include truncation to retrieve alternate word endings and more than one Boolean operator to combine synonyms and concepts. On traditional commercial online services, this search might look something like this: (cancer or oncol*) and research and (grant* or fund*) Search #2: Locate information on Warner-Lambert or its pharmaceutical research division, Parke-Davis. This search is tricky because of the embedded punctuation. This problem can be handled in several ways, based on the command structure of the search engine. A typical online search for this concept might look like this: (warner adj lambert) or (warner-lambert) or (parke adj davis) or (parke-davis) Search #3: Locate information on the XI International Conference on AIDS. This search has numeric content, and requires the use of synonyms and multiple Boolean operators. A search for this topic might be entered like this: (xi* or 11th or eleventh) and international and conference and aids [Image] Alta Vista (http://altavista.digital.com) A relative newcomer to the Web search service arena is [express stop] Digital Equipment Corporation's Alta Vista, introduced to the general public in mid-December 1995. Alta Vista contains full-text indexing for approximately 16 million URLs in its database, and provides a powerful and flexible search engine for performing queries. Probably the most refreshing thing about Alta Vista from an information professional's perspective is the similarity the search commands and features have to traditional online commercial database services. Use of Boolean operators (AND, OR, NOT, NEAR), the ability to do nested Boolean queries, and the ability to use truncation when needed (but not have terms truncated automatically by the search engine) are examples of such familiar features. Documentation for the searching features available in Alta Vista is detailed and filled with plenty of examples. Alta Vista offers two ways to search its database, Simple and Advanced. The Simple search does not support Boolean operators, but uses a combination of different syntax to indicate phrases, proximity, and required or prohibited words. Limiting searches by field (Title, URL, Host, etc.) and truncation is possible in both modes. Results from queries done in the Simple mode are returned in relevancy-ranked order. The Advanced search uses the same syntax as the Simple search for defining words, phrases, wildcards, and punctuation. However, with the Advanced search, Boolean operators must be used to combine words and phrases, and parentheses are required to control nesting. The retrieval can optionally be ordered (by specifying a set of words or phrases to list first) and can also be limited by date. Search #1: The Advanced search mode was used for all the sample searches. The search for information on cancer research grants was originally entered as: (cancer or oncol*) and research and (grant* or fund*) An error message appeared indicating too much retrieval for the truncated forms of "grant" and "fund," so the search was rephrased using the following strategy: (cancer or oncol*) and research and (grant or grants or fund or funds or funding) This search retrieved over 20,000 documents that matched at least some of the terms. Specifying "cancer" and "research" in the Relevancy Ranking Criteria area of the form improved the precision of the items retrieved at the top of the returned listing. Search #2: The search for information on either Parke-Davis or Warner-Lambert used the internal punctuation conventions in Alta Vista. Punctuation characters in words or phrases act as separators, splitting the words into separate entities. In addition, capital letters entered in a search query are considered distinct from lower case. Usually, terms should be entered in lower case to indicate a case-insensitive match, unless only retrieval of the capitalized word is desired. The search strategy was: "Parke Davis" or "Warner Lambert" This search retrieved approximately 2,000 hits, with the Warner-Lambert home page listed within the top ten items. Search #3 : Searching for information regarding the XI International Conference on AIDS was entered as: (xi or eleventh or 11th) and international and conference and aids The terms "aids," "international," and "conference" were entered as criteria for relevancy ranking the results. Over 1,000 hits were retrieved and the home page for the conference was listed first. [Image] InfoSeek (http://www2.infoseek.com/) InfoSeek was introduced in February 1995 by InfoSeek Corporation and was the first search system on the Web to charge a fee for searching. InfoSeek currently offers two ways to search for information. Users can access its collection of Web pages, newsgroups, FTP, and gopher sites and retrieve up to 100 hits for free. InfoSeek membership plans are available for those who want to access InfoSeek's additional fee-based databases, including Usenet News; Cineman Movie, Book, and Music Reviews; Hoover's Company Profiles; a selection of wire services; CorpTech Directory of Technology Companies; and MDX Health Digest. New databases in a wide variety of areas, including business, finance, health, sports, and national news are added continuously. Membership plans vary from a flat $0.20 per search to $9.95 for a monthly subscription. The paid subscription also offers access to many full-text articles. Although the search features are the same for the free and fee-based databases, we tested only the free service. InfoSeek currently includes indexing for over one million Internet sites. It uses a combination of Web "robots" or "crawlers," as well as user submissions to populate its database. Users can easily submit their sites to InfoSeek for inclusion in the database. They do not need to provide a list of keywords, but are asked to make sure their documents contain words that accurately describe contents because InfoSeek indexes the full text of all sites included in its database. Query results are returned by software that uses statistical word frequency counts and other techniques to determine significance and relevancy. Although InfoSeek does not provide much further explanation of its rating scheme, it stresses that the rankings are based on more than word frequency counts. In addition, InfoSeek attempts to eliminate duplicate entries for URLs in its database and claims never to index a URL more than once. URLs with duplicate titles are allowed, but fewer duplicates are retrieved by an InfoSeek search than from other Web search systems. InfoSeek provides a help page that explains how to perform searches against its databases. It contains thorough explanations and good examples on Simple searching, using advanced query operators and syntax in search strategies, and refining search techniques. There is also an extensive FAQ section, but it is somewhat buried. InfoSeek has a number of advanced search features. Unlike many other search engines on the Web, InfoSeek does not support Boolean operators or wildcard operators. When using advanced features in InfoSeek, users need to apply special syntax such as: * double quotation marks to find words that appear next to each other * hyphens between words to locate those words very close to each other * brackets around words to retrieve those words near each other in any order * a plus sign in front of a word or phrase to return that specific word or phrase * a minus sign in front of a word or phrase to ensure that word or phrase does not appear in any of the resulting documents The search engine for InfoSeek is case-sensitive. Capitalizing words forces a search exclusively for capitalized instances of that word. Search #1: InfoSeek does not support Boolean operators or wildcard characters, so Search #1 for information on cancer research grants could not be entered as it might be on a traditional online service. In our first attempt at this search, the following terms separated by hyphens were entered: cancer-research-grants This search strategy only returned three documents, of which only one was relevant. Hyphens require that the terms appear very close to each other, but we overlooked the fact that the hyphens also limit the order of these terms. Therefore, the next search strategy included the same search terms but with brackets around the three words to retrieve those words near each other in any order. The bracketed search retrieved 100 documents--the maximum number allowed in a free search--and the results were more on target. Search #2: Not knowing how InfoSeek handled embedded hyphens, we tried the search for information on Warner-Lambert and Parke-Davis companies in two ways. For the first search, the query was entered as: "Warner-Lambert", "Parke-Davis" The quotation marks forced retrieval of words that appear next to each other. This search retrieved 100 documents. Surprisingly, the Warner-Lambert home page was not the highest-ranked document. It was ranked 16th, behind some documents that merely mentioned Parke-Davis as a previous employer. The sixth-ranked document did not show either of the two companies on its first page, nor did it contain an obvious link to either one. To determine the importance of the hyphen within the company names, we did a second search without the hyphens: "Warner Lambert", "Parke Davis" This search retrieved the same set of 100 documents. Apparently, InfoSeek ignores embedded punctuation, at least hyphenated punctuation. Search #3: To locate information on the announcement for the XI International Conference on AIDS to be held in 1996, we formulated a search using a few keywords from the full phrase. We put hyphens between the terms to require close proximity and force the order of the terms. 11th-conference-AIDS One document was returned using this query and it contained a brief announcement about the conference. However, 11th could be presented as "eleventh," "11th," or "XIth." Since Boolean operators are not available, we had to submit three separate searches to pick up the variant spellings of the term. For the second search, the following query with the Roman numeral was used: XI-conference-AIDS The strategy found 31 documents and the first three were direct links to the actual home page for the 1996 conference. No documents were retrieved with a strategy that spelled out the term (Eleventh-conference-AIDS). [Image] Lycos (http://www.lycos.com/) As indicated by its Latin name for WolfSpider (Lycosidae), [express stop] Lycos is a spider-oriented search engine. It was developed by the School of Computer Science at Carnegie Mellon University, and is now owned by Lycos, Inc., a joint venture of CMG@Ventures and Carnegie Mellon University. Lycos systematically rebuilds its database daily and checks for active links. The search engine provides weighted retrieval from the database and returns a user's query with hits sorted by relevance ranking. Rankings are determined by the location of the word in the document. A keyword located in the title or header receives a higher ranking than a keyword located in the document text. The search results include the match score, number of links, document title, headings, sample abstract, the URL, and length of document. Lycos boasts an index of over ten million pages throughout the world--indexing over 91 percent of the content of the Web. With the Lycos catalog growing by over 300,000 pages per week, it claims it will soon catalog 99 percent of the WWW. Recently, Lycos has created a search form that offers easy access to search and display options. The Search Option button lets you select AND and OR as well as the level of closeness or looseness of a match. It has modified advanced Boolean searching with the ability to combine two out of three search terms. This match is not as powerful as true Boolean searching, but does allow multiple concepts and synonyms to be combined. Using the Display Option button, the user selects how many hits are desired and the level of display detail for the results (standard, summary, or detailed). The system automatically truncates search terms. Documentation for searching and displaying is available, but is not extremely detailed. Lycos describes how to do a variety of searches, but does not explain how they are completed. For example, there is no real explanation of the difference between a "close" or a "loose" match other than the difference in number of hits returned. Repeated email messages sent to Lycos asking for further explanation were unanswered. Search #1: The first search was entered (with search options set to match three terms) as: cancer oncol research grant fund Presumably, oncology will not appear if cancer is present, and funding will not be a term if grant is present. The retrieval was more accurate than expected (146 documents). The first few results included several highly relevant sites including the NIH Grants Database and the FAA Research Grants Program. Several of the documents ranked lower in the retrieval list contained the terms "research," "grant," and "fund," but not "cancer" or "oncology." Because the ability to do true Boolean searching with nested queries is not available in Lycos, several searches using different synonyms and combinations would need to be performed to gather comprehensive results. Search #2: In the search for information on Warner-Lambert or Parke-Davis, Lycos ignored the hyphenation both in the search strategy and retrieval. This was beneficial because, although a search on Parke-Davis typically retrieved Parke Davis, it also retrieved Parke-Davis. Constructing the search strategy for both phrases caused the same problems as the first search, where the search options had to be set to match two terms. This search engine restraint causes terms to be ANDed together, which often leads to results like we found in one of the hits--a music page discussing Bobby Parker and James "Thunderbird" Davis. This example also demonstrates the automatic truncation feature in Lycos, which is good for searches that require exhaustive retrieval, but may result in false hits. Search #3: The third search revealed an interesting problem with the Lycos search engine. It currently does not search for numbers. This search had the potential to provide an example where the Lycos Search Options work as explained and expected. In looking for information on the XI International Conference on AIDS, it was easy to specifically select a match on any two terms and ask for: 11 11th XI eleventh aids It is highly unlikely that a document will contain both "eleventh" and "11th." In practice, though, only the terms "xi," "eleventh," and "aids" were searched for. This search in Lycos produced two hits, both pointers to the conference home page. [Image] Open Text (http://www.opentext.com) Developed by Open Text Corporation of Waterloo, Ontario in partnership with UUNET Canada, Open Text is a full-text index of over one million Web sites as well as FTP sites and gopher servers. Open Text uses a crawler to scan Internet sites and then uses Open Text 5, its proprietary indexing software, to index every word of each Internet site. The results are then stored in the Open Text database. New sites are added daily and old sites regularly revisited to update bad or changed links. The power behind the Open Text is the index itself. According to the documentation, Open Text 5 can index more than 40 types of files. It indexes each word of the page so no stop words are ignored in search statements, identifies fields where words appear, and is multilingual. In addition, several technologies make Open Text 5 useful for organizations that require a powerful text-retrieval system. Using the product to index the Internet allows Open Text to demonstrate its features as well as provide a useful service to the Internet community. Open Text offers three types of searching, Simple, Weighted, and Power Searches. The Simple Search is a basic keyword search with options to search for the exact phrase entered, all of the words (implied AND), or any of the words (implied OR). The Power Search is more sophisticated and allows the user to select where the word(s) are searched: * Anywhere (everywhere on Internet) * Summary (combination of title page, first-heading, and text Open Text has deemed important) * Title (indicated by the Web page author) * First-Heading (indicated by page author) * URL * Hyperlink (outbound links from Web pages) Basic Boolean operators can be selected, as well as proximity operators. The Weighted Search allows the user to assign importance to a word or phrase. As with the Power Search, it is possible to select where in the Web page the search will be conducted. A number is then entered in the weight box to assign importance, with higher numbers designating greater importance. Of the three types of searches, Power Search is the most advanced. It provides the searcher with an easy-to-use interface, Boolean and proximity operators, and field searching. As you will see in specific searches here, the lack of nesting capabilities does present problems. The online documentation discusses the index's limitations with parsing queries. Evidently, it does not allow for parentheses in the user interface, so terms left at the end of a search statement are often dangling. Open Text suggested conducting two separate searches to remedy this problem, but this approach did not adequately identify the types of information being sought. According to Open Text, search set manipulation and the ability to save searches are features that will be added in the near future. Results are displayed in a relevancy ranking for both the Simple and Power Search. The Weighted Search allows the user to display results by occurrence count (the product of the number of occurrences of the search terms and the weight of the terms), or the presence/absence of search terms (calculated by the weights given for the terms without regard to the number of search term matches). Open Text supposedly has some ability to eliminate duplicates, but flaws have been reported. In our test searches we did not find duplicate listings. Each matched page offers the option to visit the page, see matches on the page, or find similar pages. The excellent online documentation explains all of these features in detail. The instructions for using the various search modes and the Open Text FAQ provide ample information for most types of searches. Open Text also offers a feedback button for questions that have not been addressed in the documentation. Search #1: The search for information on cancer research grants was conducted using Power Search and Weighted Search. Initially, terms were entered in the Power Search as: cancer or oncol and research and grant or fund Truncation is automatic unless a space is entered following the term which stops truncation. Searches in First-Heading, Summary, and Title each produced several hits but relevancy was limited. After analyzing the search and the Open Text documentation, it was apparent that using "or fund" as the last term had the effect of combining the previous terms and then ORing them with "fund." The search was conducted again without "fund" as a search term. There were fewer than ten hits in each of the three search areas, but only the Summary search produced links of identifiable relevancy. Eliminating "oncol" did not alter the results. Still, it seemed that there should be sites of greater relevancy so a search was attempted using a proximity operator. We entered the search as: cancer near research near grant We selected the same areas, as well as "Anywhere on the Web page" and found surprising results. Neither First-Heading or Title retrieved any sites, Summary retrieved only two, and Anywhere found 73 matches. The results found in Summary were highly relevant and were also the top two hits of the 73 in the Anywhere search. It is interesting to note that the sites located in Summary contained the search terms in the title of the page, but why the pages were not retrieved in a Title search is unclear. The same type of search was conducted using the Weighted Search. Since only four terms can be entered, the following terms were searched, and the weights are in parentheses--cancer (20), research (10), grant (20), fund (5). Results were ranked by occurrence. A search of Anywhere on the page produced 786 hits while searches of Summary, Title, and First-Heading produced fewer than five hits each. None gave exceptionally relevant results. Search #2: A search for Warner-Lambert or Parke-Davis was conducted using Simple Search and Power Search. Hyphenating the terms did not produce any hits, so the terms were entered without a hyphen. The Simple Search was entered as: warner lambert parke davis (any of these words) Over 30,000 pages were found, but the Warner-Lambert home page was not among the first ten. Trying the same search but asking for all the terms produced 35 pages, none of which were the Warner-Lambert page. Using Power Search the terms were entered as: warner followed by lambert or parke followed by davis This search found 89 matches Anywhere on the Web page, many of which were relevant to Warner-Lambert Parke-Davis, but again the Warner-Lambert page was not among the pages found. Limiting to Summary, Title, or First-Heading retrieved fewer than seven hits each--again without find the Warner-Lambert page. Analyzing the search it was apparent that the last term, "davis" was dangling. To remedy this, the search was reconstructed as: warner followed by lambert or parke davis (where parke davis is a phrase) Performing this search in the Title field retrieved one hit, the Warner-Lambert home page. Search #3: The search for the XI International Conference on AIDS was also conducted using Simple Search and Power Search. Wording for the conference number was entered as either 11th or Eleven or XI, along with various combinations of the other terms. Results included sites that linked to the conference home page as well as to the conference home page itself. MAKING CHOICES FOR COMPLEX WEB SEARCHING After looking at the features of the four search systems covered above and reviewing their performance on the sample searches, the conclusion was reached that no single Web search system is really "the best." None of the four systems can claim to include all of the Internet in their databases. And, because the indexing and value-added evaluation process for including sites is so different, no system can claim to have it all at this point. Alta Vista and Lycos are probably the most comprehensive in terms of number of URLs included. But the value-added indexing and selection process for URLs included in Alta Vista, InfoSeek, and Open Text lead us to rate them more highly in terms of relevancy and accuracy of retrieval. Open Text scores very well for including the most advanced search features, and for providing a good user interface and excellent documentation for the more complex functions. Some search features expected in most commercial online services, such as search-set manipulation and duplicate detection, are missing from most Web search systems at this point. Full-text indexing of Web pages, including indexing of embedded links, create huge databases of Web sites that have hundreds of duplicate entries, explaining the large number of duplicate hits often retrieved with Lycos searches. As any good searcher knows, complex queries requiring multiple Boolean operators can be entered in one statement in most commercial online searching systems. However, a searcher normally prefers to split concepts and operators into multiple search statements. Only one search window for entering terms is available in all but the Open Text system, making separation of concepts and combination of terms tricky. Even in Open Text, where it appears that nested logic is available, Search #1 proved that using an OR statement at the end of query skews retrieval. PROFESSIONAL SEARCHING ON THE WEB How should the professional searcher approach complex Internet searches? The best approach, given the current tools, is probably one that includes all of the four search systems covered here. Just as a searcher searching online commercial databases often runs searches on more than one database or search service to ensure comprehensiveness and accuracy in retrieval, the same is true for Web searching. Information professionals need to become familiar with the advanced features of these major Internet search services, and make an effort to have input into their future development. Information professionals should closely watch developments in the Internet searching arena over the next few months. The licensing of the Lycos search engine by Microsoft, of WebCrawler by America Online, and the increasing popularity of fee-based Internet searching services such as InfoSeek and NlightN point to a future for Web search systems that may more closely resemble the current online commercial searching environment. Commercialization of some of the major Web search systems may ultimately lead to more powerful, accurate, and efficient searching for professionals and end-users alike. ---------------------------------------------------------------------------- End Users Turn to Search Experts In many ways, current Web searching compares to the beginning of end-user online and CD-ROM database searching. Evaluations of end-user searching indicate that end-users like to do their own searching, but still depend on trained information professionals for complex or comprehensive searches. A recent survey by J. Fisher and Susanne Bj¿rner gauged the effect of wide-spread end-user searching on mediated searching. It showed that although a large percentage (85 percent) of corporate sites provide end-user searching, most (91 percent) of these corporations' librarians said that "end-users continue to rely on professionals for all but the most basic research" [1]. Even with a variety of searching tools and extensive training, evaluator P.T. Bysouth concludes that scientific end-users have relatively simple queries [2]. Others agree with these conclusions [3,4,5]. Also, many studies show that libraries are often seen as more crucial with the advent of end-user searching--rather than of diminished importance. "The work in the information department has increased rather than decreased in spite of the introduction of end-user searching" [6]. It is clear that end-users will continue to call on information professionals, whether searching takes place on a traditional online service or on the Web. As more end-users search the Web, it is up to information professionals to be familiar with the best advanced Web searching tools, the most effective methods for searching, and the most accurate query structure to handle the more difficult searches and be comfortable teaching others. Becoming expert in the more robust features of Web search engines, and participating in their future development (building on the lessons learned in online searching) is an important role for information professionals, now and in the future. REFERENCES [1] Fisher, J. and Susanne Bj¿rner. "Enabling Online End-User Searching: An Expanding Role for Librarians." Special Libraries (Fall 1994): pp. 281-291. [2] Bysouth, P.T. "Evaluating the Use of Several Approaches to Online Literature Retrieval by Research Scientists." End-User Searching: The Effective Gateway to Published Information, Association for Information Management (1990). [3] Crea, Kathleen, Jan Glover and Majlen Helenius. "The Impact of In-House and End-User Databases on Mediated Searching." ONLINE 16, No. 4 (July 1992): pp. 49-53. [4] Martin, H. and D. Nicholas. "End-Users Coming of Age? Six Years of End-User Searching at The Guardian." Online & CDROM Review 17, No. 2 (1993): pp. 83-89. [5] Brody, Roberta. "End-Users in 1993: After a Decade." ONLINE 17, No. 3 (May 1993): pp. 66-69. [6] Butcher, H. "The BDO Binder Hamlyn End-User Project." End-User Searching: The Effective Gateway to Published Information. Association for Information Management (1990). ---------------------------------------------------------------------------- Solving Searching Mysteries and False Drops Most searchers like to try a few quick searches to test a new search engine. Who can resist typing in a few terms into a blank box and seeing what comes up? Often, sifting through a pile of less relevant material, you find some truly interesting results. Equally as often, something appears that makes you wonder where it came from. Search engines on the Web have some defaults and basic features that are different, and thus not intuitive, for professional searchers. These settings are often the culprits that cause those unusual false drops. For example, the default for many Web engines is to OR terms together, then provide results based on relevancy. This combination produces a retrieval that has all terms present in the first few hits, and then fewer terms as you move through your hit list. This explains why, even though your terms were ORed together, the first hits are exactly what you are looking for, and the last few do not even contain all of your search terms. Unless you specifically ask to AND terms together, do not trust your search retrieval number to accurately portray the number of hits from your search strategy. Another typical default is automatic truncation on each term. So if your search is for "federal register," you will also retrieve documents with the terms "registering," "registered," and "federalist" in them. Another way of explaining false drops is by determining exactly what the search engine is searching. Usually, the default is the URL, but sometimes a search engine retrieves documents where your search terms appear anywhere in the document. An address might include your search term, but the actual document may not show your term. Pay close attention to any Web engine documentation to clarify just how and what it searches. Those searching mysteries can be solved! [----------------------------------------------------------------] Communications to the authors should be addressed to Peggy J. Zorn, Library Systems Analyst, Mary Emanoil, Document Delivery Specialist, or Lucy Marshall, Systems Administrator, Parke-Davis Pharmaceutical Research Library, 2800 Plymouth Road, Ann Arbor, MI 48105; 313/996-7202; Fax 313/996-7008; Internet--zornm@aa.wl.com, or Mary Panek, Information Specialist, United Technologies Research Center, Mail Stop 169-31, East Hartford, CT 06108. ---------------------------------------------------------------------------- [Online Inc.] [ONLINE] [Issue Contents] [Subscribe] [Top] Corporate Page Home Page Issue Contents To Subscribe Top ---------------------------------------------------------------------------- Copyright © 1996, Online Inc. All rights reserved. Feedback [This site created for best results under Netscape.]