Help for Archive Search
The archive search can search multiple lists on the server, depending on your access level.
There are four main steps involved in running a search:
- Select the list or lists that you would like to search. By default, all lists on the server will display for site maintainers. Individual list owners will see all the lists that they own by default.
- Enter a search string to find messages that you are interested in.
- Select report options that determine how the results appear on the page.
- Click "Search" to run and display the results.
String
Enter the string you are searching for:
- To search for messages about John Kennedy, simply type John Kennedy in the search box. This will show all the messages that contain the words "John" and "Kennedy" close to one another.
- By surrounding John Kennedy in single quotation marks like 'John Kennedy', you limit the search to the exact phrase so that this search would not show messages about "John F. Kennedy".
- For a wider range of results, you could use (John Kennedy) OR JFK so that you also get the messages that say either "John Kennedy" or "JFK".
- To search for words that are not necessarily close to one another, use "AND". For instance, Mozart AND Beethoven would show all the messages that mention both composers, while Mozart Beethoven would only find a small fraction of them.
- To make a search case sensitive, enclose it in double quotation marks. If you are interested in the works of Norman Mailer, you will probably find that searching for Mailer returns a lot of unexpected messages, while "Mailer" gives much better results.
- You can get as sophisticated as you want: ((John Kennedy) OR JFK) AND NOT ((Bay Pigs) OR Cuba) would look for messages about JFK that do not mention Cuba or the Bay of Pigs.
- Some characters have special syntactical meaning to the database functions and must be enclosed in single quotes for correct results. For instance, parentheses need to be quoted in this manner: search for 'f(x)' instead of f(x).
Substring
By default, searches will only match full words. Searching for "planet" will not find messages containing the word "planetarium" (unless they also contain the word "planet"). But if you check the "Substring" box, your search will match any word containing the string you have entered. For instance, a substring search for chem would find both "chemistry" and "alchemy."
Subject Contains
To restrict your search to messages whose subject contains specific search words, simply type them in the "Subject Contains" search box. The syntax is the same as for the "String" box, with one difference: The "AND" operator is redundant because a subject field is very short and all the words are considered to be "close" to one another. Thus, in the "Subject Contains" box, there is no difference between a search for Mozart AND Beethoven and a search for Mozart Beethoven.
Subject searches are a good alternative when searching large archives, or when searching for topics that are mentioned quite often. If a word that you are looking for appears in the subject of a message, it is much more likely to reflect the actual contents of the message than if it only appears in one isolated sentence. On the other hand, what you are looking for may be hidden in a message that was about something else, and where someone just happened to mention your topic of interest in passing.
Author's Address
You can also restrict your search to messages posted by a particular person. If you know the email address of the person who wrote the message you are interested in, this can be a very effective way to find what you are looking for without having to go through dozens of unrelated messages. Note that you do not need to know the exact email address. For instance, if you know that the userid is "john" and the host name is some machine at XYZ.COM, you can simply enter john xyz.com in the search box. Since the author's email address is a single word, there is no concept of "close" vs. "distant," and the AND operator is redundant: john xyz.com and john AND xyz.com are equivalent.
Do not try to use wildcards (for example "john@*.xyz.com") because this is not the correct syntax. The author search box uses the same syntax as the "Subject Contains" and "String" boxes.
Since, Until
Many popular mailing lists have archives spanning 10 or more years of activity. If the mailing list is about technology, you may not be interested in messages that are older than a few years. Or, alternatively, you may happen to know approximately when the information you are looking for was posted to the list. You can use the "Since" and "Until" boxes to restrict your search accordingly.
The syntax is very flexible and you can specify a date and/or time in just about any of the commonly used formats:
- 23 Jun 1986 (self explanatory).
- 1986-06-23 (international date format).
- 1995 or just 95 selects 1 Jan 1995 for the "Since" box or 31 Dec 1995 for the "Until" box.
- APR selects April of the current year, 1st or 30th depending on whether this was entered in the "Since" or "Until" box.
- APRIL 95 is the same as above, but for the year 1995.
- TODAY-7 (7 days ago) makes it easy to get a list of all the messages posted in the past week. You can also use YESTERDAY or TODAY for a shorter time span.
IMPORTANT: The US date format (mm/dd or mm/dd/yy) is not supported because it is ambiguous. Many other countries use dd/mm or dd/mm/yy instead, and to avoid ambiguities LISTSERV only supports the international date format, yyyy-mm-dd or yy/mm/dd.
Sort By
Defines criteria by which the results of the search are sorted when they are displayed. The following options are available; "Date/Time, desc." is the default.
- Date: Sort by the date and time of the message, in ascending order.
- Date/Time, desc.: Sort by the date and time of the message, in descending order.
- Lines: Sort by the length of the message, smallest to largest.
- Lines, desc.: Sort by the length of the message, largest to smallest.
- Subject: Sort by the subject line, in ascending order.
- Subject, desc.: Sort by the subject line, in descending order.
- List name: Sort by the name of the list, in ascending order.
- List name, desc.: Sort by the name of the list, in descending order.
Complete Search
Normally, if the search results span multiple pages, the sort criteria will only apply to the current page (that is, if there are 50 hits per page, LISTSERV will take the first 50 hits returned and then sort them. The next page will have the next 50 hits returned, and these will be sorted by the sort column within that page, but the sort order will not be correct from one page to the next).
If you check the "Complete Search" check box, LISTSERV will sort all the results on all the pages, but such a search will take considerably more time to complete if the search returns a large number of hits.
Non-English searches
Every effort has been made to make ISO-8859-* searches work as transparently as possible, in spite of the complexity of the situation. In order to better understand the cases where searches do not actually work as expected, you should know that the messages are archived in the format that they were originally sent. This will typically include a mix of native 8-bit text, MIME quoted-printable text, MIME base64 text, and other proprietary encoding methods such as WINMAIL.DAT, plus the basic 7-bit text. Each of these messages presents its own challenges:
- Native 8-bit text normally produces the expected results. See below for a list of generic problems that may affect even native 8-bit text.
- MIME quoted-printable text will, in most cases, produce the expected results. Conceptually, the search is carried out as though the =xx escape sequences had been replaced with their corresponding characters before beginning the search. However, soft line breaks (trailing '=' signs) are not processed (the lines are not merged). If the poster's mail client uses soft line breaks to split words in the middle, they will not be recognized. For instance, if the word "house" were written as "hou=" on one line followed by "se" on the next line, LISTSERV would not find a match with the search string "house".
- MIME base64 text is not supported by the search interface. This type of encoding should only be used for binary data because it is totally unintelligible to people without a MIME user interface and because it is context sensitive (that is, LISTSERV would have to decode the entire message before beginning the search).
- Proprietary encoding methods such as WINMAIL.DAT are not supported by the search interface. In most cases, these formats suffer from the same kind of problems as MIME base64 text, and the mail programs that generate these messages are being replaced with MIME-capable programs.
- 7-bit text (with national characters) does not work at all. It is impossible to translate this text to native 8-bit form without knowing the language in which it is written.
In addition, there are a number of generic problems that affect all message formats:
- Code page: a typical international archive will contain messages in a variety of incompatible code pages (Latin-1, Icelandic, and so forth). While LISTSERV knows the code page of each of the individual messages, it does not know the code page of the search string you are entering, nor does it support searches that span multiple code pages. If you search for one of the characters in the Icelandic code page, LISTSERV may incorrectly match messages written in another code page in which this character is not present, but where another character with the same binary code was found in the message.
- Case-insensitive searches: special tables are required to properly evaluate case-insensitive searches with non-ASCII characters. The tables LISTSERV uses were designed for the Latin-1 (ISO-8859-1) code page and may not give correct results with other code pages.
- EBCDIC systems: LISTSERV servers running on EBCDIC systems may give incorrect results due to the multiple ASCII-EBCDIC translation steps involved in processing your request. The TCP/IP product, the SMTP server, the Web server, and LISTSERV each have their own tables, which may or may not be identical.
|