Josh looked around for leads, and eventually we pitched an idea to a company called The National Journal. The National Journal published a lobbyist's phone book called The Capital Source, but it was available only in two forms: an actual paper phone book, and a file in comma-delimited ASCII format. We pointed out that if a customer wanted to send a mass-mailing, the paper phone book was virtually useless; it was simply too tedious to manually copy every name and address out of the book. Although the database was also available on disk, most users of the phone book simply were not computer-literate enough to use a raw text dump of names and addresses in a comma-delimited ASCII format!
Our idea was to write a custom-tailored search engine for the Capital Source with a friendly graphical user interface. We envisioned an application that would allow users to browse the phone book by section, as well as searching the entire phone book for all records that contained certain key words. Users would also be able to save records to a file in a format suitable for printing mailing labels.
The National Journal was intrigued with our proposal and asked us to show them a prototype. I set to work developing a search engine that would both be powerful and extremely fast. Keep in mind that this was 1991, and one was just as likely to find an old 8088-based PC on someone's desk as a 386, so efficiency was foremost on my mind. After some initial testing, I realized that simply loading the entire phone book into memory and searching for key words using basic pattern matching wasn't nearly fast enough for the snappy interactive application we'd had in mind. I had to find some way to make the search faster -- reducing the search time from several minutes down to one second or less, even when running on an 8088.
After some thought, I came to a critical realization: I could do all of the searching before the user even starts the application! If I was trying to write a text editor, a search function would always have to search whatever text the user had already typed, and there was no way to predict what that text would be beforehand. But when writing my search program, I realized that the deck was stacked -- I knew exactly what text I would be asked to search, because the text of the phone book was going to be shipped right along with the search engine, and there was no way for the user to enter new unrestricted text.
My solution was to write a program that would pre-process the entire phone book, generating an index of every unique word that exists in the text, along with the number of occurrences and the exact location of each occurrence. The indexing software was run off-line, and the indices that it generated were part of the distribution of the prototype program. The search application loaded those indices and was able to execute searches in a fraction of a second.
After a few weeks of development, I had a working prototype. It was a DOS application developed with Turbo Pascal 6.0, using the Turbo Vision GUI libraries. We made another appointment with the National Journal, and presented our application after staying up all night working on a makeshift User Manual. We presented the prototype to some people at the National Journal; they seemed impressed, and told us they'd get back to us in a few days with their decision on what to do next.
Time passed, and the National Journal still hadn't gotten back to us. Eventually, the summer started to wind down, and Josh and I were getting ready to head off to our freshman year at college (him at Yale, me at Johns Hopkins). The National Journal never really gave us a firm decision one way or the other, but eventually we got the idea that they weren't interested. Oh well -- chalk one up to experience.
Screen shots courtesy of SnagIt/32.
Back to my software page
Back to my home page