D’Arcy Norman dot net

ce n’est pas la connaissance.

Searching PDF with ht://Dig

without comments

I’ve just enabled indexing and searching of .pdf documents on the Learning Commons website.

We’re using ht:/Dig as our search engine, and it’s quite flexible. It can take external parsers to teach it to read non-text-only file formats. There are libraries available that can teach it to read .rtf, .pdf, .ps, .doc, .swf, .xls, and even .ppt files.

For now, I’ve only added the .pdf parser, using the Xpdf library. There was no binary available for MacOSX, so I had to compile from source. Here’s a link to the compiled binaries for MacOSX (compiled without support for the X11 windowing system - these are just the command line utilities). Just drop them in /usr/local/bin and enjoy!

Written by dnorman

April 23rd, 2004 at 2:04 pm

Posted in Uncategorized

Tagged with

Leave a Reply

Creative Commons License
This work is licensed under a Creative Commons Attribution 2.5 Canada License.