Search My Blog

Thursday, August 19, 2010

RECOLL: a personal text search system for Unix/Linux

Recoll is a personal full text search tool for Unix/Linux.

It is based on the very strong Xapian back-end, for which it provides a feature-rich yet easy to use front-end with a Qt graphical interface.
Recoll is free, open source, and licensed under the GPL. The current version is 1.13.04 (Changes).

Features:

  • Easy installation, few dependancies. No database daemon, web server, desktop environment or exotic language necessary.
  • Will run on most Unix-based systems
  • Qt-based GUI. Can use either Qt 3 or Qt 4.
  • Searches most common document types, emails and their attachments.
  • Powerful query facilities, with boolean searches, phrases, proximity, wildcards, filter on file types and directory tree.
  • Multi-language and multi-character set with Unicode based internals.
  • (more detail)
Recoll user ? Maybe there are still a few useful tricks that you don't know about. A quick look at the search tips might prove useful !

Go there...
http://www.lesbonscomptes.com/recoll/

Chapter 1. Introduction


Table of Contents
1.1. Giving it a try
1.2. Full text search
1.3. Recoll overview

1.1. Giving it a try

If you do not like reading manuals (who does?) and would like to give Recoll a try, just perform installation and start the recoll user interface, which will index your home directory by default, allowing you to search immediately after indexing completes.
Do not do this if your home directory contains a huge number of documents and you do not want to wait or are very short on disk space. In this case, you may first want to customize the configuration to restrict the indexed area.
Also be aware that you may need to install the appropriate supporting applications for document types that need them (for example antiword for ms-word files).

Go there...
http://www.lesbonscomptes.com/recoll/usermanual/rcl.introduction.html#RCL.INTRODUCTION.TRYIT

7.2. Supporting packages

Recoll uses external applications to index some file types. You need to install them for the file types that you wish to have indexed (these are run-time optional dependencies. None is needed for building or running Recoll except for indexing their specific file type).
After an indexing pass, the commands that were found missing can be displayed from the recoll File menu. The list is stored in the missing text file inside the configuration directory.
A list of common file types which need external commands follows. Many of the filters need the iconv command, which is not always listed as a dependancy.
As of Recoll release 1.14, a number of XML-based formats that were handled by ad hoc filter code now use xsltproc, which usually comes with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
  • Openoffice: supported natively, but needs the unzip command to be installed.
  • PDF: pdftotext is part of the Xpdf or Poppler packages.
  • Postscript: pstotext.
  • MS Word: antiword.
  • MS Excel and PowerPoint: catdoc.
  • MS Open XML (docx): needs xsltproc.
  • Wordperfect files: libwpd.
  • RTF: unrtf
  • TeX: Recoll uses the untex program. Your distribution may have a package for it. If it doesn't, there is a copy of the source on the Recoll web site, because the program has no obvious home. The filter can also work with detex and will use it if it is installed.
  • dvi: dvips
  • djvu: DjVuLibre
  • mp3: Recoll will use the id3info command from the id3lib package to extract tag information. Without it, only the file names will be indexed. Some gcc versions after 4.4 may have trouble compiling id3lib. You can find a workaround here.
  • flac files need metaflac (standard flac tools).
  • ogg files need ogginfo (vorbis tools).
  • Pictures: Recoll uses the Exiftool Perl package to extract tag information. Most image file formats are supported. Note that there may not be much interest in indexing the technical tags (image size, aperture, etc.). This is only of interest if you store personal tags or textual descriptions inside the image files.
  • chm: files in microsoft help format need Python and the pychm module (which needs chmlib).
  • ics: up to Recoll 1.13, iCalendar files need Python and the icalendar module. For newer versions, icalendar is not needed
  • zip: Zip archives need Python (and the standard zipfile module).
Text, HTML, mail folders, Openoffice and Scribus files are processed internally. Lyx is used to index Lyx files. Many filters need iconv and the standard sed and awk.

Prev Home Next
Installation Up Building from source

Go there...
http://www.lesbonscomptes.com/recoll/usermanual/rcl.install.external.html



Recoll is in my Fedora 13 Repo's...

Don

No comments: