Lucene Index Merge and Optimisation

Lucene index merge has some parameters that effect how the index is built. This has an impact on the index operations other than search. The MergeFactor controls how many documents are stored within each segment before a new one is started and how many are started before they are collected into a larger one. So a Factor of 10 means, 10 documents before aggregating and 10 aggregated indexes of a certain size before aggregating again. Consequently MergeFactor controls the number of open files.

The higher the merge factor the faster the index build as merging of segments is less frequent. However this causes a significant slow down in the speed which an index can be added to an existing one as this appears to depend on the number of files lucene has to open.

The next one is the MaxBufferedDocs parameter which controls the number of documents to buffer in memory before flushing to disk. For a batch index operation the higher this is the higher the index performance but the more memory will be consumed.

And then there is a MaxMergeDocs which limits the maximum number of documents within a segment above which merging does not happen. This is used to limit the files size, so that no file is over 2G on a 32bit system.

In running the Sakai search indexer operations I have noticed some things in this area

Comments

Leave a Reply




This is a captcha-picture. It is used to prevent mass-access by robots. (see: www.captcha.net)

You must read and type the 5 chars within 0..9 and A..F, and submit the form.

  

Oh no, I cannot read this. Please, generate a