lucene按字段排序的算法

人气：2241 2008-03-21

FieldCacheImpl的作用，用于将结果按字段排序（sort）的算法

在lucene里，除了默认的排序算法以外，它支持按某几个字段排序，类似数据库的“order by“。下面介绍一下它的原理。
在lucene里，做索引的term是先按fieldname，再按fieldvalue排序的，这样的话同一个field的term是连续的，类似于上图中的上半部分，F1是fieldname，V1,V2,V3是fieldvalue，0～10是DocID，这里总共11个Document。若要将检索结果按F1排序的话，lucene v1.4.3会将所有F1的value加载到内存中去，形成如上图下半部分，它是一个数组(该数组存储在FieldCacheImpl中)，下标是DocID，数组元素内容是F1的值。加载之后检索的过程中就根据DocID去该数组里找相应的值来计算每个document的得分。示列代码：
    // 按ID字段来排序
    Searcher searcher = new IndexSearcher(indexDir);
    Analyzer analyzer = new ChineseAnalyzer();
    Query query = QueryParser.parse(keyword, "contents", analyzer);
    Sort sort = new Sort(new SortField("ID",SortField.STRING));
    hits = searcher.search(query,sort);
Lucene V1.4.3支持整型，浮点，字符串字段的排序，而且这些字段不能被分词“untokenized”。
下面的英文介绍肯定有帮助^_^。

/**
* Encapsulates sort criteria for returned hits.
*
* <p>The fields used to determine sort order must be carefully chosen.
* Documents must contain a single term in such a field,
* and the value of the term should indicate the document's relative position in
* a given sort order. The field must be indexed, but should not be tokenized,
* and does not need to be stored (unless you happen to want it back with the
* rest of your document data). In other words:
*
* <dl><dd><code>document.add (new Field ("byNumber", Integer.toString(x), false, true, false));</code>
* </dd></dl>
*
* <p><h3>Valid Types of Values</h3>
*
* <p>There are three possible kinds of term values which may be put into
* sorting fields: Integers, Floats, or Strings. Unless
* {@link SortField SortField} objects are specified, the type of value
* in the field is determined by parsing the first term in the field.
*
* <p>Integer term values should contain only digits and an optional
* preceeding negative sign. Values must be base 10 and in the range
* <code>Integer.MIN_VALUE</code> and <code>Integer.MAX_VALUE</code> inclusive.
* Documents which should appear first in the sort
* should have low value integers, later documents high values
* (i.e. the documents should be numbered <code>1..n</code> where
* <code>1</code> is the first and <code>n</code> the last).
*
* <p>Float term values should conform to values accepted by
* {@link Float Float.valueOf(String)} (except that <code>NaN</code>
* and <code>Infinity</code> are not supported).
* Documents which should appear first in the sort
* should have low values, later documents high values.
*
* <p>String term values can contain any valid String, but should
* not be tokenized. The values are sorted according to their
* {@link Comparable natural order}. Note that using this type
* of term value has higher memory requirements than the other
* two types.
*
* <p><h3>Object Reuse</h3>
*
* <p>One of these objects can be
* used multiple times and the sort order changed between usages.
*
* <p>This class is thread safe.
*
* <p><h3>Memory Usage</h3>
*
* <p>Sorting uses of caches of term values maintained by the
* internal HitQueue(s). The cache is static and contains an integer
* or float array of length <code>IndexReader.maxDoc()</code> for each field
* name for which a sort is performed. In other words, the size of the
* cache in bytes is:
*
* <p><code>4 * IndexReader.maxDoc() * (# of different fields actually used to sort)</code>
*
* <p>For String fields, the cache is larger: in addition to the
* above array, the value of every term in the field is kept in memory.
* If there are many unique terms in the field, this could
* be quite large.
*
* <p>Note that the size of the cache is not affected by how many
* fields are in the index and <i>might</i> be used to sort - only by
* the ones actually used to sort a result set.
*
* <p>The cache is cleared each time a new <code>IndexReader</code> is
* passed in, or if the value returned by <code>maxDoc()</code>
* changes for the current IndexReader. This class is not set up to
* be able to efficiently sort hits from more than one index
* simultaneously.*/

技术文档欢迎使用技术文档，我们为你提供从新手到专业开发者的所有资源，你也可以通过它日益精进

lucene按字段排序的算法

https访问

7*24小时服务

专业一线支持

7天无理由退款

关于我们

产品与服务

常见问题

技术支持

欢迎登录福佳jsp空间

技术文档 欢迎使用技术文档，我们为你提供从新手到专业开发者的所有资源，你也可以通过它日益精进

lucene按字段排序的算法

https访问

7*24小时服务

专业一线支持

7天无理由退款

关于我们

产品与服务

常见问题

技术支持

技术文档欢迎使用技术文档，我们为你提供从新手到专业开发者的所有资源，你也可以通过它日益精进