Solr：fieldNorm每个文档不同，没有文档提升

我希望我的搜索结果按照他们正在进行的分数排序，但分数计算不正确。这就是说，不一定是不正确的，但与预期不同，我不知道为什么。我的目标是删除任何改变分数的内容。如果我执行匹配两个对象的搜索（其中ObjectA的分数高于ObjectB），则首先返回ObjectB。让我们说，对于这个例子，我的查询是一个单词：“apples”。 ObjectA的标题：“苹果是苹果”（2/3条款） ObjectA的描述：“苹果 - 苹果里有苹果，现在苹果遍布苹果的所有苹果！” （6/18条款） ObjectB的标题：“苹果很棒”（1/3条款）对象B的描述：“在苹果房间里有苹果，现在苹果在苹果上都变坏了！” （4/18条款）标题字段没有提升（或者更确切地说，提升为1），描述字段的提升为0.8。我没有通过solrconfig.xml或通过我正在通过的查询指定文档提升。如果有另一种指定文档提升的方法，那么我有可能错过一个。在分析了explain打印输出后，看起来ObjectA正在计算比ObjectB更高的分数，就像我想要的那样，除了一个区别：ObjectB的标题fieldNorm总是高于ObjectA。以下是explain打印输出。大家都知道：标题字段是mditem5_tns，描述字段是mditem7_tns：

ObjectB:
1.3327172 = (MATCH) sum of:
  1.0352166 = (MATCH) max plus 0.1 times others of:
    0.9766194 = (MATCH) weight(mditem5_tns:appl in 0), product of:
      0.53929156 = queryWeight(mditem5_tns:appl), product of:
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.8109303 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of:
        1.0 = tf(termFreq(mditem5_tns:appl)=1)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        1.0 = fieldNorm(field=mditem5_tns, doc=0)
    0.58597165 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of:
      0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of:
        0.8 = boost
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.3581977 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of:
        2.0 = tf(termFreq(mditem7_tns:appl)=4)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.375 = fieldNorm(field=mditem7_tns, doc=0)
  0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of:
    0.999001 = 1000.0/(1.0*float(1)+1000.0)
    1.0 = boost
    0.2977981 = queryNorm

ObjectA:
1.2324848 = (MATCH) sum of:
  0.93498427 = (MATCH) max plus 0.1 times others of:
    0.8632177 = (MATCH) weight(mditem5_tns:appl in 0), product of:
      0.53929156 = queryWeight(mditem5_tns:appl), product of:
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.6006513 = (MATCH) fieldWeight(mditem5_tns:appl in 0), product of:
        1.4142135 = tf(termFreq(mditem5_tns:appl)=2)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.625 = fieldNorm(field=mditem5_tns, doc=0)
    0.7176658 = (MATCH) weight(mditem7_tns:appl^0.8 in 0), product of:
      0.43143326 = queryWeight(mditem7_tns:appl^0.8), product of:
        0.8 = boost
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.2977981 = queryNorm
      1.6634457 = (MATCH) fieldWeight(mditem7_tns:appl in 0), product of:
        2.4494898 = tf(termFreq(mditem7_tns:appl)=6)
        1.8109303 = idf(docFreq=3, maxDocs=9)
        0.375 = fieldNorm(field=mditem7_tns, doc=0)
  0.2975006 = (MATCH) FunctionQuery(1000.0/(1.0*float(top(rord(lastmodified)))+1000.0)), product of:
    0.999001 = 1000.0/(1.0*float(1)+1000.0)
    1.0 = boost
    0.2977981 = queryNorm

已邀请:

2 个回复

翁茄口霉氖

这个问题是由词干器造成的。它将“apple is apples”扩展为“apple appl are apples appl”，从而使该领域更长。由于文件B仅包含由词干提取器扩展的1个术语，因此字段保持比文档A短。这导致不同的fieldNorms。

慷祈霖黑

FieldNOrm由3个组件计算 - 字段上的索引时间提升，文档的索引时间提升和字段长度。假设您没有提供任何索引时间提升，则差异必须是字段长度。因此，由于lengthNorm对于较短的字段值较高，因此B对于标题具有较高的fieldNorm值，它在标题中的标记数必须少于A. 有关Lucene评分的详细说明，请参阅以下页面： http://lucene.apache.org/java/2_4_0/scoring.html http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html

要回复问题请先登录或注册

Solr：fieldNorm每个文档不同，没有文档提升

2 个回复

发起人

solr_boost

问题状态

Solr：fieldNorm每个文档不同，没有文档提升

与内容相关的链接

2 个回复

发起人

solr_boost

问题状态