“ElasticSearch聚合分析：指标聚合（metrics）”的版本间差异

2021年5月21日 (五) 00:47的版本

关于

ES 指标聚合，就是类似 SQL 的统计函数，指标聚合可以单独使用，也可以跟桶聚合一起使用。

常用的统计函数如下：

Value Count：类似 sql 的 count 函数，统计总数；
Cardinality：类似SQL的count(DISTINCT 字段)，统计不重复的数据总数；
Avg：求平均值；
Sum：求和；
Max：求最大值；
Min：求最小值；

Value Count

值聚合，主要用于统计文档总数，类似 SQ L的 count 函数。

示例：

GET /sales/_search?size=0
{
    "aggs": {
        "types_count": { // 聚合查询的名字，随便取个名字
            "value_count": { // 聚合类型为：value_count
                "field": "type" // 计算 type 这个字段值的总数
            }
        }
    }
}

等级SQL：

select count(type) from sales

返回结果：

{
    ...
    "aggregations": {
        "types_count": { // 聚合查询的名字
            "value": 7 // 统计结果
        }
    }
}

Cardinality

基数聚合，也是用于统计文档的总数，跟Value Count的区别是，基数聚合会去重，不会统计重复的值，类似 SQL 的 count(DISTINCT 字段) 用法。

基数聚合的作用等价于 SQL 的 count(DISTINCT 字段) 的用法，其实不太准确，因为 SQL 的 count 统计结果是精确统计不会丢失精度，但是 ES 的 cardinality 基数聚合统计的总数是一个【近似值】，会有一定的误差，这么做的目的是为了性能，因为在海量的数据中精确统计总数是非常消耗性能的，但是很多业务场景不需要精确的结果，只要近似值。例如：统计网站一天的访问量，有点误差没关系。

示例：

POST /sales/_search?size=0
{
    "aggs" : {
        "type_count" : { // 聚合查询的名字，随便取一个
            "cardinality" : { // 聚合查询类型为：cardinality
                "field" : "type" // 根据type这个字段统计文档总数
            }
        }
    }
}

等级SQL：

select count(DISTINCT type) from sales

返回结果：

{
    ...
    "aggregations" : {
        "type_count" : { // 聚合查询的名字
            "value" : 3 // 统计结果
        }
    }
}

Avg

示例：

POST /exams/_search?size=0
{
  "aggs": {
    "avg_grade": { // 聚合查询名字，随便取一个名字
      "avg": { // 聚合查询类型为: avg
        "field": "grade" // 统计grade字段值的平均值
      }
    }
  }
}

返回结果：

{
    ...
    "aggregations": {
        "avg_grade": { // 聚合查询名字
            "value": 75.0 // 统计结果
        }
    }
}

Sum

示例：

POST /sales/_search?size=0
{
  "aggs": {
    "hat_prices": { // 聚合查询名字，随便取一个名字
      "sum": { // 聚合类型为：sum
        "field": "price" // 计算price字段值的总和
      }
    }
  }
}

返回结果：

{
    ...
    "aggregations": {
        "hat_prices": { // 聚合查询名字
           "value": 450.0 // 统计结果
        }
    }
}

Max

示例：

POST /sales/_search?size=0
{
  "aggs": {
    "max_price": { // 聚合查询名字,随便取一个名字
      "max": { // 聚合类型为：max
        "field": "price" // 求price字段的最大值
      }
    }
  }
}

返回结果：

{
    ...
    "aggregations": {
        "max_price": { // 聚合查询名字
            "value": 200.0 // 最大值
        }
    }
}

Min

示例：

POST /sales/_search?size=0
{
  "aggs": {
    "min_price": { // 聚合查询名字，随便取一个
      "min": { // 聚合类型为: min
        "field": "price" // 求price字段值的最小值
      }
    }
  }
}

返回结果：

{
    ...

    "aggregations": {
        "min_price": { // 聚合查询名字
            "value": 10.0 // 最小值
        }
    }
}

综合示例

实际应用中经常先通过 query 查询，搜索索引中的数据，然后对 query 查询的结果进行统计分析。

示例：

GET /sales/_search
{
  "size": 0, // size = 0,代表不想返回query查询结果，只要统计结果
  "query": { // 设置query查询条件，后面的aggs统计，仅对query查询结果进行统计
    "constant_score": {
      "filter": {
        "match": {
          "type": "hat"
        }
      }
    }
  },
  "aggs": { // 统计query查询结果, 默认情况如果不写query语句，则代表统计所有数据
    "hat_prices": { // 聚合查询名字，计算price总和
      "sum": {
        "field": "price"
      }
    },
    "min_price": { // 聚合查询名字，计算price最小值
      "min": { 
        "field": "price" 
      }
    },
    "max_price": { // 聚合查询名字，计算price最大值
      "max": { 
        "field": "price"
      }
    }
  }
}

返回结果：

{
    ...
    "aggregations": {
        "hat_prices": { // 求和
           "value": 450.0
        },
        "min_price": { // 最小值
            "value": 10.0 
        },
        "max_price": { // 最大值
            "value": 200.0 
        }
    }
}

“ElasticSearch聚合分析：指标聚合（metrics）”的版本间差异

2021年5月21日 (五) 00:47的版本

目录

关于

Value Count

Cardinality

Avg

Sum

Max

Min

综合示例

导航菜单

@@ 第2行： / 第2行： @@
 == 关于 ==
+ES 指标聚合，就是类似 SQL 的统计函数，指标聚合可以单独使用，也可以跟桶聚合一起使用。
+常用的统计函数如下：
+* '''Value Count'''：类似 sql 的 count 函数，统计总数；
+* '''Cardinality'''：类似SQL的count(DISTINCT 字段)， 统计不重复的数据总数；
+* '''Avg'''：求平均值；
+* '''Sum'''：求和；
+* '''Max'''：求最大值；
+* '''Min'''：求最小值；
+== Value Count ==
+'''值聚合'''，主要用于统计文档总数，类似 SQ L的 '''count''' 函数。
+示例：
+<syntaxhighlight lang="JSON" highlight="">
+GET /sales/_search?size=0
+{
+    "aggs": {
+        "types_count": { // 聚合查询的名字，随便取个名字
+            "value_count": { // 聚合类型为：value_count
+                "field": "type" // 计算 type 这个字段值的总数
+            }
+        }
+    }
+}
+</syntaxhighlight>
+等级SQL：
+<syntaxhighlight lang="SQL" highlight="">
+select count(type) from sales
+</syntaxhighlight>
+返回结果：
 <syntaxhighlight lang="JSON" highlight="">
+{
+    ...
+    "aggregations": {
+        "types_count": { // 聚合查询的名字
+            "value": 7 // 统计结果
+        }
+    }
+}
+</syntaxhighlight>
+== Cardinality ==
+'''基数聚合'''，也是用于统计文档的总数，跟Value Count的区别是，基数聚合会'''去重'''，不会统计重复的值，类似 SQL 的 '''count(DISTINCT 字段)''' 用法。
+<pre>
+基数聚合的作用等价于 SQL 的 count(DISTINCT 字段) 的用法，其实不太准确，因为 SQL 的 count 统计结果是精确统计不会丢失精度，但是 ES 的 cardinality 基数聚合统计的总数是一个【近似值】，会有一定的误差，这么做的目的是为了性能，因为在海量的数据中精确统计总数是非常消耗性能的，但是很多业务场景不需要精确的结果，只要近似值。例如：统计网站一天的访问量，有点误差没关系。
+</pre>
+示例：
+<syntaxhighlight lang="JSON" highlight="">
+POST /sales/_search?size=0
+{
+    "aggs" : {
+        "type_count" : { // 聚合查询的名字，随便取一个
+            "cardinality" : { // 聚合查询类型为：cardinality
+                "field" : "type" // 根据type这个字段统计文档总数
+            }
+        }
+    }
+}
+</syntaxhighlight>
+等级SQL：
+<syntaxhighlight lang="SQL" highlight="">
+select count(DISTINCT type) from sales
+</syntaxhighlight>
+返回结果：
+<syntaxhighlight lang="JSON" highlight="">
+{
+    ...
+    "aggregations" : {
+        "type_count" : { // 聚合查询的名字
+            "value" : 3 // 统计结果
+        }
+    }
+}
+</syntaxhighlight>
+== Avg ==
+示例：
+<syntaxhighlight lang="JSON" highlight="">
+POST /exams/_search?size=0
+{
+  "aggs": {
+    "avg_grade": { // 聚合查询名字，随便取一个名字
+      "avg": { // 聚合查询类型为: avg
+        "field": "grade" // 统计grade字段值的平均值
+      }
+    }
+  }
+}
+</syntaxhighlight>
+返回结果：
+<syntaxhighlight lang="JSON" highlight="">
+{
+    ...
+    "aggregations": {
+        "avg_grade": { // 聚合查询名字
+            "value": 75.0 // 统计结果
+        }
+    }
+}
+</syntaxhighlight>
+== Sum ==
+示例：
+<syntaxhighlight lang="JSON" highlight="">
+POST /sales/_search?size=0
+{
+  "aggs": {
+    "hat_prices": { // 聚合查询名字，随便取一个名字
+      "sum": { // 聚合类型为：sum
+        "field": "price" // 计算price字段值的总和
+      }
+    }
+  }
+}
+</syntaxhighlight>
+返回结果：
+<syntaxhighlight lang="JSON" highlight="">
+{
+    ...
+    "aggregations": {
+        "hat_prices": { // 聚合查询名字
+           "value": 450.0 // 统计结果
+        }
+    }
+}
+</syntaxhighlight>
+== Max ==
+示例：
+<syntaxhighlight lang="JSON" highlight="">
+POST /sales/_search?size=0
+{
+  "aggs": {
+    "max_price": { // 聚合查询名字,随便取一个名字
+      "max": { // 聚合类型为：max
+        "field": "price" // 求price字段的最大值
+      }
+    }
+  }
+}
+</syntaxhighlight>
+返回结果：
+<syntaxhighlight lang="JSON" highlight="">
+{
+    ...
+    "aggregations": {
+        "max_price": { // 聚合查询名字
+            "value": 200.0 // 最大值
+        }
+    }
+}
+</syntaxhighlight>
+== Min ==
+示例：
+<syntaxhighlight lang="JSON" highlight="">
+POST /sales/_search?size=0
+{
+  "aggs": {
+    "min_price": { // 聚合查询名字，随便取一个
+      "min": { // 聚合类型为: min
+        "field": "price" // 求price字段值的最小值
+      }
+    }
+  }
+}
+</syntaxhighlight>
+返回结果：
+<syntaxhighlight lang="JSON" highlight="">
+{
+    ...
+    "aggregations": {
+        "min_price": { // 聚合查询名字
+            "value": 10.0 // 最小值
+        }
+    }
+}
+</syntaxhighlight>
+== 综合示例 ==
+实际应用中经常先通过 query 查询，搜索索引中的数据，然后对 query 查询的结果进行统计分析。
+示例：
+<syntaxhighlight lang="JSON" highlight="">
+GET /sales/_search
+{
+  "size": 0, // size = 0,代表不想返回query查询结果，只要统计结果
+  "query": { // 设置query查询条件，后面的aggs统计，仅对query查询结果进行统计
+    "constant_score": {
+      "filter": {
+        "match": {
+          "type": "hat"
+        }
+      }
+    }
+  },
+  "aggs": { // 统计query查询结果, 默认情况如果不写query语句，则代表统计所有数据
+    "hat_prices": { // 聚合查询名字，计算price总和
+      "sum": {
+        "field": "price"
+      }
+    },
+    "min_price": { // 聚合查询名字，计算price最小值
+      "min": {
+        "field": "price"
+      }
+    },
+    "max_price": { // 聚合查询名字，计算price最大值
+      "max": {
+        "field": "price"
+      }
+    }
+  }
+}
+</syntaxhighlight>
+返回结果：
+<syntaxhighlight lang="JSON" highlight="">
+{
+    ...
+    "aggregations": {
+        "hat_prices": { // 求和
+           "value": 450.0
+        },
+        "min_price": { // 最小值
+            "value": 10.0
+        },
+        "max_price": { // 最大值
+            "value": 200.0
+        }
+    }
+}
 </syntaxhighlight>