Elasticsearch数据分析实践之聚合功能介绍

1. terms聚合(group by+sum)

使用场景

获取数据中某个字段的值列表,例如nginx访问日志中clientip列表,通过内置排序取前10即为网站TOP10访客。

query DSL

{
  "size": 0,
  "aggs": { [1]
    "topIP": { [2]
      "terms": { [3]
        "field": "clientip",
        "size": 10,
        "order": {
          "_count": "desc"
        }
      }
    }
  }
}
-1- 聚合操作被置于顶层参数 aggs 之下(如果你愿意,完整形式 aggregations 同样有效)
-2- 然后,可以为聚合指定一个我们想要名称,本例中是: topIP
-3- 最后,定义单个桶的类型 terms

response

"aggregations": {
    "topIP": {
      "buckets": [
        {
          "key": "123.59.215.53",
          "doc_count": 961868
        },
        {
          "key": "115.231.24.219",
          "doc_count": 804120
        },
        {
          "key": "58.215.139.89",
          "doc_count": 439965
        },
        ……
      ]
    }
  }

可视化

饼图
饼图

条形图
条形图

2. date_histogram聚合

使用场景

时间维度上构建指标分析,如:

  • 网站今天每小时的访问量是多少?
  • 网站今天每小时的平均响应时间是多少?

query DSL

1 网站今天每小时的访问量是多少?
{
  "size": 0,
  "aggs": {
    "page_view": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "time_zone": "Asia/Shanghai",
        "format": "yyyy-MM-dd HH:mm",
        "min_doc_count": 1,
        "extended_bounds": {
          "min": 1493827200000,
          "max": 1493913599999
        }
      }
    }
  }
}
2 网站今天每小时的访问量和平均响应时间是多少?
{
  "size": 0,
  "aggs": {
    "page_view": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "time_zone": "Asia/Shanghai",
        "format": "yyyy-MM-dd HH:mm",
        "min_doc_count": 1,
        "extended_bounds": {
          "min": 1493827200000,
          "max": 1493913599999
        }
      },
      "aggs": {
        "avg_resp_time": {
          "avg": {
            "field": "upstream_response_time"
          }
        }
      }
    }
  }
}

response

"aggregations": {
    "page_view": {
      "buckets": [
        {
          "avg_resp_time": {
            "value": 0.008664544491125232
          },
          "key_as_string": "2017-05-04 09:00",
          "key": 1493859600000,
          "doc_count": 321402
        },
        {
          "avg_resp_time": {
            "value": 0.015245752238360864
          },
          "key_as_string": "2017-05-04 10:00",
          "key": 1493863200000,
          "doc_count": 456973
        },
        {
          "avg_resp_time": {
            "value": 0.01839558196852533
          },
          "key_as_string": "2017-05-04 11:00",
          "key": 1493866800000,
          "doc_count": 754249
        },
        {
          "avg_resp_time": {
            "value": 0.11747828058831987
          },
          "key_as_string": "2017-05-04 12:00",
          "key": 1493870400000,
          "doc_count": 530589
        },
        {
          "key_as_string": "2017-05-04 13:00",
          "key": 1493874000000,
          "doc_count": 64519
        }
      ]
    }
  }

可视化

3. histogram聚合

使用场景

针对数值型指标,通过设定间隔大小,快速绘制条形图。

query DSL

{
  "size": 0,
  "aggs": {
    "page_view": {
      "histogram": {
        "field": "bytes",
        "interval": 1024
      }
    }
  }
}

response

"aggregations": {
    "topIP": {
      "buckets": [
        {
          "key": 0,
          "doc_count": 199002
        },
        {
          "key": 1024,
          "doc_count": 296
        },
        {
          "key": 2048,
          "doc_count": 1754
        },
        {
          "key": 3072,
          "doc_count": 13
        },
        {
          "key": 4096,
          "doc_count": 15
        },
        {
          "key": 5120,
          "doc_count": 131
        },
        ……
      ]
    }
  }

可视化

4. cardinality聚合(unique count)

使用场景

统计去重后的数量,它提供一个字段的基数,即该字段的 distinct 或者 unique 值的数目。 SQL 形式为:

SELECT COUNT(DISTINCT color) FROM cars

query DSL

{
"size": 0,
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "30s",
        "time_zone": "Asia/Shanghai",
        "min_doc_count": 1,
        "extended_bounds": {
          "min": 1493881370817,
          "max": 1493882270817
        }
      },
      "aggs": {
        "1": {
          "cardinality": {
            "field": "uri_path"
          }
        }
      }
    }
  }
}

可视化

标签: elasticsearch

添加新评论