Quantcast
Channel: elasticsearch
Viewing all articles
Browse latest Browse all 96

Parsing and Centralizing Elasticsearch Logs with Logstash

$
0
0

No, it’s not an endless loop waiting to happen, the plan here is to use Logstash to parse Elasticsearch logs and send them to another Elasticsearch cluster or to a log analytics service like Logsene (which conveniently exposes the Elasticsearch API, so you can use it without having to run and manage your own Elasticsearch cluster).

If you’re looking for some ELK stack intro and you think you’re in the wrong place, try our 5-minute Logstash tutorial. Still, if you have non-trivial amounts of data, you might end up here again. Because you’ll probably need to centralize Elasticsearch logs for the same reasons you centralize other logs:

  • to avoid SSH-ing into each server to figure out why something went wrong
  • to better understand issues such as slow indexing or searching (via slowlogs, for instance)
  • to search quickly in big logs

In this post, we’ll describe how to use Logstash’s file input to tail the main Elasticsearch log and the slowlogs. We’ll use grok and other filters to parse different parts of those logs into their own fields and we’ll send the resulting structured events to Logsene/Elasticsearch via the elasticsearch output. In the end, you’ll be able to do things like slowlog slicing and dicing with Kibana:

logstash_elasticsearch

TL;DR note: scroll down to the FAQ section for the whole config with comments.

Tailing Files

First, we’ll point the file input to *.log from Elasticsearch’s log directory. This will work nicely with the default rotation, which renames old logs to something like cluster-name.log.SOMEDATE. We’ll use start_position => “beginning”, to index existing content as well. We’ll add the multiline codec to parse exceptions nicely, telling it that every line not starting with a [ sign belongs to the same event as the previous line.

input {
  file {
    path => "/var/log/elasticsearch/*.log"
    type => "elasticsearch"
    start_position => "beginning"
    codec => multiline {
      pattern => "^\["
      negate => true
      what => "previous"
    }
  }
}

Parsing Generic Content

A typical Elasticsearch log comes in the form of:

[2015-01-13 15:42:24,624][INFO ][node ] [Atleza] starting ...

while a slowlog is a bit more structured, like:

[2015-01-13 15:43:17,160][WARN ][index.search.slowlog.query] [Atleza] [aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

But fields from the beginning, like timestamp and severity, are common, so we’ll parse them first:

grok {
  match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?(.|\r|\n)*)" ]
  overwrite => [ "message" ]
}

For the main Elasticsearch logs, the message field now contains the actual message, without the timestamp, severity, and log source, which are now in their own fields.

Parsing Slowlogs

For slowlogs, the message field now looks like this:

[aa][3] took[19.9ms], took_millis[19], types[], stats[], search_type[QUERY_THEN_FETCH], total_shards[5], source[{"query":{"term":{"a":2}}}], extra_source[],

First we’ll parse the index name and the shard number via grok, then the kv filter will take care of the name-value pairs that follow:

if "slowlog" in [path] {
  grok {
    match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
  }
  kv {
    source => "kv_pairs"
    field_split => " \],"
    value_split => "\["
  }
}

Some Cleanup

Now our logs are fully parsed, but there are still some niggles to take care of. One is that each log’s timestamp (the time logged by the application) is in the timestamp field, while the standard @timestamp was added by Logstash when it read that event. If you want @timestamp to hold the application-generated timestamp, you can do it with the date filter:

date {
  "match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
  target => "@timestamp"
}

Other potentially annoying things:

  • at this point, timestamp contains the same data as @timestamp
  • the content of kv_pairs from slowlogs is already parsed by the kv filter
  • the log type (for example, index.search.slowlog.query) is in a field called log_source, to make room for a field called source which stores other things (the JSON query, in this case). I would rather store index.search.slowlog.query in source, especially if I’m using the Logsene UI, where I can filter on sources by clicking on them
  • the grok and kv filters parse all fields as strings. Even if some of them, like took_millis, are numbers

To fix all of the above (remove, rename and convert fields) we’ll use the mutate filter:

mutate {
  remove_field => [ "kv_pairs", "timestamp" ]
  rename => {
    "source" => "source_body"
    "log_source" => "source"
  }
  convert => {
    "took_millis" => "integer"
    "total_shards" => "integer"
    "shard" => "integer"
  }
}

Sending Events to Logsene/Elasticsearch

Below is an elasticsearch output configuration that works well with Logsene and Logstash 1.5.0 beta 1. For an external Elasticsearch cluster, you can simply specify the host name and protocol (we recommend HTTP because it’s easier to upgrade both Logstash and Elasticsearch):

output {
  elasticsearch {
    host => "logsene-receiver.sematext.com"
    ssl => true
    port => 443
    index => "LOGSENE-TOKEN-GOES-HERE"
    protocol => "http"
    manage_template => false
  }
}

If you’re using Logstash 1.4.2 or earlier, there’s no SSL support, so you’ll have to remove the ssl line and set port to 80.

FAQ

Q: Cool, this works well for logs. How about monitoring Elasticsearch metrics like how much heap is used or how many cache hits I get?
A: Check out our SPM, which can monitor lots of applications, including Elasticsearch. If you’re a Logsene user, too, you’ll be able to correlate logs and metrics
Q: I find this logging and parsing stuff is really exciting.
A: Me too. If you want to join us, we’re hiring worldwide
Q: I’m here from the TL;DR note. Can I get the complete config?
A: Here you go (please check the comments for things you might want to change)

input {
  file { 
    path => "/var/log/elasticsearch/*.log"  # tail ES log and slowlogs
    type => "elasticsearch"
    start_position => "beginning"  # parse existing logs, too
    codec => multiline {   # put the whole exception in a single event
      pattern => "^\["
      negate => true
      what => "previous"
    }
  }
}

filter {
  if [type] == "elasticsearch" {
    grok {  # parses the common bits
      match => [ "message", "\[%{TIMESTAMP_ISO8601:timestamp}\]\[%{DATA:severity}%{SPACE}\]\[%{DATA:log_source}%{SPACE}\]%{SPACE}\[%{DATA:node}\]%{SPACE}(?<message>(.|\r|\n)*)" ]
      overwrite => [ "message" ]
    }

    if "slowlog" in [path] {  # slowlog-specific parsing
      grok {  # parse the index name and the shard number
        match => [ "message", "\[%{DATA:index}\]\[%{DATA:shard}\]%{GREEDYDATA:kv_pairs}" ]
      }
      kv {    # parses named fields
        source => "kv_pairs"
        field_split => " \],"
        value_split => "\["
      }
    }

    date {  # use timestamp from the log
      "match" => [ "timestamp", "YYYY-MM-DD HH:mm:ss,SSS" ]
      target => "@timestamp"
    }

    mutate {
      remove_field => [ "kv_pairs", "timestamp" ]  # remove unused stuff
      rename => {  # nicer field names (especially good for Logsene)
        "source" => "source_body"
        "log_source" => "source"
      }
      convert => {  # type numeric fields (they're strings by default)
        "took_millis" => "integer"
        "total_shards" => "integer"
        "shard" => "integer"
      }
    }

  }
}

output {
  elasticsearch {   # send everything to Logsene
    host => "logsene-receiver.sematext.com"
    ssl => true  # works with Logstash 1.5+
    port => 443  # use 80 for plain HTTP
    index => "LOGSENE-APP-TOKEN-GOES-HERE"  # fill in your token (click Integration from your Logsene app)
    protocol => "http"
    manage_template => false
  }
}

Filed under: Logging Tagged: elasticsearch, grok, kibana, log analytics, log management, logging, logsene, logstash, parsing, slowlog

Viewing all articles
Browse latest Browse all 96

Trending Articles