Skip to content

ECS structure logging is not compatible with all collectors as it does not use the nested format #45063

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kajh opened this issue Apr 10, 2025 · 12 comments
Assignees
Labels
type: bug A general bug
Milestone

Comments

@kajh
Copy link

kajh commented Apr 10, 2025

It would be nice with an option to get ecs logging as nested json like

{ "ecs":  {"version": "8.4"}, "service": {"name": "myservice"}}

instead of what we get today

{ "ecs.version": "8.4", "service.name" : "myservice"}

The reason for asking is that collecting and processing of logs would be easier.

We are using Spring Boot 3.4.4.

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Apr 10, 2025
@nosan
Copy link
Contributor

nosan commented Apr 10, 2025

It would be nice with an option to get ecs logging as nested json like

I'm not sureif this is advisable, as it conflicts with the ECS Schema. Currently, ECS Schema does
not support a format such as "ecs": {"version":"8.11"}. Moreover, according to the ECS Schema, the
ecs.version field is required and must exist in all events, so supporting this functionality
would not provide any value from the user's perspective.

UPDATE: My previous statement was incorrect.

The reason for asking is that collecting and processing of logs would be easier.

For your specific case, you should create a custom implementation of JsonWriterStructuredLogFormatter with a format that suits exactly to your needs. This custom formatter can then be configured using the logging.structured.format.console property.

For more details, please refer to the Spring Boot documentation.

The following NestedElasticCommonSchemaStructuredLogFormatter might suit your needs:

package task.gh45063;

import ch.qos.logback.classic.pattern.ThrowableProxyConverter;
import ch.qos.logback.classic.spi.ILoggingEvent;
import ch.qos.logback.classic.spi.IThrowableProxy;
import org.slf4j.event.KeyValuePair;
import org.springframework.boot.json.JsonWriter;
import org.springframework.boot.json.JsonWriter.PairExtractor;
import org.springframework.boot.logging.structured.ElasticCommonSchemaProperties;
import org.springframework.boot.logging.structured.ElasticCommonSchemaProperties.Service;
import org.springframework.boot.logging.structured.JsonWriterStructuredLogFormatter;
import org.springframework.boot.logging.structured.StructuredLoggingJsonMembersCustomizer;
import org.springframework.core.env.Environment;

import java.util.Objects;

class NestedElasticCommonSchemaStructuredLogFormatter extends
        JsonWriterStructuredLogFormatter<ILoggingEvent> {

    private static final PairExtractor<KeyValuePair> keyValuePairExtractor = PairExtractor.of(
            (pair) -> pair.key,
            (pair) -> pair.value);

    NestedElasticCommonSchemaStructuredLogFormatter(Environment environment,
            ThrowableProxyConverter throwableProxyConverter,
            StructuredLoggingJsonMembersCustomizer<?> customizer) {
        super((members) -> jsonMembers(environment, throwableProxyConverter, members), customizer);
    }

    private static void jsonMembers(Environment environment,
            ThrowableProxyConverter throwableProxyConverter,
            JsonWriter.Members<ILoggingEvent> members) {
        members.add("@timestamp", ILoggingEvent::getInstant);
        members.add("log").usingMembers((logMembers) -> {
                    logMembers.add("logger", ILoggingEvent::getLoggerName);
                    logMembers.add("level", ILoggingEvent::getLevel);
                }
        );
        members.add("process").usingMembers((processMembers) -> {
            processMembers.add("pid", environment.getProperty("spring.application.pid", Long.class))
                    .when(Objects::nonNull);
            processMembers.add("thread")
                    .usingMembers((threadMembers) -> threadMembers.add("name",
                            ILoggingEvent::getThreadName));
        });
        Service service = ElasticCommonSchemaProperties.get(environment).service();
        members.add("service").usingMembers((serviceMembers) -> {
            serviceMembers.add("name", service::name).whenHasLength();
            serviceMembers.add("version", service::version).whenHasLength();
            serviceMembers.add("environment", service::environment).whenHasLength();
            serviceMembers.add("node.name", service::nodeName).whenHasLength();
        });
        members.add("message", ILoggingEvent::getFormattedMessage);
        members.addMapEntries(ILoggingEvent::getMDCPropertyMap);
        members.from(ILoggingEvent::getKeyValuePairs)
                .whenNotEmpty()
                .usingExtractedPairs(Iterable::forEach, keyValuePairExtractor);
        members.add("error").whenNotNull(ILoggingEvent::getThrowableProxy)
                .usingMembers((throwableMembers) -> {
                    throwableMembers.add("type", ILoggingEvent::getThrowableProxy)
                            .as(IThrowableProxy::getClassName);
                    throwableMembers.add("message", ILoggingEvent::getThrowableProxy)
                            .as(IThrowableProxy::getMessage);
                    throwableMembers.add("stack_trace", throwableProxyConverter::convert);
                });
        members.add("ecs").usingMembers((ecsMembers) -> ecsMembers.add("version", "8.11"));
    }

}
{
  "@timestamp": "2025-04-10T14:35:20.081935Z",
  "log": {
    "logger": "task.gh45063.Gh45063Application",
    "level": "INFO"
  },
  "process": {
    "pid": 25264,
    "thread": {
      "name": "main"
    }
  },
  "service": {
    "name": "gh-45063"
  },
  "message": "Started Gh45063Application in 0.257 seconds (process running for 0.418)",
  "ecs": {
    "version": "8.11"
  }
}

@snicoll

This comment has been minimized.

@snicoll snicoll closed this as not planned Won't fix, can't repro, duplicate, stale Apr 10, 2025
@snicoll snicoll added status: declined A suggestion or change that we don't feel we should currently apply and removed status: waiting-for-triage An issue we've not yet triaged labels Apr 10, 2025
@rafaelma
Copy link

rafaelma commented Apr 10, 2025

Hello, thanks for looking at this :)

ECS stores data in a nested format, our example will be saved as { "ecs": {"version": "8.4"} }, and the mapping definition of the ECS standard needed by e.g. elasticsearch or opensearch defines also the nested format. It’s important to note that the translation from { "ecs.version": "8.4" } to { "ecs": {"version": "8.4"} } before saving the data according to the ECS standard is done by the Elastic Agent using the decode_json_fields processor in the elastic agent ref: https://www.elastic.co/guide/en/fleet/current/decode-json-fields.html.

For reference, you can check the mapping representation of the ECS standard for our example here: https://github.com/elastic/ecs/blob/main/generated/elasticsearch/composable/component/ecs.json The ECS documentation https://www.elastic.co/guide/en/ecs/current/ecs-ecs.html refers to attributes like ecs.version, but this is merely a simplified representation of a JSON nested attribute in the documentation for presentation purposes.

If you use another agent than the elastic-agent (e.g.fluent-bit) to process logs and spring delivers them as e.g. { "ecs.version": "8.4" }, you have to transform this to the nested version { "ecs": {"version": "8.4"} } before you can save it according to the standard.

Spring doesn't deliver the logs according to the ECS standard format and it will only work without problems if you collect these logs with the elastic-agent that will convert them to the right ECS format before sending them to storage.

regards,
Rafael Martinez Guerrero

@snicoll snicoll reopened this Apr 10, 2025
@snicoll snicoll added status: waiting-for-triage An issue we've not yet triaged and removed status: declined A suggestion or change that we don't feel we should currently apply labels Apr 10, 2025
@snicoll snicoll marked this as a duplicate of #45132 Apr 10, 2025
@nosan
Copy link
Contributor

nosan commented Apr 10, 2025

I set up Elasticsearch via https://www.elastic.co/cloud and performed another round of testing.

It's important to clarify that Spring does not directly produce logs in the ECS standard; typically, logs require the Elastic Agent to properly transform them into ECS format before indexing.

However, in practice, the JSON format currently produced by ElasticCommonSchemaStructuredLogFormatter works seamlessly. Elasticsearch accepts them without issue, correctly parsing all fields.

Flat JSON example:

{
  "@timestamp": "2025-04-10T18:03:42.230354Z",
  "log.level": "ERROR",
  "process.pid": 30859,
  "process.thread.name": "main",
  "service.name": "gh-45063",
  "log.logger": "task.gh45063.Gh45063Application",
  "message": "MyError",
  "error.type": "java.lang.IllegalStateException",
  "error.message": "Error Test",
  "error.stack_trace": "java.lang.IllegalStateException: Error Test\n\tat task.gh45063.Gh45063Application.main(Gh45063Application.java:14)\n",
  "ecs.version": "8.11"
}

Additionally, Elasticsearch fully supports and parses nested JSON structures, like the one below:

Nested JSON example:

{
  "@timestamp": "2025-04-10T18:04:14.257693Z",
  "log": {
    "logger": "task.gh45063.Gh45063Application",
    "level": "ERROR"
  },
  "process": {
    "pid": 30877,
    "thread": {
      "name": "main"
    }
  },
  "service": {
    "name": "gh-45063"
  },
  "message": "MyError",
  "error": {
    "type": "java.lang.IllegalStateException",
    "message": "Error Test",
    "stack_trace": "java.lang.IllegalStateException: Error Test\n\tat task.gh45063.Gh45063Application.main(Gh45063Application.java:14)\n"
  },
  "ecs": {
    "version": "8.11"
  }
}

In summary, both flat and nested formats are compatible.

Screenshots Image Image Image

@rafaelma
Copy link

rafaelma commented Apr 10, 2025

Thank you, @nosan, for investigating this issue further.

"...In summary, both flat and nested formats are compatible ....", as a source, but only when using Elasticsearch as the storage infrastructure because elasticsearch transforms flat format to the standard nested format before saving the log. If you examine the JSON code of the document saved by Elasticsearch after sending the flat version, you'll see it has been also converted into a nested json document to adhere to the standard.

UPDATE:


If elasticsearch works as opensearch this statement is not true: "If you examine the JSON code of the document saved by Elasticsearch after sending the flat version, you'll see it has been also converted into a nested json document to adhere to the standard"

OpenSearch accepts both flat and nested JSON with a nested mapping definition now, but the saved flat JSON document is not converted to nested format.

Check #45063 (comment) for full details.


The ECS standard is now utilized by many products following the merger of OpenTelemetry and Elastic standards to create a unified open standard. Refer to the following links for more information:

It would greatly enhance the Spring project if it could deliver logs in the ECS nested format. This would enable standardized log storage without the need to convert from flat to nested format before saving the log, when used with products other than Elasticsearch, such as Opensearch.

The conversion from flat to nested format to be able to save the standardized log, can be quite tedious and resource-intensive when processing millions of logs if you don't use elasticsearch.

regards

@nosan
Copy link
Contributor

nosan commented Apr 10, 2025

According to the ECS Reference:

The document structure should be nested JSON objects. If you use Beats or Logstash, the nesting of JSON objects is done for you automatically. If you’re ingesting to Elasticsearch using the API, your fields must be nested objects, not strings containing dots.

I also checked the ECS Logging Java repository elastic/ecs-logging-java and found they have a similar issues:

However, the ECS Java logging library continues to use dot notation and relies on this processor afterward. It seems they chose to keep dot notation because they can't ensure that all fields are correctly nested especially considering fields provided through ILoggingEvent.getKeyValuePairs() or ILoggingEvent.getMDCPropertyMap(), since these may contain dots as well, and for example if you set a custom field using MDC like this MDC.put("geo.continent_name", "North America")
then according to the ECS Reference , the ElasticCommonSchemaStructuredLogFormatter should ideally convert this into nested JSON, as shown below:

{
  "geo": {
    "continent_name": "North America"
  }
}

Personally, if Spring Boot chooses to support the nested JSON format, I don't see any advantage in maintaining support for the current format, as only the nested JSON format fully complies with the ECS specification.

when used with products other than Elasticsearch, such as Opensearch.

I set up OpenSearch using Docker Compose and submitted several JSON documents to it. Both formats are supported by OpenSearch.

ScreenshotImage

I hope I haven't overlooked anything, and I apologize once again for my earlier incorrect comment: #45063 (comment)

@kajh
Copy link
Author

kajh commented Apr 11, 2025

I hope I haven't overlooked anything, and I apologize once again for my earlier incorrect comment: #45063 (comment)

No problem at all! My original description of this issue was incomplete. I'm a java developer and my knowledge of Opensearch is very limited. @rafaelma is one of our log experts, and I'm very grateful he stepped in an explained the problem.

@mhalbritter mhalbritter changed the title Ecs logging as nested json Support ECS nested format Apr 11, 2025
@mhalbritter mhalbritter added type: enhancement A general enhancement and removed status: waiting-for-triage An issue we've not yet triaged labels Apr 11, 2025
@mhalbritter mhalbritter modified the milestones: 3.x, 4.x Apr 11, 2025
@mhalbritter
Copy link
Contributor

mhalbritter commented Apr 11, 2025

Thanks for letting us know. I have totally overlooked the sentence from the docs when implementing this format:

The document structure should be nested JSON objects. If you use Beats or Logstash, the nesting of JSON objects is done for you automatically. If you’re ingesting to Elasticsearch using the API, your fields must be nested objects, not strings containing dots.

@rafaelma
Copy link

rafaelma commented Apr 11, 2025

Hei @nosan and thanks again for your feedbak.

Interesting, last year, we faced mapping issues with OpenSearch when using the flat format and the nested mapping definition from ECS. It appears that this has been resolved, and it now behaves like Elasticsearch.

I have tested it again, and while OpenSearch accepts both flat and nested JSON with a nested mapping definition, the saved JSON document is not converted to nested format.

This is unfortunate, as you can have both flat and nested documents within the same index when the same type of logs comes from different sources. When operating solely within Elasticsearch or OpenSearch, it works because they have implemented an abstraction that facilitates the conversion between formats, allowing for searches across different formats.

However, this approach fails when using data from these indexes for further external processing in another system, as those documents can contain both formats. Here is a simple check I ran with jq to search a JSON file with 2 documents saved in the same opensearch index with the same mapping, one with flat format and the other with nested:

$ cat test3.json

{
    "docs":[
	{
	    "_index": "test_ecs",
	    "_id": "1",
	    "_version": 1,
	    "_score": null,
	    "_source": {
		"@timestamp": "2025-04-10T18:03:42.230354Z",
		"log.level": "ERROR",
		"process.pid": 30859,
		"process.thread.name": "main",
		"service.name": "gh-45063",
		"log.logger": "task.gh45063.Gh45063Application",
		"message": "MyError",
		"error.type": "java.lang.IllegalStateException",
		"error.message": "Error Test",
		"error.stack_trace": "java.lang.IllegalStateException: Error Test\n\tat task.gh45063.Gh45063Application.main(Gh45063Application.java:14)\n",
		"ecs.version": "8.11"
	    },
	    "fields": {
		"@timestamp": [
		    "2025-04-10T18:03:42.230Z"
		]
	    },
	    "sort": [
		1744308222230
	    ]
	},
	{
	    "_index": "test_ecs",
	    "_id": "2",
	    "_version": 1,
	    "_score": null,
	    "_source": {
		"@timestamp": "2025-04-10T18:04:14.257693Z",
		"log": {
		    "logger": "task.gh45063.Gh45063Application",
		    "level": "ERROR"
		},
		"process": {
		    "pid": 30877,
		    "thread": {
			"name": "main"
		    }
		},
		"service": {
		    "name": "gh-45063"
		},
		"message": "MyError",
		"error": {
		    "type": "java.lang.IllegalStateException",
		    "message": "Error Test",
		    "stack_trace": "java.lang.IllegalStateException: Error Test\n\tat task.gh45063.Gh45063Application.main(Gh45063Application.java:14)\n"
		},
		"ecs": {
		    "version": "8.11"
		}
	    },
	    "fields": {
		"@timestamp": [
		    "2025-04-10T18:04:14.257Z"
		]
	    },
	    "sort": [
		1744308254257
	    ]
	}
    ]
}

The two formats are not compatible; you must search using different methods because they have different structures.

$ cat test3.json |jq '.docs[]|select(._source.log.level == "ERROR") |._source.process.pid'
30877
$ cat test3.json |jq '.docs[]|select(._source."log.level" == "ERROR") |._source."process.pid"'
30859

As you can see, these are two incompatible representations of the same type of object, resulting in a lack of unified results. You have to search in different ways because they are different data structures.

We could continue discussing the pros and cons of using flat versus nested JSON formats when defining a JSON structure. Personally, I prefer the nested format because it eliminates the need to interpret the document's structure and how to use it.

One thing is certain: one should never mix flat and nested formats in the same document, as it can lead to confusion. Although both formats conform to the JSON RFC 8259 standard (see: https://www.rfc-editor.org/rfc/rfc8259.txt), the entire issue surrounding flat versus nested formats contributes to a lack of standardization and various problems. This is primarily because interpreting a JSON dataset with attributes that contain dots (.) and nested format is arbitrary and depends on the developer's implementation.

Although it may be an unspoken convention that dots (.) signify the flat version of a nested structure, it's crucial to recognize that this is not a formal standard. The consequences of this ambiguity contribute to the challenges we are discussing.

In our case, processing of logs in flat format with the Fluent Bit agent and Logstash is problematic, as they deliver JSON in nested format. If the flat parts are not converted to nested format before saving, the resulting data structure will become messy, which will lead to issues depending on the software used for further data processing.

regards and good weekend :)

@areguru
Copy link

areguru commented Apr 11, 2025

Just want to mention for those interested that there's additional interesting info about this case in the closed duplicate issue ECS standard

@rafaelma
Copy link

rafaelma commented Apr 14, 2025

Good morning,

We have further investigated this issue and uncovered some important findings. First, consider this statement:

"Elasticsearch and its derivatives, such as OpenSearch, do not permit single attributes with dots (.) in their key names. This violates the JSON standard format, as they automatically interpret dots as indicators of a nested definition. You cannot have 'single' key definitions that use dots (.) in their name"

This is highly unfortunate, and we are still very surprised that this has gone on for so long without more reactions from other communities and projects adhering to the standard. This is a summary of the consequences, which include the following:

  1. Elasticsearch/OpenSearch interprets dots (.) as indicators of a nested definition but do not convert the JSON structure into a unified format when saving it. Consequently, you can have documents with both flat and nested formats in the same index, and searching within these documents will only work correctly in Elasticsearch/OpenSearch. Exporting this data to other systems will pose a big challenge and post-processing.

  2. Systems using a flat format to define a nested structure cannot send data to Elasticsearch/OpenSearch without preprocessing it if they want to ensure all data is standardized and homogeneous saved for further processing if needed.

  3. Systems using a flat format to define a nested structure cannot expect the same behavior from other systems conforming to the standard as they do with Elasticsearch/OpenSearch.

  4. Systems that conform to the standard and use dots (.) in their key names without intending to define a nested structure will encounter problems unless they pre-process the data to replace dots (.) with underscores (_) before delivering the data to elasticsearch/opensearch. Elasticsearch's 'integrations' replaces dots in key names with underscores (_) to prevent issues. However, many systems use dots in their key definitions without intending to define a nested structure. This will require pre-processing of the data to avoid problems. (Refer to [1] Kubernetes Annotations documentation for an actual example of this)

  5. Systems that conform to the standard cannot receive data from Elasticsearch/OpenSearch for further processing without significant post-processing to unify the two incompatible representations of the same object type within the same index. Alternatively, they would need to create an abstraction (similar to what Elasticsearch/OpenSearch does) to facilitate searches across different formats.

[1] Kubernetes Annotations documentation: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/

This is an example for the item 4. where kubernetes does not intent to define a nested format with the use of dots (.). Sjekk openshift.io/scc, app.kubernetes.io/instance, app.kubernetes.io/name:

$ oc get pods fluent-bit-4ktkt -o json

{
    "apiVersion": "v1",
    "kind": "Pod",
    "metadata": {
        "annotations": {
            "checksum/config": "ec26be86917281976633b4aa480f9638648f9cf2e66e66891bdfb1e600d045e7",
            "openshift.io/scc": "fluent-bit-scc"
        },
        "creationTimestamp": "2025-04-14T07:25:15Z",
        "generateName": "fluent-bit-",
        "labels": {
            "app.kubernetes.io/instance": "fluent-bit",
            "app.kubernetes.io/name": "fluent-bit",
            "controller-revision-hash": "7cc98f8dd8",
            "pod-template-generation": "5"
        }
        ....

$ oc get pods fluent-bit-4ktkt -o yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    checksum/config: ec26be86917281976633b4aa480f9638648f9cf2e66e66891bdfb1e600d045e7
    openshift.io/scc: fluent-bit-scc
  creationTimestamp: "2025-04-14T07:25:15Z"
  generateName: fluent-bit-
  labels:
    app.kubernetes.io/instance: fluent-bit
    app.kubernetes.io/name: fluent-bit
    controller-revision-hash: 7cc98f8dd8
    pod-template-generation: "5"
  name: fluent-bit-4ktkt
  namespace: fluent-bit


regards,

@philwebb philwebb self-assigned this Apr 14, 2025
@philwebb philwebb changed the title Support ECS nested format ECS structure logging should use nested format Apr 14, 2025
@philwebb philwebb changed the title ECS structure logging should use nested format ECS structure logging is not compatible with all collectors as it does not use the nested format Apr 14, 2025
@philwebb philwebb added type: bug A general bug and removed type: enhancement A general enhancement labels Apr 14, 2025
@philwebb philwebb modified the milestones: 4.x, 3.5.0-RC1 Apr 14, 2025
@philwebb
Copy link
Member

@rafaelma I think we should consider this a bug, although it's too risky to fix in 3.4. I've pushed something to main so that we can hopefully fix this for our 3.5.0-RC1 release. If you get a chance, could you please try the latest SNAPSHOT (once https://github.com/spring-projects/spring-boot/actions/runs/14457193995 has finished)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug A general bug
Projects
None yet
Development

No branches or pull requests

8 participants