New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
promtail syslog receiver aborts on non-UTF8 logs #1783
Comments
Possible resolutions I can see:
2 is fully reversible. The problem with 2 is that there might be some devices out there already sending UTF8 inside syslog but not declaring it with a BOM, and these would be mis-decoded (and this would be a change in behaviour). The problem with 3 is that there is data loss. The problem with 4 is that surrogate escapes are not widely used, and are technically still invalid unicode and hence cannot safely be carried in JSON. 5 is not safely reversible, unless you also replace a bare My inclination is a combination of 1 and 2:
EDIT: reference https://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences |
Until this is fixed, here is a workaround when using rsyslog: you can use the mmutf8fix module to replace invalid utf8 characters with a single ASCII character (unfortunately the Unicode replacement character is not possible). Put this before the action which forwards to promtail:
This module is available as part of rsyslog from the Adiscon v8-stable repository, in separate package "rsyslog-mmutf8fix". |
Thank you Brian for the workaround and possible solutions, we'll look into it ! 🥇 |
We started seeing this too. Thanks for the workaround! The problem is related to influxdata/go-syslog#21 and I started fixing upstream: influxdata/go-syslog#35. The idea is to relax the parser to accept any encoding and add an option to enforce a compliant The encoding could then be fixed or reencoded with a fallback encoding in the syslog receiver part of promtail or more generically in a later pipeline stage (new transform stage?). Something like: scrape_configs:
- job_name: syslog
syslog:
listen_address: 127.0.0.1:5140
relabel_configs:
- source_labels: [__syslog_message_hostname]
target_label: host
pipeline_stages:
- match:
selector: '{host=~"ex-.*"}'
stages:
- encoding:
reencode_with_fallback: iso-latin-1 |
Describe the bug
When promtail receives a message which is not valid UTF8 over the TCP syslog receiver, it logs an error and drops the TCP connection on the floor.
To Reproduce
This is tested with promtail-linux-amd64 1.3.0 from binary release.
Configure rsyslog to forward to promtail
Send a packet containing binary goop, e.g. this one captured from a Netgear switch
What happens (as recorded in
/var/log/syslog
by rsyslog itself):Expected behavior
The log to be received and processed. Note that RFC5424 section 6 explicitly allows non-UTF8 messages as long as they don't begin with the Unicode BOM, and therefore rsyslog is not violating the protocol.
Possibly, the non-UTF8 text may be converted to replacement character uFFFD, or codepoints
u80
touFF
, or perhaps surrogate escape sequences in the rangeuDC80
touDCFF
, before forwarding to loki (since it has to be JSON-wrapped). But storing the log in a partially modified form is better than not storing it and aborting.Environment:
Screenshots, Promtail config, or terminal output
Promtail config:
The text was updated successfully, but these errors were encountered: