mirror of
https://github.com/protocolbuffers/protobuf-go.git
synced 2025-01-29 18:32:46 +00:00
2d80e9b3ab
The following changes are made: * Permit invalid UTF-8 in proto2. This goes against specified behavior, but matches functional behavior in wire marshaling (not just for Go, but also in the other major language implementations as well). * The Format function is specified as ignoring errors since its intended purpose is to surface information to the human user even if it's not exactly parsible back into a message. As such, add an unexported allowInvalidUTF8 option that is specially used by Format. * Add an EmitASCII option that forces the formatting of strings and bytes to always be encoded as ASCII. This ensures that the entire output is always ASCII as well. Note that we do not replicate this behavior for protojson since: * The JSON format fundamentally has a stricter and well-specified grammar for exactly what is valid/invalid, while the text format has not had a well-specified grammar for the longest time, leading to all sorts of weird usages due to Hyrum's law. * This is to ease migration from the legacy implementation, which did permit invalid UTF-8 in proto2. * The EmitASCII option relies on the ability to always escape Unicode characters using ASCII escape sequences, but this is not possible in JSON since the grammar only has an escape sequence defined for Unicode characters \u0000 to \uffff, inclusive. However, Unicode v12.0.0 defines characters up to \U0010FFFF, which is beyond what the JSON grammar provides escape sequences for. Change-Id: I2b524a904e9ec59f9ed5500e299613bc27c31a14 Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/233077 Reviewed-by: Herbie Ong <herbie@google.com>