Commit Graph

18 Commits

Author SHA1 Message Date
Herbie Ong
4e6b903e61 internal/encoding/text: fix eof crash when parsing list of scalars
Need to check for EOF and return proper error.

Bug caught by fuzz test: https://oss-fuzz.com/testcase-detail/6258064955277312.

Change-Id: I63d5c12c301f2ddefc9a0813c13abef40d745e91
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/218258
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2020-02-06 19:20:11 +00:00
Herbie Ong
2eb18f0e62 internal/encoding/text: fix error construction in parseTypeName
Fuzz test caught the following issue --
https://oss-fuzz.com/testcase-detail/6288731021770752

Change-Id: Idcbce7953b465d1b83c01b1d123c9d43907d402a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/218037
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2020-02-05 23:01:15 +00:00
Herbie Ong
9b3d97c473 encoding/prototext: rewrite of internal/encoding/text
* Fixes golang/protobuf#842. Unmarshal can now parse singular or
  repeated message fields without the field separator.
* Fixes golang/protobuf#1011. Handles negative 0 properly.
* For unknown fields with fixed 32-bit and 64-bit wire types, output is
  now in hex format with 0x prefix similar to C++ lib output. Previous
  Go implementation simply outputs these as decimal numbers %d.
* All parsing errors, except for unexpected EOF should now contain line
  and column number info.
* Fixed following conformance-related features:
  * Parse nan,inf,-inf,infinity,-infinity as case-insensitive.
  * Interpret float32 overflows as inf or -inf.
  * Parse large int-like number as proto float.
* Discard unknown map field if DiscardUnknown=true.
* Allow whitespaces/comments in Any type URL and extension field names per spec.
* Improves performance and memory usage. It is now as fast and efficient as
  protojson, if not better on most benchmarks.

name                                     old time/op    new time/op    delta
Text/Unmarshal/google_message1_proto2-4    14.1µs ±43%     8.7µs ±12%  -38.27%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4    11.6µs ±18%     7.7µs ± 9%  -33.69%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           6.20ms ±27%    4.10ms ± 5%  -33.95%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4      12.8µs ± 6%    10.3µs ±23%  -19.54%  (p=0.000 n=9+10)
Text/Marshal/google_message1_proto3-4      11.9µs ±16%     8.6µs ±10%  -27.45%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4             5.59ms ± 5%    5.30ms ±22%     ~     (p=0.356 n=9+10)
JSON/Unmarshal/google_message1_proto2-4    12.3µs ±61%    13.9µs ±26%     ~     (p=0.190 n=10+10)
JSON/Unmarshal/google_message1_proto3-4    7.51µs ± 6%    7.86µs ± 1%   +4.66%  (p=0.010 n=10+9)
JSON/Unmarshal/google_message2-4           3.74ms ± 2%    3.94ms ± 2%   +5.32%  (p=0.000 n=10+10)
JSON/Marshal/google_message1_proto2-4      9.90µs ±12%    9.95µs ± 4%     ~     (p=0.315 n=9+10)
JSON/Marshal/google_message1_proto3-4      7.55µs ± 4%    7.93µs ± 3%   +4.98%  (p=0.000 n=10+10)
JSON/Marshal/google_message2-4             4.29ms ± 5%    4.49ms ± 2%   +4.53%  (p=0.001 n=10+10)

name                                     old alloc/op   new alloc/op   delta
Text/Unmarshal/google_message1_proto2-4    12.5kB ± 0%     2.0kB ± 0%  -83.87%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4    12.2kB ± 0%     1.8kB ± 0%  -85.33%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           5.35MB ± 0%    0.89MB ± 0%  -83.28%  (p=0.000 n=10+9)
Text/Marshal/google_message1_proto2-4      12.0kB ± 0%     1.4kB ± 0%  -88.15%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto3-4      12.4kB ± 0%     1.9kB ± 0%  -84.91%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4             5.64MB ± 0%    1.02MB ± 0%  -81.85%  (p=0.000 n=10+9)
JSON/Unmarshal/google_message1_proto2-4    2.29kB ± 0%    2.29kB ± 0%     ~     (all equal)
JSON/Unmarshal/google_message1_proto3-4    2.08kB ± 0%    2.08kB ± 0%     ~     (all equal)
JSON/Unmarshal/google_message2-4            899kB ± 0%     899kB ± 0%     ~     (p=1.000 n=10+10)
JSON/Marshal/google_message1_proto2-4      1.46kB ± 0%    1.46kB ± 0%     ~     (all equal)
JSON/Marshal/google_message1_proto3-4      1.36kB ± 0%    1.36kB ± 0%     ~     (all equal)
JSON/Marshal/google_message2-4             1.19MB ± 0%    1.19MB ± 0%     ~     (p=0.197 n=10+10)

name                                     old allocs/op  new allocs/op  delta
Text/Unmarshal/google_message1_proto2-4       133 ± 0%        89 ± 0%  -33.08%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4       108 ± 0%        67 ± 0%  -37.96%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4            60.0k ± 0%     38.7k ± 0%  -35.52%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4        65.0 ± 0%      25.0 ± 0%  -61.54%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto3-4        59.0 ± 0%      22.0 ± 0%  -62.71%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4              27.4k ± 0%      7.3k ± 0%  -73.39%  (p=0.000 n=10+10)
JSON/Unmarshal/google_message1_proto2-4      95.0 ± 0%      95.0 ± 0%     ~     (all equal)
JSON/Unmarshal/google_message1_proto3-4      74.0 ± 0%      74.0 ± 0%     ~     (all equal)
JSON/Unmarshal/google_message2-4            36.3k ± 0%     36.3k ± 0%     ~     (all equal)
JSON/Marshal/google_message1_proto2-4        27.0 ± 0%      27.0 ± 0%     ~     (all equal)
JSON/Marshal/google_message1_proto3-4        30.0 ± 0%      30.0 ± 0%     ~     (all equal)
JSON/Marshal/google_message2-4              11.3k ± 0%     11.3k ± 0%     ~     (p=1.000 n=10+10)

Change-Id: I377925facde5535f06333b6f25e9c9b358dc062f
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/204602
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2020-02-05 02:11:08 +00:00
Joe Tsai
3c4ab8c6f1 encoding/prototext: drop trailing newline for empty
This is more consistent with the indent documentation:
	If indent is a non-empty string, it causes every entry in a List or Message
	to be preceded by the indent and trailed by a newline.

Since an empty message has no entries, there should be no newlines.

Change-Id: I5d57165aaf94ca6b184bb35bf05d5d68f5ee9dd5
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/194877
Reviewed-by: Herbie Ong <herbie@google.com>
2019-09-14 21:08:43 +00:00
Herbie Ong
4eb4d61b0c internal/encoding/text: minor tweak in inserting random whitespace
Simply move logic into similar code block.  Maintains the same logic.

Change-Id: I7b5a3f3d57f6102c7919cdc03dd105f08d21aca3
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/194039
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-09-07 00:43:37 +00:00
Joe Tsai
1799d1111a all: rename tag and flags for legacy support
Rename build tag "proto1_legacy" -> "protolegacy"
to be consistent with the "protoreflect" tag.

Rename flag constant "Proto1Legacy" -> "ProtoLegacy" since
it covers more than simply proto1 legacy features.
For example, it covers alpha-features of proto3 that
were eventually removed from the final proto3 release.

Change-Id: I0f4fcbadd4b5a61c87645e2e5be11d187e59157c
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/189345
Reviewed-by: Damien Neil <dneil@google.com>
2019-08-08 20:49:00 +00:00
Herbie Ong
a3369c5dc2 internal/encoding/text: replace use of regular expression in decoding
Improve performance by replacing use of regular expressions with direct
parsing code.

Compared to latest version:

name                                     old time/op    new time/op    delta
Text/Unmarshal/google_message1_proto2-4    21.8µs ± 5%    14.0µs ± 9%  -35.69%  (p=0.000 n=10+9)
Text/Unmarshal/google_message1_proto3-4    19.6µs ± 4%    13.8µs ±10%  -29.47%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           13.4ms ± 4%     4.9ms ± 4%  -63.44%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4      13.8µs ± 2%    14.1µs ± 4%   +2.42%  (p=0.011 n=9+10)
Text/Marshal/google_message1_proto3-4      11.6µs ± 2%    11.8µs ± 8%     ~     (p=0.573 n=8+10)
Text/Marshal/google_message2-4             8.01ms ±48%    5.97ms ± 5%  -25.44%  (p=0.000 n=10+10)

name                                     old alloc/op   new alloc/op   delta
Text/Unmarshal/google_message1_proto2-4    13.0kB ± 0%    12.6kB ± 0%   -3.40%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4    13.0kB ± 0%    12.5kB ± 0%   -3.50%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           5.67MB ± 0%    5.50MB ± 0%   -3.13%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4      12.0kB ± 0%    12.1kB ± 0%   +0.02%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto3-4      11.7kB ± 0%    11.7kB ± 0%   +0.01%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4             5.68MB ± 0%    5.68MB ± 0%   +0.01%  (p=0.000 n=10+10)

name                                     old allocs/op  new allocs/op  delta
Text/Unmarshal/google_message1_proto2-4       142 ± 0%       142 ± 0%     ~     (all equal)
Text/Unmarshal/google_message1_proto3-4       156 ± 0%       156 ± 0%     ~     (all equal)
Text/Unmarshal/google_message2-4            70.1k ± 0%     65.4k ± 0%   -6.76%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4        91.0 ± 0%      91.0 ± 0%     ~     (all equal)
Text/Marshal/google_message1_proto3-4        80.0 ± 0%      80.0 ± 0%     ~     (all equal)
Text/Marshal/google_message2-4              36.4k ± 0%     36.4k ± 0%     ~     (all equal)

Change-Id: Ia5d3c16e9e33961aae03bac0d53fcfc5b1943d2a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/173360
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-07-23 22:08:16 +00:00
Joe Tsai
36dc22ddb8 encoding: use strs.UnsafeString to avoid duplicated code
The strs.UnsafeString casts a []byte as a string.
This allows us to avoid duplicated functionality.

Change-Id: I9930b94bae35eac0f98c0fa62963b300bc8d7e49
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185459
Reviewed-by: Herbie Ong <herbie@google.com>
2019-07-10 07:01:20 +00:00
Damien Neil
8c86fc5e7d all: remove non-fatal UTF-8 validation errors (and non-fatal in general)
Immediately abort (un)marshal operations when encountering invalid UTF-8
data in proto3 strings. No other proto implementation supports non-UTF-8
data in proto3 strings (and many reject it in proto2 strings as well).
Producing invalid output is an interoperability threat (other
implementations won't be able to read it).

The case where existing string data is found to contain non-UTF8 data is
better handled by changing the field to the `bytes` type, which (aside
from UTF-8 validation) is wire-compatible with `string`.

Remove the errors.NonFatal type, since there are no remaining cases
where it is needed. "Non-fatal" errors which produce results and a
non-nil error are problematic because they compose poorly; the better
approach is to take an option like AllowPartial indicating which
conditions to check for.

Change-Id: I9d189ec6ffda7b5d96d094aa1b290af2e3f23736
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/183098
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-06-20 20:55:13 +00:00
Damien Neil
e89e6244e0 all: change module to google.golang.org/protobuf
Temporarily remove go.mod, since we can't generate an accurate one until
the corresponding v1 change is submitted.

Change-Id: I1e1ad97f2b455e33f61ffaeb8676289795e47e72
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/177000
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-05-14 17:28:29 +00:00
Herbie Ong
1e09691415 internal/encoding/{json,text}: improve string parsing
Previous calls to indexNeedEscape with a type conversion from []byte
to string incurs allocation.

Make 2 different calls instead, one for string and one for bytes.

Type converting string to []byte does not incur extra allocation,
however, the benchmark results still show it to be slower by ~3% for
textpb and 6+% for jsonpb, hence decided to go with 2 separate calls
instead.

Results over current head:
name          old time/op    new time/op    delta
TextEncode-4    18.1ms ± 2%    18.3ms ± 2%     ~     (p=0.065 n=10+9)
TextDecode-4     233ms ± 3%     102ms ± 1%  -56.34%  (p=0.000 n=9+10)
JSONEncode-4    10.4ms ± 2%    10.5ms ± 0%   +0.56%  (p=0.019 n=9+9)
JSONDecode-4     870ms ± 2%     354ms ± 4%  -59.33%  (p=0.000 n=9+10)

name          old alloc/op   new alloc/op   delta
TextEncode-4    28.9MB ± 0%    28.9MB ± 0%   +0.00%  (p=0.000 n=10+9)
TextDecode-4    1.16GB ± 0%    0.03GB ± 0%  -97.44%  (p=0.000 n=9+10)
JSONEncode-4    3.94MB ± 0%    3.94MB ± 0%   +0.00%  (p=0.000 n=10+10)
JSONDecode-4    3.35GB ± 0%    0.01GB ± 0%  -99.83%  (p=0.000 n=10+10)

name          old allocs/op  new allocs/op  delta
TextEncode-4     73.5k ± 0%     73.5k ± 0%     ~     (all equal)
TextDecode-4      278k ± 0%      255k ± 0%   -8.26%  (p=0.000 n=9+10)
JSONEncode-4     63.8k ± 0%     63.8k ± 0%     ~     (all equal)
JSONDecode-4      247k ± 0%      210k ± 0%  -14.92%  (p=0.000 n=10+10)

Change-Id: Ibc64e9a7827ec1fffa213eb79f60497950203700
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/172239
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-04-17 00:26:13 +00:00
Herbie Ong
250c6eaf92 internal/encoding/text: change Value.Float{32,64} to Value.Float
Collapse Value.Float32 and Value.Float64 into single API to keep it
consistent with Value.{Int,Uint}.

Change-Id: I07737e72715fe3cc3f6bcad579cf5d6cfe3757d5
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/167317
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-03-13 04:35:13 +00:00
Herbie Ong
84f0960b04 internal/encoding/text: format using 32 bitsize when encoding float32
When encoding/textpb marshals out float32 values, it was previously
formatting it as float64 bitsize since both float types are stored as
float64 and internal/encoding/text only has one Float type.  A
consequence of this is that the output may display a different value
than expected, e.g.  1.02 becomes 1.0199999809265137.

This CL splits Float type into Float32 and Float64 to keep track of
which bitsize to use when formatting.  Values of both types are still
stored as float64 to keep the logic simple.

Decoding will always use Float64, but users can ask for a float32 value
from it.

Change-Id: Iea5b14b283fec2236a0c3946fac34d4d79b95274
Reviewed-on: https://go-review.googlesource.com/c/158497
Reviewed-by: Damien Neil <dneil@google.com>
2019-01-18 17:54:23 +00:00
Joe Tsai
71acbc7b7d internal/detrand: support disabling detrand
Since detrand is an internal package, we can safely provide a function
that can be called to disable its functionality for testing purposes.

Change-Id: I26383e12a5832eb5af01952898a4c73f627d7aa5
Reviewed-on: https://go-review.googlesource.com/c/151678
Reviewed-by: Herbie Ong <herbie@google.com>
2018-11-29 07:49:45 +00:00
Joe Tsai
492a476312 internal/detrand: new package for deterministically random functionality
The use of math/rand in serialization is to provide some form of instability
to the output to provide a clear signal to the user that the should not
depend on the the property of stability. However, it is reasonable that users
expect the output for these to be deterministic.

As such, add a detrand package that provides deterministic, yet unstable
randomization functionality.

Since this package hashes the binary, it does impose a small initialization cost:
	Benchmark    100000    20712 ns/op    480 B/op    6 allocs/op

Change-Id: I232d0fea1789a4278079837a67ee2f63474a4364
Reviewed-on: https://go-review.googlesource.com/c/151340
Reviewed-by: Herbie Ong <herbie@google.com>
2018-11-27 02:14:04 +00:00
Herbie Ong
c3f4d48629 internal/encoding/text: add extra random space to make output unstable.
Make output deliberately unstable so users don't rely on exactness.

For multi-line output, add another extra random space after <key>: for
at most one field per message.

-- example --
key1: field1
key2:  {
    foo:  bar
}

For single-line output, add another extra random space after a field per
message.

-- example --
key1:field1  key2:{foo:bar}

Change-Id: I3ab25d4d970fdebb88bbd9dd8fa6d73af84338ea
Reviewed-on: https://go-review.googlesource.com/c/150977
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2018-11-27 00:46:16 +00:00
Joe Tsai
01ab29648e go.mod: rename google.golang.org/proto as github.com/golang/protobuf/v2
This change was created by running:
	git ls-files | xargs sed -i "s|google.golang.org/proto|github.com/golang/protobuf/v2|g"

This change is *not* an endorsement of "github.com/golang/protobuf/v2" as the
final import path when the v2 API is eventually released as stable.
We continue to reserve the right to make breaking changes as we see fit.

This change enables us to host the v2 API on a repository that is go-gettable
(since go.googlesource.com is not a known host by the "go get" tool;
and google.golang.org/proto was just a stub URL that is not currently served).
Thus, we can start work on a forked version of the v1 API that explores
what it would take to implement v1 in terms of v2 in a backwards compatible way.

Change-Id: Ia3ebc41ac4238af62ee140200d3158b53ac9ec48
Reviewed-on: https://go-review.googlesource.com/136736
Reviewed-by: Damien Neil <dneil@google.com>
2018-09-24 16:11:50 +00:00
Joe Tsai
27c2a76c85 internal/encoding/text: initial commit of proto text format parser/serializer
Package text provides a parser and serializer for the proto text format.
This focuses on the grammar of the format and is agnostic towards specific
semantics of protobuf types.

High-level API:
	func Marshal(v Value, indent string, delims [2]byte, outputASCII bool) ([]byte, error)
	func Unmarshal(b []byte) (Value, error)
	type Type uint8
		const Bool Type ...
	type Value struct{ ... }
		func ValueOf(v interface{}) Value
		func (v Value) Type() Type
		func (v Value) Bool() (x bool, ok bool)
		func (v Value) Int(b64 bool) (x int64, ok bool)
		func (v Value) Uint(b64 bool) (x uint64, ok bool)
		func (v Value) Float(b64 bool) (x float64, ok bool)
		func (v Value) Name() (protoreflect.Name, bool)
		func (v Value) String() string
		func (v Value) List() []Value
		func (v Value) Message() [][2]Value
		func (v Value) Raw() []byte

Change-Id: I4a78ec4474c160d0de4d32120651edd931ea2c1e
Reviewed-on: https://go-review.googlesource.com/127455
Reviewed-by: Herbie Ong <herbie@google.com>
2018-08-07 22:44:06 +00:00