68 Commits

Author SHA1 Message Date
Herbie Ong
2eb18f0e62 internal/encoding/text: fix error construction in parseTypeName
Fuzz test caught the following issue --
https://oss-fuzz.com/testcase-detail/6288731021770752

Change-Id: Idcbce7953b465d1b83c01b1d123c9d43907d402a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/218037
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2020-02-05 23:01:15 +00:00
Herbie Ong
9b3d97c473 encoding/prototext: rewrite of internal/encoding/text
* Fixes golang/protobuf#842. Unmarshal can now parse singular or
  repeated message fields without the field separator.
* Fixes golang/protobuf#1011. Handles negative 0 properly.
* For unknown fields with fixed 32-bit and 64-bit wire types, output is
  now in hex format with 0x prefix similar to C++ lib output. Previous
  Go implementation simply outputs these as decimal numbers %d.
* All parsing errors, except for unexpected EOF should now contain line
  and column number info.
* Fixed following conformance-related features:
  * Parse nan,inf,-inf,infinity,-infinity as case-insensitive.
  * Interpret float32 overflows as inf or -inf.
  * Parse large int-like number as proto float.
* Discard unknown map field if DiscardUnknown=true.
* Allow whitespaces/comments in Any type URL and extension field names per spec.
* Improves performance and memory usage. It is now as fast and efficient as
  protojson, if not better on most benchmarks.

name                                     old time/op    new time/op    delta
Text/Unmarshal/google_message1_proto2-4    14.1µs ±43%     8.7µs ±12%  -38.27%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4    11.6µs ±18%     7.7µs ± 9%  -33.69%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           6.20ms ±27%    4.10ms ± 5%  -33.95%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4      12.8µs ± 6%    10.3µs ±23%  -19.54%  (p=0.000 n=9+10)
Text/Marshal/google_message1_proto3-4      11.9µs ±16%     8.6µs ±10%  -27.45%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4             5.59ms ± 5%    5.30ms ±22%     ~     (p=0.356 n=9+10)
JSON/Unmarshal/google_message1_proto2-4    12.3µs ±61%    13.9µs ±26%     ~     (p=0.190 n=10+10)
JSON/Unmarshal/google_message1_proto3-4    7.51µs ± 6%    7.86µs ± 1%   +4.66%  (p=0.010 n=10+9)
JSON/Unmarshal/google_message2-4           3.74ms ± 2%    3.94ms ± 2%   +5.32%  (p=0.000 n=10+10)
JSON/Marshal/google_message1_proto2-4      9.90µs ±12%    9.95µs ± 4%     ~     (p=0.315 n=9+10)
JSON/Marshal/google_message1_proto3-4      7.55µs ± 4%    7.93µs ± 3%   +4.98%  (p=0.000 n=10+10)
JSON/Marshal/google_message2-4             4.29ms ± 5%    4.49ms ± 2%   +4.53%  (p=0.001 n=10+10)

name                                     old alloc/op   new alloc/op   delta
Text/Unmarshal/google_message1_proto2-4    12.5kB ± 0%     2.0kB ± 0%  -83.87%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4    12.2kB ± 0%     1.8kB ± 0%  -85.33%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           5.35MB ± 0%    0.89MB ± 0%  -83.28%  (p=0.000 n=10+9)
Text/Marshal/google_message1_proto2-4      12.0kB ± 0%     1.4kB ± 0%  -88.15%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto3-4      12.4kB ± 0%     1.9kB ± 0%  -84.91%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4             5.64MB ± 0%    1.02MB ± 0%  -81.85%  (p=0.000 n=10+9)
JSON/Unmarshal/google_message1_proto2-4    2.29kB ± 0%    2.29kB ± 0%     ~     (all equal)
JSON/Unmarshal/google_message1_proto3-4    2.08kB ± 0%    2.08kB ± 0%     ~     (all equal)
JSON/Unmarshal/google_message2-4            899kB ± 0%     899kB ± 0%     ~     (p=1.000 n=10+10)
JSON/Marshal/google_message1_proto2-4      1.46kB ± 0%    1.46kB ± 0%     ~     (all equal)
JSON/Marshal/google_message1_proto3-4      1.36kB ± 0%    1.36kB ± 0%     ~     (all equal)
JSON/Marshal/google_message2-4             1.19MB ± 0%    1.19MB ± 0%     ~     (p=0.197 n=10+10)

name                                     old allocs/op  new allocs/op  delta
Text/Unmarshal/google_message1_proto2-4       133 ± 0%        89 ± 0%  -33.08%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4       108 ± 0%        67 ± 0%  -37.96%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4            60.0k ± 0%     38.7k ± 0%  -35.52%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4        65.0 ± 0%      25.0 ± 0%  -61.54%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto3-4        59.0 ± 0%      22.0 ± 0%  -62.71%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4              27.4k ± 0%      7.3k ± 0%  -73.39%  (p=0.000 n=10+10)
JSON/Unmarshal/google_message1_proto2-4      95.0 ± 0%      95.0 ± 0%     ~     (all equal)
JSON/Unmarshal/google_message1_proto3-4      74.0 ± 0%      74.0 ± 0%     ~     (all equal)
JSON/Unmarshal/google_message2-4            36.3k ± 0%     36.3k ± 0%     ~     (all equal)
JSON/Marshal/google_message1_proto2-4        27.0 ± 0%      27.0 ± 0%     ~     (all equal)
JSON/Marshal/google_message1_proto3-4        30.0 ± 0%      30.0 ± 0%     ~     (all equal)
JSON/Marshal/google_message2-4              11.3k ± 0%     11.3k ± 0%     ~     (p=1.000 n=10+10)

Change-Id: I377925facde5535f06333b6f25e9c9b358dc062f
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/204602
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2020-02-05 02:11:08 +00:00
Herbie Ong
886c32637f internal/encoding/json: add tests for negative zeros.
Updates golang/protobuf#1011.

Copied tests from http://golang.org/cl/217501.

Change-Id: I58ea1111beccee9691929b062eb87a1a752f81e0
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/217578
Reviewed-by: Damien Neil <dneil@google.com>
2020-02-03 23:03:51 +00:00
Damien Neil
ce8f7f6353 internal/impl: inline small tag decoding
Inline varint decoding of small (1- and 2-byte) field tags in the
fast-path unmarshaler.

name                             old time/op  new time/op  delta
EmptyMessage/Wire/Unmarshal      40.6ns ± 1%  40.2ns ± 1%   -1.02%  (p=0.000 n=37+35)
EmptyMessage/Wire/Unmarshal-12   6.77ns ± 2%  7.13ns ± 5%   +5.32%  (p=0.000 n=37+37)
RepeatedInt32/Wire/Unmarshal     9.46µs ± 1%  6.57µs ± 1%  -30.56%  (p=0.000 n=38+39)
RepeatedInt32/Wire/Unmarshal-12  1.50µs ± 2%  1.05µs ± 2%  -30.00%  (p=0.000 n=39+37)
Required/Wire/Unmarshal           371ns ± 1%   258ns ± 1%  -30.44%  (p=0.000 n=38+32)
Required/Wire/Unmarshal-12       60.3ns ± 1%  44.3ns ± 2%  -26.45%  (p=0.000 n=38+36)

Change-Id: Ie80415dea8cb6b840eafa52f0572046a1910a9b1
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/216419
Reviewed-by: Joe Tsai <joetsai@google.com>
2020-01-26 22:23:05 +00:00
Herbie Ong
b02b6d1da5 internal/encoding/json: fix performance cliff when decoding large integers that will go out of range.
For large positive integers, add check for number of decimal digits
before converting number to plain integer w/o exponent.

If exponent value is large, previous implementation may end up
constructing a large string with lots of zeroes that is not useful as it
will fail later on when called with strconv.Parse{Uint,Int} anyways.

Fixes golang/protobuf#1002.

Change-Id: I65bfad304401e076743853d7501786b7231b083b
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/213717
Reviewed-by: Damien Neil <dneil@google.com>
2020-01-08 18:44:43 +00:00
Joe Tsai
6c26a04a51 internal/filedesc: use jsonName.Init method over JSONName constructor
The JSONName constructor returns a struct value which shallow copies
a sync.Once within it; this is a dubious pattern.
Instead, add a jsonName.Init method to initialize the value.

Change-Id: I190a7239b1b62a8041ee7e4e09c0fe37b64ff623
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/213237
Reviewed-by: Damien Neil <dneil@google.com>
2020-01-06 18:42:14 +00:00
Damien Neil
5ba0c29655 internal/encoding/json: fix crash in parsing
Fuzzer-detected crash when parsing: {""

Change-Id: I019c667f48e6a1237858b5abf7d34f43593fb3b6
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/212357
Reviewed-by: Herbie Ong <herbie@google.com>
2019-12-22 04:14:01 +00:00
Damien Neil
fe15dd4cdd all: don't allow invalid field numbers when legacy support is on
The deprecated messageset format permits extension fields with numbers
greater than the usual maximum (1<<29-1). To support this, the
internal/encoding/wire package has disabled field number validation when
legacy support is enabled.

We shouldn't skip validating all field numbers for validity just because
we support larger ones in messagesets.

This change drops range validation from the wire package (other than
checking that numbers fit in an int32) and adds it to the wire
unmarshalers instead. This gives us validation where we care
about it (when unmarshaling a wire-format message) and allows for
best-effort handling of out-of-range numbers everywhere else.

Fixes golang/protobuf#996

Change-Id: I4e11b8a8aa177dd60e89723570af074a317c2451
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/210290
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-12-09 18:35:13 +00:00
Damien Neil
ce3384cd34 proto, internal/impl: store unknown MessageSet items in non-mset format
In the v1 implementation, unknown MessageSet items are stored in a
message's unknown fields section in non-MessageSet format. For example,
consider a MessageSet containing an item with type_id T and value V.
If the type_id is not resolvable, the item will be placed in the unknown
fields as a bytes-valued field with number T and contents V. This
conversion is then reversed when marshaling a MessageSet containing
unknown fields.

Preserve this behavior in v2.

One consequence of this change is that actual unknown fields in a
MessageSet (any field other than 1) are now discarded. This matches
the previous behavior.

Change-Id: I3d913613f84e0ae82481078dbc91cb25628651cc
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/205697
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-11-11 19:40:27 +00:00
Joe Tsai
84177c9bf3 all: use typed variant of protoreflect.ValueOf
Change-Id: I7479632b57e7c8efade12a2eb2b855e9c321adb1
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/196037
Reviewed-by: Damien Neil <dneil@google.com>
2019-09-17 21:33:16 +00:00
Joe Tsai
3c4ab8c6f1 encoding/prototext: drop trailing newline for empty
This is more consistent with the indent documentation:
	If indent is a non-empty string, it causes every entry in a List or Message
	to be preceded by the indent and trailed by a newline.

Since an empty message has no entries, there should be no newlines.

Change-Id: I5d57165aaf94ca6b184bb35bf05d5d68f5ee9dd5
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/194877
Reviewed-by: Herbie Ong <herbie@google.com>
2019-09-14 21:08:43 +00:00
Herbie Ong
582ab3de42 encoding/protojson: add random whitespaces in encoding output
This is meant to deter users from doing byte for byte comparison.

Change-Id: If005d2dc1eba45eaa4254171d2f247820db109e4
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/194037
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-09-07 00:48:34 +00:00
Herbie Ong
4eb4d61b0c internal/encoding/text: minor tweak in inserting random whitespace
Simply move logic into similar code block.  Maintains the same logic.

Change-Id: I7b5a3f3d57f6102c7919cdc03dd105f08d21aca3
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/194039
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-09-07 00:43:37 +00:00
Damien Neil
79bfdbe45b all: rename ExtensionType Descriptor method to TypeDescriptor (1/2)
Descriptor methods generally return a Descriptor with no Go type
information. ExtensionType's Descriptor is an exception, returning an
ExtensionTypeDescriptor containing both the proto descriptor and a
reference back to the ExtensionType. The pure descriptor is accessed
by xt.Descriptor().Descriptor().

Rename ExtensionType's Descriptor method to TypeDescriptor to make it
clear that it behaves a bit differently.

Change 1/2: Add the TypeDescriptor method and deprecate Descriptor.

Change-Id: I1806095044d35a474d60f94d2a28bdf528f12238
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/192139
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-08-28 18:34:29 +00:00
Joe Tsai
1799d1111a all: rename tag and flags for legacy support
Rename build tag "proto1_legacy" -> "protolegacy"
to be consistent with the "protoreflect" tag.

Rename flag constant "Proto1Legacy" -> "ProtoLegacy" since
it covers more than simply proto1 legacy features.
For example, it covers alpha-features of proto3 that
were eventually removed from the final proto3 release.

Change-Id: I0f4fcbadd4b5a61c87645e2e5be11d187e59157c
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/189345
Reviewed-by: Damien Neil <dneil@google.com>
2019-08-08 20:49:00 +00:00
Damien Neil
92f76189a3 all: refactor extensions, add proto.GetExtension etc.
Change protoiface.ExtensionDescV1 to implement protoreflect.ExtensionType.

ExtensionDescV1's Name field conflicts with the Descriptor Name method,
so change the protoreflect.{Message,Enum,Extension}Type types to no
longer implement the corresponding Descriptor interface. This also leads
to a clearer distinction between the two types.

Introduce a protoreflect.ExtensionTypeDescriptor type which bridges
between ExtensionType and ExtensionDescriptor.

Add extension accessor functions to the proto package:
proto.{Has,Clear,Get,Set}Extension. These functions take a
protoreflect.ExtensionType parameter, which allows writing the
same function call using either the old or new API:

  proto.GetExtension(message, somepb.E_ExtensionFoo)

Fixes golang/protobuf#908

Change-Id: Ibc65d12a46666297849114fd3aefbc4a597d9f08
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/189199
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-08-08 18:20:51 +00:00
Herbie Ong
a3369c5dc2 internal/encoding/text: replace use of regular expression in decoding
Improve performance by replacing use of regular expressions with direct
parsing code.

Compared to latest version:

name                                     old time/op    new time/op    delta
Text/Unmarshal/google_message1_proto2-4    21.8µs ± 5%    14.0µs ± 9%  -35.69%  (p=0.000 n=10+9)
Text/Unmarshal/google_message1_proto3-4    19.6µs ± 4%    13.8µs ±10%  -29.47%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           13.4ms ± 4%     4.9ms ± 4%  -63.44%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4      13.8µs ± 2%    14.1µs ± 4%   +2.42%  (p=0.011 n=9+10)
Text/Marshal/google_message1_proto3-4      11.6µs ± 2%    11.8µs ± 8%     ~     (p=0.573 n=8+10)
Text/Marshal/google_message2-4             8.01ms ±48%    5.97ms ± 5%  -25.44%  (p=0.000 n=10+10)

name                                     old alloc/op   new alloc/op   delta
Text/Unmarshal/google_message1_proto2-4    13.0kB ± 0%    12.6kB ± 0%   -3.40%  (p=0.000 n=10+10)
Text/Unmarshal/google_message1_proto3-4    13.0kB ± 0%    12.5kB ± 0%   -3.50%  (p=0.000 n=10+10)
Text/Unmarshal/google_message2-4           5.67MB ± 0%    5.50MB ± 0%   -3.13%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4      12.0kB ± 0%    12.1kB ± 0%   +0.02%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto3-4      11.7kB ± 0%    11.7kB ± 0%   +0.01%  (p=0.000 n=10+10)
Text/Marshal/google_message2-4             5.68MB ± 0%    5.68MB ± 0%   +0.01%  (p=0.000 n=10+10)

name                                     old allocs/op  new allocs/op  delta
Text/Unmarshal/google_message1_proto2-4       142 ± 0%       142 ± 0%     ~     (all equal)
Text/Unmarshal/google_message1_proto3-4       156 ± 0%       156 ± 0%     ~     (all equal)
Text/Unmarshal/google_message2-4            70.1k ± 0%     65.4k ± 0%   -6.76%  (p=0.000 n=10+10)
Text/Marshal/google_message1_proto2-4        91.0 ± 0%      91.0 ± 0%     ~     (all equal)
Text/Marshal/google_message1_proto3-4        80.0 ± 0%      80.0 ± 0%     ~     (all equal)
Text/Marshal/google_message2-4              36.4k ± 0%     36.4k ± 0%     ~     (all equal)

Change-Id: Ia5d3c16e9e33961aae03bac0d53fcfc5b1943d2a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/173360
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-07-23 22:08:16 +00:00
Damien Neil
9429392195 internal/encoding/pack: fix tests on armv7a
Golden test output doesn't match when math.NaN() has different bits from
the test's NaNs. Drop the NaN-related tests as too fiddly to be worth
keeping.

Change-Id: I89cf961273c2afab3b6b9f6c63878816314e9f43
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/186639
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-07-19 15:13:41 +00:00
Joe Tsai
5ae10aa9f0 encoding: unify MessageSet extension handling logic
This CL unifies common MessageSet logic in prototext and protojson
into the messageset package. While we are at it, also enable
MessageSet support only if the proto1_legacy build flag is enabled.

Change-Id: I1a7d475e8bb1dad61ecd286df45e4239e5bef072
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185898
Reviewed-by: Damien Neil <dneil@google.com>
2019-07-15 21:07:58 +00:00
Damien Neil
302cb325fb proto: support message_set_wire_format
MessageSets are a deprecated proto1 feature, long since superseded by
extensions. Add disabled-by-default support behind flags.Proto1Legacy.

Change-Id: I7d3ace07f3b0efd59673034f3dc633b908345a88
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185538
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-07-15 19:32:30 +00:00
Joe Tsai
3d8e369c4e all: implement proto1 weak fields
This implements generation of and reflection support for weak fields.
Weak fields are a proto1 feature where the "weak" option can be specified
on a singular message field. A weak reference results in generated code
that does not directly link in the dependency containing the weak message.

Weak field support is not added to any of the serialization logic.

Change-Id: I08ccfa72bc80b2ffb6af527a1677a0a81dcf33fb
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185399
Reviewed-by: Damien Neil <dneil@google.com>
2019-07-15 18:44:12 +00:00
Joe Tsai
36dc22ddb8 encoding: use strs.UnsafeString to avoid duplicated code
The strs.UnsafeString casts a []byte as a string.
This allows us to avoid duplicated functionality.

Change-Id: I9930b94bae35eac0f98c0fa62963b300bc8d7e49
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185459
Reviewed-by: Herbie Ong <herbie@google.com>
2019-07-10 07:01:20 +00:00
Joe Tsai
ace50e2b5e internal/encoding/tag: support marshaling weak fields
We already support unmarshaling weak fields, support the other direction.

Change-Id: I514e6b7b18cf9a3567fb1775a1e389fe64f22d22
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185340
Reviewed-by: Herbie Ong <herbie@google.com>
2019-07-09 22:25:24 +00:00
Joe Tsai
e5900a6a90 internal/encoding/tag: fix handling of JSON name
When decoding, only treat the name as being explicitly set if the
name differs from what the JSON name would have been had it been
automatically derived from the protobuf field name.

Change-Id: Ida9256eafe7af20c7a06be2d4fb298be44276104
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185398
Reviewed-by: Herbie Ong <herbie@google.com>
2019-07-09 22:16:29 +00:00
Damien Neil
e91877de26 internal/impl: add fast-path unmarshal
Benchmarks run with:
  go test ./benchmarks/ -bench=Wire  -benchtime=500ms -benchmem -count=8

Fast-path vs. parent commit:

  name                                      old time/op    new time/op    delta
  Wire/Unmarshal/google_message1_proto2-12    1.35µs ± 2%    0.45µs ± 4%  -67.01%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message1_proto3-12    1.07µs ± 1%    0.31µs ± 1%  -71.04%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message2-12            691µs ± 2%     188µs ± 2%  -72.78%  (p=0.000 n=7+8)

  name                                      old allocs/op  new allocs/op  delta
  Wire/Unmarshal/google_message1_proto2-12      60.0 ± 0%      25.0 ± 0%  -58.33%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message1_proto3-12      42.0 ± 0%       7.0 ± 0%  -83.33%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message2-12            28.6k ± 0%      8.5k ± 0%  -70.34%  (p=0.000 n=8+8)

Fast-path vs. -v1:

  name                                      old time/op    new time/op    delta
  Wire/Unmarshal/google_message1_proto2-12     702ns ± 1%     445ns ± 4%   -36.58%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message1_proto3-12     604ns ± 1%     311ns ± 1%   -48.54%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message2-12            179µs ± 3%     188µs ± 2%    +5.30%  (p=0.000 n=7+8)

  name                                      old allocs/op  new allocs/op  delta
  Wire/Unmarshal/google_message1_proto2-12      26.0 ± 0%      25.0 ± 0%    -3.85%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message1_proto3-12      8.00 ± 0%      7.00 ± 0%   -12.50%  (p=0.000 n=8+8)
  Wire/Unmarshal/google_message2-12            8.49k ± 0%     8.49k ± 0%    -0.01%  (p=0.000 n=8+8)

Change-Id: I6247ac3fd66a63d9acb902cbd192094ee3d151c3
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185147
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-07-09 19:56:42 +00:00
Joe Tsai
9aeddcdcf0 internal/encoding/wire: fix trivial misspelling
Change-Id: I71f1715145ebc58aadec1d5ecb5f1900998f65a0
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/184277
Reviewed-by: Damien Neil <dneil@google.com>
2019-07-08 20:14:46 +00:00
Joe Tsai
15076350e8 reflect/protodesc: enforce strict validation
Hyrum's Law dictates that if we do not prevent naughty behavior,
people will rely on it. If we do not validate that the provided
file descriptor is correct today, it will be near impossible
to add proper validation checks later on.

The logic added validates that the provided file descriptor is
correct according to the same semantics as protoc,
which was reversed engineered to derive the set of rules implemented here.
The rules are unfortunately complicated because protobuf is a language
full of many non-orthogonal features. While our logic is complicated,
it is still 1/7th the size of the equivalent C++ code!

Change-Id: I6acc5dc3bd2e4c6bea6cd9e81214f8104402602a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/184837
Reviewed-by: Damien Neil <dneil@google.com>
2019-07-03 20:46:51 +00:00
Joe Tsai
b2107fbd8d reflect/protodesc: robustify dependency resolution implementation
Overview of changes:
* Add an option that specifies whether to replace unresolvable references
with a placeholder instead of producing an error. Since the prior behavior
produced placeholders (not always), we default to that behavior for now,
but will enable strict resolving in a future CL.
* The option is not yet exported because there is concern about what the
public API should look like. This will be exposed in a future CL.
* Unlike before, we now permit placeholders for unresolvable enum values.
* We implement relative name resolution logic.
* We handle the case where the type is unknown, but type_name is specified.
In such a case, we populate both FieldDescriptor.{Enum,Message} and leave
the FieldDescriptor.Kind with the zero value. If the type_name happened
to resolve, we use that to determine the type.
* If a placeholder is used to represent a relative name,
the FullName reports an invalid full name with a "*." prefix.

Change-Id: Ifa8c750423c488fb9324eec4d033a2f251505fda
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/184317
Reviewed-by: Damien Neil <dneil@google.com>
2019-07-03 19:17:36 +00:00
Damien Neil
cedb595154 proto, internal/impl: avoid string->[]byte conversions
We trust the compiler to optimize away the string->[]byte conversion in
code like:

	b = wire.AppendBytes(b, []byte(s))

In testing (go 1.12.5 linux/amd64), this optimization is not happening.
Perhaps newer versions of the compiler will optimize this, but we
shouldn't rely on it; avoid unnecessary conversions.

Benchmark differences vs https://golang.org/cl/171462:

  name                                   old time/op    new time/op    delta
  Wire/Marshal/google_message1_proto2-6     310ns ± 2%     189ns ± 3%  -39.20%  (p=0.000 n=8+8)
  Wire/Marshal/google_message1_proto3-6     389ns ± 8%     261ns ± 2%  -33.03%  (p=0.000 n=8+8)
  Wire/Marshal/google_message2-6            103µs ±11%      59µs ± 4%  -42.17%  (p=0.000 n=8+8)

  name                                   old alloc/op   new alloc/op   delta
  Wire/Marshal/google_message1_proto2-6      592B ± 0%      240B ± 0%  -59.46%  (p=0.000 n=8+8)
  Wire/Marshal/google_message1_proto3-6      576B ± 0%      224B ± 0%  -61.11%  (p=0.000 n=8+8)
  Wire/Marshal/google_message2-6            196kB ± 0%      90kB ± 0%  -54.05%  (p=0.000 n=8+8)

  name                                   old allocs/op  new allocs/op  delta
  Wire/Marshal/google_message1_proto2-6      5.00 ± 0%      1.00 ± 0%  -80.00%  (p=0.000 n=8+8)
  Wire/Marshal/google_message1_proto3-6      5.00 ± 0%      1.00 ± 0%  -80.00%  (p=0.000 n=8+8)
  Wire/Marshal/google_message2-6            1.66k ± 0%     0.00k ± 0%  -99.94%  (p=0.000 n=8+8)

Change-Id: Idab7634b8c86604dffa46895ba2e61be38c9bd9c
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/183380
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-06-24 17:41:45 +00:00
Damien Neil
8c86fc5e7d all: remove non-fatal UTF-8 validation errors (and non-fatal in general)
Immediately abort (un)marshal operations when encountering invalid UTF-8
data in proto3 strings. No other proto implementation supports non-UTF-8
data in proto3 strings (and many reject it in proto2 strings as well).
Producing invalid output is an interoperability threat (other
implementations won't be able to read it).

The case where existing string data is found to contain non-UTF8 data is
better handled by changing the field to the `bytes` type, which (aside
from UTF-8 validation) is wire-compatible with `string`.

Remove the errors.NonFatal type, since there are no remaining cases
where it is needed. "Non-fatal" errors which produce results and a
non-nil error are problematic because they compose poorly; the better
approach is to take an option like AllowPartial indicating which
conditions to check for.

Change-Id: I9d189ec6ffda7b5d96d094aa1b290af2e3f23736
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/183098
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-06-20 20:55:13 +00:00
Joe Tsai
d888139e7b internal/filedesc, internal/filetype: initial commit
The internal/fileinit package is split apart into two packages:
* internal/filedesc constructs descriptors from the raw proto.
It is very similar to the previous internal/fileinit package.
* internal/filetype wraps descriptors with Go type information

Overview:
* The internal/fileinit package will be deleted in a future CL.
It is kept around since the v1 repo currently depends on it.
* The internal/prototype package is deleted. All former usages of it
are now using internal/filedesc instead. Most significantly,
the reflect/protodesc package was almost entirely re-written.
* The internal/impl package drops support for messages that do not
have a Descriptor method (pre-2016). This removes a significant amount
of technical debt.
filedesc.Builder to parse raw descriptors.
* The internal/encoding/defval package now handles enum values by name.

Change-Id: I3957bcc8588a70470fd6c7de1122216b80615ab7
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/182360
Reviewed-by: Damien Neil <dneil@google.com>
2019-06-20 02:06:11 +00:00
Damien Neil
e89e6244e0 all: change module to google.golang.org/protobuf
Temporarily remove go.mod, since we can't generate an accurate one until
the corresponding v1 change is submitted.

Change-Id: I1e1ad97f2b455e33f61ffaeb8676289795e47e72
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/177000
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-05-14 17:28:29 +00:00
Joe Tsai
ac31a352ca reflect/protoreflect: add helper methods to FieldDescriptor
Added API:
	FieldDescriptor.IsExtension
	FieldDescriptor.IsList
	FieldDescriptor.MapKey
	FieldDescriptor.MapValue
	FieldDescriptor.ContainingOneof
	FieldDescriptor.ContainingMessage

Deprecated API (to be removed in subsequent CL):
	FieldDescriptor.Oneof
	FieldDescriptor.Extendee

These methods help cleanup several common usage patterns.

Change-Id: I9a3ffabc2edb2173c536509b22f330f98bba7cf3
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/176977
Reviewed-by: Damien Neil <dneil@google.com>
2019-05-14 17:14:05 +00:00
Joe Tsai
972d873b59 internal/encoding/wire: increase maximum field number
The protobuf documentation explicitly specifies (1<<29)-1 as the maximum
field number, but the C++ implementation itself has a special-case where it
allows field numbers up to MaxInt32 for MessageSet fields, but continues
to apply the former limit in all non-MessageSet cases.

To avoid complicated branching logic, we use the larger limit for all cases
if MessageSet is supported, otherwise, we impose the documented limit.

Change-Id: I710a2a21aa3beba161c3e6ca2f2ed9a266920a5a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/175817
Reviewed-by: Damien Neil <dneil@google.com>
2019-05-07 21:39:02 +00:00
Herbie Ong
decef41dcc internal/encoding/json: improve decoding speed and memory allocation
Change use of regexp for matching literals true,false,null to simple
bytes comparison. Small gain from doing this.

Remove computing for position in Value as that is only needed in error
messages. In order to preserve ability to compute for position later,
store the original input in Value instead of just the slice containing
the value, however, need to also store the start index and size of the
parsed value.

Using benchmark in encoding/bench_test.go now shows faster time and less
memory usage than V1.

name          old time/op    new time/op    delta
JSONEncode-4    30.3ms ± 1%    10.3ms ± 1%  -66.02%  (p=0.000 n=9+8)
JSONDecode-4    54.4ms ± 3%    18.9ms ± 2%  -65.33%  (p=0.000 n=10+10)

name          old alloc/op   new alloc/op   delta
JSONEncode-4    10.3MB ± 0%     3.9MB ± 0%  -61.74%  (p=0.000 n=10+9)
JSONDecode-4    19.0MB ± 0%     3.6MB ± 0%  -81.29%  (p=0.000 n=10+9)

name          old allocs/op  new allocs/op  delta
JSONEncode-4      465k ± 0%       64k ± 0%  -86.30%  (p=0.000 n=10+8)
JSONDecode-4      289k ± 0%      163k ± 0%  -43.69%  (p=0.000 n=10+9)

Change-Id: I0a3108d675d6442674facb065aaebd14051f6c5d
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/172662
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-04-29 22:13:25 +00:00
Joe Tsai
d24bc72368 reflect/protoreflect: rename methods with Type suffix
The protobuf type system uses the word "descriptor" instead of "type".
We should avoid the "type" verbage when we aren't talking about Go types.
The old names are temporarily kept around for compatibility reasons.

Change-Id: Icc99c913528ead011f7a74aa8399d9c5ec6dc56e
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/172238
Reviewed-by: Herbie Ong <herbie@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
2019-04-20 06:35:24 +00:00
Herbie Ong
1e09691415 internal/encoding/{json,text}: improve string parsing
Previous calls to indexNeedEscape with a type conversion from []byte
to string incurs allocation.

Make 2 different calls instead, one for string and one for bytes.

Type converting string to []byte does not incur extra allocation,
however, the benchmark results still show it to be slower by ~3% for
textpb and 6+% for jsonpb, hence decided to go with 2 separate calls
instead.

Results over current head:
name          old time/op    new time/op    delta
TextEncode-4    18.1ms ± 2%    18.3ms ± 2%     ~     (p=0.065 n=10+9)
TextDecode-4     233ms ± 3%     102ms ± 1%  -56.34%  (p=0.000 n=9+10)
JSONEncode-4    10.4ms ± 2%    10.5ms ± 0%   +0.56%  (p=0.019 n=9+9)
JSONDecode-4     870ms ± 2%     354ms ± 4%  -59.33%  (p=0.000 n=9+10)

name          old alloc/op   new alloc/op   delta
TextEncode-4    28.9MB ± 0%    28.9MB ± 0%   +0.00%  (p=0.000 n=10+9)
TextDecode-4    1.16GB ± 0%    0.03GB ± 0%  -97.44%  (p=0.000 n=9+10)
JSONEncode-4    3.94MB ± 0%    3.94MB ± 0%   +0.00%  (p=0.000 n=10+10)
JSONDecode-4    3.35GB ± 0%    0.01GB ± 0%  -99.83%  (p=0.000 n=10+10)

name          old allocs/op  new allocs/op  delta
TextEncode-4     73.5k ± 0%     73.5k ± 0%     ~     (all equal)
TextDecode-4      278k ± 0%      255k ± 0%   -8.26%  (p=0.000 n=9+10)
JSONEncode-4     63.8k ± 0%     63.8k ± 0%     ~     (all equal)
JSONDecode-4      247k ± 0%      210k ± 0%  -14.92%  (p=0.000 n=10+10)

Change-Id: Ibc64e9a7827ec1fffa213eb79f60497950203700
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/172239
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-04-17 00:26:13 +00:00
Damien Neil
fb8bd3a65a internal/legacy, internal/encoding/tag: don't synthesize options
Stop adding options to legacy MessageDescriptors and FieldDescriptors.
We would synthesize options messages containing semantic options
(FieldOptions.is_packed, FieldOptions.is_weak, MessageOptions.map_entry).
This information is already contained in the protoreflect descriptors,
so there is no real need to define a correct options message.

If we do want to include options in legacy descriptors, then we should
get the original value out of the FileDescriptorProto, since it may
include additional extensions or other information.

This change completely removes the dependency from internal/legacy to
descriptor.proto.

Change-Id: Ib6bbe4ca6e0fe7ae501f3e9b11d5fa0222808410
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/171458
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-04-10 21:29:14 +00:00
Herbie Ong
8fa64d98d8 internal/encoding/json: improve Value.Int,Uint by reducing allocations
parseNumber does not need to construct new slices for numberParts, it
simply needs to reference the correct subset from the input.

normalizeToString may need to allocate but only if there's a positive
exponent.

name      old time/op    new time/op    delta
Float-4      308ns ± 0%     291ns ± 0%   ~     (p=1.000 n=1+1)
Int-4        498ns ± 0%     341ns ± 0%   ~     (p=1.000 n=1+1)
String-4     262ns ± 0%     250ns ± 0%   ~     (p=1.000 n=1+1)
Bool-4       212ns ± 0%     210ns ± 0%   ~     (p=1.000 n=1+1)

name      old alloc/op   new alloc/op   delta
Float-4      48.0B ± 0%     48.0B ± 0%   ~     (all equal)
Int-4         160B ± 0%       99B ± 0%   ~     (p=1.000 n=1+1)
String-4      176B ± 0%      176B ± 0%   ~     (all equal)
Bool-4       0.00B          0.00B        ~     (all equal)

name      old allocs/op  new allocs/op  delta
Float-4       1.00 ± 0%      1.00 ± 0%   ~     (all equal)
Int-4         9.00 ± 0%      4.00 ± 0%   ~     (p=1.000 n=1+1)
String-4      3.00 ± 0%      3.00 ± 0%   ~     (all equal)
Bool-4        0.00           0.00        ~     (all equal)

Change-Id: If083e18a5914b15e794d34722cbb6539cbd73a53
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/170788
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-04-06 04:16:05 +00:00
Herbie Ong
8ac9dd257b encoding/jsonpb: add support for unmarshaling Any
Also added json.Decoder.Clone API for unmarshaling Any to look
ahead remaining bytes for @type field.

Change-Id: I2f803743534dfb64f9092d716805b115faa5975a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/170102
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-04-02 04:49:03 +00:00
Herbie Ong
670d808a6d internal/encoding/json: improve allocation of Value for JSON strings
Substitute interface Value.value field with per type field boo and str
instead.

name      old time/op    new time/op    delta
String-4     286ns ± 0%     254ns ± 0%   ~     (p=1.000 n=1+1)
Bool-4       209ns ± 0%     211ns ± 0%   ~     (p=1.000 n=1+1)

name      old alloc/op   new alloc/op   delta
String-4      192B ± 0%      176B ± 0%   ~     (p=1.000 n=1+1)
Bool-4       0.00B          0.00B        ~     (all equal)

name      old allocs/op  new allocs/op  delta
String-4      4.00 ± 0%      3.00 ± 0%   ~     (p=1.000 n=1+1)
Bool-4        0.00           0.00        ~     (all equal)

Change-Id: Ib0167d22e60d63c221c303b79c75b9e96d432fe7
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/170277
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-04-01 20:47:36 +00:00
Herbie Ong
a3421952ac internal/encoding/json: improve decoding of JSON numbers for floats
Per Joe's suggestion, remove producing numberParts when parsing a JSON
number to produce corresponding Value. This saves having to store it
inside Value as well. Only produce numberParts for calls to
Value.{Int,Uint} call.

numberParts is only used for producing integers and removing the logic to
produce numberParts improves overall decoding speed for floats, and shows no
change for integers.

name     old time/op  new time/op  delta
Float-4   559ns ± 0%   288ns ± 0%   ~     (p=1.000 n=1+1)
Int-4     471ns ± 0%   466ns ± 0%   ~     (p=1.000 n=1+1)

Change-Id: I21bf304ca67dda8d41a4ea0022dcbefd51058c1c
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/168781
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-03-23 22:02:55 +00:00
Herbie Ong
c96a79da29 encoding/jsonpb: add support for basic unmarshaling
Unmarshaling of scalar, messages, repeated, and maps.

Need to further improve on error messages for consistency, some error
messages contain the position info while some currently do not.  There
are cases where position info is wrong as well when a value is decoded
in another pass, e.g. numbers in string value, or map keys.

Change-Id: I6f9e903c499b5e87fb258dbdada7434389fc7522
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/166338
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-03-15 18:53:18 +00:00
Joe Tsai
990b9f5919 internal/prototype: move from reflect/prototype
The prototype package was initially used by generated reflection support,
but has now been replaced by internal/fileinit.
Eventually, this functionality should be deleted and re-written in terms
of other components in the repo.

Usages that prototype currently provides (but should be moved) are:
* Constructing standalone messages and enums, which is behavior we should
provide in reflect/protodesc. The google.protobuf.{Enum,Type} are well-known
proto messages designed for this purpose.
* Constructing placeholder files, enums, and messages.
* Consructing protoreflect.{Message,Enum,Extension}Types, which are protobuf
descriptors with associated Go type information.

Change-Id: Id7dbefff952682781b439aa555508c59b2629f9e
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/167383
Reviewed-by: Damien Neil <dneil@google.com>
2019-03-13 20:17:00 +00:00
Herbie Ong
250c6eaf92 internal/encoding/text: change Value.Float{32,64} to Value.Float
Collapse Value.Float32 and Value.Float64 into single API to keep it
consistent with Value.{Int,Uint}.

Change-Id: I07737e72715fe3cc3f6bcad579cf5d6cfe3757d5
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/167317
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-03-13 04:35:13 +00:00
Herbie Ong
87608a75eb encoding/jsonpb: switch MarshalOptions to use new JSON encoder
Delete temporary copy of old JSON encoder/decoder internal/encoding/jsonx.

Change-Id: I8b23d7907370d069d0930c360979a2d8b62adc93
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/165778
Reviewed-by: Damien Neil <dneil@google.com>
2019-03-11 21:54:53 +00:00
Herbie Ong
d3f8f2d412 internal/encoding/json: rewrite to a token-based encoder and decoder
Previous decoder decodes a JSON number into a float64, which lacks
64-bit integer precision.

I attempted to retrofit it with storing the raw bytes and parsed out
number parts, see golang.org/cl/164377.  While that is possible, the
encoding logic for Value is not symmetrical with the decoding logic and
can be confusing since both utilizes the same Value struct.

Joe and I decided that it would be better to rewrite the JSON encoder
and decoder to be token-based instead, removing the need for sharing a
model type plus making it more efficient.

Change-Id: Ic0601428a824be4e20141623409ab4d92b6167c7
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/165677
Reviewed-by: Damien Neil <dneil@google.com>
2019-03-11 21:53:21 +00:00
Herbie Ong
1e1e78b71b internal/encoding/jsonx: copy internal/encoding/json
We're rewriting internal/encoding/json. So, make a copy of it first in
order not to break encoding/jsonpb package.

Change-Id: I8b63c468d3f432102d2af4db22a7549998ce3876
Reviewed-on: https://go-review.googlesource.com/c/164642
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
2019-03-02 00:23:28 +00:00
Damien Neil
3fa6d3f003 internal/encoding/pack: don't depend on exact math.NaN bits
This test examines the result of converting math.NaN() to a fixed byte
string. Change it to use a specific NaN value instead, since the value
returned by math.NaN is specified only as being a NaN, not a specific
one.

Use specific float32 and float64 NaN values, since the result of
converting a float64 NaN to a float32 can and does vary.

Fixes test failure on ARM.

Change-Id: Ia1517fdba768cdd88e5ee5f5af37f0b481e651b4
Reviewed-on: https://go-review.googlesource.com/c/162117
Reviewed-by: Herbie Ong <herbie@google.com>
2019-02-12 21:38:35 +00:00
Herbie Ong
84f0960b04 internal/encoding/text: format using 32 bitsize when encoding float32
When encoding/textpb marshals out float32 values, it was previously
formatting it as float64 bitsize since both float types are stored as
float64 and internal/encoding/text only has one Float type.  A
consequence of this is that the output may display a different value
than expected, e.g.  1.02 becomes 1.0199999809265137.

This CL splits Float type into Float32 and Float64 to keep track of
which bitsize to use when formatting.  Values of both types are still
stored as float64 to keep the logic simple.

Decoding will always use Float64, but users can ask for a float32 value
from it.

Change-Id: Iea5b14b283fec2236a0c3946fac34d4d79b95274
Reviewed-on: https://go-review.googlesource.com/c/158497
Reviewed-by: Damien Neil <dneil@google.com>
2019-01-18 17:54:23 +00:00