When a message (within an extension) is lazily decoded, its size cache is
initialized to 0 (the zero value for an int32). This doesn’t mean the size cache
reads 0, but rather that it was not initialized.
This fixes TestExtensionGetRace being flaky since CL 580015.
related to golang/protobuf#1609
Change-Id: Ia305badadd300679975f230005c3e33c94050e4a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/586396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lasse Folger <lassefolger@google.com>
Extensions will be kept in wire format over proto.Size and proto.Marshal.
This change is a significant performance optimization for jobs that read and
write Protobuf messages of the same type, but do not need to process extensions.
This change is based on work by Patrik Nyblom.
Note that the proto.Size semantics for lazy messages might be surprising;
see https://protobuf.dev/reference/go/size/ for details.
We have been running this change for about two weeks in Google,
all known breakages have already been addressed with CL 579995.
related to golang/protobuf#1609
Change-Id: I16be78d15304d775bb30e76356a1a61d61300b43
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/580015
Reviewed-by: Lasse Folger <lassefolger@google.com>
Auto-Submit: Michael Stapelberg <stapelberg@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This CL adds runtime support for unknown fields to be represented
as *[]byte in addition to the current representation as []byte.
This CL does not change generated code to use *[]byte.
Comparison between using *[]byte and []byte:
• Every message supports unknown fields, so use of []byte
expands a message size by 24B (for 64-bit systems).
In contrast, *[]byte only expands a message by 8B.
This has significant memory implications for small messages.
• If unknown fields are encountered, *[]byte has extra overhead
allocating the 24B slice header. However, it is assumed
that messages rarely see any unknown fields at runtime,
or generally do so for a temporary period of time.
Change-Id: I81935e4ea7394166e61ff4579f76f59fa792dfc9
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/244937
Reviewed-by: Damien Neil <dneil@google.com>
The size cache is an int32. Store a -1 in it if the message size
overflows, and fall back to recomputing the size if the value is
negative. This means lamentable O(N^2) costs in marshaling,
but that's better than silently producing invalid output.
Also considered: Return an error. Avoids O(N^2) behavior, but gives the
user no good choices if they don't care the output being slow. Encoding
costs of messages this large are likely to be dominated by copying the
bytes rather than the size operation anyway, so slow-but-correct seems
like the most generally useful option.
We could store valid values for the range (0x7fffffff,0xfffffffe)
reserving only 0xffffffff as the overflow sentinel, but optimizing this
case seems less important than the code being obviously correct.
Fixesgolang/protobuf#970.
Change-Id: I44f59ff81fdfbc8672dd5aec959d5153a081aab9
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/220593
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Move all fast-path inputs and outputs into the Input/Output structs.
Collapse all booleans into bitfields.
Change-Id: I79ebfbac9cd1d8ef5ec17c4f955311db007391ca
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/219505
Reviewed-by: Joe Tsai <joetsai@google.com>
Refactor the fast-path size, marshal, unmarshal, and isinit functions to
take the *coderFieldInfo for the field as input.
This replaces a number of closures capturing field-specific information
with functions taking that information as an explicit parameter.
Change-Id: I8cb39701265edb7b673f6f04a0152d5f4dbb4d5d
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/218937
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Change the representation of option flags in protoiface from bools to a
bitfield. This brings the representation of options in protoiface in
sync with that in internal/impl.
This change has several benefits:
1. We will probably find that we need to add more option flags over time.
Converting to the more efficient representation of these flags as high
in the call stack as possible minimizes the performance implication of
the struct growing.
2. On a similar note, this avoids the need to convert from the compact
representation to the larger one when passing from internal/impl to
proto, since the {Marshal,Unmarshal}State methods take the compact form.
3. This removes unused options from protoiface. Instead of documenting
that AllowPartial is always set, we can just not include an AllowPartial
flag in the protoiface options.
4. Conversely, this provides a way to add option flags to protoiface
that we don't want to expose in the proto package.
name old time/op new time/op delta
EmptyMessage/Wire/Marshal-12 11.1ns ± 7% 10.1ns ± 1% -9.35% (p=0.000 n=8+8)
EmptyMessage/Wire/Unmarshal-12 7.07ns ± 0% 6.74ns ± 1% -4.58% (p=0.000 n=8+8)
EmptyMessage/Wire/Validate-12 4.30ns ± 1% 3.80ns ± 8% -11.45% (p=0.000 n=7+8)
RepeatedInt32/Wire/Marshal-12 1.17µs ± 1% 1.21µs ± 7% +4.09% (p=0.000 n=8+8)
RepeatedInt32/Wire/Unmarshal-12 938ns ± 0% 942ns ± 3% ~ (p=0.178 n=7+8)
RepeatedInt32/Wire/Validate-12 521ns ± 4% 543ns ± 7% ~ (p=0.157 n=7+8)
Required/Wire/Marshal-12 97.2ns ± 1% 95.3ns ± 1% -1.98% (p=0.001 n=7+7)
Required/Wire/Unmarshal-12 41.0ns ± 9% 38.6ns ± 3% -5.73% (p=0.048 n=8+8)
Required/Wire/Validate-12 25.4ns ±11% 21.4ns ± 3% -15.62% (p=0.000 n=8+7)
Change-Id: I3ac1b00ab36cfdf61316ec087a5dd20d9248e4f6
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/216760
Reviewed-by: Joe Tsai <joetsai@google.com>
Add functions to the proto package which plumb through the fast-path state.
As a sample use case: A followup CL adds an Initialized field to
protoiface.UnmarshalOutput, permitting the unmarshaller to report back
when it can confirm that a message is fully initialized. We want to
preserve that information when an unmarshal operation threads through
the proto package (such as when unmarshaling extensions).
To allow these functions to be added as methods of MarshalOptions and
UnmarshalOptions rather than top-level functions, separate the options
from the input structs.
Also update options passed to fast-path methods to set AllowPartial and
Merge to reflect the expected behavior of those methods. (Always allow
partial, never merge.)
Change-Id: I482477b0c9340793be533e75a86d0bb88708716a
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/215877
Reviewed-by: Joe Tsai <joetsai@google.com>
We may want to make changes to the inputs and outputs of the fast-path
functions in the future. For example, we likely want to add the ability
for the fast-path unmarshal to report back whether the unmarshaled
message is known to be initialized.
Change the signatures of these functions to take in and return struct
types which can be extended with whatever fields we want in the future.
Change-Id: Idead360785df730283a4630ea405265b72482e62
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/215719
Reviewed-by: Joe Tsai <joetsai@google.com>
Change size, marshal, and isinit operations on oneofs to look up the
currently-set oneof type in a map rather than testing for each possible
oneof field in turn.
Significantly improves oneof encoding speed for oneofs with a
substantial number of fields:
go test ./proto -bench=./oneof.*string.*test.TestAll -benchmem -count=8 -cpu=1
name old time/op new time/op delta
Encode/oneof_(string)_(*test.TestAllTypes) 911ns ± 1% 397ns ± 3% -56.45% (p=0.000 n=8+7)
Decode/oneof_(string)_(*test.TestAllTypes) 899ns ± 1% 922ns ± 1% +2.49% (p=0.001 n=7+7)
Fixesgolang/protobuf#950
Change-Id: I9393a87975ce09011d885a8af4a63a639ea8452f
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/210281
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Stash fast-path information for extensions on the ExtensionInfo. In
the usual case where an ExtensionType's underlying implementation is
an *ExtensionInfo, fetching the fast-path information becomes a type
assertion rather than a mutex-guarded map access.
Maintain a global sync.Map for the case where an ExtensionType isn't an
*ExtensionInfo.
Substantially improves performance for fast-path operations on
extensions:
Encode/MessageSet_type_id_before_message_content-12 267ns ± 1% 185ns ± 1% -30.44% (p=0.001 n=7+7)
Encode/basic_scalar_types_(*test.TestAllExtensions)-12 1.94µs ± 1% 0.40µs ± 1% -79.32% (p=0.000 n=8+7)
Change-Id: If048b521deb3665a090ea3d0a178c61691d4201e
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/210540
Reviewed-by: Joe Tsai <joetsai@google.com>
In the v1 implementation, unknown MessageSet items are stored in a
message's unknown fields section in non-MessageSet format. For example,
consider a MessageSet containing an item with type_id T and value V.
If the type_id is not resolvable, the item will be placed in the unknown
fields as a bytes-valued field with number T and contents V. This
conversion is then reversed when marshaling a MessageSet containing
unknown fields.
Preserve this behavior in v2.
One consequence of this change is that actual unknown fields in a
MessageSet (any field other than 1) are now discarded. This matches
the previous behavior.
Change-Id: I3d913613f84e0ae82481078dbc91cb25628651cc
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/205697
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Change the storage type of ExtensionField from interface{} to
protoreflect.Value.
Replace the codec functions operating on interface{}s with ones
operating on Values.
Values are potentially more efficient, since they can represent
non-pointer types without allocation. This also reduces the number of
types used to represent field values.
Additionally, this change lays groundwork for changing the
user-visible representation of repeated extension fields from
*[]T to []T. The storage type for extension fields must support mutation
(thus *[]T currently); changing the storage type to a Value permits this
without the need to introduce yet another view on field values.
Change-Id: Ida336be14112bb940f655236eb58df21bf312525
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/192218
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
The following adjustments were made:
* The pragma.NoUnkeyedLiterals is moved to be the first field.
This is done to keep the options struct smaller. Even if the last
field is zero-length, Go GC implementation details forces the struct
to be padded at the end.
* Methods are documented as always treating AllowPartial as true.
* Added a support flag for UnmarshalOptions.DiscardUnknown.
Change-Id: I1f75d226542ab2bb0123d9cea143c7060df226d8
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185998
Reviewed-by: Damien Neil <dneil@google.com>
Move data used by the fast-path implementations into a substructure of
MessageInfo and initialize it separately.
Change-Id: Ib855ee8ea5cb0379528b52ba0e191319aa5e2dff
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/184077
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Immediately abort (un)marshal operations when encountering invalid UTF-8
data in proto3 strings. No other proto implementation supports non-UTF-8
data in proto3 strings (and many reject it in proto2 strings as well).
Producing invalid output is an interoperability threat (other
implementations won't be able to read it).
The case where existing string data is found to contain non-UTF8 data is
better handled by changing the field to the `bytes` type, which (aside
from UTF-8 validation) is wire-compatible with `string`.
Remove the errors.NonFatal type, since there are no remaining cases
where it is needed. "Non-fatal" errors which produce results and a
non-nil error are problematic because they compose poorly; the better
approach is to take an option like AllowPartial indicating which
conditions to check for.
Change-Id: I9d189ec6ffda7b5d96d094aa1b290af2e3f23736
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/183098
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Add ExtensionField.{SetType,GetType} to hide the fact that the underlying
descriptor is actually an ExtensionDescV1.
Change-Id: I1d0595484ced0a88d2df0852a732fdf0fe9aa232
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/180538
Reviewed-by: Damien Neil <dneil@google.com>
The name MessageType is easily confused with protoreflect.MessageType.
Rename it as MessageInfo, which follows the pattern set by v1,
where the equivalent data structure is called InternalMessageInfo.
Change-Id: I535956e1f7c6e9b07e9585e889d5e93388d0d2ce
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/178478
Reviewed-by: Damien Neil <dneil@google.com>