The size cache is an int32. Store a -1 in it if the message size
overflows, and fall back to recomputing the size if the value is
negative. This means lamentable O(N^2) costs in marshaling,
but that's better than silently producing invalid output.
Also considered: Return an error. Avoids O(N^2) behavior, but gives the
user no good choices if they don't care the output being slow. Encoding
costs of messages this large are likely to be dominated by copying the
bytes rather than the size operation anyway, so slow-but-correct seems
like the most generally useful option.
We could store valid values for the range (0x7fffffff,0xfffffffe)
reserving only 0xffffffff as the overflow sentinel, but optimizing this
case seems less important than the code being obviously correct.
Fixesgolang/protobuf#970.
Change-Id: I44f59ff81fdfbc8672dd5aec959d5153a081aab9
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/220593
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
The Format function and MarshalOptions.Format method are helper
functions for directly obtaining the formatted string for a message
without having to deal with errors or convert a []byte to string.
It is only intended for human consumption (e.g., debugging or logging).
We also add a MarshalOptions.Multiline option to specify that the output
should use some default indentation in a multiline output.
This assists in the v1 to v2 migration where:
protoV1.CompactTextString(m) => prototext.MarshalOptions{}.Format(m)
protoV1.MarshalTextString(m) => prototext.Format(m)
At Google, there are approximately 10x more usages of MarshalTextString than
CompactTextString, so it makes sense that the top-level Format function
does multiline expansion by default.
Fixes#850
Change-Id: I149c9e190a6d99b985d3884df675499a3313e9b3
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/213460
Reviewed-by: Damien Neil <dneil@google.com>
Reviewed-by: Herbie Ong <herbie@google.com>
This adds a experimental function to the internal/impl package which
validates a wire-format message against a message type. The validator
reports whether the message can be successfully unmarshaled, and whether
the result is initialized (all required fields are set). In some cases,
the validator returns ambiguous results when full validation would be
expensive.
The validator is unused outside of tests. In the future, it may be used
to permit lazy unmarshaling of some data. It is being added now for
testing; in particular, the wire fuzzer now checks the validator output
for consistency with the unmarshaler.
The validator adds a small amount of unused per-MessageType state. If
this becomes a concern, we could conditionalize it with a build tag.
Change-Id: I4216ef81d6a9ed975302eed189b02d08608858b4
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/212302
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
The v1 proto.Equal function treats (*Message)(nil) and new(Message)
as being different, while v2 proto.Equal treated them as equal since
a typed nil pointer is functionally an empty message since the
protobuf data model has no concept of presence as a first-class
property of messages.
Unfortunately, a significant amount of code depends on this distinction
that it would be difficult to migrate users from v1 to v2 unless we
preserved similar semantics in the v2 proto.Equal.
Also, double down on these semantics for protocmp.Transform.
Fixes#965
Change-Id: I21e78ba6251401a0ac0ccf495188093973cd7f3f
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/213238
Reviewed-by: Damien Neil <dneil@google.com>
Move the test inputs for the wire marshaler and unmarshaler out of
decode_test.go and into a new file. Consolidate some tests for invalid
messages (UTF-8 validation failures, field numbers out of range) into
a single list of invalid messages. Break out the no-enforce-utf8 test
into a separate file, since it is both complicated and conditional on
legacy support.
Change-Id: Ide80fa9d3aec2b6d42a57e6f9265358aa5e661a7
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/211557
Reviewed-by: Joe Tsai <joetsai@google.com>
The v1 wire marshaler sorts fields as follows:
- All extensions, sorted by field number.
- All non-oneof fields, sorted by field number.
- All oneof fields, in indeterminate order.
We already make some steps toward supporting this ordering: The
fast path encoder places extensions in sorted order at the start
of the message.
This commit moves oneof fields to the end of the message, makes the
reflection-based encoder use this ordering when deterministic marshaling
is enabled, and adds a test to catch unintentional changes to the
ordering.
Users SHOULD NOT depend on stability of the marshal output. It is
subject to change over time. Without deterministic marshaling enabled,
it is subject to change over calls to Marshal.
Change-Id: I6cfd89090d790a3bb50785f32b94d2781d7d08db
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/206800
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
When allocating more space for the destination message in MarshalAppend,
use the same slice growth algorithm as the Go runtime's append rather
than allocating precisely the desired space.
Change-Id: If6033f6f7abdca473bc5188c4d3938ce57d3bdd2
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/197758
Reviewed-by: Joe Tsai <joetsai@google.com>
The old implementation had the behavior where a nil wrapper value:
m := new(foopb.Message)
m.OneofField = (*foopb.Message_OneofUint32)(nil)
was functionally equivalent to it being directly set to nil:
m := new(foopb.Message)
m.OneofField = nil
preserve this semantic in both the table-drive implementation
and the reflection implementation.
Change-Id: Ie44d51e044d4822e61d0e646fbc44aa8d9b90c1f
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/189559
Reviewed-by: Herbie Ong <herbie@google.com>
Rename build tag "proto1_legacy" -> "protolegacy"
to be consistent with the "protoreflect" tag.
Rename flag constant "Proto1Legacy" -> "ProtoLegacy" since
it covers more than simply proto1 legacy features.
For example, it covers alpha-features of proto3 that
were eventually removed from the final proto3 release.
Change-Id: I0f4fcbadd4b5a61c87645e2e5be11d187e59157c
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/189345
Reviewed-by: Damien Neil <dneil@google.com>
In 2014, when proto3 was being developed, there were a number of early
adopters of the new syntax. Before the finalization of proto3 when
it was released in open-source in July 2016, a decision was made to
strictly validate strings in proto3. However, some of the early adopters
were already using invalid UTF-8 with string fields.
The google.protobuf.FieldOptions.enforce_utf8 option only exists to support
those grandfathered users where they can opt-out of the validation logic.
Practical use of that option in open source is impossible even if a user
specifies the proto1_legacy build tag since it requires a hacked
variant of descriptor.proto that is not externally available.
This CL supports enforce_utf8 by modifiyng internal/filedesc to
expose the flag if it detects it in the raw descriptor.
We add an strs.EnforceUTF8 function as a centralized place to determine
whether to perform validation. Validation opt-out is supported
only in builds with legacy support.
We implement support for validating UTF-8 in all proto3 string fields,
even if they are backed by a Go []byte.
Change-Id: I9c0628b84909bc7181125f09db730c80d490e485
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/186002
Reviewed-by: Damien Neil <dneil@google.com>
The invalidExtensions flag no longer seems necessary. Tests pass without it.
Change-Id: Ieb35e26912b047718ccbfcdc926625aec1cd8c87
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/185937
Reviewed-by: Damien Neil <dneil@google.com>
This does not remove all dependencies,
but all of the cases where it can now be implemented in terms of v2.
Change-Id: Idc5b0273f0d35c284bf2141eb9cce998692ceb15
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/184878
Reviewed-by: Herbie Ong <herbie@google.com>
Immediately abort (un)marshal operations when encountering invalid UTF-8
data in proto3 strings. No other proto implementation supports non-UTF-8
data in proto3 strings (and many reject it in proto2 strings as well).
Producing invalid output is an interoperability threat (other
implementations won't be able to read it).
The case where existing string data is found to contain non-UTF8 data is
better handled by changing the field to the `bytes` type, which (aside
from UTF-8 validation) is wire-compatible with `string`.
Remove the errors.NonFatal type, since there are no remaining cases
where it is needed. "Non-fatal" errors which produce results and a
non-nil error are problematic because they compose poorly; the better
approach is to take an option like AllowPartial indicating which
conditions to check for.
Change-Id: I9d189ec6ffda7b5d96d094aa1b290af2e3f23736
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/183098
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Added API:
Message.Len
Message.Range
Message.Has
Message.Clear
Message.Get
Message.Set
Message.Mutable
Message.NewMessage
Message.WhichOneof
Message.GetUnknown
Message.SetUnknown
Deprecated API (to be removed in subsequent CL):
Message.KnownFields
Message.UnknownFields
The primary difference with the new API is that the top-level
Message methods are keyed by FieldDescriptor rather than FieldNumber
with the following semantics:
* For known fields, the FieldDescriptor must exactly match the
field descriptor known by the message.
* For extension fields, the FieldDescriptor must implement ExtensionType,
where ContainingMessage.FullName matches the message name, and
the field number is within the message's extension range.
When setting an extension field, it automatically stores
the extension type information.
* Extension fields are always considered nullable,
implying that repeated extension fields are nullable.
That is, you can distinguish between a unpopulated list and an empty list.
* Message.Get always returns a valid Value even if unpopulated.
The behavior is already well-defined for scalars, but for unpopulated
composite types, it now returns an empty read-only version of it.
Change-Id: Ia120630b4db221aeaaf743d0f64160e1a61a0f61
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/175458
Reviewed-by: Damien Neil <dneil@google.com>
Temporarily remove go.mod, since we can't generate an accurate one until
the corresponding v1 change is submitted.
Change-Id: I1e1ad97f2b455e33f61ffaeb8676289795e47e72
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/177000
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Add support for basic equality comparison of messages.
Messages are equal if they have the same type and marshal to the
same bytes with deterministic serialization, with some exceptions:
- Messages with different registered extensions are unequal.
- NaN is not equal to itself.
Unlike the v1 Equal, a nil message is equal to an empty message of
the same type.
Change-Id: Ibabdadd8c767b801051b8241aeae1ba077e58121
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/174277
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>
Was overwriting the output buffer rather than appending to it.
Change-Id: I6ffb72a440f464f4259cfebc42c1dc75b73fb5ae
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/171117
Reviewed-by: Joe Tsai <thebrokentoaster@gmail.com>