Skip to content

feat(rust): add configurable size guardrails#3579

Open
ayush00git wants to merge 3 commits intoapache:mainfrom
ayush00git:feat/rust-sizeguards
Open

feat(rust): add configurable size guardrails#3579
ayush00git wants to merge 3 commits intoapache:mainfrom
ayush00git:feat/rust-sizeguards

Conversation

@ayush00git
Copy link
Copy Markdown
Contributor

@ayush00git ayush00git commented Apr 16, 2026

Why?

To prevent excessive allocation from malicious untrusted payloads in the Rust runtime.

What does this PR do?

This brings the Rust implementation into parity with the C++ runtime by introducing configurable guardrails for binary sizes and collection counts.

Related issues

#3409

AI Contribution Checklist

  • Substantial AI assistance was used in this PR: yes / no
  • If yes, I included a completed AI Contribution Checklist in this PR description and the required AI Usage Disclosure.
  • Substantial AI assistance was used in this PR: yes / no
  • If yes, I included the standardized AI Usage Disclosure block below.
  • If yes, I can explain and defend all important changes without AI help.
  • If yes, I reviewed AI-assisted code changes line by line before submission.
  • If yes, I completed line-by-line self-review first and fixed issues before requesting AI review.
  • If yes, I ran two fresh AI review agents on the current PR diff or current HEAD after the latest code changes: one using .claude/skills/fory-code-review/SKILL.md and one without that skill.
  • If yes, I addressed all AI review comments and repeated the review loop until both ai reviewers reported no further actionable comments.
  • If yes, I attached screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes in this PR body.
  • If yes, I ran adequate human verification and recorded evidence (checks run locally or in CI, pass/fail summary, and confirmation I reviewed results).
  • If yes, I added/updated tests and specs where required.
  • If yes, I validated protocol/performance impacts with evidence when applicable.
  • If yes, I verified licensing and provenance compliance.

AI Usage Disclosure (only when substantial AI assistance = yes):

  • If yes, my PR description includes the required ai_review summary and screenshot evidence of the final clean AI review results from both fresh reviewers on the current PR diff or current HEAD after the latest code changes.
The implementation looks solid, matches the C++ reference implementation accurately, and is ready for the main branch. I've reviewed your code changes across config.rs, context.rs, and the serializer module, verifying edge cases and consistency.

Here are the key aspects that verify the correctness of the Rust implementation:

Guardrail Logic Parity with C++:

max_collection_size correctly intercepts untrusted payload lengths before large iterations or memory allocations occur in Vec, HashMap, BTreeMap, and other collection variants. This perfectly encapsulates the C++ read_collection_data_slow patterns.
max_binary_size accurately targets raw-byte slice operations across primitive_list.rs without inappropriately restricting types like String, mirroring how C++ omits this check inside string_serializer.h.
Accurate Error Propagation:

The addition of Error::SizeLimitExceeded works brilliantly with the new Fory configuration model. You're properly halting deserialization and passing clear payload dimensions to Error::size_limit_exceeded, ensuring graceful failure without panics or memory exhaustion.
Validation and Pre-Allocation Consistency:

Checking max_collection_size just prior to enforcing buffer limits inside check_collection_len and check_map_len is well-ordered. By prioritizing the runtime limit checks, you avoid false OOMs or deep iterations if Fory receives heavily corrupted payload headers.

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change? ((Yes, adds Fory configuration methods max_binary_size() and max_collection_size(), as well as Error::SizeLimitExceeded))
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

@ayush00git
Copy link
Copy Markdown
Contributor Author

Hey @chaokunyang
Have a look at the implementation and let me know the changes.
I don't write rust, but these issues (including streaming deserialization support) were pending from a long time, so I used AI to code but I have reviewed it.
Duplicate PR - #3421

@chaokunyang
Copy link
Copy Markdown
Collaborator

@ayush00git Could you run benchmarks/rust and compare with main branch?

@ayush00git
Copy link
Copy Markdown
Contributor Author

main branch -
image

feat/rust-sizeguards branch -
image

@ayush00git
Copy link
Copy Markdown
Contributor Author

some areas like MediaContentList serialization/deserialization, Sample serialization and StructList deserialization are showing regressions averagely of around 20%, i'll investigate these ones. most probably this is due to field type validations.

@ayush00git
Copy link
Copy Markdown
Contributor Author

StructList and MediaContentList serialize calls still shows around 10% regression

feat/rust-sizeguards

## Benchmark Results

### Timing Results (nanoseconds)

| Datatype         | Operation   | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct           | Serialize   | 68.2      | 122.5         | fory    |
| Struct           | Deserialize | 37.9      | 64.8          | fory    |
| Sample           | Serialize   | 102.9     | 566.3         | fory    |
| Sample           | Deserialize | 162.6     | 868.7         | fory    |
| MediaContent     | Serialize   | 219.4     | 332.2         | fory    |
| MediaContent     | Deserialize | 280.4     | 599.8         | fory    |
| StructList       | Serialize   | 192.0     | 606.2         | fory    |
| StructList       | Deserialize | 143.2     | 444.3         | fory    |
| SampleList       | Serialize   | 391.3     | 4002.1        | fory    |
| SampleList       | Deserialize | 1279.0    | 4939.9        | fory    |
| MediaContentList | Serialize   | 856.0     | 2501.9        | fory    |
| MediaContentList | Deserialize | 1676.1    | 3206.9        | fory    |

### Throughput Results (ops/sec)

| Datatype         | Operation   | fory TPS   | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct           | Serialize   | 14,665,552 | 8,161,267    | fory    |
| Struct           | Deserialize | 26,369,222 | 15,434,242   | fory    |
| Sample           | Serialize   | 9,721,007  | 1,765,880    | fory    |
| Sample           | Deserialize | 6,151,575  | 1,151,198    | fory    |
| MediaContent     | Serialize   | 4,558,716  | 3,010,144    | fory    |
| MediaContent     | Deserialize | 3,565,952  | 1,667,167    | fory    |
| StructList       | Serialize   | 5,208,605  | 1,649,702    | fory    |
| StructList       | Deserialize | 6,985,191  | 2,250,883    | fory    |
| SampleList       | Serialize   | 2,555,323  | 249,869      | fory    |
| SampleList       | Deserialize | 781,861    | 202,433      | fory    |
| MediaContentList | Serialize   | 1,168,170  | 399,696      | fory    |
| MediaContentList | Deserialize | 596,623    | 311,828      | fory    |

main

## Benchmark Results

### Timing Results (nanoseconds)

| Datatype         | Operation   | fory (ns) | protobuf (ns) | Fastest |
| ---------------- | ----------- | --------- | ------------- | ------- |
| Struct           | Serialize   | 67.5      | 123.3         | fory    |
| Struct           | Deserialize | 38.3      | 63.4          | fory    |
| Sample           | Serialize   | 101.4     | 561.7         | fory    |
| Sample           | Deserialize | 165.6     | 919.2         | fory    |
| MediaContent     | Serialize   | 213.0     | 332.2         | fory    |
| MediaContent     | Deserialize | 281.9     | 568.0         | fory    |
| StructList       | Serialize   | 175.2     | 678.8         | fory    |
| StructList       | Deserialize | 141.8     | 453.0         | fory    |
| SampleList       | Serialize   | 448.6     | 3831.5        | fory    |
| SampleList       | Deserialize | 1347.9    | 4977.6        | fory    |
| MediaContentList | Serialize   | 759.1     | 2429.7        | fory    |
| MediaContentList | Deserialize | 1665.3    | 3674.4        | fory    |

### Throughput Results (ops/sec)

| Datatype         | Operation   | fory TPS   | protobuf TPS | Fastest |
| ---------------- | ----------- | ---------- | ------------ | ------- |
| Struct           | Serialize   | 14,815,693 | 8,109,642    | fory    |
| Struct           | Deserialize | 26,132,177 | 15,766,902   | fory    |
| Sample           | Serialize   | 9,864,852  | 1,780,215    | fory    |
| Sample           | Deserialize | 6,040,471  | 1,087,903    | fory    |
| MediaContent     | Serialize   | 4,695,056  | 3,009,782    | fory    |
| MediaContent     | Deserialize | 3,547,861  | 1,760,563    | fory    |
| StructList       | Serialize   | 5,707,437  | 1,473,231    | fory    |
| StructList       | Deserialize | 7,052,684  | 2,207,652    | fory    |
| SampleList       | Serialize   | 2,229,008  | 260,994      | fory    |
| SampleList       | Deserialize | 741,895    | 200,900      | fory    |
| MediaContentList | Serialize   | 1,317,402  | 411,573      | fory    |
| MediaContentList | Deserialize | 600,492    | 272,153      | fory    |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants