Add pet clustering Rust modules#10207
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e46cbc26e8
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| let emb = match vdb.get_vector(face.vector_id as u64) { | ||
| Ok(v) => v, | ||
| Err(_) => continue, | ||
| }; |
There was a problem hiding this comment.
Treat missing face vectors as unclustered or error
When vdb.get_vector fails, this loop silently continues, so those new_faces never make it into inputs; later, n_unclustered is derived from inputs (or set to 0 when inputs is empty). In stale/corrupt index scenarios this drops faces from both assignments and n_unclustered, so callers get a successful response that undercounts unclustered items and loses data. Please either fail fast on missing vectors or explicitly count skipped faces as unclustered (the same pattern appears in run_pet_clustering_incremental_exemplars_from_index).
Useful? React with 👍 / 👎.
| debug_assert_eq!(a.len(), b.len()); | ||
| let mut score = 0.0_f64; | ||
| unsafe { | ||
| simsimd_dot_f32(a.as_ptr(), b.as_ptr(), a.len() as u64, &mut score); |
There was a problem hiding this comment.
Validate embedding lengths before unsafe SIMD dot
The only length check here is debug_assert_eq!, which is compiled out in release builds, but simsimd_dot_f32 is then called with n = a.len() unconditionally. If any centroid/exemplar from FFI has a different dimension than a (including empty/truncated vectors), this can read past b and trigger undefined behavior or crashes in production. Add a runtime length guard before the unsafe call and handle mismatches as a non-match/error.
Useful? React with 👍 / 👎.
Summary
Verification
Notes