Skip to content

Add pet clustering Rust modules#10207

Open
Amrithesh-Kakkoth wants to merge 1 commit intoente-io:mainfrom
Amrithesh-Kakkoth:pet-clustering-rust-pr
Open

Add pet clustering Rust modules#10207
Amrithesh-Kakkoth wants to merge 1 commit intoente-io:mainfrom
Amrithesh-Kakkoth:pet-clustering-rust-pr

Conversation

@Amrithesh-Kakkoth
Copy link
Copy Markdown
Contributor

Summary

  • add the Rust pet clustering module to the shared photos crate
  • add the V2 pet clustering path and wire the mobile Rust API entrypoints to it
  • export the pet clustering module from the pet ML package

Verification

  • cargo fmt --all --check (rust/photos)
  • cargo test ml::pet::cluster_v2 -- --nocapture (rust/photos)
  • cargo clippy --all-targets -- -D warnings (rust/photos)
  • cargo test api::ml_indexing_api::tests -- --nocapture (mobile/apps/photos/rust)
  • cargo clippy --all-targets -- -D warnings (mobile/apps/photos/rust)

Notes

  • mobile/apps/photos/rust cargo fmt --all --check still fails on pre-existing formatting in usearch_api.rs on main, so that file was intentionally left out of this PR

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e46cbc26e8

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +690 to +693
let emb = match vdb.get_vector(face.vector_id as u64) {
Ok(v) => v,
Err(_) => continue,
};
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat missing face vectors as unclustered or error

When vdb.get_vector fails, this loop silently continues, so those new_faces never make it into inputs; later, n_unclustered is derived from inputs (or set to 0 when inputs is empty). In stale/corrupt index scenarios this drops faces from both assignments and n_unclustered, so callers get a successful response that undercounts unclustered items and loses data. Please either fail fast on missing vectors or explicitly count skipped faces as unclustered (the same pattern appears in run_pet_clustering_incremental_exemplars_from_index).

Useful? React with 👍 / 👎.

Comment on lines +16 to +19
debug_assert_eq!(a.len(), b.len());
let mut score = 0.0_f64;
unsafe {
simsimd_dot_f32(a.as_ptr(), b.as_ptr(), a.len() as u64, &mut score);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Validate embedding lengths before unsafe SIMD dot

The only length check here is debug_assert_eq!, which is compiled out in release builds, but simsimd_dot_f32 is then called with n = a.len() unconditionally. If any centroid/exemplar from FFI has a different dimension than a (including empty/truncated vectors), this can read past b and trigger undefined behavior or crashes in production. Add a runtime length guard before the unsafe call and handle mismatches as a non-match/error.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant