Skip to content

fix(migration): v200 thread_entity guard, PostgreSQL jsonb cast, self-loop cycle detection#27756

Open
yan-3005 wants to merge 3 commits intomainfrom
fix/v200-migration-thread-entity-jsonb
Open

fix(migration): v200 thread_entity guard, PostgreSQL jsonb cast, self-loop cycle detection#27756
yan-3005 wants to merge 3 commits intomainfrom
fix/v200-migration-thread-entity-jsonb

Conversation

@yan-3005
Copy link
Copy Markdown
Contributor

@yan-3005 yan-3005 commented Apr 27, 2026

Context

This PR fixes five bugs introduced or exposed during the v2.0.0 migration, all of which caused CI failures on both MySQL and PostgreSQL. The bugs compound each other: the workflow seeding failures (#4 and #5) were only visible because the migration bugs (#1–3) caused the server to restart mid-migration and hit edge cases on re-run.


Bug 1: thread_entity crash on migration re-run (MySQL + PostgreSQL)

Root cause: OpenMetadata uses two independent tracking systems for migrations:

  • Flyway tracks DDL SQL scripts in flyway_schema_history — each script is committed atomically and cannot be rolled back once applied.
  • Java migration state is tracked in server_change_log inside a transaction — if a later migration step fails, the transaction rolls back and the Java state reverts to "v200 not yet run."

The v2.0.0 postDataMigrationSQLScript.sql renames thread_entity → thread_entity_legacy after the Java data migration runs. If any subsequent migration throws and rolls back the server_change_log transaction, Flyway has already committed the rename permanently. On the next server startup, Java re-executes migrateThreadTasksToTaskEntity, which tries to SELECT FROM thread_entity — a table that no longer exists — and crashes.

This also affects upgrades from certain versions where thread_entity was never present.

Fix: Guard migrateThreadTasksToTaskEntity with a tableExists(handle, "thread_entity") check at the top of the method and return early if the table is absent.


Bug 2: PostgreSQL jsonb type mismatch — all 1032 suggestion inserts fail

Root cause: The v2.0.0 PostgreSQL schema declares the following columns as jsonb NOT NULL:

  • task_entity.json
  • activity_stream.json
  • activity_stream.domains

JDBI's .bind("fieldName", someString) always binds a Java String as SQL type varchar. PostgreSQL enforces strict type checking and does not implicitly cast varchar to jsonb, throwing:

ERROR: column "json" is of type jsonb but expression is of type character varying
  Hint: You will need to rewrite or cast the expression.

MySQL's json type is lenient and accepts string input without an explicit cast, so MySQL never surfaced this bug. PostgreSQL's strictness caused every single suggestion insert and every activity event insert to fail.

Fix: Pass ConnectionType through the call chain into insertTask and insertActivityEvent. When ConnectionType.POSTGRES, use an explicit ::jsonb cast in the SQL string (e.g., :json::jsonb). MySQL continues to use the plain :json binding.


Bug 3: MySQL migrateThreadTasksToTaskEntity silently disabled

Root cause: The MySQL migration had migrateThreadTasksToTaskEntity commented out as a temporary workaround after it caused CI failures:

// Causing issues with collate CI, needs to be fixed before enabling this migration @harshach
// migrateThreadTasksToTaskEntity(handle);

The actual root cause was the thread_entity crash described in Bug 1 — not the method itself. This meant MySQL deployments silently skipped migrating thread-based tasks to task_entity, resulting in missing task data for all MySQL installations.

Fix: Re-enable the method now that Bug 1's tableExists guard makes it safe to call. The commented-out code is removed.


Bug 4: BPMN self-loop edges falsely detected as cycles

Root cause: hasCycleDFS in WorkflowDefinitionRepository iterates over all outgoing edges of a node and recurses into each neighbor. It does not skip self-loop edges — edges where a node transitions to itself (e.g., AssignedStage → AssignedStage).

BPMN self-loops are a valid and intentional pattern for reassignment transitions: an incident can be reassigned to a different team while remaining in the AssignedStage state. The DFS treated this as a cycle and threw a validation exception when IncidentLifecycleWorkflow was seeded.

Fix: Skip the recursion when neighbor.equals(node) — a self-loop cannot form a multi-node cycle.


Bug 5: One failing seed entity aborts all subsequent seeds

Root cause: initSeedDataFromResources in EntityRepository iterates over workflow definitions alphabetically. Seed files are processed in alphabetical order, so IncidentLifecycleWorkflow is initialized before TableCertificationWorkflow. When Bug 4 caused IncidentLifecycleWorkflow initialization to throw an exception, the exception propagated out of the loop uncaught, aborting the entire seed operation. TableCertificationWorkflow and every other definition that came after were never seeded.

Fix: Wrap each entity initialization in a try-catch. Log the failure with context (entity type, FQN, error message and stack trace) and continue to the next entity. One bad workflow definition should not prevent all others from loading.


Tests

Added 7 new unit tests in MigrationUtilTest covering:

  • tableExists guard skips gracefully for migrateThreadTasksToTaskEntity (MySQL + PostgreSQL)
  • tableExists guard skips gracefully for migrateSuggestionsToTaskEntity (MySQL + PostgreSQL)
  • tableExists guard skips gracefully for migrateLegacyActivityThreadsToActivityStream (MySQL + PostgreSQL)
  • migrateThreadTasksToTaskEntity uses ::jsonb cast for PostgreSQL
  • migrateThreadTasksToTaskEntity does not use ::jsonb cast for MySQL

All 13 tests in the file pass.

Test plan

  • MySQL CI: v200 migration completes without thread_entity does not exist crash
  • PostgreSQL CI: v200 migration completes without column "json" is of type jsonb but expression is of type character varying
  • All unit tests in MigrationUtilTest pass (13 total)
  • IncidentLifecycleWorkflow seeds successfully on fresh install
  • TableCertificationWorkflow seeds successfully on fresh install
  • mvn spotless:apply -pl openmetadata-service reports no formatting issues

🤖 Generated with Claude Code

…-loop cycle detection

- Add tableExists guard to migrateThreadTasksToTaskEntity so it skips safely when
  thread_entity has been renamed to thread_entity_legacy by postDataMigration SQL,
  preventing crashes on fresh installs and migration re-runs
- Fix PostgreSQL jsonb type mismatch in migrateSuggestionsToTaskEntity and
  migrateLegacyActivityThreadsToActivityStream: JDBI binds String as varchar which
  PostgreSQL rejects for jsonb columns; use ::jsonb cast when ConnectionType is POSTGRES
- Re-enable migrateThreadTasksToTaskEntity for MySQL (was commented out as workaround)
  now that the tableExists guard makes it safe
- Fix hasCycleDFS in WorkflowDefinitionRepository to skip self-loop edges so
  BPMN reassignment cycles (AssignedStage -> AssignedStage) are not flagged as cycles,
  allowing IncidentLifecycleWorkflow to seed without error
- Fix initSeedDataFromResources in EntityRepository to continue seeding remaining
  entities when one entity fails, instead of aborting the entire seed operation
- Add unit tests covering MySQL/PostgreSQL variants for all three migration methods,
  tableExists guard, and jsonb cast behavior (13 tests total, all passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 27, 2026 09:31
@yan-3005 yan-3005 added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 27, 2026
@yan-3005 yan-3005 self-assigned this Apr 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens v200 migrations and seed/workflow initialization by making migrations resilient to schema drift (renamed tables), fixing PostgreSQL JSONB inserts, and relaxing workflow validation to allow intentional self-loop transitions.

Changes:

  • Add v200 migration guards and PostgreSQL ::jsonb casts for task/activity inserts via ConnectionType.
  • Re-enable MySQL thread-task migration with a thread_entity existence guard.
  • Improve workflow seeding robustness by skipping self-loop edges in cycle detection and continuing seed initialization after per-entity failures.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
openmetadata-service/src/main/java/org/openmetadata/service/migration/utils/v200/MigrationUtil.java Adds ConnectionType-aware ::jsonb binding and a thread_entity existence guard; threads/suggestions/activity migrations now pass connection type into inserts.
openmetadata-service/src/main/java/org/openmetadata/service/migration/postgres/v200/Migration.java Updates v200 Postgres migration to pass POSTGRES into updated MigrationUtil APIs.
openmetadata-service/src/main/java/org/openmetadata/service/migration/mysql/v200/Migration.java Updates v200 MySQL migration to pass MYSQL and re-enables thread-task migration call.
openmetadata-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java Adds tests for table-missing guards and attempts to cover Postgres/MySQL jsonb casting behavior.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/WorkflowDefinitionRepository.java Skips self-loop edges during DFS cycle detection to allow valid reassignment transitions.
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java Makes seed initialization resilient by catching/logging per-entity failures instead of aborting all seeding.

Comment on lines +114 to +127
void migrateThreadTasksUsesJsonbCastForPostgres() {
when(handle.createQuery("SELECT 1 FROM thread_entity LIMIT 1").mapTo(Integer.class).one())
.thenReturn(1);
when(handle
.createQuery(
"SELECT json FROM thread_entity WHERE type = 'Task' ORDER BY createdAt ASC")
.mapToMap()
.list())
.thenReturn(java.util.List.of());

assertDoesNotThrow(() -> MigrationUtil.migrateThreadTasksToTaskEntity(handle, POSTGRES));

verify(handle, never()).createUpdate(contains(":json,"));
}
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

migrateThreadTasksUsesJsonbCastForPostgres doesn’t currently validate the ::jsonb insert behavior because the mocked thread list is empty, so insertTask(...) is never reached and Handle.createUpdate(...) is never called. Consider returning a non-empty threads result (one row with a minimal json payload) and then asserting that the SQL passed to createUpdate contains :json::jsonb for POSTGRES.

Copilot generated this review using guidance from repository custom instructions.
… tests

- Fix tableExists to use findFirst() instead of one(): one() throws on an
  empty result set (0 rows from an existing but empty table), causing the
  guard to incorrectly return false and log "table does not exist"
- Replace vacuous jsonb cast tests with direct insertTask reflection tests:
  the previous tests mocked an empty thread list so insertTask was never
  reached, making assertions trivially true regardless of cast logic; now
  call insertTask directly via reflection so the ::jsonb cast path is
  actually exercised for POSTGRES and absent for MYSQL
- Update tableExists mock stubs in tests from .one() to .findFirst()
  to match the corrected implementation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@yan-3005
Copy link
Copy Markdown
Contributor Author

@gitar-bot Intentional design choice — each failure is already individually logged with full context (entity type, FQN, message, and full stack trace) via LOG.warn. Re-throwing only lastException would surface a single arbitrary failure while silently discarding the others, which is more misleading than the current behavior. If there's a systemic failure (e.g., DB connectivity loss), the N consecutive WARN lines in the logs make that obvious. The try-catch is specifically to prevent one bad workflow definition from blocking all subsequent ones from seeding — not to suppress errors.

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 27, 2026

@gitar-bot Intentional design choice — each failure is already individually logged with full context (entity type, FQN, message, and full stack trace) via LOG.warn. Re-throwing only lastException would surface a single arbitrary failure while silently discarding the others, which is more misleading than the current behavior. If there's a systemic failure (e.g., DB connectivity loss), the N consecutive WARN lines in the logs make that obvious. The try-catch is specifically to prevent one bad workflow definition from blocking all subsequent ones from seeding — not to suppress errors.

That's a fair point — individual WARN logs with full context per entity do provide better diagnostic coverage than surfacing a single arbitrary lastException. The suggestion was meant to guard against the silent-all-fail scenario, but you're right that re-throwing one exception while discarding the rest would be misleading in the general case, and consecutive WARN lines are sufficient signal for systemic failures. Thanks for the clarification.

Copilot AI review requested due to automatic review settings April 27, 2026 12:14
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 27, 2026

Code Review 👍 Approved with suggestions 1 resolved / 2 findings

Implements v200 thread entity guards and robust PostgreSQL jsonb casting, resolving previous issues with vacuous test results. Consider refining the seed data initialization to avoid swallowing errors that could mask underlying system failures.

💡 Edge Case: Seed data failures silently swallowed may hide systemic issues

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:1231-1240

The new try/catch in initSeedDataFromResources (line 1231) logs a warning and continues when any entity fails to initialize. While this is the stated intent (one bad workflow shouldn't block others), a systemic failure (e.g., database connectivity loss) would cause every entity to fail silently, and the system would start with no seed data and only WARN-level logs. Consider adding a threshold: if all entities fail, re-throw the last exception to surface the systemic issue.

Suggested fix
int failCount = 0;
Exception lastException = null;
for (T entity : entities) {
  try {
    initializeEntity(entity);
  } catch (Exception e) {
    failCount++;
    lastException = e;
    LOG.warn("Failed to initialize ...", ...);
  }
}
if (failCount == entities.size() && !entities.isEmpty()) {
  throw new RuntimeException(
    "All seed entities failed to initialize", lastException);
}
✅ 1 resolved
Quality: Jsonb cast tests pass vacuously with empty input data

📄 openmetadata-service/src/test/java/org/openmetadata/service/migration/utils/v200/MigrationUtilTest.java:114-128
The tests migrateThreadTasksUsesJsonbCastForPostgres (line 114) and migrateThreadTasksDoesNotUseJsonbCastForMysql (line 130) set up an empty list of threads via java.util.List.of(). Since no rows are returned, the migration loop body never executes and insertTask is never called. The verify(handle, never()).createUpdate(contains(...)) assertions pass trivially regardless of whether the jsonb cast logic is correct.

To meaningfully test the cast behavior, provide at least one valid thread row so insertTask is actually invoked, then assert on the SQL string passed to createUpdate.

🤖 Prompt for agents
Code Review: Implements v200 thread entity guards and robust PostgreSQL jsonb casting, resolving previous issues with vacuous test results. Consider refining the seed data initialization to avoid swallowing errors that could mask underlying system failures.

1. 💡 Edge Case: Seed data failures silently swallowed may hide systemic issues
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityRepository.java:1231-1240

   The new try/catch in `initSeedDataFromResources` (line 1231) logs a warning and continues when any entity fails to initialize. While this is the stated intent (one bad workflow shouldn't block others), a systemic failure (e.g., database connectivity loss) would cause *every* entity to fail silently, and the system would start with no seed data and only WARN-level logs. Consider adding a threshold: if all entities fail, re-throw the last exception to surface the systemic issue.

   Suggested fix:
   int failCount = 0;
   Exception lastException = null;
   for (T entity : entities) {
     try {
       initializeEntity(entity);
     } catch (Exception e) {
       failCount++;
       lastException = e;
       LOG.warn("Failed to initialize ...", ...);
     }
   }
   if (failCount == entities.size() && !entities.isEmpty()) {
     throw new RuntimeException(
       "All seed entities failed to initialize", lastException);
   }

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@sonarqubecloud
Copy link
Copy Markdown

@github-actions
Copy link
Copy Markdown
Contributor

🔴 Playwright Results — 2 failure(s), 17 flaky

✅ 3957 passed · ❌ 2 failed · 🟡 17 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 295 0 4 4
🔴 Shard 2 737 2 5 8
🟡 Shard 3 749 0 1 7
🟡 Shard 4 758 0 1 18
✅ Shard 5 687 0 0 41
🟡 Shard 6 731 0 6 8

Genuine Failures (failed on all attempts)

Features/Glossary/GlossaryWorkflow.spec.ts › should display correct status badge color and icon (shard 2)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="StatusBadgeTerm1777298857730"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="StatusBadgeTerm1777298857730"]').locator('.status-badge')�[22m
�[2m    18 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'305b18c0.Quick65384eff".StatusBadgeTerm1777298857730-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

Features/Glossary/GlossaryWorkflow.spec.ts › should start term as Draft when glossary has reviewers (shard 2)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoHaveText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator:  locator('[data-row-key*="DraftTerm1777298930604"]').locator('.status-badge')
Expected: �[32m"Draft"�[39m
Received: �[31m"In Review"�[39m
Timeout:  15000ms

Call log:
�[2m  - Expect "toHaveText" with timeout 15000ms�[22m
�[2m  - waiting for locator('[data-row-key*="DraftTerm1777298930604"]').locator('.status-badge')�[22m
�[2m    18 × locator resolved to <div class="status-badge inReview" data-testid=""PW%'6450bff8.Lively23ce3589".DraftTerm1777298930604-status">…</div>�[22m
�[2m       - unexpected value "In Review"�[22m

🟡 17 flaky test(s) (passed on retry)
  • Features/TeamsDragAndDrop.spec.ts › Should drag and drop team on table level (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Flow/SearchRBAC.spec.ts › User without permission (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/DataQuality/DataQuality.spec.ts › TestCase filters (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Search suggestions should be filtered by selected domain (shard 2, 1 retry)
  • Features/Glossary/GlossaryHierarchy.spec.ts › should cancel move operation (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Pages/Entity.spec.ts › Inactive Announcement create & delete (shard 4, 1 retry)
  • Pages/Lineage/DataAssetLineage.spec.ts › verify create lineage for entity - Topic (shard 6, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Login.spec.ts › Refresh should work (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › User as Owner Add, Update and Remove (shard 6, 1 retry)
  • Pages/TasksUIFlow.spec.ts › Verify task lifecycle in activity feed (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants