Skip to content

Feature: dry run support for deploy command#106

Open
vasu2856 wants to merge 16 commits intoaws:mainfrom
vasu2856:feature/dry_run
Open

Feature: dry run support for deploy command#106
vasu2856 wants to merge 16 commits intoaws:mainfrom
vasu2856:feature/dry_run

Conversation

@vasu2856
Copy link
Copy Markdown
Contributor

feat(dry-run): add deploy dry-run validation engine with comprehensive checkers

  • Add DryRunEngine orchestrating phase-specific validation checkers
  • Implement 11 specialized checkers (bootstrap, bundle, catalog, connectivity, dependency, git, manifest, permission, project, quicksight, storage, workflow)
  • Add PermissionChecker using iam:SimulatePrincipalPolicy for IAM validation
  • Add DependencyChecker validating pre-existing AWS resources and DataZone types
  • Add DryRunReport model collecting findings classified as OK/WARNING/ERROR
  • Integrate dry-run as pre-deployment validation step in deploy command
  • Add --dry-run flag for standalone validation mode
  • Add --skip-validation flag to bypass pre-deployment checks
  • Add --output option for JSON report generation
  • Include comprehensive unit and integration tests for all checkers
  • Add design, requirements, and testing documentation
  • Update CLI and project dependencies for dry-run support

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

vasu2856 and others added 15 commits February 25, 2026 15:31
- Add comprehensive catalog import/export guide with step-by-step instructions
- Add quick reference guide for common catalog operations
- Refine design.md to clarify manifest schema with separate subsections for assets, glossaries, data products, and metadata forms
- Remove schedule asset special handling from design (deferred to future implementation)
- Update architecture diagrams to reflect new manifest configuration structure
- Clarify CatalogExporter API routing for data products and metadata forms
- Improve docstring documentation for export_catalog function with parameter descriptions
- Simplify manifest schema to use single `enabled` boolean instead of granular resource type filters
- Add `publish` flag to automatically publish assets and data products during deployment
- Update CatalogExporter to export ALL project-owned resources with optional `--updated-after` filtering
- Enhance IdentifierMapper to use externalIdentifier with normalization as primary lookup, falling back to name-based matching
- Add support for asset and data product publishing in CatalogImporter
- Clarify dependency ordering and include delete operations in import summary
- Update architecture documentation to reflect simplified configuration approach
- Streamline design diagrams to show complete resource flow and identifier mapping strategy
… support

- Expand Multi-Environment section to explain independent project/domain targeting per stage
- Update architecture diagram to show optional separate domains for dev/test/prod stages
- Add new "Multi-Domain and Multi-Project Architecture" section with use cases
- Include configuration example showing domain_id per stage
- Document use cases: organizational boundaries, compliance, multi-tenant, cross-account
- Update DataZone Helper documentation to reflect multi-domain support
- Enhance sequence and flow diagrams to show domain resolution per stage
- Clarify that each deployment stage can target independent projects in independent domains
- Add multi-domain configuration section with YAML examples
- Improve validation phase documentation to include multi-domain verification
… filtering

- Add .config.kiro file to establish spec metadata and workflow type
- Clarify that manifest contains NO filter options — only enabled, publish, and assets.access
- Document that --updated-after is a CLI-only flag on bundle command, not a manifest field
- Update design diagrams to show CLI flag as separate input to CatalogExporter
- Emphasize uniform filtering across ALL resource types via CLI timestamp
- Refine CatalogExporter docstring to clarify filter source and scope
- Update internal helper documentation to note filters come from CLI only
- Improve quick reference and guide documentation for clarity on filtering behavior
… graph and simplify tasks

- Update dependency graph to include Data Products as final resource type
- Revise creation order to place Data Products after Assets
- Revise deletion order to place Data Products first (reverse dependency)
- Add clarification that Data Products reference Assets
- Consolidate and simplify task descriptions for catalog export/import implementation
- Add new examples directory with README for catalog import/export workflows
- Update quick reference guide with streamlined information
- Reflect simplified manifest schema with only enabled, publish, and assets.access fields
…CI/CD integration

- Add CatalogExporter helper to query and serialize DataZone resources (Glossaries, GlossaryTerms, FormTypes, AssetTypes, Assets, Data Products)
- Add CatalogImporter helper to import and optionally publish exported catalog resources with identifier mapping
- Extend application manifest schema with catalog configuration (enabled, skipPublish, assets.access)
- Add --updated-after CLI flag to filter exported resources by modification timestamp
- Integrate catalog export into bundle command and import into deploy command
- Add catalog-import-export GitHub Actions workflow for automated deployment
- Add comprehensive integration tests for export, import, and round-trip scenarios
- Add unit tests for catalog helpers and manifest configuration
- Add example manifest and seed data script for catalog import/export demonstration
- Update documentation with catalog import/export guide and quick reference
- Preserve source publish state (listingStatus) during export and conditionally republish on import
…ith API filtering and edge case handling

- Update design documentation to clarify Search API ownership filtering and SearchTypes API client-side filtering requirements
- Document get_asset API enrichment for full asset details including formsOutput
- Correct listingStatus value from "LISTED" to "ACTIVE" for published state detection
- Add comprehensive testing guide covering export, import, and round-trip scenarios
- Expand integration tests with edge case coverage including disabled catalog and skip-publish manifests
- Add sample test fixtures (connections, workflows, code) for integration test scenarios
- Enhance unit tests for catalog export/import properties and DataZone property handling
- Update CLI, bundle, and deploy commands to support refined catalog operations
- Improve catalog export and import helper implementations with better error handling and filtering logic
- Update example documentation and seed data scripts with latest catalog patterns
…orm normalization specs

- Add `_resolve_target_data_source()` helper to match data sources by type and database name with fallback priority
- Add `_normalize_forms_input_for_api()` helper to remap form identifiers and data source references for target domain
- Add Requirement 5.15 for DataSourceReferenceForm remapping during import with database name extraction from GlueTableForm
- Add Property 18 validation for data source remapping with matching priority and fallback behavior
- Add edge case handling for missing data sources and JSON parse failures in error scenarios table
- Update task requirements list to include Requirement 5.15
- Update multilingual README translations (fr, he, it, ja, pt, zh) to reflect new functionality
- Update catalog import/export guides with data source remapping documentation
- Implement form normalization in `catalog_import.py` and deploy command integration
- Add "Back to Main README" navigation link to French README
- Add "Back to Main README" navigation link to Hebrew README
- Add "Back to Main README" navigation link to Italian README
- Add "Back to Main README" navigation link to Japanese README
- Add "Back to Main README" navigation link to Chinese README
- Improves navigation between main and translated documentation pages
…utility

- Rename _check_import_permissions to _ensure_import_permissions to reflect new behavior
- Add _POLICY_DETAIL_KEY mapping for policy type to detail key conversion
- Implement automatic policy grant creation via add_policy_grant when grants are missing
- Update permission check logic to attempt adding missing grants before failing
- Change return value from missing grants list to failed grants list
- Add comprehensive logging for grant checking and addition attempts
- Create cleanup_catalog_resources.py integration test utility to remove project-owned resources
- Update error messaging to clarify that grants are added automatically when possible
- Improve docstrings to document the new auto-grant behavior
…e checkers

- Add DryRunEngine orchestrating phase-specific validation checkers
- Implement 11 specialized checkers (bootstrap, bundle, catalog, connectivity, dependency, git, manifest, permission, project, quicksight, storage, workflow)
- Add PermissionChecker using iam:SimulatePrincipalPolicy for IAM validation
- Add DependencyChecker validating pre-existing AWS resources and DataZone types
- Add DryRunReport model collecting findings classified as OK/WARNING/ERROR
- Integrate dry-run as pre-deployment validation step in deploy command
- Add --dry-run flag for standalone validation mode
- Add --skip-validation flag to bypass pre-deployment checks
- Add --output option for JSON report generation
- Include comprehensive unit and integration tests for all checkers
- Add design, requirements, and testing documentation
- Update CLI and project dependencies for dry-run support
@vasu2856 vasu2856 requested review from Shnekit and abaror75 March 25, 2026 01:45

### Requirement 1: Dry Run CLI Option

**User Story:** As a DevOps engineer, I want to pass a `--dry-run` flag to the deploy command, so that I can preview the deployment without making changes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to make sure --dry-run command works with ReadOnly permissions. Some big customers with strong permission management might ask for that.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I guess it is fine, since the tool is also validating IAM permissions for deployment


## Introduction

The Deploy Dry Run feature adds a `--dry-run` option to the existing `smus-cicd deploy` command. When enabled, the CLI walks through every phase of the deployment pipeline — manifest loading, bundle exploration, project initialization, storage deployment, git deployment, catalog import, QuickSight dashboard deployment, workflow creation, and bootstrap actions — without creating, modifying, or deleting any actual resources. It also proactively verifies IAM permissions, S3 bucket accessibility, DataZone domain/project reachability, and catalog asset availability, producing a structured report of what would happen and any issues detected. The goal is to let operators confirm a deployment will succeed before committing to it, avoiding partial deployment failures.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, are we saying

If the --dry-run deploy does not fail, the actual deploy will not fail

?

We might want to stay away from such claims because a lot of things (like notebooks for example) are in customer control and we cannot predict whether these will execute okay.

#### Acceptance Criteria

1. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL load and parse the Manifest file and report any YAML syntax or schema validation errors.
2. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL resolve the Target_Stage and verify that the specified domain, project, and Deployment_Configuration sections are present and well-formed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I am wondering if describe command is still useful at this point haha. It might be doing a subset of things that the --dry-run is going to do. Might worth exploring deleting that command in the future

1. WHEN a bundle archive path is provided, THE Dry_Run_Engine SHALL open the bundle archive and enumerate all files contained within it.
2. WHEN a bundle archive path is not provided, THE Dry_Run_Engine SHALL attempt to locate the bundle in the `./artifacts` directory using the same resolution logic as the deploy command.
3. THE Dry_Run_Engine SHALL verify that each storage item referenced in the Deployment_Configuration has corresponding files in the bundle or on the local filesystem.
4. THE Dry_Run_Engine SHALL verify that each git item referenced in the Deployment_Configuration has corresponding content in the bundle or is accessible via the configured repository URL.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still do not have a github workflow for git stuff. We should add it to verify this.

2. WHEN dry-run mode is active, THE Permission_Checker SHALL verify that the current IAM identity has DataZone permissions (`datazone:GetDomain`, `datazone:GetProject`, `datazone:SearchListings`) required for the target domain and project.
3. WHEN the Deployment_Configuration includes catalog assets, THE Permission_Checker SHALL verify that the current IAM identity has catalog import permissions (`datazone:CreateAsset`, `datazone:CreateGlossary`, `datazone:CreateGlossaryTerm`, `datazone:CreateFormType`).
4. WHEN the manifest configures IAM role creation or update, THE Permission_Checker SHALL verify that the current IAM identity has `iam:CreateRole`, `iam:AttachRolePolicy`, and `iam:PutRolePolicy` permissions.
5. WHEN the manifest configures QuickSight dashboard deployment, THE Permission_Checker SHALL verify that the current IAM identity has QuickSight permissions (`quicksight:DescribeDashboard`, `quicksight:CreateDashboard`, `quicksight:UpdateDashboard`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually another QuickSightServiceRole which is used by default to perform dashboard refresh where I faced issues last time: https://github.com/aws/CICD-for-SageMakerUnifiedStudio/tree/main/examples/analytic-workflow/dashboard-glue-quick#quicksight-dataset-refresh-fails

We might want to figure out, if we can set a different role for it in the examples

2. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL simulate storage deployment and report the target S3 bucket, prefix, and file count for each storage item.
3. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL simulate git deployment and report the target connection, repository, and file count for each git item.
4. WHEN dry-run mode is active AND the bundle contains catalog export data, THE Dry_Run_Engine SHALL simulate catalog import and report the count and types of catalog resources that would be created, updated, or deleted.
5. WHEN dry-run mode is active AND the manifest configures QuickSight dashboards, THE Dry_Run_Engine SHALL simulate QuickSight deployment and report which dashboards would be exported and imported.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering: do we have any more "special" resources like QuickSight? Does it make sense to create a unique logic for handling it?


**User Story:** As a DevOps engineer, I want the dry run to verify that target AWS resources are reachable, so that I can detect network or configuration issues before deployment.

#### Acceptance Criteria
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add necessary connection checks to this as well?

- Add form type status validation in target domain to detect DISABLED form types
- Implement _check_disabled_form_types method to query DataZone API for form type status
- Resolve target_domain_id and target_region in dry-run engine after manifest validation
- Add target_domain_id and target_region fields to DryRunContext for downstream checkers
- Update catalog_import to re-enable DISABLED form types during import via create_form_type upsert
- Add add_policy_grants integration test utility for catalog import workflows
- Enhance cleanup_catalog_resources with improved resource deletion handling
- Emit WARNING findings when form types exist but are DISABLED in target environment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants