Compare commits

..

95 Commits

Author SHA1 Message Date
DXC
b2435d66c3 fix: 终极状态回滚 + 三个 panel 短路径导包修复
- water_quality_gui_v2.py:回滚机制由 panel_registry.get_tab_index 改为基于 PANEL_REGISTRY[current_tab_idx]['step_id'] 的索引反查,消除不存在的 get_tab_index 导致回滚期间二次崩溃并被静默吞噬的幽灵跳步根因
2026-06-18 14:55:14 +08:00
DXC
e5bb9c5cd9 fix: 懒加载导航容错 + 状态回滚机制 + Step 11 NameError 修复
导航容错(v2._on_step_list_changed)

- try-except 包裹懒加载 + setCurrentIndex,避免导航崩溃导致左右脱节

- 新增状态回滚机制:右侧切换失败时强制将左侧导航栏选中项退回当前显示 tab_index 对应的 step,消除"幽灵跳转/假死"(blockSignals 防 setCurrentRow 二次触发回调)

Step 11 NameError

- step11_map_panel 顶部补 import get_step_output_path,消除 update_from_config 中 NameError
2026-06-18 13:59:30 +08:00
DXC
f93dbeb848 fix: Step 7 路由 + 发送端字符串对齐(消除 KeyError)+ 配套防御性补强
核心修复

- panel_registry: step7_index class_ref 换绑 Step7View(最小侵入式对位,保持 step_id 保护全项目 18 处下游引用)

- step7_view._on_run_single_clicked: 'step7' → 'step7_index'(wrapped_config key + step_name 同步对齐,消除 PipelineScheduler 抛 KeyError "未注册的步骤: 'step7'")

配套防御性补强

- pipeline_executor.run_single_step_handler: 后台 is_running 时改 QMessageBox 警告 + LogMessage,防多次点击死锁

- worker_thread.run_single_step: 兼容嵌套/扁平 config 格式,嵌套子 dict 为空时回退扁平读取

- 公式 ListWidget layout 修复:setUniformItemSizes + stretch=1 + update(),消除 step7_view 加载后坍塌/不刷新
2026-06-18 13:59:20 +08:00
DXC
f61a3dfb1d feat: 懒加载面板Catch-up状态回放 + 上一步/下一步向导导航按钮 2026-06-18 11:18:37 +08:00
DXC
d6c003a211 fix: Step7 UI坍塌修复+EventBus打通 + DRY抽离spxy/ks + GridSearchCV→RandomizedSearchCV + smoke test死链修复 2026-06-18 11:18:27 +08:00
DXC
3ee4e90b31 fix: step_default_outputs 支持列表候选文件名,解决动态命名导致 OutputUpdated 断链 2026-06-18 10:36:52 +08:00
DXC
3f217e95b0 fix: 修复下游面板自动填充断裂的三处根因 + 清理过时 pipeline→panel 映射 2026-06-18 10:28:14 +08:00
DXC
2261b4b30e feat: Step1~Step14 面板单步按钮 EventBus 解耦 + Handler 补全(Step8~Step14)+ 旧上帝类删除
- 9 个面板(step1~step6/step8_ml_train/step8_qaa/step9_ml_predict/step10)单步执行按钮从 parent 链上溯改为 global_event_bus.publish('RequestRunSingleStep')

- PipelineExecutor 新增 _on_request_run_single_step 订阅

- 新增 Handler: step8_ml_train / step9_ml_predict / step10_qaa_inversion / step11_concentration / step12_kriging / step13_visualization / step14_report

- 删除旧 water_quality_inversion_pipeline_GUI.py(上帝类已肢解完毕)
2026-06-18 09:19:51 +08:00
DXC
2d45610aa6 fix: 导航状态锁防抖 + 运行按钮中间槽函数(修复乱跳与无响应双 Bug)
Bug1 导航乱跳:_on_step_list_changed 先调 get_panel 触发懒加载再 setCurrentIndex,避免 removeTab/insertTab 索引偏移导致跳页错乱。_on_tab_changed 加 _is_syncing 守卫斩断乒乓效应。Bug2 按钮无响应:新增 _on_run_all_clicked 中间槽函数替代直接连接,内含 print 探针 + try/except 兜底。
2026-06-18 09:01:25 +08:00
DXC
f1cc339d4a feat: Step2~Step7 Handler 批量生成 + WorkerThread 接入新调度器
- 新增 6 个 Handler:Step2GlintDetection / Step3GlintRemoval / Step4Sampling / Step5ProcessCsv / Step6ExtractSpectra / Step7CalcIndices

- 新增 register_handlers.py:register_all_handlers() 一键注册 Step1~Step7

- 更新 __init__.py:导出全部 7 个 Handler

- 重构 worker_thread.py:移除旧 WaterQualityInversionPipeline 导入,改用 PipelineScheduler + register_all_handlers

- run_single_step 改为 scheduler.run_step() 调用,保留外部模型透传逻辑
2026-06-17 18:02:31 +08:00
DXC
f6455b71ba fix: PanelFactory 信号风暴修复 + 后端上帝类肢解(BaseStepHandler/调度器/Step1打样) 2026-06-17 17:48:40 +08:00
DXC
39e8c29913 refactor: 实现第二批 Manager(LogManager/ConfigManager/DialogService/TrainingModeManager)
- log_manager.py: 日志区+进度条+清空按钮封装,内部订阅 LogMessage/ProgressUpdate

- config_manager.py: 配置读写(new/load/save/get_current_config),懒加载安全(未加载面板返回 {})

- dialog_service.py: 纯展示弹窗封装(Pipeline状态/关于/AI设置)

- training_mode_manager.py: 训练模式切换,发布 TrainingModeChanged 事件

- water_quality_gui_v2.py: 725→605 行,菜单回调全部委托给 Manager,移除 _create_log_panel/_create_progress_panel/_on_log_message/_on_progress_update
2026-06-17 17:35:27 +08:00
DXC
19c86e6e44 refactor: 实现纯壳主窗口 + 第一批 Manager(PanelFactory/PipelineExecutor/WorkspaceInitializer)
- water_quality_gui_v2.py: 纯壳主窗口(725行),依赖注入链 + EventBus 驱动,7个菜单连线

- panel_factory.py: PanelFactory 懒加载工厂(占位页替换 + 邻接预加载)

- pipeline_executor.py: PipelineExecutor 核心调度(接管 run_full_pipeline/run_single_step/stop_pipeline)

- workspace_initializer.py: WorkspaceInitializer 环境初始化(接管 init_workspace/set_work_directory/auto_populate_all)
2026-06-17 17:18:15 +08:00
DXC
bb5c2a50f8 refactor: 引入 EventBus 事件总线,实现各步骤面板间的去中心化自动参数传导,完成最终解耦 2026-06-17 16:27:26 +08:00
DXC
a58744cfbb refactor: 建立动态面板注册表,消除硬编码,实现步骤界面的数据驱动渲染与依赖路由 2026-06-17 16:02:17 +08:00
DXC
1949711cda refactor: 提取 WorkspaceManager,将文件扫描与路径业务逻辑从主 GUI 解耦 2026-06-17 15:35:02 +08:00
DXC
191a4b681d refactor: 移除主界面重复代码,复用现有组件并彻底抽离图像控件 2026-06-17 15:16:19 +08:00
DXC
91881d564a fix: 修复 Step8 模型生成路径错误及特征分离未过滤坐标列导致的 0 模型 Bug
- view 层 Step8View.update_work_directory 不再生成 <work_dir>/indices/<basename>_indices.csv,改为生成标准的 <work_dir>/8_Modeling/ 模型存放目录;FileSelectWidget 标签与文件过滤器同步调整为目录语义(输出模型目录 / All Files (*.*)),消除'保存目录被存成 csv 文件'导致的 train_models 跳过判定。
2026-06-17 14:15:34 +08:00
DXC
c2740c2bde fix(step7): 修复 Step7 未默认加载内置水质指数公式表的问题
- init_ui 末尾自动调用 _auto_load_formulas 读取 src/gui/model/waterindex.csv
  并填充 formula_list(默认全选),用户进 tab 即可见/可勾选公式。
- 新增 _resolve_formula_csv_path helper:兼容 PyInstaller _MEIPASS /
  frozen _internal / 开发环境(__file__ 倒推 3 层到 src/)三种位置。
- get_config 新增 formula_csv_file 字段——之前缺失导致 step7_service
  拿到空路径必报 "未提供公式 CSV 路径(formula_csv_file)" 错误,
  进而导致下游 step8/9 因为拿不到 features 而生成 0 个模型。
- 新增 4 个 _on_select_* 按钮回调(_auto_load_formulas 后启用);
  set_config 同步兼容 formula_csv_file 键。
- 删除未使用的 _resolve_subdir helper(早前从旧 panel 搬过来但新 view 未引用)。

验证:
  AST 解析 320 行 OK
  helper 解析到 D:\...\src\gui\model\waterindex.csv exists=True
  pandas 解析 CSV 63 行(45 ratio + 18 concentration)OK
  BaseView 契约方法 init_ui/get_config/set_config/update_work_directory/
  _on_run_clicked 全部保留
2026-06-17 14:06:15 +08:00
DXC
b3a6855881 fix: 补齐 Step6 缺失的水体与耀斑掩膜自动传导链路
MainView._sync_dependencies 此前未推送以下两条链路,导致用户
每次进入 Step 6 都必须手动选水体/耀斑掩膜文件:

1. step1 → step6 (boundary_path)
   Step 1 在 NDWI 模式下输出 water_mask_out.dat → 经 boundary_path
   推给 Step 6 的 self.water_mask_file。
   注意:Step 6 接收键名是 boundary_path(历史遗留别名),不是
   water_mask_path。

2. step2 → step6 (glint_mask_path)
   Step 2 输出 severe_glint_area.dat → 经 glint_mask_path 推给
   Step 6 的 self.glint_mask_file。

同时精简了 _sync_dependencies 的 docstring(去掉逐条推送关系列表,
改为一行摘要)和若干块内注释。

其它方法 / 字段 / 类结构未改动。
2026-06-17 13:41:50 +08:00
DXC
6a962f5e8f feat(new-arch):主窗口全功能增强(图标系统 + 全链路参数同步 + 服务输出统一解析 + Step12 分类浏览)
1. main_view.py:图标系统 + 全链路参数自动传导
   - 新增 _res() 解析项目根的相对路径,PyInstaller 打包后兼容 sys._MEIPASS。
   - 新增 QListWidgetItem / QMessageBox 导入,左侧导航列表支持右键菜单 + 错误弹窗。
   - ROUTES 12 条全部新增 icon 字段("1.png" 等),侧边栏显示业务图标。
   - 新增 step_outputs 缓存机制:每个 step 完成后把 output_path 写入 self.step_outputs。
   - 新增 _sync_dependencies() 同步函数 + _safe_set_config() 包装器,
     按依赖图把上游产物推给下游 view:
       step1 → step6 water_mask_path
       step3 → step4 / step6 / step10 deglint_img_path / bsq_path
       step4 → step9 sampling_csv_path
       step5 → step6 csv_path
       step6 → step7 / step8 training_csv_path
       step8 → step9 models_dir(父目录)
       step9 → step11 prediction_csv_dir / prediction_csv_path(双推)
       step10 → step11 geotiff_dir / geotiff_path(双推)

2. services/step1-13:统一输出解析器集成
   - 新增 src/new/services/_output_resolver.py,提供 resolve_output_dir /
     copy_to_user_path / get_user_output_path / is_user_specified 四个共享工具。
   - 每个 service 把原有的私有 _resolve_xxx_dir 改为调用 resolve_output_dir,
     强制执行"用户优先"规则(用户指定 output_path 时用其父目录,否则用 work_dir/<subdir>)。
   - 用户指定文件名 vs 底层硬编码文件名的"事后劫持"通过 copy_to_user_path 完成
     (覆盖 step2、step4、step7、step8 等底层 step 不接受 output_path 关键字的步骤)。

3. views/step12_view.py:恢复 ImageCategoryTree + ImageViewerWidget 高级组件
   - 删掉精简版占位 Label,挂回旧版的 ImageCategoryTree(按"模型评估/光谱分析/
     统计图表/处理结果/含量分布图"五类自动归类工作目录下的图像文件)。
   - 挂回 ImageViewerWidget(滚轮缩放 0.1x-5x + 50ms 防抖 + FastTransformation/
     SmoothTransformation 智能切换 + Ctrl+Wheel + 工具栏)。
   - 扫描按钮接通 image_tree.scan_directory(),选中节点即时加载到 image_viewer。
   - 按钮样式切换为 ModernStylesheet(success/primary)统一视觉。
2026-06-17 13:28:58 +08:00
DXC
9cb3c8ed0d fix(ui):修复 3 项 UI 交互痛点(输出路径不显示 / Step4 预览缺失 / Step5 CSV NaN 报错)
1. base_view.py:update_from_config 路由修复
   原默认实现只缓存 work_dir + pipeline,不触发 update_work_directory,
   导致所有派生 view 的输出路径自动填入从未执行。
   补一行 self.update_work_directory(work_dir) 后,13 个 view 全部受益。

2. step4_view.py:恢复采样点交互式预览
   从旧 panel 移植 preview_btn 按钮 + QTimer 2s 心跳(_check_csv_exists)
   + SamplingViewerDialog 弹窗。
   用户在执行 Step 4 后点击按钮即可点击散点查看各采样点光谱曲线。

3. step5_view.py:CSV 预览 NaN 崩溃修复
   pd.read_csv(csv_path, nrows=n) → pd.read_csv(csv_path, nrows=n).fillna("")。
   避免底层 Qt 模型在解析 float64 空值时崩溃(PandasTableModel 路径必填)。
2026-06-17 13:28:10 +08:00
DXC
48668c9e74 services/step10-13:终极决战!打通空间插值、可视化出图与报告生成的最后四步独立服务 2026-06-17 09:57:13 +08:00
DXC
6fc0394fe2 services/step6-9:打通光谱计算与机器学习预测的核心独立服务 2026-06-17 09:34:21 +08:00
DXC
f8d5ea2eb8 services/step2-5:打通前四个预处理步骤的真实后端独立服务
新增 src/new/services/{step2,step3,step4,step5}_service.py 四个独立后端服务:
2026-06-17 09:15:22 +08:00
DXC
ef3de632d3 smoke:L3 step2 断言从 PlaceholderView 更新为 Step2View(反映 12 个 view 已迁移) 2026-06-17 08:58:29 +08:00
DXC
3d4462f4e9 main_view:ROUTES 12 条全部对齐真实 view(业务名+路径+class),Boot 日志更新 2026-06-17 08:58:24 +08:00
DXC
84f0f6058f views/step2-view13:12 个前端 view 迁移完成(继承 BaseView,纯 UI,service 仍占位) 2026-06-17 08:58:17 +08:00
DXC
61bd8582e5 路由壳升级:TaskWorker 三信号 + main_router→main_view 迁移(54/54 smoke 通过) 2026-06-16 18:23:38 +08:00
DXC
bd4263d2ca 旧 GUI 张冠李戴修复:step6/step8 ML 训练 CSV 强制读 Step 6 特征结果 + step3 默认算法切到 goodman 2026-06-16 17:53:55 +08:00
DXC
afe9eaff2c README_new_arch.md + _smoke_new_arch.py:端到端新架构运行文档与三层冒烟(service/view/e2e 共 54 项断言) 2026-06-16 17:53:46 +08:00
DXC
e993a184bd views/* + main_router.py:13 step 路由壳(QListWidget+QStackedWidget+TaskWorker 后台执行器) 2026-06-16 17:53:35 +08:00
DXC
2a89fdc62c services/placeholder_service.py:step2-step13 占位服务(execute_placeholder 返回 not_implemented 状态) 2026-06-16 17:53:19 +08:00
DXC
e62f53bf77 services/step1_service.py:水域掩膜纯函数服务(execute_step1,零 PyQt/零全局,异常统一转 dict) 2026-06-16 17:53:11 +08:00
DXC
1e0e7d1973 端到端新架构骨架:src/new 包入口 + BaseView 接口契约(dispatch_execute 沿父链上溯 run_single_step) 2026-06-16 17:53:01 +08:00
DXC
15547bddfb .gitignore 精确放行 src/new/(保留 new/ 广义规则的反向例外) 2026-06-16 17:52:51 +08:00
DXC
027981e9a6 ContentMapper 边界读取支持栅格水掩膜(.dat/.bsq/.tif/.tiff/.img) 2026-06-16 15:15:10 +08:00
DXC
5084f7d049 Step10 Kriging 输出路径强制 14_visualization + Step11 掩膜自动填入 2026-06-16 14:12:10 +08:00
DXC
0238aa66ab 路径归一化:统一 14 个子目录 helper 接口 + 修复 getattr 张冠李戴
新增 _step_path_resolver.py(STEP_DATA_SOURCE 映射表 + _FALLBACK_DIR_TABLE 40+ keys + resolve_subdir / get_step_output_path / resolve_step_widget 三层 API),与 pipeline.get_step_output_dir 互为表里、互不依赖。

pipeline 新增 get_step_output_dir(step_name) 唯一权威接口(class-level _STEP_OUTPUT_DIR_MAP 延迟构造 + 未知 key 回退 work_dir + 调试日志)。

全量重构 src/gui/panels/step*.py(17 个文件)

* 消除全部 os.path.join(wp, "X_subdir") 硬编码(14 个预定义子目录)

* 8 处 getattr(main_window.stepXX_panel, ...) 张冠李戴死代码全部修复(错位属性名 → 通过 STEP_DATA_SOURCE 映射到正确的 main_window 长名属性)

* 删除 step12_viz_panel.py 中 self.step11_ml_panel / step11_panel / step12_panel 死代码块

* 提示文字/标签字典/日志保留原文,仅替换实际路径计算

Smoke test:39 fallback key + 14 路径映射 + 14 step 数字 key + 17/17 panel AST 解析 + 17/17 import 全部就位。
2026-06-16 12:54:18 +08:00
DXC
03c788a16c Step6 波长读取:spectral 解析失败时增加 .hdr 文本暴力解析兜底,消灭 band_1 fallback 2026-06-16 11:07:30 +08:00
DXC
d41262aa18 Step5 输出文件名统一为 processed_data.csv(修复 GUI/算法断链)
实际落盘(data_preparation_step.py:32、runner.py:101)一直为 processed_data.csv,但 GUI 三处残留旧名 cleaned_sampling_data.csv:注册表 step_default_outputs['step5_clean']、面板占位符、默认输出路径生成。本次统一替换,与 PipelineRunner/算法真实产物对齐。
2026-06-15 17:32:07 +08:00
DXC
0a0ede2e02 Step3 插值:多进程内存雪崩二次补丁(mask copy + workers 上限 6) 2026-06-15 17:10:36 +08:00
DXC
60a9d7d922 Step3 插值算法 OOM 修复 + 多进程加速 + 全链路累积改动(14 文件) 2026-06-15 16:49:17 +08:00
DXC
82e0b92af6 Mega-1.1 全链路路径归一化收尾(18 文件) 2026-06-15 15:20:50 +08:00
DXC
a9e77d2ad0 添加公式方法 2026-06-15 14:55:32 +08:00
DXC
f73a7d8999 添加公式方法 2026-06-12 16:48:20 +08:00
DXC
be47b70594 Step4 心跳刷新 + Step10 输出目录更名与智能寻址优化 2026-06-12 10:27:47 +08:00
DXC
4c9ca2aa03 全链路路径对齐:注册表重写为字符串格式,10_sampling→4_sampling,water_quality_indices→training_spectra_indices 2026-06-12 09:59:35 +08:00
DXC
89bdcbc27a Step7 面板:移除输出模式选择 UI,output_mode 硬编码锁定为 0(全量输出) 2026-06-12 09:27:16 +08:00
DXC
04669bdee8 Step7 面板:单选框蓝底实心样式美化,清理死代码(np/Tuple 导入、_get_coord_cols),run_step 路由化 2026-06-12 09:24:16 +08:00
DXC
e59703f163 结构修改,后端文件跟前端内容进行适配 2026-06-11 17:44:24 +08:00
DXC
3584c07b67 对齐 GUI 面板 ID 与 pipeline 方法路由 2026-06-11 15:35:47 +08:00
DXC
1ad4c54b80 Fix step4_panel variable name inconsistency causing AttributeError 2026-06-11 15:14:26 +08:00
DXC
5d75d3371b Step5: 强制锁死GUI路由字典,替换全部旧step_id为新命名 2026-06-11 15:09:35 +08:00
DXC
d3262ae80d Rename pipeline method names to match step numbers (Step4) 2026-06-11 14:58:39 +08:00
DXC
7c7a31ce00 Fix panel internal titles and step calls (Step3) 2026-06-11 14:56:33 +08:00
DXC
604886abb3 fix(gui): 同步侧边栏/选项卡文本与路由映射,删除回归预测 tab 2026-06-11 11:24:28 +08:00
DXC
3c4d4081a4 refactor(gui): 重命名面板序号 step4-11,采样点布设移至 step4,ML 建模移至 step9 2026-06-11 11:13:16 +08:00
DXC
184f5fe9f4 fix(step14): 批量渲染文件名唯一性 + Colorbar 样式 + 2σ拉伸 2026-06-11 10:29:32 +08:00
DXC
aa539db9bd chore: .gitignore 排除 _archive_panels_backup_/ 2026-06-10 17:14:29 +08:00
DXC
016c895803 feat(qaa): 新增 QAA 算法模块 src/core/algorithms/qaa/ 2026-06-10 17:14:08 +08:00
DXC
16fc92648b chore: 新增 QAA 校验脚本 _check_qaa.py 和 CSV 生成脚本 _run_gen_csv.py 2026-06-10 17:14:02 +08:00
DXC
0493ba7916 fix(map): GeoTIFF 可视化全链路修复 2026-06-10 17:13:51 +08:00
DXC
2671c0837a feat(step8): 新增 Step8 水色指数反演 GUI 面板 step8_waterindex_panel 2026-06-10 17:13:37 +08:00
DXC
320f2f18f2 feat(step8): 新增水色指数反演模块 waterindex_inversion + CSV 公式驱动架构 2026-06-10 17:13:25 +08:00
DXC
cfe4c50c31 feat(step8→step9): 源头透传坐标元数据,打通空间坐标全程流 2026-06-10 09:55:28 +08:00
DXC
7571762e63 fix(step9): 保留原始坐标列至 final_concentrations.csv,防止 Step14 崩溃 2026-06-10 09:54:00 +08:00
DXC
04a321d225 fix(step14): 修正流水线方法名 step9_generate_distribution_map → step14_distribution_map 2026-06-10 09:46:14 +08:00
DXC
fa9c940074 feat(visualization+report): 接入 Step9 浓度反演数据至可视化面板与报告生成器 2026-06-10 09:41:39 +08:00
DXC
c3cc2ef77e feat(step9): 新增浓度反演模块及 GUI 面板 2026-06-09 17:55:25 +08:00
dxc
4ca90b0e79 fix: get_spectral.py CSV列索引错误 - 跳过测量点ID列正确读取纬度(41.66°)和经度(124.22°)
input.csv列顺序: 时间,测量点,纬度,经度,水质参数...
原代码错误地将测量点ID(col0)当作纬度,纬度(col1)当作经度
修复后: lat=col1(纬度), lon=col2(经度)
修复前导致所有14815个采样点坐标转换后超出影像范围,光谱提取为0
修复后: 14815个采样点全部成功提取有效光谱(314~717)
2026-06-09 15:02:28 +08:00
dxc
6d49e80c7e fix(gui): step8_panel改用DataPreparationStep计算水质指数,统一pipeline与面板独立运行路径 2026-06-09 13:38:28 +08:00
dxc
9ebe4fe4d3 fix(gui): step8_panel增加Formula_Type/Coefficient UI支持,get_config输出formula_coefficients 2026-06-09 13:31:50 +08:00
dxc
41c6a64628 fix(gui): step9_panel增加pipeline.indices_path读取,解决step8产出断链问题 2026-06-09 13:31:12 +08:00
dxc
2872788cc3 fix(pipeline): 移除STEP_MAP中step8→step11_ml的错误映射,避免resolve_step_id('step8')返回step11_ml 2026-06-09 13:30:52 +08:00
dxc
90ba5a5fe2 fix(pipeline): 移除未使用的WaterQualityIndexCalculator导入和实例化 2026-06-09 13:30:36 +08:00
DXC
c9b9eded84 fix(gui): step8_panel QBrush崩溃修复 + step9_panel step5→step8_panel 回填链路对齐 2026-06-09 13:23:17 +08:00
DXC
47cbb4a013 refactor(pipeline): step8 输出文件命名统一为 training_spectra_indices.csv,produces 增加 trad_indices_dir 2026-06-09 13:18:15 +08:00
DXC
593719e7d0 fix(gui): step8 QBrush崩溃修复 + step9 自动探测 Traditional_Indices 目录回填 2026-06-09 13:13:01 +08:00
DXC
bf2496badc feat(data): waterindex.csv 新增 19 条 concentration 类型经验浓度公式 2026-06-09 11:45:20 +08:00
DXC
28394f2eda feat(gui): 全流程面板合并 + 一键式运行 GUI 入口集成 2026-06-09 11:30:42 +08:00
DXC
aefc9d5aac feat(pipeline): 一键式运行 - 调度引擎核心 + 预检/免检系统 + 线程桥接 2026-06-09 11:29:11 +08:00
DXC
624a5bdcd4 refactor(water_index): 用 waterindex.csv 驱动公式计算,移除 45 个硬编码方法 2026-06-09 11:24:15 +08:00
DXC
371e7a2745 fix(PipelineRunner): 接力棒断链修复 + 依赖级联自动唤醒引擎 2026-06-09 09:07:59 +08:00
DXC
d22414bf7d feat(sampling): add adaptive sampling toggle + interactive sampling point viewer 2026-06-08 15:39:43 +08:00
DXC
e57fdb4f75 feat(report): 支持 Minimax AI 后端 + 统一 AI 配置对话框,修复 figure_counter 返回值断链 Bug 2026-06-08 14:58:16 +08:00
DXC
d5dd2ba1da chore: 移除 frontend/ 和图标资源目录;彻底清理遗留脚手架 2026-06-08 12:13:37 +08:00
DXC
1cbd38a8e0 chore: 从索引移除运行时产物、个人配置、旧脚手架;完善 .gitignore 2026-06-08 12:12:11 +08:00
DXC
e3debbcb15 fix(step8): 修复外部模型字典透传断链 + 规范化 loaded_model_data 防 Ridge subscriptable 崩溃 2026-06-08 11:36:36 +08:00
DXC
2b76d7908f feat(step8): 外部模型从单文件升级为母文件夹多模型字典扫描 2026-06-08 09:56:02 +08:00
DXC
4efe5b871e feat(gui): 一键运行智能预检
4 段预检彻底解决切换 PipelineRunner 后报 TypeError/静默跳过等问题, 并升级一键运行 UX:

- 预检 1: work_path + log + scan + auto_populate(无需弹窗, 静默回填)

- 预检 2: step3 波段越界 60s 倒计时弹窗(BandConfirmDialog) + gdal 主线程同步读 RasterCount, 越界时 SpinBox 回写 UI

- 预检 3: img_path 硬校验(warning + 跳 step1 + return)

- 预检 4: csv_path 软提示(information + 不 return, 让用户在 QMessageBox.question 二次确认时自己决定是否跳过训练)

新增 src/gui/dialogs.py: BandConfirmDialog(QDialog 子类, 60s 倒计时)
2026-06-04 10:38:46 +08:00
DXC
2139715829 fix(runner): step5 严格依赖 step4 产物 + 拒绝静默跳过
- step5.requires 加入 processed_csv_path(step4 产物) 并显式 parameter_map 到 csv_path 形参;step5.skip_when_missing=False 配合 Facade **kwargs 兜底

- parameter_map 双向映射规避 L2 顺序注入冲突: processed_csv_path→csv_path(主), csv_path→_raw_csv_ignored(占位, 落 **kwargs)

- PipelineRunner.run() skip_when_missing 块新增 _notify 通知, 让 GUI 知道具体缺了什么(拒绝静默跳过)
2026-06-04 10:38:33 +08:00
DXC
64aa5b8f40 fix(runner): 14 Facade kwargs 兜底 + 4 spec parameter_map 修正 + step6_75 路由切到 indices
- 14 个 stepX_... Facade 形参表末尾加 **kwargs,杜绝 Runner 注入未声明 key 时的 TypeError(典型:step3 收到 glint_mask_path)

- runner._invoke user_overrides 合并加 v is not None and v != '' 过滤,避免 GUI 面板空值覆盖 ctx 中已写入的有效路径

- PIPELINE_STEPS 加 4 个 parameter_map 修正 ctx 字段名→形参名错位:step6_5/6_75: training_csv_path→csv_path;step8_5: models_dir→non_empirical_models_dir;step8_75: models_dir→custom_regression_dir

- step6_75 路由从 training_csv_path 切到 indices_path(requires + parameter_map 同步);配合 skip_when_missing,未跑 step5_5 时自动 skip

- worker_thread.py: mode='full' 切到 PipelineRunner + PipelineContext 调度
2026-06-04 09:15:04 +08:00
DXC
343e316799 refactor(pipeline): 路径直接传输 — 统一 ctx 字段名/panel key/step 形参名 2026-06-03 17:29:41 +08:00
DXC
517bb28611 snapshot: 路线 B 重构前原状(pipeline 包首次入 git) 2026-06-03 16:31:45 +08:00
185 changed files with 27404 additions and 10554 deletions

50
.gitignore vendored
View File

@ -155,3 +155,53 @@ tmp/
*.bak
*.backup
*~
# ============================================================
# 不应进入版本控制的文件类型
# ============================================================
# Qwen Code 用户配置(个人环境,每次 clone 都不同)
.qwen/settings.json
.qwen/settings.json.orig
# Qwen Code 自动生成的 skill 文件(每次会话重新生成)
.qwen/skills/
# GUI 运行时生成的文件
src/gui/scaler_params.pkl
src/gui/crash_dump.txt
# 临时/调试脚本(根目录)
降采样光谱.py
1.py
tset.py
# 报告与文档(本地工作产物)
封装问题分析报告.md
软件说明.md
软件说明2.md
# 数据子目录中非 .gitkeep 的生成文件
data/sub/waterindex*.csv
data/sub/waterindex*.xlsx
data/sub/png/watermask.png
# 图标文件(仅需保留 vector/svg删除像素图标压缩包副本
data/icons-1/
data/icons/
# 旧版脚手架(遗留实验代码)
new/
# 精确放行 src/new/(端到端模块化新架构)
!/src/new/
!/src/new/**
!/src/new/core/**
!/src/new/services/**
!/src/new/views/**
# 前端脚手架(未集成的独立 Vue 项目)
frontend/
# 面板备份目录(运行中自动生成)
_archive_panels_backup_/

4
1.py
View File

@ -1,4 +0,0 @@
new_wavelengths = [np.mean(wavelengths[i:i+3]) for i in range(0, len(wavelengths), 3)]
print(new_wavelengths)

83
README_new_arch.md Normal file
View File

@ -0,0 +1,83 @@
# 端到端模块化新架构src/new/
## 目录结构
```
src/new/
├── __init__.py
├── core/
│ ├── __init__.py
│ └── base_view.py # 基础通讯接口(继承 QWidget + dispatch_execute
├── services/ # 独立后端大脑
│ ├── __init__.py
│ ├── step1_service.py # Step 1 真实服务execute_step1
│ └── placeholder_service.py # step2-step13 占位服务
├── views/ # 独立前端皮囊
│ ├── __init__.py
│ ├── step1_view.py # Step 1 真实视图(继承 BaseView
│ └── placeholder_view.py # step2-step13 占位视图
└── main_view.py # 路由与调度壳QMainWindow + QThread
```
## 端到端调用链
```
Step1View._on_run_clicked (绿色按钮)
│ self.dispatch_execute("step1", self.get_config())
BaseView.dispatch_execute (沿父链上溯)
│ ancestor.run_single_step(step_id, config)
MainView.run_single_step (查 ROUTES 表 → 注入 work_dir)
│ TaskWorker(service_func, config).start()
services.step1_service.execute_step1(config)
│ 调 WaterMaskStep.run(...) → 包装成 dict 返回
MainView._on_step_done (按 status 写日志)
```
## 运行验证
### 1. 三层冒烟(推荐先跑)
```cmd
cd D:\111\office\ZHLduijie\1.WQ\WQ_GUI
python _smoke_new_arch.py
```
预期输出 `汇总54/54 通过`
### 2. 启动路由主窗口
```cmd
cd D:\111\office\ZHLduijie\1.WQ\WQ_GUI
python -m src.new.main_view
```
或:
```cmd
python src\new\main_view.py
```
启动后:
* 左侧 `QListWidget` 显示 13 个 stepstep1 真实,其余占位)
* 点击 `执行 Step 1: 水域掩膜` → 绿色按钮 → `dispatch_execute`
* 底部 `QTextEdit` 实时打印 `[Router]` / `[Service]` 日志
## 关键设计原则
1. **view 零业务**`src/new/views/*.py` 绝不 import 任何 `src/core/``src/services/`
2. **service 零 PyQt**`src/new/services/*.py` 不 import 任何 PyQt、不读写全局
3. **唯一跨界通道**`BaseView.dispatch_execute` 把 (step_id, config) 推给主窗口
4. **后台执行不阻塞 UI**`TaskWorker(QThread)` 子线程跑 service
5. **错误兜底**service 任何异常都被 TaskWorker 捕获并转成 `{status: "error", ...}`
## 当前状态
| step | view | service | 状态 |
|--------|---------------------|------------------------|---------------------|
| step1 | `Step1View` 真实 | `execute_step1` 真实 | ✅ 已迁移 |
| step2-13 | `PlaceholderView` | `execute_placeholder` | 🚧 占位待迁移 |

4
_check_qaa.py Normal file
View File

@ -0,0 +1,4 @@
import sys
sys.path.insert(0, r'D:\111\office\ZHLduijie\1.WQ\WQ_GUI')
from src.core.algorithms.qaa import QAABaselineSolver
print("QAABaselineSolver imported OK")

0
_run_gen_csv.py Normal file
View File

6
check_lines.py Normal file
View File

@ -0,0 +1,6 @@
import sys
with open(r'D:\111\office\ZHLduijie\1.WQ\WQ_GUI\src\gui\water_quality_gui.py', 'rb') as f:
content = f.read()
lines = content.split(b'\r\n')
for i, line in enumerate(lines[2918:2955], start=2919):
sys.stdout.buffer.write(f'{i}: {repr(line[:120])}'.encode('utf-8') + b'\n')

Binary file not shown.

Before

Width:  |  Height:  |  Size: 67 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 70 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 55 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 46 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 51 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 59 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 950 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 30 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.6 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.2 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 5.3 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.2 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 978 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 1.9 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 300 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 16 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 6.4 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 250 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 2.9 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 3.1 MiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 884 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 18 KiB

View File

@ -1,46 +0,0 @@
Formula_Name,Category,Formula,Reference
BGA_Am09KBBI,Phycocyanin (BGA_PC),(w686 - w658) / (w686 + w658),"Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S.; Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery, Optics Express, 2009, 17, 11, 1-13."
BGA_Be162B643sub629,Phycocyanin (BGA_PC),w644 - w629,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538."
BGA_Be162B700sub601,Phycocyanin (BGA_PC),w700 - w601,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539."
BGA_Be162BsubPhy,Phycocyanin (BGA_PC),w715 - w615,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540."
BGA_Be16FLHBlueRedNIR,Phycocyanin (BGA_PC),w658 - (w857 + (w458 - w857)),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538."
BGA_Be16FLHGreenRedNIR,Phycocyanin (BGA_PC),w658 - (w857 + (w558 - w857)),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539."
BGA_Be16FLHVioletRedNIR,Phycocyanin (BGA_PC),w658 - (w857 + (w444 - w857)),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538."
BGA_Be16MPI,Phycocyanin (BGA_PC),(w615 - w601) - (w644 - w601),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539."
BGA_Be16NDPhyI,Phycocyanin (BGA_PC),(w700 - w622) / (w700 + w622),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540."
BGA_Be16NDPhyI644over615,Phycocyanin (BGA_PC),(w644 - w615) / (w644 + w615),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 541."
BGA_Be16NDPhyI644over629,Phycocyanin (BGA_PC),(w644 - w629) / (w644 + w629),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 542."
BGA_Be16Phy2BDA644over629,Phycocyanin (BGA_PC),w644 / w629,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 545."
BGA_Da052BDA,Phycocyanin (BGA_PC),w714 / w672,"Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672."
BGA_Go04MCI,Phycocyanin (BGA_PC),w709 - w681 - (w753 - w681),"Gower, J.F.R.; Brown,L.; Borstad, G.A.; Observation of chlorophyll fluorescence in west coast waters of Canada using the MODIS satellite sensor. Can. J. Remote Sens., 2004, 30 (1), 17<31><37>?5."
BGA_HU103BDA,Phycocyanin (BGA_PC),(((1 / w615) - (1 / w600)) - w725),"Hunter, P.D.; Tyler, A.N.; Willby, N.J.; Gilvear, D.J.; The spatial dynamics of vertical migration by Microcystis aeruginosa in a eutrophic shallow lake: A case study using high spatial resolution time-series airborne remote sensing. Limn. Oceanogr. 2008, 53, 2391-2406"
BGA_Ku15PhyCI,Phycocyanin (BGA_PC),(-1 * (W681 - W665 - (W709 - W665))),"Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-10."
BGA_Ku15SLH,Phycocyanin (BGA_PC),(w715 - w658) + (w715 - w658),"Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-11."
BGA_MI092BDA,Phycocyanin (BGA_PC),w700 / w600,"Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758<35><38>?75."
BGA_MM092BDA,Phycocyanin (BGA_PC),w724 / w600,"Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758<35><38>?76."
BGA_MM12NDCIalt,Phycocyanin (BGA_PC),(w700 - w658) / (w700 + w658),"Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114003"
BGA_MM143BDAopt,Phycocyanin (BGA_PC),((1 / w629) - (1 / w659)) * w724,"Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114004"
BGA_SI052BDA,Phycocyanin (BGA_PC),w709 / w620,"Simis, S. G. H.; Peters, S.W. M.; Gons, H. J.; Remote sensing of the cyanobacteria pigment phycocyanin in turbid inland water. Limn. Oceanogr., 2005, 50, 237<33><37>?45"
BGA_SM122BDA,Phycocyanin (BGA_PC),w709 / w600,"Mishra, S. Remote sensing of cyanobacteria in turbid productive waters, PhD Dissertation. Mississippi State University, USA. 2012."
BGA_SY002BDA,Phycocyanin (BGA_PC),w650 / w625,"Schalles, J.; Yacobi, Y. Remote detection and seasonal patterns of phycocyanin, carotenoid and chlorophyll-a pigments in eutrophic waters. Archiv fur Hydrobiologie, Special Issues Advances in Limnology, 2000, 55,153<35><33>?68"
BGA_Wy08CI,Phycocyanin (BGA_PC),(-1 * (W686 - W672 - (W715 - W672))),"Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672."
Chl_Al10SABI,chlorophyll_a,(w857 - w644) / (w458 + w529),"Alawadi, F. Detection of surface algal blooms using the newly developed algorithm surface algal bloom index (SABI). Proc. SPIE 2010, 7825."
Chl_Am092Bsub,chlorophyll_a,w681 - w665,"Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S. Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery. Opt. Express 2009, 17, 9126<32><36>?144."
Chl_Be16FLHblue,chlorophyll_a,w529 - (w644 + (w458 - w644)),"Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30."
Chl_Be16FLHviolet,chlorophyll_a,w529 - (w644 + (w429 - w644)),"Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30."
Chl_Be16NDTIblue,chlorophyll_a,(w658 - w458) / (w658 + w458),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 543."
Chl_Be16NDTIviolet,chlorophyll_a,(w658 - w444) / (w658 + w444),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 544."
Chl_De933BDA,chlorophyll_a,w600 - w648 - w625,"Dekker, A.; Detection of the optical water quality parameters for eutrophic waters by high resolution remote sensing, Ph.D. thesis, 1993, Free University, Amsterdam."
Chl_Gi033BDA,chlorophyll_a,((1 / w672) - (1 / w715)) * w757,"Gitelson, A.A.; U. Gritz, and M. N. Merzlyak.; Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Phys. 2003, 160, 271-282."
Chl_Kn07KIVU,chlorophyll_a,(w458 - w644) / w529,"Kneubuhler, M.; Frank T.; Kellenberger, T.W; Pasche N.; Schmid M.; Mapping chlorophyll-a in Lake Kivu with remote sensing methods. 2007, Proceedings of the Envisat Symposium 2007, Montreux, Switzerland 23<32><33>?7 April 2007 (ESA SP-636, July 2007)."
Chl_MM12NDCI,chlorophyll_a,(w715 - w686) / (w715 + w686),"Mishra, S.; and Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters, Remote Sens. Environ., 2012, 117, 394-406"
Chl_Zh10FLH,chlorophyll_a,w686 - (w715 + (w672 - w751)),"Zhao, D.Z.; Xing, X.G.; Liu, Y.G.; Yang, J.H.; Wang, L. The relation of chlorophyll-a concentration with the reflectance peak near 700 nm in algae-dominated waters and sensitivity of fluorescence algorithms for detecting algal bloom. Int. J. Remote Sens. 2010, 31, 39-48"
Turb_Be16GreenPlusRedBothOverViolet,Turbidity,(w558 + w658) / w444,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538"
Turb_Be16RedOverViolet,Turbidity,w658 / w444,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539"
Turb_Bow06RedOverGreen,Turbidity,w658 / w558,"Bowers, D. G., and C. E. Binding. 2006. 闁炽儲缈籬e Optical Properties of Mineral Suspended Particles: A Review and Synthesis.<2E><>?Estuarine Coastal and Shelf Science 67 (1<><31>?): 219<31><39>?30. doi:10.1016/j.ecss.2005.11.010"
Turb_Chip09NIROverGreen,Turbidity,w857 / w558,"Chipman, J. W.; Olmanson, L.G.; Gitelson, A.A.; Remote sensing methods for lake management: A guide for resource managers and decision-makers. 2009."
Turb_Dox02NIRoverRed,Turbidity,w857 / w658,"Doxaran, D., Froidefond, J.-M.; Castaing, P. ; A reflectance band ratio used to estimate suspended matter concentrations in sediment-dominated coastal waters, Remote Sens., 2002, 23, 5079-5085"
Turb_Frohn09GreenPlusRedBothOverBlue,Turbidity,(w558 + w658) / w458,"Frohn, R. C., & Autrey, B. C. (2009). Water quality assessment in the Ohio River using new indices for turbidity and chlorophyll-a with Landsat-7 Imagery. Draft Internal Report, US Environmental Protection Agency."
Turb_Harr92NIR,Turbidity,w857,"Schiebe F.R., Harrington J.A., Ritchie J.C. Remote-Sensing of Suspended Sediments闁炽儲鏁刪e Lake Chicot, Arkansas Project. Int. J. Remote Sens. 1992;13:1487<38><37>?509"
Turb_Lath91RedOverBlue,Turbidity,w658 / w458,"Lathrop, R. G., Jr., T. M. Lillesand, and B. S. Yandell, 1991. Testing the utility of simple multi-date Thematic Mapper calibration algorithms for monitoring turbid inland waters. International Journal of Remote Sensing"
Turb_Moore80Red,Turbidity,w658,"Moore, G.K., Satellite remote sensing of water turbidity, Hydrological Sciences, 1980, 25, 4, 407-422"
1 Formula_Name Category Formula Reference
2 BGA_Am09KBBI Phycocyanin (BGA_PC) (w686 - w658) / (w686 + w658) Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S.; Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery, Optics Express, 2009, 17, 11, 1-13.
3 BGA_Be162B643sub629 Phycocyanin (BGA_PC) w644 - w629 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538.
4 BGA_Be162B700sub601 Phycocyanin (BGA_PC) w700 - w601 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539.
5 BGA_Be162BsubPhy Phycocyanin (BGA_PC) w715 - w615 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540.
6 BGA_Be16FLHBlueRedNIR Phycocyanin (BGA_PC) w658 - (w857 + (w458 - w857)) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538.
7 BGA_Be16FLHGreenRedNIR Phycocyanin (BGA_PC) w658 - (w857 + (w558 - w857)) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539.
8 BGA_Be16FLHVioletRedNIR Phycocyanin (BGA_PC) w658 - (w857 + (w444 - w857)) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538.
9 BGA_Be16MPI Phycocyanin (BGA_PC) (w615 - w601) - (w644 - w601) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539.
10 BGA_Be16NDPhyI Phycocyanin (BGA_PC) (w700 - w622) / (w700 + w622) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540.
11 BGA_Be16NDPhyI644over615 Phycocyanin (BGA_PC) (w644 - w615) / (w644 + w615) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 541.
12 BGA_Be16NDPhyI644over629 Phycocyanin (BGA_PC) (w644 - w629) / (w644 + w629) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 542.
13 BGA_Be16Phy2BDA644over629 Phycocyanin (BGA_PC) w644 / w629 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 545.
14 BGA_Da052BDA Phycocyanin (BGA_PC) w714 / w672 Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672.
15 BGA_Go04MCI Phycocyanin (BGA_PC) w709 - w681 - (w753 - w681) Gower, J.F.R.; Brown,L.; Borstad, G.A.; Observation of chlorophyll fluorescence in west coast waters of Canada using the MODIS satellite sensor. Can. J. Remote Sens., 2004, 30 (1), 17?5.
16 BGA_HU103BDA Phycocyanin (BGA_PC) (((1 / w615) - (1 / w600)) - w725) Hunter, P.D.; Tyler, A.N.; Willby, N.J.; Gilvear, D.J.; The spatial dynamics of vertical migration by Microcystis aeruginosa in a eutrophic shallow lake: A case study using high spatial resolution time-series airborne remote sensing. Limn. Oceanogr. 2008, 53, 2391-2406
17 BGA_Ku15PhyCI Phycocyanin (BGA_PC) (-1 * (W681 - W665 - (W709 - W665))) Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-10.
18 BGA_Ku15SLH Phycocyanin (BGA_PC) (w715 - w658) + (w715 - w658) Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-11.
19 BGA_MI092BDA Phycocyanin (BGA_PC) w700 / w600 Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758?75.
20 BGA_MM092BDA Phycocyanin (BGA_PC) w724 / w600 Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758?76.
21 BGA_MM12NDCIalt Phycocyanin (BGA_PC) (w700 - w658) / (w700 + w658) Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114003
22 BGA_MM143BDAopt Phycocyanin (BGA_PC) ((1 / w629) - (1 / w659)) * w724 Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114004
23 BGA_SI052BDA Phycocyanin (BGA_PC) w709 / w620 Simis, S. G. H.; Peters, S.W. M.; Gons, H. J.; Remote sensing of the cyanobacteria pigment phycocyanin in turbid inland water. Limn. Oceanogr., 2005, 50, 237?45
24 BGA_SM122BDA Phycocyanin (BGA_PC) w709 / w600 Mishra, S. Remote sensing of cyanobacteria in turbid productive waters, PhD Dissertation. Mississippi State University, USA. 2012.
25 BGA_SY002BDA Phycocyanin (BGA_PC) w650 / w625 Schalles, J.; Yacobi, Y. Remote detection and seasonal patterns of phycocyanin, carotenoid and chlorophyll-a pigments in eutrophic waters. Archiv fur Hydrobiologie, Special Issues Advances in Limnology, 2000, 55,153?68
26 BGA_Wy08CI Phycocyanin (BGA_PC) (-1 * (W686 - W672 - (W715 - W672))) Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672.
27 Chl_Al10SABI chlorophyll_a (w857 - w644) / (w458 + w529) Alawadi, F. Detection of surface algal blooms using the newly developed algorithm surface algal bloom index (SABI). Proc. SPIE 2010, 7825.
28 Chl_Am092Bsub chlorophyll_a w681 - w665 Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S. Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery. Opt. Express 2009, 17, 9126?144.
29 Chl_Be16FLHblue chlorophyll_a w529 - (w644 + (w458 - w644)) Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30.
30 Chl_Be16FLHviolet chlorophyll_a w529 - (w644 + (w429 - w644)) Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30.
31 Chl_Be16NDTIblue chlorophyll_a (w658 - w458) / (w658 + w458) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 543.
32 Chl_Be16NDTIviolet chlorophyll_a (w658 - w444) / (w658 + w444) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 544.
33 Chl_De933BDA chlorophyll_a w600 - w648 - w625 Dekker, A.; Detection of the optical water quality parameters for eutrophic waters by high resolution remote sensing, Ph.D. thesis, 1993, Free University, Amsterdam.
34 Chl_Gi033BDA chlorophyll_a ((1 / w672) - (1 / w715)) * w757 Gitelson, A.A.; U. Gritz, and M. N. Merzlyak.; Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Phys. 2003, 160, 271-282.
35 Chl_Kn07KIVU chlorophyll_a (w458 - w644) / w529 Kneubuhler, M.; Frank T.; Kellenberger, T.W; Pasche N.; Schmid M.; Mapping chlorophyll-a in Lake Kivu with remote sensing methods. 2007, Proceedings of the Envisat Symposium 2007, Montreux, Switzerland 23?7 April 2007 (ESA SP-636, July 2007).
36 Chl_MM12NDCI chlorophyll_a (w715 - w686) / (w715 + w686) Mishra, S.; and Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters, Remote Sens. Environ., 2012, 117, 394-406
37 Chl_Zh10FLH chlorophyll_a w686 - (w715 + (w672 - w751)) Zhao, D.Z.; Xing, X.G.; Liu, Y.G.; Yang, J.H.; Wang, L. The relation of chlorophyll-a concentration with the reflectance peak near 700 nm in algae-dominated waters and sensitivity of fluorescence algorithms for detecting algal bloom. Int. J. Remote Sens. 2010, 31, 39-48
38 Turb_Be16GreenPlusRedBothOverViolet Turbidity (w558 + w658) / w444 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538
39 Turb_Be16RedOverViolet Turbidity w658 / w444 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539
40 Turb_Bow06RedOverGreen Turbidity w658 / w558 Bowers, D. G., and C. E. Binding. 2006. 闁炽儲缈籬e Optical Properties of Mineral Suspended Particles: A Review and Synthesis.?Estuarine Coastal and Shelf Science 67 (1?): 219?30. doi:10.1016/j.ecss.2005.11.010
41 Turb_Chip09NIROverGreen Turbidity w857 / w558 Chipman, J. W.; Olmanson, L.G.; Gitelson, A.A.; Remote sensing methods for lake management: A guide for resource managers and decision-makers. 2009.
42 Turb_Dox02NIRoverRed Turbidity w857 / w658 Doxaran, D., Froidefond, J.-M.; Castaing, P. ; A reflectance band ratio used to estimate suspended matter concentrations in sediment-dominated coastal waters, Remote Sens., 2002, 23, 5079-5085
43 Turb_Frohn09GreenPlusRedBothOverBlue Turbidity (w558 + w658) / w458 Frohn, R. C., & Autrey, B. C. (2009). Water quality assessment in the Ohio River using new indices for turbidity and chlorophyll-a with Landsat-7 Imagery. Draft Internal Report, US Environmental Protection Agency.
44 Turb_Harr92NIR Turbidity w857 Schiebe F.R., Harrington J.A., Ritchie J.C. Remote-Sensing of Suspended Sediments闁炽儲鏁刪e Lake Chicot, Arkansas Project. Int. J. Remote Sens. 1992;13:1487?509
45 Turb_Lath91RedOverBlue Turbidity w658 / w458 Lathrop, R. G., Jr., T. M. Lillesand, and B. S. Yandell, 1991. Testing the utility of simple multi-date Thematic Mapper calibration algorithms for monitoring turbid inland waters. International Journal of Remote Sensing
46 Turb_Moore80Red Turbidity w658 Moore, G.K., Satellite remote sensing of water turbidity, Hydrological Sciences, 1980, 25, 4, 407-422

Binary file not shown.

View File

@ -1,46 +0,0 @@
Formula_Name,Category,Formula,Reference
BGA_Am09KBBI,Phycocyanin (BGA_PC),(w686 - w658) / (w686 + w658),"Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S.; Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery, Optics Express, 2009, 17, 11, 1-13."
BGA_Be162B643sub629,Phycocyanin (BGA_PC),w644 - w629,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538."
BGA_Be162B700sub601,Phycocyanin (BGA_PC),w700 - w601,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539."
BGA_Be162BsubPhy,Phycocyanin (BGA_PC),w715 - w615,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540."
BGA_Be16FLHBlueRedNIR,Phycocyanin (BGA_PC),w658 - (w857 + (w458 - w857)),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538."
BGA_Be16FLHGreenRedNIR,Phycocyanin (BGA_PC),w658 - (w857 + (w558 - w857)),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539."
BGA_Be16FLHVioletRedNIR,Phycocyanin (BGA_PC),w658 - (w857 + (w444 - w857)),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538."
BGA_Be16MPI,Phycocyanin (BGA_PC),(w615 - w601) - (w644 - w601),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539."
BGA_Be16NDPhyI,Phycocyanin (BGA_PC),(w700 - w622) / (w700 + w622),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540."
BGA_Be16NDPhyI644over615,Phycocyanin (BGA_PC),(w644 - w615) / (w644 + w615),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 541."
BGA_Be16NDPhyI644over629,Phycocyanin (BGA_PC),(w644 - w629) / (w644 + w629),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 542."
BGA_Be16Phy2BDA644over629,Phycocyanin (BGA_PC),w644 / w629,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 545."
BGA_Da052BDA,Phycocyanin (BGA_PC),w714 / w672,"Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672."
BGA_Go04MCI,Phycocyanin (BGA_PC),w709 - w681 - (w753 - w681),"Gower, J.F.R.; Brown,L.; Borstad, G.A.; Observation of chlorophyll fluorescence in west coast waters of Canada using the MODIS satellite sensor. Can. J. Remote Sens., 2004, 30 (1), 17<31><37>?5."
BGA_HU103BDA,Phycocyanin (BGA_PC),(((1 / w615) - (1 / w600)) - w725),"Hunter, P.D.; Tyler, A.N.; Willby, N.J.; Gilvear, D.J.; The spatial dynamics of vertical migration by Microcystis aeruginosa in a eutrophic shallow lake: A case study using high spatial resolution time-series airborne remote sensing. Limn. Oceanogr. 2008, 53, 2391-2406"
BGA_Ku15PhyCI,Phycocyanin (BGA_PC),-1 * (W681 - W665 - (W709 - W665)),"Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-10."
BGA_Ku15SLH,Phycocyanin (BGA_PC),(w715 - w658) + (w715 - w658),"Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-11."
BGA_MI092BDA,Phycocyanin (BGA_PC),w700 / w600,"Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758<35><38>?75."
BGA_MM092BDA,Phycocyanin (BGA_PC),w724 / w600,"Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758<35><38>?76."
BGA_MM12NDCIalt,Phycocyanin (BGA_PC),(w700 - w658) / (w700 + w658),"Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114003"
BGA_MM143BDAopt,Phycocyanin (BGA_PC),((1 / w629) - (1 / w659)) * w724,"Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114004"
BGA_SI052BDA,Phycocyanin (BGA_PC),w709 / w620,"Simis, S. G. H.; Peters, S.W. M.; Gons, H. J.; Remote sensing of the cyanobacteria pigment phycocyanin in turbid inland water. Limn. Oceanogr., 2005, 50, 237<33><37>?45"
BGA_SM122BDA,Phycocyanin (BGA_PC),w709 / w600,"Mishra, S. Remote sensing of cyanobacteria in turbid productive waters, PhD Dissertation. Mississippi State University, USA. 2012."
BGA_SY002BDA,Phycocyanin (BGA_PC),w650 / w625,"Schalles, J.; Yacobi, Y. Remote detection and seasonal patterns of phycocyanin, carotenoid and chlorophyll-a pigments in eutrophic waters. Archiv fur Hydrobiologie, Special Issues Advances in Limnology, 2000, 55,153<35><33>?68"
BGA_Wy08CI,Phycocyanin (BGA_PC),-1 * (W686 - W672 - (W715 - W672)),"Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672."
Chl_Al10SABI,chlorophyll_a,(w857 - w644) / (w458 + w529),"Alawadi, F. Detection of surface algal blooms using the newly developed algorithm surface algal bloom index (SABI). Proc. SPIE 2010, 7825."
Chl_Am092Bsub,chlorophyll_a,w681 - w665,"Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S. Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery. Opt. Express 2009, 17, 9126<32><36>?144."
Chl_Be16FLHblue,chlorophyll_a,w529 - (w644 + (w458 - w644)),"Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30."
Chl_Be16FLHviolet,chlorophyll_a,w529 - (w644 + (w429 - w644)),"Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30."
Chl_Be16NDTIblue,chlorophyll_a,(w658 - w458) / (w658 + w458),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 543."
Chl_Be16NDTIviolet,chlorophyll_a,(w658 - w444) / (w658 + w444),"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 544."
Chl_De933BDA,chlorophyll_a,w600 - w648 - w625,"Dekker, A.; Detection of the optical water quality parameters for eutrophic waters by high resolution remote sensing, Ph.D. thesis, 1993, Free University, Amsterdam."
Chl_Gi033BDA,chlorophyll_a,((1 / w672) - (1 / w715)) * w757,"Gitelson, A.A.; U. Gritz, and M. N. Merzlyak.; Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Phys. 2003, 160, 271-282."
Chl_Kn07KIVU,chlorophyll_a,(w458 - w644) / w529,"Kneubuhler, M.; Frank T.; Kellenberger, T.W; Pasche N.; Schmid M.; Mapping chlorophyll-a in Lake Kivu with remote sensing methods. 2007, Proceedings of the Envisat Symposium 2007, Montreux, Switzerland 23<32><33>?7 April 2007 (ESA SP-636, July 2007)."
Chl_MM12NDCI,chlorophyll_a,(w715 - w686) / (w715 + w686),"Mishra, S.; and Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters, Remote Sens. Environ., 2012, 117, 394-406"
Chl_Zh10FLH,chlorophyll_a,w686 - (w715 + (w672 - w751)),"Zhao, D.Z.; Xing, X.G.; Liu, Y.G.; Yang, J.H.; Wang, L. The relation of chlorophyll-a concentration with the reflectance peak near 700 nm in algae-dominated waters and sensitivity of fluorescence algorithms for detecting algal bloom. Int. J. Remote Sens. 2010, 31, 39-48"
Turb_Be16GreenPlusRedBothOverViolet,Turbidity,(w558 + w658) / w444,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538"
Turb_Be16RedOverViolet,Turbidity,w658 / w444,"Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539"
Turb_Bow06RedOverGreen,Turbidity,w658 / w558,"Bowers, D. G., and C. E. Binding. 2006. 鈥淭he Optical Properties of Mineral Suspended Particles: A Review and Synthesis.<2E><>?Estuarine Coastal and Shelf Science 67 (1<><31>?): 219<31><39>?30. doi:10.1016/j.ecss.2005.11.010"
Turb_Chip09NIROverGreen,Turbidity,w857 / w558,"Chipman, J. W.; Olmanson, L.G.; Gitelson, A.A.; Remote sensing methods for lake management: A guide for resource managers and decision-makers. 2009."
Turb_Dox02NIRoverRed,Turbidity,w857 / w658,"Doxaran, D., Froidefond, J.-M.; Castaing, P. ; A reflectance band ratio used to estimate suspended matter concentrations in sediment-dominated coastal waters, Remote Sens., 2002, 23, 5079-5085"
Turb_Frohn09GreenPlusRedBothOverBlue,Turbidity,(w558 + w658) / w458,"Frohn, R. C., & Autrey, B. C. (2009). Water quality assessment in the Ohio River using new indices for turbidity and chlorophyll-a with Landsat-7 Imagery. Draft Internal Report, US Environmental Protection Agency."
Turb_Harr92NIR,Turbidity,w857,"Schiebe F.R., Harrington J.A., Ritchie J.C. Remote-Sensing of Suspended Sediments鈥攖he Lake Chicot, Arkansas Project. Int. J. Remote Sens. 1992;13:1487<38><37>?509"
Turb_Lath91RedOverBlue,Turbidity,w658 / w458,"Lathrop, R. G., Jr., T. M. Lillesand, and B. S. Yandell, 1991. Testing the utility of simple multi-date Thematic Mapper calibration algorithms for monitoring turbid inland waters. International Journal of Remote Sensing"
Turb_Moore80Red,Turbidity,w658,"Moore, G.K., Satellite remote sensing of water turbidity, Hydrological Sciences, 1980, 25, 4, 407-422"
1 Formula_Name Category Formula Reference
2 BGA_Am09KBBI Phycocyanin (BGA_PC) (w686 - w658) / (w686 + w658) Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S.; Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery, Optics Express, 2009, 17, 11, 1-13.
3 BGA_Be162B643sub629 Phycocyanin (BGA_PC) w644 - w629 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538.
4 BGA_Be162B700sub601 Phycocyanin (BGA_PC) w700 - w601 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539.
5 BGA_Be162BsubPhy Phycocyanin (BGA_PC) w715 - w615 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540.
6 BGA_Be16FLHBlueRedNIR Phycocyanin (BGA_PC) w658 - (w857 + (w458 - w857)) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538.
7 BGA_Be16FLHGreenRedNIR Phycocyanin (BGA_PC) w658 - (w857 + (w558 - w857)) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539.
8 BGA_Be16FLHVioletRedNIR Phycocyanin (BGA_PC) w658 - (w857 + (w444 - w857)) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538.
9 BGA_Be16MPI Phycocyanin (BGA_PC) (w615 - w601) - (w644 - w601) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539.
10 BGA_Be16NDPhyI Phycocyanin (BGA_PC) (w700 - w622) / (w700 + w622) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 540.
11 BGA_Be16NDPhyI644over615 Phycocyanin (BGA_PC) (w644 - w615) / (w644 + w615) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 541.
12 BGA_Be16NDPhyI644over629 Phycocyanin (BGA_PC) (w644 - w629) / (w644 + w629) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 542.
13 BGA_Be16Phy2BDA644over629 Phycocyanin (BGA_PC) w644 / w629 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 545.
14 BGA_Da052BDA Phycocyanin (BGA_PC) w714 / w672 Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672.
15 BGA_Go04MCI Phycocyanin (BGA_PC) w709 - w681 - (w753 - w681) Gower, J.F.R.; Brown,L.; Borstad, G.A.; Observation of chlorophyll fluorescence in west coast waters of Canada using the MODIS satellite sensor. Can. J. Remote Sens., 2004, 30 (1), 17?5.
16 BGA_HU103BDA Phycocyanin (BGA_PC) (((1 / w615) - (1 / w600)) - w725) Hunter, P.D.; Tyler, A.N.; Willby, N.J.; Gilvear, D.J.; The spatial dynamics of vertical migration by Microcystis aeruginosa in a eutrophic shallow lake: A case study using high spatial resolution time-series airborne remote sensing. Limn. Oceanogr. 2008, 53, 2391-2406
17 BGA_Ku15PhyCI Phycocyanin (BGA_PC) -1 * (W681 - W665 - (W709 - W665)) Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-10.
18 BGA_Ku15SLH Phycocyanin (BGA_PC) (w715 - w658) + (w715 - w658) Kudela, R.M., Palacios, S.L., Austerberry, D.C., Accorsi, E.K., Guild, L.S.; Application of hyperspectral remote sensing to cyanobacterial blooms in inland waters, Torres-Perez, J., 2015, Remote Sens. Environ., 2015, 167, 1-11.
19 BGA_MI092BDA Phycocyanin (BGA_PC) w700 / w600 Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758?75.
20 BGA_MM092BDA Phycocyanin (BGA_PC) w724 / w600 Mishra, S.; Mishra, D.R.; Schluchter, W. M., A novel algorithm for predicting PC concentrations in cyanobacteria: A proximal hyperspectral remote sensing approach. Remote Sens., 2009, 1, 758?76.
21 BGA_MM12NDCIalt Phycocyanin (BGA_PC) (w700 - w658) / (w700 + w658) Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114003
22 BGA_MM143BDAopt Phycocyanin (BGA_PC) ((1 / w629) - (1 / w659)) * w724 Mishra, S.; Mishra, D.R.; A novel remote sensing algorithm to quantify phycocyanin in cyanobacterial algal blooms, Env. Res. Lett., 2014, 9 (11), DOI:10.1088/1748-9326/9/11/114004
23 BGA_SI052BDA Phycocyanin (BGA_PC) w709 / w620 Simis, S. G. H.; Peters, S.W. M.; Gons, H. J.; Remote sensing of the cyanobacteria pigment phycocyanin in turbid inland water. Limn. Oceanogr., 2005, 50, 237?45
24 BGA_SM122BDA Phycocyanin (BGA_PC) w709 / w600 Mishra, S. Remote sensing of cyanobacteria in turbid productive waters, PhD Dissertation. Mississippi State University, USA. 2012.
25 BGA_SY002BDA Phycocyanin (BGA_PC) w650 / w625 Schalles, J.; Yacobi, Y. Remote detection and seasonal patterns of phycocyanin, carotenoid and chlorophyll-a pigments in eutrophic waters. Archiv fur Hydrobiologie, Special Issues Advances in Limnology, 2000, 55,153?68
26 BGA_Wy08CI Phycocyanin (BGA_PC) -1 * (W686 - W672 - (W715 - W672)) Wynne, T. T., Stumpf, R. P., Tomlinson, M. C., Warner, R. A., Tester, P. A., Dyble, J.; Relating spectral shape to cyanobacterial blooms in the Laurentian Great Lakes. Int. J. Remote Sens., 2008, 29, 3665-3672.
27 Chl_Al10SABI chlorophyll_a (w857 - w644) / (w458 + w529) Alawadi, F. Detection of surface algal blooms using the newly developed algorithm surface algal bloom index (SABI). Proc. SPIE 2010, 7825.
28 Chl_Am092Bsub chlorophyll_a w681 - w665 Amin, R.; Zhou, J.; Gilerson, A.; Gross, B.; Moshary, F.; Ahmed, S. Novel optical techniques for detecting and classifying toxic dinoflagellate Karenia brevis blooms using satellite imagery. Opt. Express 2009, 17, 9126?144.
29 Chl_Be16FLHblue chlorophyll_a w529 - (w644 + (w458 - w644)) Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30.
30 Chl_Be16FLHviolet chlorophyll_a w529 - (w644 + (w429 - w644)) Beck, R.A. and 22 others; Comparison of satellite reflectance algorithms for estimating chlorophyll-a in a temperate reservoir using coincident hyperspectral aircraft imagery and dense coincident surface observations, Remote Sens. Environ., 2016, 178, 15-30.
31 Chl_Be16NDTIblue chlorophyll_a (w658 - w458) / (w658 + w458) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 543.
32 Chl_Be16NDTIviolet chlorophyll_a (w658 - w444) / (w658 + w444) Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 544.
33 Chl_De933BDA chlorophyll_a w600 - w648 - w625 Dekker, A.; Detection of the optical water quality parameters for eutrophic waters by high resolution remote sensing, Ph.D. thesis, 1993, Free University, Amsterdam.
34 Chl_Gi033BDA chlorophyll_a ((1 / w672) - (1 / w715)) * w757 Gitelson, A.A.; U. Gritz, and M. N. Merzlyak.; Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves. J. Plant Phys. 2003, 160, 271-282.
35 Chl_Kn07KIVU chlorophyll_a (w458 - w644) / w529 Kneubuhler, M.; Frank T.; Kellenberger, T.W; Pasche N.; Schmid M.; Mapping chlorophyll-a in Lake Kivu with remote sensing methods. 2007, Proceedings of the Envisat Symposium 2007, Montreux, Switzerland 23?7 April 2007 (ESA SP-636, July 2007).
36 Chl_MM12NDCI chlorophyll_a (w715 - w686) / (w715 + w686) Mishra, S.; and Mishra, D.R. Normalized difference chlorophyll index: A novel model for remote estimation of chlorophyll-a concentration in turbid productive waters, Remote Sens. Environ., 2012, 117, 394-406
37 Chl_Zh10FLH chlorophyll_a w686 - (w715 + (w672 - w751)) Zhao, D.Z.; Xing, X.G.; Liu, Y.G.; Yang, J.H.; Wang, L. The relation of chlorophyll-a concentration with the reflectance peak near 700 nm in algae-dominated waters and sensitivity of fluorescence algorithms for detecting algal bloom. Int. J. Remote Sens. 2010, 31, 39-48
38 Turb_Be16GreenPlusRedBothOverViolet Turbidity (w558 + w658) / w444 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 538
39 Turb_Be16RedOverViolet Turbidity w658 / w444 Beck, R.; Xu, M.; Zhan, S.; Liu, H.; Johansen, R.A.; Tong, S.; Yang, B.; Shu, S.; Wu, Q.; Wang, S.; Berling, K.; Murray, A.; Emery, E.; Reif, M.; Harwood, J.; Young, J.; Martin, M.; Stillings, G.; Stumpf, R.; Su, H.; Ye, Z.; Huang, Y. Comparison of Satellite Reflectance Algorithms for Estimating Phycocyanin Values and Cyanobacterial Total Biovolume in a Temperate Reservoir Using Coincident Hyperspectral Aircraft Imagery and Dense Coincident Surface Observations. Remote Sens. 2017, 9, 539
40 Turb_Bow06RedOverGreen Turbidity w658 / w558 Bowers, D. G., and C. E. Binding. 2006. 鈥淭he Optical Properties of Mineral Suspended Particles: A Review and Synthesis.?Estuarine Coastal and Shelf Science 67 (1?): 219?30. doi:10.1016/j.ecss.2005.11.010
41 Turb_Chip09NIROverGreen Turbidity w857 / w558 Chipman, J. W.; Olmanson, L.G.; Gitelson, A.A.; Remote sensing methods for lake management: A guide for resource managers and decision-makers. 2009.
42 Turb_Dox02NIRoverRed Turbidity w857 / w658 Doxaran, D., Froidefond, J.-M.; Castaing, P. ; A reflectance band ratio used to estimate suspended matter concentrations in sediment-dominated coastal waters, Remote Sens., 2002, 23, 5079-5085
43 Turb_Frohn09GreenPlusRedBothOverBlue Turbidity (w558 + w658) / w458 Frohn, R. C., & Autrey, B. C. (2009). Water quality assessment in the Ohio River using new indices for turbidity and chlorophyll-a with Landsat-7 Imagery. Draft Internal Report, US Environmental Protection Agency.
44 Turb_Harr92NIR Turbidity w857 Schiebe F.R., Harrington J.A., Ritchie J.C. Remote-Sensing of Suspended Sediments鈥攖he Lake Chicot, Arkansas Project. Int. J. Remote Sens. 1992;13:1487?509
45 Turb_Lath91RedOverBlue Turbidity w658 / w458 Lathrop, R. G., Jr., T. M. Lillesand, and B. S. Yandell, 1991. Testing the utility of simple multi-date Thematic Mapper calibration algorithms for monitoring turbid inland waters. International Journal of Remote Sensing
46 Turb_Moore80Red Turbidity w658 Moore, G.K., Satellite remote sensing of water turbidity, Hydrological Sciences, 1980, 25, 4, 407-422

85
data/格式转化.py Normal file
View File

@ -0,0 +1,85 @@
import os
from pathlib import Path
from PIL import Image
def batch_convert_to_ico(source_dirs, output_dir, target_size=(256, 256)):
"""
批量将指定目录下的图像文件转换为 ICO 格式。
:param source_dirs: 包含源文件夹路径的列表
:param output_dir: 转换后 ICO 文件的保存目录
:param target_size: 输出 ICO 的尺寸,默认 256x256
"""
# 支持的常见输入图像后缀
supported_extensions = {'.png', '.jpg', '.jpeg', '.bmp', '.webp', '.tiff'}
# 确保输出目录存在,若无则自动创建
out_path = Path(output_dir)
out_path.mkdir(parents=True, exist_ok=True)
total_converted = 0
total_failed = 0
print("=" * 50)
print(f"🚀 开始批量转换 ICO 图标...")
print(f"📁 目标输出目录: {out_path}")
print("=" * 50)
# 遍历所有传入的源目录
for folder in source_dirs:
folder_path = Path(folder)
if not folder_path.exists():
print(f"⚠️ 警告: 源目录不存在,已跳过 -> {folder_path}")
continue
print(f"\n📂 正在扫描目录: {folder_path}")
# 遍历目录下的所有文件
for file_path in folder_path.iterdir():
# 仅处理普通文件且后缀在支持列表内(忽略大小写)
if file_path.is_file() and file_path.suffix.lower() in supported_extensions:
try:
with Image.open(file_path) as img:
# 处理透明通道问题:
# 如果图片支持透明通道 (RGBA/P/LA),转为 RGBA 确保透明背景不丢失
# 如果是普通 RGB (如 JPG),转为 RGB
if img.mode in ('RGBA', 'LA') or (img.mode == 'P' and 'transparency' in img.info):
img_clean = img.convert('RGBA')
else:
img_clean = img.convert('RGB')
# 构造输出文件名 (原文件名.ico)
new_filename = f"{file_path.stem}.ico"
save_path = out_path / new_filename
# 如果目标文件夹中已存在同名文件,为了防止覆盖,可以在文件名后加个标识
# 但通常图标库同名直接覆盖较符合需求,这里默认直接保存
img_clean.save(save_path, format="ICO", sizes=[target_size])
print(f" ✅ 成功: {file_path.name} -> {new_filename}")
total_converted += 1
except Exception as e:
print(f" ❌ 失败: 无法转换 {file_path.name},错误信息: {e}")
total_failed += 1
print("\n" + "=" * 50)
print("🎉 转换任务结束!")
print(f"统计: 成功转换 {total_converted} 个文件,失败 {total_failed} 个。")
print("=" * 50)
if __name__ == "__main__":
# 1. 定义你要读取的两个源文件夹路径列表
SOURCES = [
r"D:\111\office\ZHLduijie\1.WQ\WQ_GUI\data\icons",
r"D:\111\office\ZHLduijie\1.WQ\WQ_GUI\data\icons\word"
]
# 2. 定义统一输出的目标文件夹路径
OUTPUT = r"D:\111\office\ZHLduijie\1.WQ\WQ_GUI\data\icons-1"
# 执行转换
batch_convert_to_ico(SOURCES, OUTPUT)

View File

@ -0,0 +1,350 @@
# Smoke Test — 路线 B MVPPipelineContext + AutoML + 软取消 + GUI 缝合)
> 适用范围:路线 B 重构 4 部分pipeline 包 / AutoML 训练器 / WorkerThread 软取消 / GUI 一键全自动)落盘后的端到端点火试飞清单。
> 目标:**用最小数据集1 个 BSQ + 1 个 CSV在 1020 分钟内验证全链路打通**。
---
## 0. 前置准备5 分钟)
### 0.1 装 Optuna
`environment.yml` 当前**未列** optuna属于本次重构新增依赖。若不装Step 6 会自动降级到老 GridSearchCV仍能跑通但会触发 fallback 日志)。
```bash
call venv\Scripts\activate.bat
pip install "optuna>=3.6,<4.0"
```
写入 `environment.yml` 的 patch提交时改
```yaml
# 路线 B AutoML 防爆引擎(可选;未装时 Step 6 走老 GridSearchCV 降级路径)
- optuna>=3.6
```
### 0.2 准备最小数据集
```text
work_dir_smoke/
├── raw/
│ ├── sample.b # 假彩色 BSQ任意小分辨率都行建议 50×50×6 波段)
│ ├── sample_mask.tif # (可选)水域掩膜;不提供则 Step 1 自动生成 NDWI
│ └── sample.csv # 含 36 个水质参数目标列Chl-a / TSS / SD / TN / TP / COD…+ 6 列波段反射率
└── (其他文件由流程自动生成)
```
**CSV 模板示例**`feature_start_column` 默认为第一列;目标列必须**在特征列之前**
```csv
Chl-a,TSS,SD,B1,B2,B3,B4,B5,B6
12.3,15.1,0.8,0.045,0.052,0.038,0.061,0.072,0.085
11.8,14.2,0.9,0.044,0.051,0.037,0.060,0.071,0.084
... (≥ 200 行AutoML 智能子采样 N>5000 时才生效)
```
### 0.3 启动 venv
```bash
cd /d "D:\111\office\ZHLduijie\1.WQ\WQ_GUI"
call venv\Scripts\activate.bat
set PYTHONPATH=src;%PYTHONPATH%
```
---
## 1. CLI 烟雾最快路径3 分钟)— **A 级:必跑**
跳过 GUI直接验证 `automl_trainer.py` 自身可独立运行 + Optuna 子采样 + 降级路径:
```bash
python -m src.core.prediction.automl_trainer ^
--csv work_dir_smoke/raw/sample.csv ^
--feature-start 6 ^
--n-trials 5 ^
--timeout 60.0 ^
--out work_dir_smoke/7_Supervised_Model_Training_AutoML
```
**通过标准**
- [ ] 进程退出码 0
- [ ] 控制台打印 `AutoML: 目标列 X 共尝试 N 个 trial最佳 CV R²=…`
- [ ] `<out>/<preprocess>/<target>_<preprocess>_<model>_AUTOML.joblib` 存在
- [ ] `<out>/automl_summary.json` 存在且 `success=true`
**若 Optuna 未装**,期待看到:
```
[AutoML] optuna 未安装,全目标列回退老 GridSearchCV
```
产物文件名带 `_AUTOML` 后缀的逻辑此时**不会触发**fallback 走老路径),属正常。
---
## 2. GUI 端到端 9 步核心场景1020 分钟)— **S 级:必跑**
### 2.1 启动 GUI
```bash
call venv\Scripts\activate.bat
set PYTHONPATH=src;%PYTHONPATH%
python -m src.gui.water_quality_gui
```
### 2.2 UI 配置
| 步骤 | 操作 | 期望 |
| ----- | -------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| 1/9 | 点"选择工作目录" → 选 `work_dir_smoke/` | 左侧步骤列表高亮UI 不报错 |
| 2/9 | 在 Step 1 面板选 `sample.b`**掩膜留空**(验证 NDWI 自动生成路径) | 掩膜文本框保持空白 |
| 3/9 | 在 Step 4 面板选 `sample.csv` | CSV 路径显示正确 |
| 4/9 | **关键**其他步骤2/3/5/5.5/6/7/8/9保持默认不改任何参数 | AutoML 默认开启use_automl=True |
| 5/9 | 点 **▶ 运行完整流程**(不要用老 `run_full_pipeline` 槽) | 弹出**二次确认窗**,文案显示:<br>• 掩膜:`未指定(将自动生成 NDWI 水域掩膜)`<br>• 去耀斑:开启<br>• AutoML开启Optuna 子采样寻优) |
| 6/9 | 点"是(Y)" | "运行"按钮变灰,"停止"按钮亮起;进度条归零 |
### 2.3 观察日志(重点 4 大检查点)
#### ✅ 检查点 1ctx 路径传递
启动后**第一秒**应看到类似:
```
[Runner] ctx 已构造14 路径字段4 目录字段
[Runner] 步骤 1/14step1_generate_water_maskrequires=['raw_img_path', 'water_mask_path']
[Runner] 步骤 2/14step2_find_glint_arearequires=['raw_img_path', 'water_mask_path', 'output_dir']
...
[Runner] ctx 路径校准water_mask_path = ...\work_dir_smoke\2_Glint_Area_Mask\glint_mask.tif
```
**若没有 `[Runner]` 日志**,说明 v1 旧路径被走到了,**`inspect.signature` duck-type 没探测到 v2**,回去检查 `worker_thread.py:run()`
#### ✅ 检查点 2Step 1 NDWI 自动生成
```
[Step1] 未指定 mask_path自动基于 NDWI 生成水域掩膜
[Step1] NDWI 阈值=0.4,写入 1_Water_Mask/water_mask.tif
```
→ 验证 `<work_dir>/1_Water_Mask/water_mask.tif` 文件存在且非空。
#### ✅ 检查点 3AutoML 启用
```
[Step6] AutoML 启用 Optuna 子采样寻优timeout=300s, n_trials=20, max_samples=5000
[Step6] 目标列 'Chl-a' 共 3 个候选模型,最佳 R²=0.812model=RandomForest
[Step6] 目标列 'TSS' 共 3 个候选模型,最佳 R²=0.745model=XGBoost
[Step6] 训练完成,产物写入 7_Supervised_Model_Training_AutoML/
[Step6] automl_summary.json 写入完成
```
→ 验证产物:
- [ ] `7_Supervised_Model_Training_AutoML/<preprocess>/<target>_<preprocess>_<model>_AUTOML.joblib` ≥ 1 个
- [ ] `7_Supervised_Model_Training_AutoML/automl_summary.json``automl: true` 字段
- [ ] 老目录 `7_Supervised_Model_Training/` **不应该被创建**AutoML 路径独立)
#### ✅ 检查点 4AutoML 降级(仅未装 Optuna 时)
```
[AutoML] optuna 未安装,全目标列回退老 GridSearchCV
[Step6] 降级路径:调用 WaterQualityModelingBatch.train_models_batch132 组 GridSearchCV
```
→ 跑通即可(仍能产生模型文件),但**降级**属于非优选路径。
### 2.4 9 步全程观察清单
| 步 | 期望产物(路径相对 `work_dir` | 期望耗时50×50 测试数据) |
| ---- | -------------------------------------------------------------- | -------------------------- |
| 1 | `1_Water_Mask/water_mask.tif` | < 5 s |
| 2 | `2_Glint_Area_Mask/glint_mask.tif` | < 5 s |
| 3 | `3_Remove_Glint_Image/deglint_image.tif` | < 5 s |
| 4 | `4_Process_CSV/processed_data.csv` | < 2 s |
| 5 | `5_Training_Sample/training_spectra.csv` | < 5 s |
| 5.5 | `5_5_Calculate_Indices/indices.csv`如启用 | < 2 s |
| **6**| `7_Supervised_Model_Training_AutoML/`**新路径** | **< 5 minOptuna 5 trial** |
| 6.5 | `6_5_Non_Empirical_Modeling/`如启用 | 12 min |
| 6.75 | `6_75_Custom_Regression/`如启用 | 12 min |
| 7 | `7_Sampling_Points/sampling_points.csv` | < 3 s |
| 8 | `8_Prediction/predicted_values.csv` | < 5 s |
| 8.5 | `8_5_Prediction_Non_Empirical/predicted.csv`如启用 | < 5 s |
| 8.75 | `8_75_Prediction_Custom/predicted.csv`如启用 | < 5 s |
| 9 | `9_Kriging_Distribution_Map/distribution_map.tif` | 530 s Python |
### 2.5 流程结束
- [ ] 进度条到 100%
- [ ] "运行"按钮恢复可点
- [ ] "停止"按钮变灰
- [ ] 日志末行出现 `=== 流程执行完成 ===` `=== 流程被取消 ===`取决于是否点过停止
- [ ] 控制台 `on_pipeline_finished` 触发UI 状态被统一恢复
---
## 3. 软取消测试3 分钟)— **A 级:必跑**
验证 `threading.Event` 软取消链路不再用 `terminate()`)。
### 3.1 启动完整流程
2.2 启动流程
### 3.2 中途点"停止"
**时机** Step 6 AutoML trials 的中途看到 `[Step6] 目标列 'Chl-a' 共 N 个候选模型` 之后任意时刻"停止"。
**期望看到**
```
[STOP] 用户请求软取消
[Step6] 检测到 cancel_event本 trial 完成后退出
[Step6] AutoML 在 trial #X 中止,已完成 5/20 trial
[Runner] 软取消:跳过剩余 8 个 step
=== 流程被取消 ===
```
UI 状态
- [ ] "运行"按钮重新亮起
- [ ] "停止"按钮变灰
- [ ] 进度条保留在中断时的百分比****归零
- [ ] `on_pipeline_finished` 触发 `success=False, cancelled=True` 区分
- [ ] **Python 进程不退出**GUI 仍可继续点"运行"开新流程
**反例(不应该发生)**
- `QThread: Destroyed while thread is still running` 警告
- Python 解释器直接崩溃
- UI 永远卡死`run_all_btn` 一直是灰的
### 3.3 旧 `stop()` 路径回归
为防老代码忘了改临时把 `water_quality_gui.py:stop_pipeline` 改回 `self.worker.stop()`跑一次完整流程看是否出现
```
[DEPRECATED] WorkerThread.stop() 已弃用,请改用 soft_stop()。
```
**这是预期行为**弃用方法保留但打 warning流程仍能完成即视为通过
---
## 4. 失败 / 降级场景5 分钟)— **B 级:选跑**
### 4.1 未填掩膜 + NDWI 阈值设极端值
NDWI 阈值设到 `0.9`几乎无水域Step 1 应给出 warning 但不崩
```
[Step1] NDWI 阈值=0.9,水域覆盖率 < 1%,请检查影像
```
### 4.2 CSV 完全无目标列
准备一个**没有目标列的 CSV**全特征列点运行
```
[AutoML] 训练 CSV 不存在或无目标列:未识别出目标列
[Step6] AutoML 全部失败,所有目标列返回 success=False
```
UI 不会崩会在 `automl_summary.json` `error: "未识别出目标列"`
### 4.3 Step 1 路径不存在
Step 1 选了一个**不存在的 .bsq 文件**
```
[Runner] step1_generate_water_mask 异常FileNotFoundError
[STOP] 流程中止在 step 1
```
UI 弹错误窗 + 把左侧步骤列表 `setCurrentRow(0)` 自动定位到 Step 1`_focus_step` 起效)。
### 4.4 Optuna 版本冲突
装一个 `optuna==2.10`API 大改 GUI
```
[AutoML] optuna API 不兼容(>=3.6 要求):<error>
[AutoML] 全目标列回退老 GridSearchCV
```
降级路径生效即视为通过
---
## 5. 验证矩阵 Checklist
复制以下到 PR 描述 / 验收单
```markdown
## 路线 B MVP 验证矩阵
### 代码落盘
- [ ] src/core/pipeline/__init__.py17 行4 export
- [ ] src/core/pipeline/context.pyPipelineContext dataclass
- [ ] src/core/pipeline/runner.pyStepSpec + PIPELINE_STEPS + PipelineRunner
- [ ] src/core/prediction/__init__.py追加 train_with_automl export
- [ ] src/core/prediction/automl_trainer.pyAutoMLResult + train_with_automl + CLI
- [ ] src/core/steps/modeling_step.pyuse_automl 分支 + _train_models_automl
- [ ] src/core/water_quality_inversion_pipeline_GUI.pyrun_full_pipeline_v2 + LEGACY_ATTR_MAP + _sync_legacy_attrs_from_context
- [ ] src/gui/core/worker_thread.pycancel_event + soft_stop + run() duck-type
- [ ] src/gui/water_quality_gui.pyon_run_all_clicked + _collect_minimal_config + 按钮重连)
### CLI 自测
- [ ] A.1 `python -m src.core.prediction.automl_trainer --csv ...` 退出码 0
- [ ] A.2 产物 .joblib 含 `_AUTOML` 后缀
- [ ] A.3 automl_summary.json 含 success=true
### GUI 端到端
- [ ] B.1 启动无 ImportError
- [ ] B.2 二次确认窗文案含 mask 提示 + AutoML 状态
- [ ] B.3 日志含 [Runner] 前缀v2 路径生效)
- [ ] B.4 Step 1 NDWI 自动生成路径生效
- [ ] B.5 9 步产物路径全部存在
- [ ] B.6 流程结束后 UI 状态恢复(运行按钮亮、停止按钮灰)
### 软取消
- [ ] C.1 流程中途点停止cancel_event 触发
- [ ] C.2 流程被取消而非崩溃
- [ ] C.3 UI 状态由 on_pipeline_finished 统一恢复
- [ ] C.4 旧 stop() 调用打 [DEPRECATED] warning
### 降级
- [ ] D.1 Optuna 未装 → 全目标列回退老 GridSearchCV
- [ ] D.2 无目标列 CSV → 写 error 到 summary不崩 UI
- [ ] D.3 不存在文件 → _focus_step 定位到对应 step
```
---
## 6. 已知未做(不在本次范围)
- [ ] Kriging 多进程并行当前 backend="loop" Python
- [ ] Step 5 radius==0 内存优化整波段读入
- [ ] 进度条 sub-step 粒度当前只到 step
- [ ] Step 8 全图预测当前只对采样点预测
- [ ] 全项目搜替换老 `self.worker.stop()` 调用仅本会话改了 `water_quality_gui.py` stop_pipeline
- [ ] `requirements.txt` 同步 Optuna `environment.yml`
- [ ] 单元测试套件`tests/` 目录为空建议用 pytest 覆盖 train_with_automl / PipelineRunner
---
## 7. 出问题找哪里
| 现象 | 看哪里 |
| --------------------------------------------- | ------------------------------------------------------- |
| `[Runner]` 日志没出来 | `worker_thread.py:run()` `inspect.signature` 探测 |
| `[AutoML]` 完全没打 | `modeling_step.py:170` `if use_automl` 是否进了 |
| AutoML `optuna API 不兼容` | `automl_trainer.py:236` `try import` |
| 软取消无反应 | `worker_thread.py:run()` 末尾的 `cancel_event.is_set()` |
| 二次确认窗没出来 | `water_quality_gui.py:on_run_all_clicked` line ~2848 |
| 9 步产物路径错位 | `pipeline/runner.py:PIPELINE_STEPS` `output` 字段 |
| v1 路径被走到 | `_sync_legacy_attrs_from_context` 没调 v2 异常 |
---
> **作者注**:本清单对应**路线 B 一键全自动重构 4 部分全部落盘**的验收场景,编号与 todo 8 同步。
> 跑通 §1 + §2 + §3 三段即视为 MVP 验收通过§4 用于鲁棒性抽查。

8
license.lic Normal file
View File

@ -0,0 +1,8 @@
{
"version": "1.0",
"product": "WaterQualityInversion",
"machine_code": "76E4992A5CF08BA570D6150908E04755",
"generated_at": "2026-05-28 14:21:35",
"expiry": "2099-12-31",
"signature": "DC9AB900D7033A281E54F41F3F76D026FFA75D635484D40C7F6FC1F6023E02AB"
}

6
run_smoke.bat Normal file
View File

@ -0,0 +1,6 @@
@echo off
cd /d "D:\111\office\ZHLduijie\1.WQ\WQ_GUI"
call venv\Scripts\activate.bat
set PYTHONPATH=new\app\api;%PYTHONPATH%
python -c "import _smoke_test_train; _smoke_test_train.test_load_train_df(); _smoke_test_train.test_get_model_pipeline_all_types(); _smoke_test_train.test_run_train_sync_linearregression_fast(); _smoke_test_train.test_run_train_sync_bad_csv(); _smoke_test_train.test_run_train_sync_bad_target(); print('OK')" > %TEMP%\smoke_log.txt 2>&1
type %TEMP%\smoke_log.txt

View File

@ -16,6 +16,15 @@ from src.core.algorithms.glint_detection.detectors import (
remove_shoreline_buffer,
calculate_glint_mask,
)
from src.core.algorithms.qaa.qaas_baseline import QAABaselineSolver
from src.core.algorithms.concentration_inversion import (
ChlorophyllInversion,
CDOMInversion,
TurbidityInversion,
TotalNitrogenInversion,
TotalPhosphorusInversion,
ConcentrationPipeline,
)
__all__ = [
# 插值
@ -33,4 +42,13 @@ __all__ = [
'create_shoreline_buffer',
'remove_shoreline_buffer',
'calculate_glint_mask',
# QAA
'QAABaselineSolver',
# 浓度反演
'ChlorophyllInversion',
'CDOMInversion',
'TurbidityInversion',
'TotalNitrogenInversion',
'TotalPhosphorusInversion',
'ConcentrationPipeline',
]

View File

@ -0,0 +1,670 @@
# -*- coding: utf-8 -*-
"""
水质浓度反演模块
基于 QAA Step 8 输出的光谱吸收/散射系数 (a_lambda, bb_lambda)
通过生物光学模型反演水质参数浓度。
主要反演目标:
- 叶绿素 A (Chl-a)675nm 吸收峰法
- 浊度 (Turbidity):后向散射系数法
- CDOM 吸收系数 a_dg(440):指数衰减法
- 总氮 (TN) / 总磷 (TP):光学代理回归框架
参考:
- Lee, Z.P. et al. (2002/2010/2014) QAA 系列
- Bricaud, A. et al. (1998) Limnol. Oceanogr. — 叶绿素比吸收系数
- Carder, K.L. et al. (1999) Marine Technology Society — CDOM 指数衰减
"""
from __future__ import annotations
import os
from typing import Dict, List, Optional, Tuple, Union
import numpy as np
import pandas as pd
# ------------------------------------------------------------------
# 公共系数表(来自 Bricaud et al. 1998 等文献,内陆水体典型值)
# ------------------------------------------------------------------
# 叶绿素比吸收系数 a*_ph(675) 单位m²/mg
# 随叶绿素浓度范围变化Bricaud 经验值
CHLA_SPECIFIC_ABSORPTION: Dict[str, float] = {
"low": 0.055, # 寡营养水体Chla < 5 mg/m³
"medium": 0.040, # 中营养Chla 5-30 mg/m³
"high": 0.028, # 富营养Chla 30-100 mg/m³
"bloom": 0.020, # 藻华Chla > 100 mg/m³
}
# CDOM 指数衰减斜率 S单位nm⁻¹内陆水体典型范围 0.010-0.025
CDOM_S_LOOKUP: Dict[str, float] = {
"low_turbidity": 0.010, # 清澈寡营养
"medium_turbidity": 0.015, # 中等浊度
"high_turbidity": 0.020, # 高浊度富营养
"bloom": 0.025, # 藻华主导
}
# 纯水吸收系数表400-800nmBabin et al. 2003 简化值单位m⁻¹
PURE_WATER_A: Dict[int, float] = {
400: 0.0064, 410: 0.0066, 420: 0.0068, 430: 0.0072,
440: 0.0080, 450: 0.0092, 460: 0.0105, 470: 0.0120,
480: 0.0135, 490: 0.0155, 500: 0.0175, 510: 0.0200,
520: 0.0230, 530: 0.0270, 540: 0.0315, 550: 0.0370,
560: 0.0435, 570: 0.0510, 580: 0.0600, 590: 0.0710,
600: 0.0830, 610: 0.0960, 620: 0.1110, 630: 0.1280,
640: 0.1470, 650: 0.1680, 660: 0.1920, 670: 0.2180,
675: 0.2450, 680: 0.2750, 690: 0.3100, 700: 0.3500,
710: 0.3950, 720: 0.4450, 730: 0.5000, 740: 0.5600,
750: 0.6250, 760: 0.6950, 770: 0.7700, 780: 0.8500,
790: 0.9300, 800: 1.0100,
}
def _interp_pure_water_a(wavelength: float) -> float:
"""线性插值获取纯水吸收系数"""
wl_int = {k for k in PURE_WATER_A if k <= int(wavelength)}
if not wl_int:
return PURE_WATER_A[min(PURE_WATER_A.keys())]
k_low = max(wl_int)
k_high = min({k for k in PURE_WATER_A if k >= int(wavelength)} or {k_low})
if k_low == k_high:
return float(PURE_WATER_A[k_low])
w = (wavelength - k_low) / (k_high - k_low)
return float(PURE_WATER_A[k_low]) * (1 - w) + float(PURE_WATER_A[k_high]) * w
# ------------------------------------------------------------------
# 叶绿素反演器
# ------------------------------------------------------------------
class ChlorophyllInversion:
"""
基于 675nm 吸收峰法的叶绿素 A 浓度反演。
原理:
总吸收 a(675) = a_w(675) + a_ph(675) + a_dg(675)
其中 a_ph(675) 是叶绿素特征吸收峰,
a_dg(675) ≈ a_dg(440) * exp(-S * (675-440))
步骤:
1. 从 a(λ) 减去纯水吸收 a_w(λ)
2. 用线性基线法估算 a_dg(675)baseline(675) = mean[a(665), a(685)]
3. a_ph(675) = a(675) - a_w(675) - baseline(675)
4. Chla = a_ph(675) / a*_ph(675)
Parameters
----------
specific_absorption : float, optional
叶绿素比吸收系数 a*_ph(675),单位 m²/mg。
若为 None使用浓度自适应估算逻辑。
lake_case : str, optional
水体类型标识,用于自动选择比吸收系数,
支持 "oligotrophic_clear" / "medium" / "bloom_dominant" / "turbid_mixed"
"""
def __init__(
self,
specific_absorption: Optional[float] = None,
lake_case: Optional[str] = None
):
self.specific_absorption = specific_absorption
self.lake_case = lake_case or "medium"
def run_inversion(
self,
wavelengths: np.ndarray,
a_lambda: np.ndarray,
bb_lambda: Optional[np.ndarray] = None
) -> Dict:
"""
执行叶绿素 A 反演。
Parameters
----------
wavelengths : np.ndarray
波长数组nm形状 (n_bands,)。
a_lambda : np.ndarray
总吸收系数 a(λ),形状 (n_bands,)。
bb_lambda : np.ndarray, optional
后向散射系数(暂未使用,保留扩展接口)。
Returns
-------
dict
包含键:
- chla_mg_m3 : 叶绿素 A 浓度mg/m³
- a_ph_675 : 675nm 处叶绿素吸收m⁻¹
- baseline_675 : 675nm 处 CDOM+NAP 基线m⁻¹
- a_w_675 : 纯水吸收m⁻¹
"""
wavelengths = np.asarray(wavelengths, dtype=np.float64)
a_lambda = np.asarray(a_lambda, dtype=np.float64)
aw_675 = _interp_pure_water_a(675.0)
wl_arr = wavelengths
a_arr = a_lambda
a_665 = float(np.interp(665, wl_arr, a_arr, left=np.nan, right=np.nan))
a_675 = float(np.interp(675, wl_arr, a_arr, left=np.nan, right=np.nan))
a_685 = float(np.interp(685, wl_arr, a_arr, left=np.nan, right=np.nan))
if not np.isfinite(a_665) or not np.isfinite(a_675) or not np.isfinite(a_685):
return {
"chla_mg_m3": np.nan,
"a_ph_675": np.nan,
"baseline_675": np.nan,
"a_w_675": aw_675,
"warning": "675nm 波段缺失,无法进行叶绿素反演",
}
baseline_675 = (a_665 + a_685) / 2.0
a_ph_675 = max(a_675 - aw_675 - baseline_675, 0.0)
if self.specific_absorption is not None:
a_star = self.specific_absorption
else:
a_star = self._adaptive_specific_absorption(a_ph_675)
if a_star <= 0:
return {
"chla_mg_m3": np.nan,
"a_ph_675": a_ph_675,
"baseline_675": baseline_675,
"a_w_675": aw_675,
"warning": "比吸收系数为非正值",
}
chla = a_ph_675 / a_star
return {
"chla_mg_m3": chla,
"a_ph_675": a_ph_675,
"baseline_675": baseline_675,
"a_w_675": aw_675,
}
def _adaptive_specific_absorption(self, a_ph_675: float) -> float:
"""根据 a_ph(675) 量级自适应选择比吸收系数"""
if a_ph_675 < 0.05:
return CHLA_SPECIFIC_ABSORPTION["low"]
elif a_ph_675 < 0.2:
return CHLA_SPECIFIC_ABSORPTION["medium"]
elif a_ph_675 < 0.5:
return CHLA_SPECIFIC_ABSORPTION["high"]
else:
return CHLA_SPECIFIC_ABSORPTION["bloom"]
def invert_to_csv(
self,
input_csv: str,
output_csv: str,
sample_id_col: str = "sample_id"
) -> str:
"""
从 a_lambda_results.csv 批量反演叶绿素并保存结果。
Parameters
----------
input_csv : str
Step 8 输出的 a_lambda_results.csv 路径。
output_csv : str
保存路径。
sample_id_col : str
样本 ID 列名。
Returns
-------
str
输出文件路径。
"""
df = pd.read_csv(input_csv, encoding="utf-8-sig")
df = df.sort_values([sample_id_col, "Wavelength"])
results = []
for sid, group in df.groupby(sample_id_col, sort=False):
wl = group["Wavelength"].values.astype(np.float64)
a = group["a_lambda"].values.astype(np.float64)
res = self.run_inversion(wl, a)
res[sample_id_col] = sid
results.append(res)
out_df = pd.DataFrame(results)
cols = [sample_id_col, "chla_mg_m3", "a_ph_675", "baseline_675", "a_w_675"]
cols = [c for c in cols if c in out_df.columns]
out_df = out_df[cols]
os.makedirs(os.path.dirname(output_csv) or ".", exist_ok=True)
out_df.to_csv(output_csv, index=False, float_format="%.6f")
return output_csv
# ------------------------------------------------------------------
# CDOM 反演器
# ------------------------------------------------------------------
class CDOMInversion:
"""
基于指数衰减模型的 CDOM 吸收系数反演。
原理:
a_dg(λ) = a_dg(λ₀) * exp(-S * (λ - λ₀))
取 λ₀ = 440nm蓝光峰S 由水体类型决定,
通过 a(550) ≈ a_w(550) + a_dg(550) 反推 a_dg(440)。
Parameters
----------
S : float, optional
CDOM 指数衰减斜率nm⁻¹。若为 None根据 lake_case 自动选择。
reference_wavelength : int
参考波长,默认 440nm。
"""
def __init__(
self,
S: Optional[float] = None,
reference_wavelength: int = 440
):
self.S = S
self.ref_wl = reference_wavelength
def run_inversion(
self,
wavelengths: np.ndarray,
a_lambda: np.ndarray
) -> Dict:
"""
执行 CDOM 反演。
Parameters
----------
wavelengths : np.ndarray
波长数组。
a_lambda : np.ndarray
总吸收系数 a(λ)。
Returns
-------
dict
包含键:
- a_dg_440 : 440nm 处 CDOM 吸收m⁻¹
- S : 使用的衰减斜率
"""
wavelengths = np.asarray(wavelengths, dtype=np.float64)
a_lambda = np.asarray(a_lambda, dtype=np.float64)
if self.S is None:
S = CDOM_S_LOOKUP["medium_turbidity"]
else:
S = self.S
a_440 = float(np.interp(440, wavelengths, a_lambda, left=np.nan, right=np.nan))
a_550 = float(np.interp(550, wavelengths, a_lambda, left=np.nan, right=np.nan))
aw_440 = _interp_pure_water_a(440.0)
aw_550 = _interp_pure_water_a(550.0)
a_dg_550 = max(a_550 - aw_550, 0.0)
delta_wl = 550 - self.ref_wl
a_dg_440 = a_dg_550 * np.exp(S * delta_wl)
return {
"a_dg_440": a_dg_440,
"a_dg_550": a_dg_550,
"S": S,
}
def invert_to_csv(
self,
input_csv: str,
output_csv: str,
sample_id_col: str = "sample_id"
) -> str:
"""从 a_lambda_results.csv 批量反演 CDOM 并保存结果。"""
df = pd.read_csv(input_csv, encoding="utf-8-sig")
df = df.sort_values([sample_id_col, "Wavelength"])
results = []
for sid, group in df.groupby(sample_id_col, sort=False):
wl = group["Wavelength"].values.astype(np.float64)
a = group["a_lambda"].values.astype(np.float64)
res = self.run_inversion(wl, a)
res[sample_id_col] = sid
results.append(res)
out_df = pd.DataFrame(results)
cols = [sample_id_col, "a_dg_440", "a_dg_550", "S"]
cols = [c for c in cols if c in out_df.columns]
out_df = out_df[cols]
os.makedirs(os.path.dirname(output_csv) or ".", exist_ok=True)
out_df.to_csv(output_csv, index=False, float_format="%.6f")
return output_csv
# ------------------------------------------------------------------
# 浊度反演器
# ------------------------------------------------------------------
class TurbidityInversion:
"""
基于后向散射系数的光学浊度反演。
原理(简化模型):
Turbidity (NTU) ≈ k * b_b(550)
其中 b_b(550) 是 550nm 处的后向散射系数,
k 为经验系数(内陆水体典型值 1.0-3.0)。
Parameters
----------
k : float
经验系数。默认值 2.0。
reference_wavelength : int
参考波段,默认 550nm。
"""
def __init__(self, k: float = 2.0, reference_wavelength: int = 550):
self.k = k
self.ref_wl = reference_wavelength
def run_inversion(
self,
wavelengths: np.ndarray,
bb_lambda: np.ndarray
) -> Dict:
"""
执行浊度反演。
Parameters
----------
wavelengths : np.ndarray
波长数组。
bb_lambda : np.ndarray
后向散射系数 b_b(λ)。
Returns
-------
dict
包含键:
- turbidity_ntu : 浊度NTU
- bb_ref : 参考波段处的 b_b 值
"""
wavelengths = np.asarray(wavelengths, dtype=np.float64)
bb_lambda = np.asarray(bb_lambda, dtype=np.float64)
bb_ref = float(np.interp(
self.ref_wl, wavelengths, bb_lambda, left=np.nan, right=np.nan
))
turbidity = self.k * bb_ref
return {
"turbidity_ntu": turbidity,
"bb_ref": bb_ref,
}
def invert_to_csv(
self,
input_csv: str,
output_csv: str,
sample_id_col: str = "sample_id"
) -> str:
"""从 a_lambda_results.csv 批量反演浊度并保存结果。"""
df = pd.read_csv(input_csv, encoding="utf-8-sig")
if "bb_lambda" not in df.columns:
raise ValueError("输入 CSV 中缺少 bb_lambda 列")
df = df.sort_values([sample_id_col, "Wavelength"])
results = []
for sid, group in df.groupby(sample_id_col, sort=False):
wl = group["Wavelength"].values.astype(np.float64)
bb = group["bb_lambda"].values.astype(np.float64)
res = self.run_inversion(wl, bb)
res[sample_id_col] = sid
results.append(res)
out_df = pd.DataFrame(results)
cols = [sample_id_col, "turbidity_ntu", "bb_ref"]
cols = [c for c in cols if c in out_df.columns]
out_df = out_df[cols]
os.makedirs(os.path.dirname(output_csv) or ".", exist_ok=True)
out_df.to_csv(output_csv, index=False, float_format="%.6f")
return output_csv
# ------------------------------------------------------------------
# 总氮 / 总磷反演器(光学代理回归框架)
# ------------------------------------------------------------------
class TotalNitrogenInversion:
"""
总氮 (TN) 光学代理回归模型。
框架说明:
TN 与 Chla 之间通常存在正相关R² ≈ 0.5-0.7
本类提供回归框架,实际系数需由实测数据标定。
公式(线性代理):
TN (mg/L) = α * Chla + β * Turbidity + γ
Parameters
----------
alpha : float
Chla 系数。默认 0.05。
beta : float
浊度系数。默认 0.10。
gamma : float
截距。默认 0.20。
"""
def __init__(
self,
alpha: float = 0.05,
beta: float = 0.10,
gamma: float = 0.20
):
self.alpha = alpha
self.beta = beta
self.gamma = gamma
def run_inversion(
self,
chla_mg_m3: float,
turbidity_ntu: float
) -> Dict:
"""执行总氮反演(光学代理法)。"""
tn = self.alpha * chla_mg_m3 + self.beta * turbidity_ntu + self.gamma
return {"tn_mg_L": tn}
def calibrate(
self,
samples: List[Dict]
) -> None:
"""
用实测样本标定回归系数。
Parameters
----------
samples : list[dict]
样本列表,每项包含 'chla', 'turbidity', 'tn' 键。
"""
try:
import numpy as np
X = np.array([[s["chla"], s["turbidity"]] for s in samples])
y = np.array([s["tn"] for s in samples])
coeffs, _, _, _ = np.linalg.lstsq(X, y, rcond=None)
self.alpha, self.beta = coeffs
self.gamma = float(np.mean(y - self.alpha * X[:, 0] - self.beta * X[:, 1]))
except Exception as e:
raise RuntimeError(f"标定失败: {e}")
class TotalPhosphorusInversion:
"""
总磷 (TP) 光学代理回归模型。
框架说明:
TP 与 Chla / 浊度均相关(湖泊富营养化阶段尤为明显),
提供双变量线性回归框架,实际系数需由实测数据标定。
公式(线性代理):
TP (mg/L) = α * Chla + β * Turbidity + γ
Parameters
----------
alpha : float
Chla 系数。默认 0.002。
beta : float
浊度系数。默认 0.005。
gamma : float
截距。默认 0.010。
"""
def __init__(
self,
alpha: float = 0.002,
beta: float = 0.005,
gamma: float = 0.010
):
self.alpha = alpha
self.beta = beta
self.gamma = gamma
def run_inversion(
self,
chla_mg_m3: float,
turbidity_ntu: float
) -> Dict:
"""执行总磷反演(光学代理法)。"""
tp = self.alpha * chla_mg_m3 + self.beta * turbidity_ntu + self.gamma
return {"tp_mg_L": tp}
def calibrate(
self,
samples: List[Dict]
) -> None:
"""用实测样本标定回归系数。"""
try:
import numpy as np
X = np.array([[s["chla"], s["turbidity"]] for s in samples])
y = np.array([s["tp"] for s in samples])
coeffs, _, _, _ = np.linalg.lstsq(X, y, rcond=None)
self.alpha, self.beta = coeffs
self.gamma = float(np.mean(y - self.alpha * X[:, 0] - self.beta * X[:, 1]))
except Exception as e:
raise RuntimeError(f"标定失败: {e}")
# ------------------------------------------------------------------
# 一站式浓度反演流水线
# ------------------------------------------------------------------
class ConcentrationPipeline:
"""
整合 Chlorophyll / CDOM / Turbidity / TN / TP 反演的一站式流水线。
接收 Step 8 输出的 a_lambda_results.csv
输出 final_concentrations.csv含所有水质参数浓度列
Parameters
----------
lake_case : str, optional
水体类型,用于 Chla 比吸收系数自适应选择。
S_cdom : float, optional
CDOM 衰减斜率(若为 None自动选择
k_turbidity : float
浊度经验系数。
tn_params : dict, optional
总氮反演初始参数。
tp_params : dict, optional
总磷反演初始参数。
"""
def __init__(
self,
lake_case: str = "medium",
S_cdom: Optional[float] = None,
k_turbidity: float = 2.0,
tn_params: Optional[Dict] = None,
tp_params: Optional[Dict] = None,
):
self.lake_case = lake_case
self.chla_inv = ChlorophyllInversion(lake_case=lake_case)
self.cdom_inv = CDOMInversion(S=S_cdom)
self.turb_inv = TurbidityInversion(k=k_turbidity)
self.tn_inv = TotalNitrogenInversion(**(tn_params or {}))
self.tp_inv = TotalPhosphorusInversion(**(tp_params or {}))
def run_pipeline(
self,
input_csv: str,
output_csv: str,
sample_id_col: str = "sample_id"
) -> str:
"""
执行完整浓度反演流水线。
Parameters
----------
input_csv : str
Step 8 输出的 a_lambda_results.csv 路径。
output_csv : str
输出 final_concentrations.csv 路径。
sample_id_col : str
样本 ID 列名。
Returns
-------
str
输出文件路径。
"""
df = pd.read_csv(input_csv, encoding="utf-8-sig")
if "bb_lambda" not in df.columns:
df["bb_lambda"] = np.nan
# ── 保留原始坐标列:按 sample_id 取第一条记录的非光谱列 ───────────
wl_col = "Wavelength"
coord_meta_cols = [c for c in df.columns if c not in (sample_id_col, wl_col, "a_lambda", "bb_lambda")]
coord_df = df.groupby(sample_id_col, sort=False)[coord_meta_cols].first().reset_index()
df = df.sort_values([sample_id_col, "Wavelength"])
results = []
for sid, group in df.groupby(sample_id_col, sort=False):
wl = group["Wavelength"].values.astype(np.float64)
a = group["a_lambda"].values.astype(np.float64)
bb = group["bb_lambda"].values.astype(np.float64) \
if "bb_lambda" in group.columns and group["bb_lambda"].notna().any() \
else None
chla_res = self.chla_inv.run_inversion(wl, a)
cdom_res = self.cdom_inv.run_inversion(wl, a)
if bb is not None and np.any(np.isfinite(bb)):
turb_res = self.turb_inv.run_inversion(wl, bb)
else:
turb_res = {"turbidity_ntu": np.nan, "bb_ref": np.nan}
chla_val = chla_res.get("chla_mg_m3", np.nan)
turb_val = turb_res.get("turbidity_ntu", np.nan)
tn_res = self.tn_inv.run_inversion(chla_val, turb_val)
tp_res = self.tp_inv.run_inversion(chla_val, turb_val)
row = {
sample_id_col: sid,
"Chla_mg_m3": chla_val,
"a_ph_675_m1": chla_res.get("a_ph_675", np.nan),
"CDOM_a_dg_440_m1": cdom_res.get("a_dg_440", np.nan),
"Turbidity_NTU": turb_val,
"TN_mg_L": tn_res.get("tn_mg_L", np.nan),
"TP_mg_L": tp_res.get("tp_mg_L", np.nan),
}
results.append(row)
out_df = pd.DataFrame(results)
# ── 将原始坐标列按 sample_id 合并到浓度结果左侧 ───────────────────
if not coord_df.empty and sample_id_col in coord_df.columns:
out_df = coord_df.merge(out_df, on=sample_id_col, how="left")
os.makedirs(os.path.dirname(output_csv) or ".", exist_ok=True)
out_df.to_csv(output_csv, index=False, float_format="%.6f")
return output_csv

View File

@ -3,8 +3,24 @@
提供对影像中所有波段都为0的像素点进行插值的核心数学逻辑。
支持多种插值方法nearest, bilinear, spline (RBF), kriging。
本模块使用多进程并行分块 IO 加速Plan A
- ProcessPoolExecutor 为每个 worker 进程打开一次源影像initializer 阶段),
避免每块重复 gdal.Open 带来的开销Windows 上 ~50ms/次)
- 主进程统一负责输出文件的写入,避免多进程写锁竞争
- 分块大小block_size默认 1024内存充足可调至 2048 / 4096
注意:
- GDAL Dataset / Rasterio Dataset 对象不能跨进程传递picking 不支持),
所以 worker 必须在 init 阶段自己独立打开源文件
- 每个 worker 强制设置 ``GDAL_NUM_THREADS=1``,避免 8 worker × GDAL 多线程
造成的 CPU 过订阅
- 关闭多进程:传 ``use_multiprocessing=False`` 或 ``n_workers=1``
"""
import multiprocessing
from concurrent.futures import ProcessPoolExecutor
import numpy as np
from typing import Optional, Union, Tuple, List
from pathlib import Path
@ -24,6 +40,9 @@ except ImportError:
GDAL_AVAILABLE = False
_worker_dataset: Optional["gdal.Dataset"] = None
def interpolate_pixels(
image_stack: np.ndarray,
zero_coords: np.ndarray,
@ -52,7 +71,6 @@ def interpolate_pixels(
height, width, n_bands = image_stack.shape
result = image_stack.copy()
# 兼容中文和各种格式的method参数
raw_method = str(interpolation_method).lower()
if 'nearest' in raw_method or '邻近' in raw_method or '最邻近' in raw_method:
method = 'nearest'
@ -181,39 +199,271 @@ def _interpolate_single_band(
return np.zeros(len(zero_coords))
def _normalize_interpolation_method(method: str) -> str:
"""将中文/英文混用的插值方法名归一化为内部标准名
支持: 'nearest'/'邻近'/'最邻近''bilinear'/'线性'/'双线性'
'spline'/'样条'/'rbf''kriging'/'克里金'
"""
raw = str(method).lower()
if 'nearest' in raw or '邻近' in raw or '最邻近' in raw:
return 'nearest'
if 'bilinear' in raw or '线性' in raw or '双线性' in raw:
return 'bilinear'
if 'spline' in raw or '样条' in raw or 'rbf' in raw:
return 'spline'
if 'kriging' in raw or '克里金' in raw:
return 'kriging'
return 'nearest'
def _read_water_mask_to_array(
water_mask: Optional[Union[str, np.ndarray]],
expected_height: int,
expected_width: int,
) -> Optional[np.ndarray]:
"""读取水域掩膜为 numpy 数组单波段bool/int 均可)
None 或空字符串直接返回 None。形状不匹配时给出告警但不抛错
让调用方按"无掩膜"路径继续。
"""
if water_mask is None:
return None
if isinstance(water_mask, str):
if not water_mask.strip():
return None
mask_ds = gdal.Open(water_mask, gdal.GA_ReadOnly)
if mask_ds is None:
print(f" [warn] 无法打开水域掩膜 {water_mask},按无掩膜处理")
return None
try:
mask_array = mask_ds.GetRasterBand(1).ReadAsArray()
finally:
mask_ds = None
elif isinstance(water_mask, np.ndarray):
mask_array = water_mask
else:
return None
if mask_array.shape != (expected_height, expected_width):
print(
f" [warn] 水域掩膜形状 {mask_array.shape} 与影像 "
f"({expected_height}, {expected_width}) 不匹配,按无掩膜处理"
)
return None
return mask_array
def _init_worker(img_path: str) -> None:
"""ProcessPoolExecutor initializer: 每个 worker 进程只调用一次
在 worker 进程启动时打开源影像 dataset 并缓存在模块全局变量
``_worker_dataset`` 中。后续所有块处理直接复用这个 dataset
避免每块重复 ``gdal.Open``Windows 上约 50ms/次100 块即 5s
同时设置 ``GDAL_NUM_THREADS=1``,避免 8 worker × GDAL 默认多线程
造成的 CPU 过订阅。
"""
global _worker_dataset
gdal.SetConfigOption('GDAL_NUM_THREADS', '1')
if hasattr(gdal, 'UseExceptions'):
gdal.UseExceptions()
_worker_dataset = gdal.Open(img_path, gdal.GA_ReadOnly)
if _worker_dataset is None:
raise RuntimeError(f"Worker failed to open source image: {img_path}")
def _interpolate_block_worker(task: tuple) -> tuple:
"""ProcessPoolExecutor worker: 处理单个块并返回结果
该函数必须保持模块级(可被 pickle不持有任何外部状态——
源 dataset 通过 ``_worker_dataset`` 模块全局变量获取。
Returns:
``(x0, y0, inner_bands, zero_count, error_msg)`` 元组:
- x0, y0: 块在影像中的写入起点
- inner_bands: ``List[np.ndarray]``,每个元素是 (inner_h, inner_w)
float32 数组(每个波段一个),或失败时为 None
- zero_count: 该扩展块中识别到的零像素数(含 halo 范围)
- error_msg: None 表示成功str 表示错误信息
"""
(
x0, y0, ey0, ex0, ey1, ex1,
row_offset, col_offset, inner_h, inner_w,
mask_segment_ext, method,
) = task
if _worker_dataset is None:
return (x0, y0, None, 0, "Worker dataset not initialized")
try:
inner_bands, zero_count = _process_one_block(
_worker_dataset, x0, y0, ey0, ex0, ey1, ex1,
row_offset, col_offset, inner_h, inner_w,
mask_segment_ext, method,
)
return (x0, y0, inner_bands, zero_count, None)
except Exception as e:
return (x0, y0, None, 0, str(e))
def _process_one_block(
dataset: "gdal.Dataset",
x0: int, y0: int,
ey0: int, ex0: int, ey1: int, ex1: int,
row_offset: int, col_offset: int,
inner_h: int, inner_w: int,
mask_segment_ext: Optional[np.ndarray],
method: str,
) -> Tuple[List[np.ndarray], int]:
"""处理单个扩展块纯计算核心dataset 显式传入)
串行模式和并行模式共用此函数。并行模式下 dataset 来自 worker 的
缓存(``_worker_dataset``),串行模式下 dataset 由主函数传入。
Args:
dataset: 已打开的源影像 dataset
x0, y0: 内部块左上角(写入位置)
ey0, ex0, ey1, ex1: 扩展块(含 halo坐标
row_offset, col_offset: 内部块在扩展块中的偏移
inner_h, inner_w: 内部块尺寸
mask_segment_ext: 扩展块对应的水域掩膜None 表示不应用)
method: 插值方法(已归一化)
Returns:
``(inner_bands, zero_count)`` 元组:
- inner_bands: ``List[np.ndarray]``,长度 = n_bands每个元素形状为
``(inner_h, inner_w)`` 的 float32 数组
- zero_count: 扩展块中识别到的零像素数
"""
n_bands = dataset.RasterCount
ext_bands: List[np.ndarray] = []
for b in range(1, n_bands + 1):
band = dataset.GetRasterBand(b)
ext_bands.append(
band.ReadAsArray(ex0, ey0, ex1 - ex0, ey1 - ey0).astype(np.float32)
)
band = None
try:
ext_h, ext_w = ey1 - ey0, ex1 - ex0
all_zero_ext = np.ones((ext_h, ext_w), dtype=bool)
for b_data in ext_bands:
all_zero_ext &= (b_data == 0)
if mask_segment_ext is not None:
all_zero_ext &= (mask_segment_ext > 0)
zero_count = int(np.sum(all_zero_ext))
if zero_count == 0:
inner_bands = [
ext_bands[b][
row_offset:row_offset + inner_h,
col_offset:col_offset + inner_w,
]
for b in range(n_bands)
]
return inner_bands, 0
zero_y, zero_x = np.where(all_zero_ext)
zero_coords = np.column_stack([zero_x, zero_y])
valid_mask = ~all_zero_ext
valid_y, valid_x = np.where(valid_mask)
valid_coords = np.column_stack([valid_x, valid_y])
if len(valid_coords) == 0:
print(
f" [warn] 块 (y={y0}-{y0 + inner_h}, x={x0}-{x0 + inner_w}) "
f"无有效像素可作插值上下文,已跳过"
)
inner_bands = [
ext_bands[b][
row_offset:row_offset + inner_h,
col_offset:col_offset + inner_w,
]
for b in range(n_bands)
]
return inner_bands, zero_count
for b in range(n_bands):
ext_band = ext_bands[b]
valid_values_band = ext_band[valid_mask]
if len(valid_values_band) == 0:
continue
band_result = _interpolate_single_band(
zero_coords, valid_coords, valid_values_band, method
)
ext_band[zero_y, zero_x] = band_result
inner_bands = [
ext_bands[b][
row_offset:row_offset + inner_h,
col_offset:col_offset + inner_w,
]
for b in range(n_bands)
]
return inner_bands, zero_count
finally:
del ext_bands
def interpolate_zero_pixels_batch(
img_path: str,
interpolation_method: str = 'nearest',
output_path: Optional[str] = None,
water_mask: Optional[Union[str, np.ndarray]] = None,
deglint_dir: Optional[str] = None,
callback_progress: Optional[callable] = None
callback_progress: Optional[callable] = None,
block_size: int = 1024,
halo_size: int = 64,
n_workers: Optional[int] = None,
use_multiprocessing: bool = True,
) -> Tuple[str, Optional[np.ndarray]]:
"""
对影像中所有波段都为0的像素点进行插值完整流程含文件I/O
对影像中所有波段都为0的像素点进行插值完整流程含文件I/O
采用 **分块 IO + 多进程并行** 策略:
1. 影像按 ``block_size`` × ``block_size`` 分块,每块边界外扩展
``halo_size`` 像素作为插值上下文,避免块边缘插值退化
2. 多进程并行(默认 ``ProcessPoolExecutor``worker 数 = CPU 核心数)
并发处理所有块GDAL Dataset 不能跨进程传递,所以每个 worker
在 ``initializer`` 阶段独立打开源文件一次并缓存
3. 主进程按块序接收处理结果并统一写入输出文件,避免写锁竞争
4. 该方案可彻底避免一次性读取 50 波段整景影像时的 OOM 隐患
50 波段 × 4000×4000 × float32 ≈ 3GB 的 np.dstack
Args:
img_path: 输入影像文件路径
interpolation_method: 插值方法,支持 'nearest', 'bilinear', 'spline', 'kriging'
output_path: 输出文件路径如果为None自动生成
water_mask: 水域掩膜(文件路径或数组
interpolation_method: 插值方法,支持 'nearest', 'bilinear', 'spline',
'kriging' 及其中文别名('邻近'/'最邻近'/'线性'/'双线性'/'样条'/'克里金'
output_path: 输出文件路径(如果为 None 且 deglint_dir 提供,自动生成
water_mask: 水域掩膜(文件路径或数组),形状须与影像高宽一致
deglint_dir: 去耀斑目录(用于生成默认输出路径)
callback_progress: 进度回调函数
callback_progress: 进度回调函数,签名 ``callback(msg: str)``
block_size: 分块大小(像素),默认 1024内存充足可调 2048/4096
halo_size: 上下文 halo 宽度(像素),默认 64
n_workers: 并行 worker 进程数None = ``multiprocessing.cpu_count()``
传 1 等价于串行模式
use_multiprocessing: 是否启用多进程False 时强制串行
Returns:
(output_path, interpolated_image_stack) 元组
``(output_path, None)`` 元组。第二个值固定为 ``None``(与原版语义保留
兼容;返回完整内存堆叠会重新引入 OOM 风险,故不再提供)。
"""
if not SCIPY_AVAILABLE:
raise ImportError("scipy未安装无法进行0值像素插值")
if not GDAL_AVAILABLE:
raise ImportError("GDAL未安装无法读取影像文件")
# 确定输出路径
if output_path is None and deglint_dir is not None:
output_path = str(Path(deglint_dir) / f"interpolated_{interpolation_method}.bsq")
method = _normalize_interpolation_method(interpolation_method)
# 检查文件是否已存在
if output_path and Path(output_path).exists():
if output_path is None and deglint_dir is not None:
output_path = str(Path(deglint_dir) / f"interpolated_{method}.bsq")
if output_path is None:
raise ValueError("output_path 和 deglint_dir 至少需要指定一个")
if Path(output_path).exists():
return output_path, None
dataset = gdal.Open(img_path, gdal.GA_ReadOnly)
@ -227,94 +477,126 @@ def interpolate_zero_pixels_batch(
geotransform = dataset.GetGeoTransform()
projection = dataset.GetProjection()
# 读取所有波段数据
all_bands = []
for band_idx in range(1, n_bands + 1):
band = dataset.GetRasterBand(band_idx)
band_data = band.ReadAsArray().astype(np.float32)
all_bands.append(band_data)
image_stack = np.dstack(all_bands)
# 读取水域掩膜
mask_array = None
if water_mask is not None:
if isinstance(water_mask, str):
mask_dataset = gdal.Open(water_mask, gdal.GA_ReadOnly)
if mask_dataset:
mask_array = mask_dataset.GetRasterBand(1).ReadAsArray()
mask_dataset = None
elif isinstance(water_mask, np.ndarray):
mask_array = water_mask
# 找出所有波段都为0的像素点
all_bands_zero = np.all(image_stack == 0, axis=2)
if mask_array is not None:
all_bands_zero = all_bands_zero & (mask_array > 0)
zero_pixel_count = np.sum(all_bands_zero)
if zero_pixel_count == 0:
# 无需插值,直接保存
if output_path:
driver = gdal.GetDriverByName('ENVI')
if driver is None:
driver = gdal.GetDriverByName('GTiff')
out_dataset = driver.Create(output_path, width, height, n_bands, gdal.GDT_Float32)
out_dataset.SetGeoTransform(geotransform)
out_dataset.SetProjection(projection)
for i, band_data in enumerate(all_bands):
out_band = out_dataset.GetRasterBand(i + 1)
out_band.WriteArray(band_data)
out_band.FlushCache()
out_dataset = None
return output_path, image_stack
# 获取坐标
zero_y, zero_x = np.where(all_bands_zero)
zero_coords = np.column_stack([zero_x, zero_y])
valid_mask = ~all_bands_zero
valid_y, valid_x = np.where(valid_mask)
valid_coords = np.column_stack([valid_x, valid_y])
if len(valid_coords) == 0:
raise ValueError("没有有效像素可用于插值")
# 逐波段插值
interpolated_bands = []
for band_idx in range(n_bands):
if callback_progress:
callback_progress(f"处理波段 {band_idx + 1}/{n_bands}...")
band_data = all_bands[band_idx].copy()
valid_values_band = band_data[valid_mask]
if len(valid_values_band) == 0:
interpolated_bands.append(band_data)
continue
band_result = _interpolate_single_band(
zero_coords, valid_coords, valid_values_band, interpolation_method
if width <= 0 or height <= 0 or n_bands <= 0:
raise ValueError(
f"影像尺寸异常: width={width}, height={height}, n_bands={n_bands}"
)
band_data[all_bands_zero] = band_result
interpolated_bands.append(band_data)
# 保存结果
if output_path:
mask_array = _read_water_mask_to_array(water_mask, height, width)
driver = gdal.GetDriverByName('ENVI')
if driver is None:
driver = gdal.GetDriverByName('GTiff')
out_dataset = driver.Create(output_path, width, height, n_bands, gdal.GDT_Float32)
if driver is None:
raise RuntimeError("未找到可用的栅格驱动ENVI / GTiff 都不存在)")
out_dataset = driver.Create(
output_path, width, height, n_bands, gdal.GDT_Float32
)
if out_dataset is None:
raise RuntimeError(f"无法创建输出文件: {output_path}")
out_dataset.SetGeoTransform(geotransform)
out_dataset.SetProjection(projection)
for i, band_data in enumerate(interpolated_bands):
out_band = out_dataset.GetRasterBand(i + 1)
out_band.WriteArray(band_data)
out_band.FlushCache()
try:
if not use_multiprocessing:
effective_workers = 1
elif n_workers is not None and n_workers >= 1:
effective_workers = int(n_workers)
else:
try:
cpu_count = multiprocessing.cpu_count() or 1
except (NotImplementedError, OSError):
cpu_count = 1
# 为了内存安全,强制将物理进程数限制在最高 6 个
effective_workers = min(6, max(1, cpu_count))
n_blocks_y = (height + block_size - 1) // block_size
n_blocks_x = (width + block_size - 1) // block_size
total_blocks = n_blocks_y * n_blocks_x
tasks = []
for by in range(n_blocks_y):
y0 = by * block_size
y1 = min(y0 + block_size, height)
inner_h = y1 - y0
ey0 = max(0, y0 - halo_size)
ey1 = min(height, y1 + halo_size)
for bx in range(n_blocks_x):
x0 = bx * block_size
x1 = min(x0 + block_size, width)
inner_w = x1 - x0
ex0 = max(0, x0 - halo_size)
ex1 = min(width, x1 + halo_size)
row_offset = y0 - ey0
col_offset = x0 - ex0
mask_segment_ext = None
if mask_array is not None:
mask_segment_ext = mask_array[ey0:ey1, ex0:ex1].copy()
tasks.append((
x0, y0, ey0, ex0, ey1, ex1,
row_offset, col_offset, inner_h, inner_w,
mask_segment_ext, method,
))
if callback_progress:
callback_progress(
f"分块插值开始: 共 {total_blocks}"
f"(block_size={block_size}, halo={halo_size}, method={method}, "
f"workers={effective_workers})"
)
total_zero_pixels = 0
if effective_workers <= 1:
for block_idx, task in enumerate(tasks, 1):
x0_t, y0_t = task[0], task[1]
if callback_progress:
callback_progress(
f"{block_idx}/{total_blocks} "
f"y=[{y0_t},{y0_t + task[8]}) x=[{x0_t},{x0_t + task[9]})"
)
inner_bands, zero_count = _process_one_block(
dataset, *task
)
for b_idx, band_data in enumerate(inner_bands):
out_dataset.GetRasterBand(b_idx + 1).WriteArray(
band_data, xoff=x0_t, yoff=y0_t
)
total_zero_pixels += zero_count
else:
with ProcessPoolExecutor(
max_workers=effective_workers,
initializer=_init_worker,
initargs=(img_path,),
) as executor:
futures = [
executor.submit(_interpolate_block_worker, task)
for task in tasks
]
for block_idx, future in enumerate(futures, 1):
x0_t, y0_t, inner_bands, zero_count, error = future.result()
if error is not None:
raise RuntimeError(
f"块 (y={y0_t}, x={x0_t}) 处理失败: {error}"
)
if inner_bands is not None:
for b_idx, band_data in enumerate(inner_bands):
out_dataset.GetRasterBand(b_idx + 1).WriteArray(
band_data, xoff=x0_t, yoff=y0_t
)
total_zero_pixels += zero_count
if callback_progress:
callback_progress(f"已写入块 {block_idx}/{total_blocks}")
if callback_progress:
callback_progress(
f"分块插值完成: 共处理 {total_zero_pixels} 个零像素 "
f"{total_blocks} 块,方法 {method}workers={effective_workers}"
)
return output_path, None
finally:
out_dataset = None
result_stack = np.dstack(interpolated_bands)
return output_path, result_stack
finally:
dataset = None

View File

@ -0,0 +1,7 @@
# -*- coding: utf-8 -*-
"""
QAA 准解析反演算法模块
"""
from src.core.algorithms.qaa.qaas_baseline import QAABaselineSolver
__all__ = ['QAABaselineSolver']

View File

@ -0,0 +1,345 @@
# -*- coding: utf-8 -*-
"""
QAA 准解析算法基线求解器 (QAABaselineSolver)
实现 QAA-v5 / QAA-v6 核心步骤:
1. Rrs(λ) → r_rs(λ)(水面以下遥感反射率转换)
2. 计算中间变量 u(λ)(固有光学性质比值)
3. λ₀ 锚点查表获取纯水吸收 aw(λ₀) 和后向散射 bbw(λ₀)
4. 估算全波段 b_b(λ)(后向散射系数)
5. 推导全波段 a(λ)(总吸收系数)
参考:
- Lee, Z.P. et al. (2002) JGR-Oceans, 107(C4), 9-1~9-18 (QAA-v4)
- Lee, Z.P. et al. (2010) Applied Optics, 49(4), 617-623 (QAA-v5)
- Lee, Z.P. et al. (2014) Applied Optics, 53(4), 598-611 (QAA-v6)
"""
import os
import warnings
from typing import Optional, Union, Tuple
import numpy as np
import pandas as pd
class QAABaselineSolver:
"""
QAA 准解析算法基线求解器。
Parameters
----------
pure_water_csv : str, optional
纯水 IOPs 表路径,默认使用 src/utils/pure_water_iops.csv。
qaa_version : str, default "QAA-v6"
算法版本,支持 "QAA-v5""QAA-v6"
Attributes
----------
iops_df : pd.DataFrame
纯水 IOPs 表,含 Wavelength / aw / bbw 三列。
"""
def __init__(
self,
pure_water_csv: Optional[str] = None,
qaa_version: str = "QAA-v6"
):
if pure_water_csv is None:
project_root = os.path.abspath(
os.path.join(os.path.dirname(__file__), '..', '..', '..', 'utils')
)
pure_water_csv = os.path.join(project_root, 'pure_water_iops.csv')
if not os.path.exists(pure_water_csv):
raise FileNotFoundError(f"纯水 IOPs 表不存在: {pure_water_csv}")
self.iops_df = pd.read_csv(pure_water_csv)
self.qaa_version = qaa_version
# ------------------------------------------------------------------
# 核心 QAA 步骤
# ------------------------------------------------------------------
@staticmethod
def _rrs_to_rrs_subsurface(rrs: np.ndarray) -> np.ndarray:
"""
将水面遥感反射率 Rrs 转换为水面以下遥感反射率 r_rs。
转换公式Lee et al. 1999
r_rs = Rrs / (0.52 + 1.7 * Rrs)
Parameters
----------
rrs : np.ndarray
水面遥感反射率 Rrs形状 (N,) 或 (N, n_bands)。
Returns
-------
np.ndarray
水面以下遥感反射率 r_rs。
"""
rrs = np.asarray(rrs, dtype=np.float64)
denom = 0.52 + 1.7 * rrs
with np.errstate(divide='ignore', invalid='ignore'):
result = rrs / denom
result[~np.isfinite(result)] = np.nan
return result
@staticmethod
def _compute_u(rrs_subsurface: np.ndarray) -> np.ndarray:
"""
计算中间变量 u = b_b / (a + b_b)。
QAA-v5/v6 经验关系Lee et al. 2002
u = r_rs / (0.5 * r_rs + sqrt(0.25 * r_rs^2 + 0.1 * r_rs))
Parameters
----------
rrs_subsurface : np.ndarray
水面以下遥感反射率 r_rs。
Returns
-------
np.ndarray
u 值,范围 [0, 1)。
"""
rs = np.asarray(rrs_subsurface, dtype=np.float64)
with np.errstate(divide='ignore', invalid='ignore'):
result = rs / (0.5 * rs + np.sqrt(0.25 * rs ** 2 + 0.1 * rs))
result[~np.isfinite(result)] = np.nan
return result
def _get_pure_water_iops(self, wavelength: Union[int, float]) -> Tuple[float, float]:
"""
根据波长从纯水 IOPs 表中插值获取 aw 和 bbw。
Parameters
----------
wavelength : float
波长nm范围应在 400-800nm 内。
Returns
-------
(aw, bbw) : tuple
纯水吸收系数 (m^-1) 和后向散射系数 (m^-1)。
"""
df = self.iops_df
wl_arr = df['Wavelength'].values
aw_arr = df['aw'].values
bbw_arr = df['bbw'].values
aw = float(np.interp(wavelength, wl_arr, aw_arr))
bbw = float(np.interp(wavelength, wl_arr, bbw_arr))
return aw, bbw
@staticmethod
def _compute_bb(
u: np.ndarray,
bbw_0: float,
wavelength: np.ndarray,
lambda_0: int
) -> np.ndarray:
"""
估算全波段后向散射系数 b_b(λ)。
经验光谱形状Lee et al. 2002, QAA-v4
b_b(λ) = b_bw(λ₀) * (λ₀ / λ)^S
其中 S 为经验光谱斜率参数QAA-v5 中默认 0.5
QAA-v6 中随 λ₀ 自适应调整)。
Parameters
----------
u : np.ndarray
中间变量 u。
bbw_0 : float
λ₀ 处的纯水后向散射系数。
wavelength : np.ndarray
全波段波长数组。
lambda_0 : int
参考波长(锚点)。
Returns
-------
np.ndarray
全波段后向散射系数 b_b。
"""
S = 0.5 if lambda_0 < 600 else 0.0
wavelength = np.asarray(wavelength, dtype=np.float64)
ratio = (float(lambda_0) / wavelength) ** S
bb = u * bbw_0 / (1.0 - u) * ratio
bb = np.maximum(bb, 0.0)
return bb
@staticmethod
def _compute_a(
u: np.ndarray,
aw_0: float,
bbw_0: float,
wavelength: np.ndarray,
lambda_0: int
) -> np.ndarray:
"""
推导全波段总吸收系数 a(λ)。
由 u = b_b / (a + b_b) 推导:
a = b_b * (1 - u) / u
Parameters
----------
u : np.ndarray
中间变量 u。
aw_0 : float
λ₀ 处的纯水吸收系数。
bbw_0 : float
λ₀ 处的纯水后向散射系数。
wavelength : np.ndarray
全波段波长数组。
lambda_0 : int
参考波长(锚点)。
Returns
-------
np.ndarray
全波段总吸收系数 a。
"""
S = 0.5 if lambda_0 < 600 else 0.0
wavelength = np.asarray(wavelength, dtype=np.float64)
ratio = (float(lambda_0) / wavelength) ** S
bbw = bbw_0 * ratio
with np.errstate(divide='ignore', invalid='ignore'):
a = bbw * (1.0 - u) / u + aw_0
a[~np.isfinite(a)] = np.nan
return a
# ------------------------------------------------------------------
# 主入口
# ------------------------------------------------------------------
def run_inversion(
self,
wavelengths: np.ndarray,
Rrs_spectrum: np.ndarray,
lambda_0: int
) -> dict:
"""
执行 QAA 核心反演。
Parameters
----------
wavelengths : np.ndarray
光谱波长数组nm形状 (n_bands,) 或 (n_samples, n_bands)。
Rrs_spectrum : np.ndarray
水面遥感反射率光谱数据,形状 (n_bands,) 或 (n_samples, n_bands)。
若为 2D每行为一个样本的光谱。
lambda_0 : int
参考波长(锚点),用于查表获取纯水 IOPs。
Returns
-------
dict
包含以下键的字典:
- wavelengths : 波长数组
- Rrs : 输入 Rrs
- r_rs_subsurface : 水下遥感反射率
- u : 中间变量
- a_lambda : 总吸收系数 a(λ)
- bb_lambda : 后向散射系数 b_b(λ)
- aw : λ₀ 处纯水吸收
- bbw : λ₀ 处纯水后向散射
"""
wavelengths = np.asarray(wavelengths, dtype=np.float64)
Rrs_spectrum = np.asarray(Rrs_spectrum, dtype=np.float64)
if Rrs_spectrum.ndim == 1:
Rrs_spectrum = Rrs_spectrum[np.newaxis, :]
aw_0, bbw_0 = self._get_pure_water_iops(lambda_0)
results = []
for row in Rrs_spectrum:
rrs_sub = self._rrs_to_rrs_subsurface(row)
u = self._compute_u(rrs_sub)
bb = self._compute_bb(u, bbw_0, wavelengths, lambda_0)
a = self._compute_a(u, aw_0, bbw_0, wavelengths, lambda_0)
results.append({
'wavelengths': wavelengths,
'Rrs': row,
'r_rs_subsurface': rrs_sub,
'u': u,
'a_lambda': a,
'bb_lambda': bb,
'aw_0': aw_0,
'bbw_0': bbw_0,
})
if len(results) == 1:
return results[0]
return results
def invert_to_csv(
self,
wavelengths: np.ndarray,
Rrs_spectrum: np.ndarray,
lambda_0: int,
output_csv: str,
wavelength_col: str = "Wavelength",
sample_ids: Optional[list] = None
) -> str:
"""
执行反演并将结果保存为 CSV 文件。
Parameters
----------
wavelengths : np.ndarray
波长数组n_bands,)。
Rrs_spectrum : np.ndarray
光谱数据,形状 (n_bands,) 或 (n_samples, n_bands)。
lambda_0 : int
参考波长。
output_csv : str
输出 CSV 文件路径。
wavelength_col : str
输出 CSV 中波长列的列名前缀。
sample_ids : list, optional
样本 ID 列表(若为 None使用 row_0, row_1, ...)。
Returns
-------
str
输出文件路径。
"""
wavelengths = np.asarray(wavelengths, dtype=np.float64)
Rrs_spectrum = np.asarray(Rrs_spectrum, dtype=np.float64)
if Rrs_spectrum.ndim == 1:
Rrs_spectrum = Rrs_spectrum[np.newaxis, :]
n_samples = Rrs_spectrum.shape[0]
if sample_ids is None:
sample_ids = [f"sample_{i}" for i in range(n_samples)]
aw_0, bbw_0 = self._get_pure_water_iops(lambda_0)
rows_out = []
for i, row in enumerate(Rrs_spectrum):
rrs_sub = self._rrs_to_rrs_subsurface(row)
u = self._compute_u(rrs_sub)
bb = self._compute_bb(u, bbw_0, wavelengths, lambda_0)
a = self._compute_a(u, aw_0, bbw_0, wavelengths, lambda_0)
for j, wl in enumerate(wavelengths):
rows_out.append({
'sample_id': sample_ids[i],
'Wavelength': wl,
'Rrs': row[j],
'r_rs': rrs_sub[j],
'u': u[j],
'a_lambda': a[j],
'bb_lambda': bb[j],
})
df = pd.DataFrame(rows_out)
os.makedirs(os.path.dirname(output_csv) or '.', exist_ok=True)
df.to_csv(output_csv, index=False, float_format='%.8f')
return output_csv

View File

@ -0,0 +1,22 @@
# -*- coding: utf-8 -*-
"""
水色指数反演模块(包入口)
从 waterindex.csv 读取公式,对去耀斑 BSQ 高光谱影像进行全图矩阵运算,
输出带完整坐标信息的 GeoTIFF。
公式格式waterindex.csv
- 波长占位符w{nm},如 w686, w708, w665
- 支持混合大小写w686 / W665 均可
- 示例NDCI = (w708 - w665) / (w708 + w665)
输出:
- GeoTIFF (Float32)LZW 压缩,带 Tile
- 完整克隆原始 BSQ 的 GeoTransform / Projection / NoData
- Step 14 可直接用 rasterio 读取数组和空间范围
"""
# 重新导出 WaterIndexProcessor向后兼容所有已有 import
from src.core.algorithms.waterindex_inversion import WaterIndexProcessor
__all__ = ['WaterIndexProcessor']

View File

@ -0,0 +1,646 @@
# -*- coding: utf-8 -*-
"""
水色指数反演模块
直接读取去耀斑高光谱 BSQ 影像,应用 waterindex.csv 中的公式,
输出各水质参数指数的 GeoTIFF 栅格图像。
公式格式waterindex.csv
- 波长占位符w{nm},如 w686, w708, w665
- 支持混合大小写w686 / W665 均可
- 示例NDCI = (w708 - w665) / (w708 + w665)
BGA_Am09KBBI = (w686 - w658) / (w686 + w658)
输出:
- GeoTIFF (Float32)LZW 压缩,带 Tile
- 完整克隆原始 BSQ 的 GeoTransform / Projection / NoData
- Step 14 可直接用 rasterio 读取进行克里金插值
"""
from __future__ import annotations
import csv
import os
import re
import sys
import time
import traceback
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Tuple
import numpy as np
from osgeo import gdal, osr
# GDAL 驱动注册
gdal.UseExceptions()
# ------------------------------------------------------------------
# 公共工具
# ------------------------------------------------------------------
def _get_resource_path(relative_path: str) -> str:
"""获取 waterindex.csv 等资源的绝对路径,兼容 PyInstaller 打包。"""
if hasattr(sys, '_MEIPASS'):
base = sys._MEIPASS
else:
base = os.path.abspath(
os.path.join(os.path.dirname(os.path.dirname(__file__)), '..', '..', '..')
)
return os.path.join(base, relative_path)
# ------------------------------------------------------------------
# WaterIndexProcessor
# ------------------------------------------------------------------
class WaterIndexProcessor:
"""
水色指数处理器
读取 waterindex.csv 中的公式,应用于 BSQ 高光谱影像,
输出带完整坐标信息的 GeoTIFF 指数图。
核心能力:
- 公式解析w{nm} 占位符 → 实际波段 2D numpy 数组
- 矩阵运算:全影像批量计算,无需逐点循环
- 地理信息保持:克隆原始 BSQ 的 GeoTransform / Projection
- NoData 处理:运算中产生的 NaN/Inf 统一标记为 -9999
"""
# 内置安全命名空间(公式 eval 白名单)
_SAFE_NS: Dict[str, Any] = {
'np': np,
'nan': np.nan,
'inf': np.inf,
'pi': np.pi,
'e': np.e,
}
def __init__(self, waterindex_csv_path: Optional[str] = None):
"""
Parameters
----------
waterindex_csv_path : str, optional
waterindex.csv 路径。
若为 None尝试从默认位置加载
1. src/gui/model/waterindex.csv开发环境
2. _MEIPASS/src/gui/model/waterindex.csv打包环境
"""
self.csv_path: Optional[str] = None
self.formulas: List[Dict[str, Any]] = []
if waterindex_csv_path:
self.csv_path = waterindex_csv_path
else:
candidates = [
os.path.join(os.path.dirname(__file__), '..', '..', 'gui', 'model', 'waterindex.csv'),
os.path.join(os.path.dirname(__file__), '..', '..', '..', 'gui', 'model', 'waterindex.csv'),
]
for p in candidates:
if os.path.isfile(p):
self.csv_path = p
break
if self.csv_path:
self._parse_csv()
else:
self.formulas = []
# ------------------------------------------------------------------
# 公式加载
# ------------------------------------------------------------------
def _parse_csv(self) -> None:
"""解析 waterindex.csv加载所有公式。"""
if not os.path.isfile(self.csv_path):
raise FileNotFoundError(f"公式配置文件不存在: {self.csv_path}")
# ★★★ 防止多次调用时公式翻倍叠加 ★★★
self.formulas.clear()
with open(self.csv_path, 'r', encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
for row in reader:
self.formulas.append(dict(row))
print(f"[WaterIndexProcessor] 加载 {len(self.formulas)} 条公式 ← {self.csv_path}")
def reload(self, waterindex_csv_path: str) -> None:
"""重新加载公式配置文件。"""
self.csv_path = waterindex_csv_path
self._parse_csv()
# ------------------------------------------------------------------
# 公式查询
# ------------------------------------------------------------------
def list_formulas(self) -> List[Dict[str, Any]]:
"""返回所有公式的列表。"""
return list(self.formulas)
def list_formula_names(self) -> List[str]:
"""返回所有公式名称列表。"""
return [f.get('Formula_Name', '') for f in self.formulas]
def get_formula(self, name: str) -> Optional[Dict[str, Any]]:
"""按名称查找单个公式。"""
for f in self.formulas:
if f.get('Formula_Name', '').strip() == name.strip():
return f
return None
def list_categories(self) -> List[str]:
"""返回所有公式类别(去重排序)。"""
cats = set()
for f in self.formulas:
c = f.get('Category', '').strip()
if c:
cats.add(c)
return sorted(cats)
def get_formulas_by_category(self, category: str) -> List[Dict[str, Any]]:
"""按类别筛选公式。"""
return [f for f in self.formulas
if f.get('Category', '').strip().lower() == category.strip().lower()]
# ------------------------------------------------------------------
# 影像元数据
# ------------------------------------------------------------------
def get_image_metadata(self, bsq_path: str, hdr_path: Optional[str] = None) -> Dict[str, Any]:
"""获取影像元数据GDAL + ENVI HDR 双重保障)。
Parameters
----------
bsq_path : str
BSQ 影像路径
hdr_path : str, optional
ENVI HDR 路径None → 自动构造)
Returns
-------
dict
含 keys: width, height, bands, wavelengths, wavelength_range,
geotransform, projection, driver
"""
meta: Dict[str, Any] = {}
# 1. GDAL 优先(获取空间信息)
try:
ds = gdal.Open(bsq_path, gdal.GA_ReadOnly)
if ds is not None:
meta['width'] = ds.RasterXSize
meta['height'] = ds.RasterYSize
meta['bands'] = ds.RasterCount
meta['driver'] = ds.GetDriver().ShortName
gt = ds.GetGeoTransform()
proj = ds.GetProjection()
if gt and gt != (0, 1, 0, 0, 0, 1):
meta['geotransform'] = gt
if proj:
meta['projection'] = proj
ds = None
except Exception:
pass
# 2. HDR 补充波长信息
if hdr_path is None:
hdr_path = os.path.splitext(bsq_path)[0] + '.hdr'
if not os.path.isfile(hdr_path):
hdr_path_alt = os.path.splitext(bsq_path)[0] + '.HDR'
if os.path.isfile(hdr_path_alt):
hdr_path = hdr_path_alt
if os.path.isfile(hdr_path):
wl = self._parse_wavelengths_from_hdr(hdr_path)
if wl:
meta['wavelengths'] = wl
if len(wl) >= 2:
meta['wavelength_range'] = f"{wl[0]:.1f}{wl[-1]:.1f} nm ({len(wl)} 波段)"
elif meta.get('bands', 0) > 0:
meta['wavelength_range'] = f"{meta['bands']} 波段(波长信息缺失)"
return meta
@staticmethod
def _parse_wavelengths_from_hdr(hdr_path: str) -> Optional[List[float]]:
"""从 ENVI .hdr 文件中解析波长列表。"""
try:
with open(hdr_path, 'r', encoding='utf-8', errors='ignore') as f:
content = f.read()
# 格式1wavelength = { 400, 401, ... }
m = re.search(r'wavelength\s*=\s*\{([^}]+)\}', content, re.DOTALL)
if m:
vals = [float(v) for v in re.findall(r'[\d.]+', m.group(1)) if v.strip()]
if vals:
return vals
# 格式2逐行罗列
wavelengths: List[float] = []
in_wl = False
for line in content.split('\n'):
line = line.strip()
if line.startswith('wavelength'):
in_wl = True
continue
if in_wl:
if line.startswith('{'):
continue
try:
wavelengths.append(float(line))
except ValueError:
if '}' in line:
in_wl = False
return wavelengths if wavelengths else None
except Exception:
return None
# ------------------------------------------------------------------
# 公式解析w{nm} 占位符 → 实际波段数据
# ------------------------------------------------------------------
def _find_nearest_band_index(self, target_wv: float,
wavelengths: List[float]) -> int:
"""找到最接近目标波长的 GDAL 波段索引1-based"""
if not wavelengths:
raise ValueError("波长列表为空,无法匹配波段")
nearest = min(range(len(wavelengths)),
key=lambda i: abs(wavelengths[i] - target_wv))
return nearest + 1 # GDAL 波段从 1 开始
def _parse_formula_wavelengths(self, formula: str) -> List[int]:
"""从公式字符串中提取所有波长值去重int"""
raw = re.findall(r'[wW](\d+)', formula)
seen = set()
result: List[int] = []
for r in raw:
v = int(r)
if v not in seen:
seen.add(v)
result.append(v)
return result
def _eval_formula_fast(self, formula: str,
band_data: Dict[int, np.ndarray]) -> Optional[np.ndarray]:
"""快速公式求值(预处理后直接 eval
band_data: {波长int: 2D 数组}
formula 示例: "(w708 - w665) / (w708 + w665)"
"""
# 预处理w708 → _B708避免与 Python 关键字冲突)
processed = re.sub(r'[wW](\d+)', r'_B\1', formula)
# 构建局部变量表_B708 = band_data[708]
local_vars = {f"_B{wv}": arr for wv, arr in band_data.items()}
local_vars.update(self._SAFE_NS)
try:
result = eval(processed, {"__builtins__": {}}, local_vars)
return result
except Exception as e:
print(f" ⚠ 公式求值失败 [{formula}]: {e}")
return None
# ------------------------------------------------------------------
# 单波段读取(带 NoData 处理)
# ------------------------------------------------------------------
@staticmethod
def _read_band_as_float(bsq_path: str, band_idx: int) -> np.ndarray:
"""读取 BSQ 指定波段1-based返回 float64NaN 替换 NoData。"""
ds = gdal.Open(bsq_path, gdal.GA_ReadOnly)
if ds is None:
raise RuntimeError(f"无法用 GDAL 打开影像: {bsq_path}")
band = ds.GetRasterBand(band_idx)
arr = band.ReadAsArray()
nodata = band.GetNoDataValue()
ds = None
arr = arr.astype(np.float64)
if nodata is not None:
arr = np.where(arr == nodata, np.nan, arr)
return arr
# ------------------------------------------------------------------
# 核心处理:逐公式矩阵运算 + GeoTIFF 输出
# ------------------------------------------------------------------
def process_bsq(
self,
bsq_path: str,
hdr_path: Optional[str] = None,
output_dir: Optional[str] = None,
formula_names: Optional[List[str]] = None,
water_mask: Optional[np.ndarray] = None,
nodata_value: float = -9999.0,
progress_callback: Optional[Callable[[str, float], None]] = None,
) -> Dict[str, str]:
"""逐公式处理 BSQ 影像,输出 GeoTIFF。
Parameters
----------
bsq_path : str
去耀斑 BSQ 影像路径
hdr_path : str, optional
ENVI HDR 文件路径None → 自动构造)
output_dir : str, optional
输出目录None → 与 bsq_path 同目录下的 10_WaterIndex_Images/
formula_names : list, optional
要处理的公式名列表None → 处理全部)
water_mask : np.ndarray, optional
水域掩膜数组(与 BSQ 同形状),掩膜值为 0 表示陆地,
将被强制赋值为 nodata_value
nodata_value : float
NoData 标记值
progress_callback : callable, optional
回调 (msg: str, pct: float)
Returns
-------
dict
{公式名: 输出 GeoTIFF 路径}
"""
# ── 自动构造 HDR 路径 ────────────────────────────────────────────
if hdr_path is None:
hdr_path = os.path.splitext(bsq_path)[0] + '.hdr'
if not os.path.isfile(hdr_path):
hdr_path_alt = os.path.splitext(bsq_path)[0] + '.HDR'
if os.path.isfile(hdr_path_alt):
hdr_path = hdr_path_alt
# ── 自动构造输出目录 ────────────────────────────────────────────
if output_dir is None:
output_dir = os.path.join(os.path.dirname(bsq_path), '10_WaterIndex_Images')
os.makedirs(output_dir, exist_ok=True)
def progress(msg: str, pct: float):
if progress_callback:
progress_callback(msg, pct)
# ── 获取影像元数据 ───────────────────────────────────────────────
progress("正在打开影像并读取元数据…", 2)
meta = self.get_image_metadata(bsq_path, hdr_path)
width = meta.get('width', 0)
height = meta.get('height', 0)
n_bands = meta.get('bands', 0)
wavelengths = meta.get('wavelengths', [])
geotransform = meta.get('geotransform')
projection = meta.get('projection')
if n_bands == 0 or width == 0 or height == 0:
raise ValueError(f"影像元数据无效,无法处理: {bsq_path}")
if not wavelengths:
raise ValueError(f"无法从 {hdr_path} 读取波长信息,公式无法解析")
progress(
f"影像: {width}×{height}像素, {n_bands}波段, "
f"波长 {wavelengths[0]:.1f}{wavelengths[-1]:.1f}nm",
5
)
# ── 过滤要处理的公式 ──────────────────────────────────────────────
if formula_names:
formulas_to_run = [
f for f in self.formulas
if f.get('Formula_Name', '').strip() in formula_names
]
else:
formulas_to_run = list(self.formulas)
results: Dict[str, str] = {}
total = len(formulas_to_run)
# ── 逐公式处理 ───────────────────────────────────────────────────
for i, formula_row in enumerate(formulas_to_run):
fname = formula_row.get('Formula_Name', '').strip()
fstr = formula_row.get('Formula', '').strip()
category = formula_row.get('Category', '').strip()
ftype = formula_row.get('Formula_Type', '').strip()
if not fname or not fstr:
continue
progress(
f"[{i + 1}/{total}] {fname} ({category})",
5 + 90 * i / total
)
try:
# 1) 提取公式所需的波长列表
required_wvs = self._parse_formula_wavelengths(fstr)
# 2) 按需读取波段数据(相同波长只读一次)
band_data: Dict[int, np.ndarray] = {}
for wv in required_wvs:
if wv not in band_data:
band_idx = self._find_nearest_band_index(wv, wavelengths)
if not (0 < band_idx <= n_bands):
print(f" ⚠ 公式 '{fname}' 引用波段 {band_idx},超出范围 ({n_bands}),跳过")
raise ValueError(f"波段 {band_idx} 超出影像范围")
band_data[wv] = self._read_band_as_float(bsq_path, band_idx)
# 3) 矩阵运算
index_arr = self._eval_formula_fast(fstr, band_data)
if index_arr is None:
print(f" ⚠ 公式 '{fname}' 计算失败,跳过")
continue
# 4) NoData 处理NaN / Inf → nodata_value
index_arr = np.where(np.isfinite(index_arr), index_arr, nodata_value)
# 4b) 水域掩膜拦截陆地像素mask==0强制赋 NoData
if water_mask is not None:
land_pixels = (water_mask == 0)
land_count = int(land_pixels.sum())
if land_count > 0:
index_arr = np.where(land_pixels, nodata_value, index_arr)
print(f" 🗺 掩膜处理:陆地像素 {land_count:,} 个已设为 NoData")
# 5) 输出 GeoTIFF
safe_fname = re.sub(r'[^\w\u4e00-\u9fff-]', '_', fname)
out_tif = os.path.join(output_dir, f"{safe_fname}.tif")
self._write_geotiff(
out_path=out_tif,
data=index_arr,
reference_bsq=bsq_path,
nodata_value=nodata_value,
description=f"{fname}|{category}|{ftype}|{fstr}",
)
results[fname] = out_tif
valid = index_arr[index_arr != nodata_value]
mean_val = float(np.mean(valid)) if valid.size else np.nan
print(f"{fname}{out_tif} (mean={mean_val:.4f})")
except ValueError as ve:
print(f" ⏭ 跳过 '{fname}': {ve}")
continue
except Exception as e:
print(f" ❌ 公式 '{fname}' 失败: {e}\n{traceback.format_exc()}")
continue
progress(f"完成!共输出 {len(results)} / {total} 个指数图", 100)
return results
def _write_geotiff(
self,
out_path: str,
data: np.ndarray,
reference_bsq: str,
nodata_value: float = -9999.0,
description: str = "",
) -> None:
"""将数组写入 GeoTIFF克隆原始 BSQ 的地理信息。
Parameters
----------
out_path : str
输出 GeoTIFF 路径
data : np.ndarray
2D 数据数组height, width
reference_bsq : str
参考 BSQ 影像路径(用于克隆 GeoTransform / Projection
nodata_value : float
NoData 标记值
description : str
GDAL 数据集描述
"""
height, width = data.shape
driver = gdal.GetDriverByName('GTiff')
if driver is None:
raise RuntimeError("GDAL GTiff 驱动不可用")
out_ds = driver.Create(
out_path,
width, height,
1,
gdal.GDT_Float32,
options=['COMPRESS=LZW', 'TILED=YES', 'BIGTIFF=IF_SAFER'],
)
if out_ds is None:
raise RuntimeError(f"无法创建 GeoTIFF: {out_path}")
# 写入数据
out_band = out_ds.GetRasterBand(1)
out_band.SetNoDataValue(nodata_value)
out_band.WriteArray(data)
out_band.FlushCache()
# 写入描述
if description:
out_band.SetDescription(description)
# ★★★ 克隆原始 BSQ 的 GeoTransform 和 Projection ★★★
ref_ds = gdal.Open(reference_bsq, gdal.GA_ReadOnly)
if ref_ds is not None:
gt = ref_ds.GetGeoTransform()
proj = ref_ds.GetProjection()
if gt and gt != (0, 1, 0, 0, 0, 1):
out_ds.SetGeoTransform(gt)
if proj:
out_ds.SetProjection(proj)
ref_ds = None
out_ds = None
# ------------------------------------------------------------------
# Pipeline 入口(供 PipelineRunner 调用)
# ------------------------------------------------------------------
def run_inversion(
self,
deglint_img_path: str,
work_dir: str,
formula_csv_path: Optional[str] = None,
selected_formulas: Optional[List[str]] = None,
water_mask_path: Optional[str] = None,
nodata_value: float = -9999.0,
callback: Optional[Callable] = None,
**kwargs,
) -> Dict[str, str]:
"""Pipeline 入口方法。
Parameters
----------
deglint_img_path : str
去耀斑影像 BSQ 路径
work_dir : str
工作目录
formula_csv_path : str, optional
waterindex.csv 路径None → 使用初始化时的路径)
selected_formulas : list, optional
要处理的公式列表
water_mask_path : str, optional
水域掩膜路径(如 1_water_mask/water_mask.dat
掩膜中为 0 的像素视为陆地区域,其指数值将被强制设为 NoData。
nodata_value : float
NoData 标记值,默认 -9999.0
callback : callable, optional
进度回调
Returns
-------
dict
{公式名: 输出 GeoTIFF 路径}
"""
# 重新加载公式(如指定了新路径)
if formula_csv_path:
self.reload(formula_csv_path)
elif not self.formulas:
raise RuntimeError("WaterIndexProcessor 未加载公式,请指定 formula_csv_path")
def notify(msg: str, pct: float):
if callback:
callback(msg, pct)
notify("开始水色指数反演", 0)
bsq_path = deglint_img_path
hdr_path = os.path.splitext(bsq_path)[0] + '.hdr'
if not os.path.isfile(hdr_path):
hdr_path_alt = os.path.splitext(bsq_path)[0] + '.HDR'
if os.path.isfile(hdr_path_alt):
hdr_path = hdr_path_alt
output_dir = os.path.join(work_dir, "10_WaterIndex_Images")
# ── 加载水域掩膜(可选)───────────────────────────────────────
water_mask: Optional[np.ndarray] = None
if water_mask_path:
if os.path.isfile(water_mask_path):
try:
import rasterio
with rasterio.open(water_mask_path) as msrc:
water_mask = msrc.read(1)
print(f"[run_inversion] 水域掩膜已加载: {water_mask_path}"
f"形状={water_mask.shape}"
f"陆地区域(0)={int((water_mask == 0).sum())}"
f"水区域(>0)={int((water_mask > 0).sum())}")
except Exception as mask_err:
print(f"[run_inversion] ⚠ 掩膜加载失败,跳过掩膜处理: {mask_err}")
water_mask = None
else:
print(f"[run_inversion] ⚠ 水域掩膜文件不存在: {water_mask_path},跳过掩膜处理")
notify("水色指数处理中…", 20)
results = self.process_bsq(
bsq_path=bsq_path,
hdr_path=hdr_path,
output_dir=output_dir,
formula_names=selected_formulas,
water_mask=water_mask,
nodata_value=nodata_value,
progress_callback=lambda m, p: notify(m, 20 + 70 * p / 100),
)
notify("水色指数反演完成", 100)
return results

View File

@ -899,11 +899,11 @@ def get_spectral_in_coor(imgpath, coorpath, outpath, radius=0, flare_path=None,
if __name__ == '__main__':
# 在这里直接设置参数
imgpath = r"D:\BaiduNetdiskDownload\yaobao\result3.bsq"# BIL格式影像文件路径
coorpath = r"E:\code\WQ\封装\work_dir\4_processed_data\processed_data.csv"# CSV格式坐标文件路径第1、2列为纬度和经度
coorpath = r"E:\code\WQ\封装\work_dir\5_Data_Cleaning\processed_data.csv"# CSV格式坐标文件路径第1、2列为纬度和经度
output_path = r"E:\code\WQ\封装\test/yangdian_output.csv" # CSV格式输出文件路径
radius = 5 # 采样半径像素0表示单点采样>0表示半径内平均
flare_path = r"E:\code\WQ\封装\work_dir\2_glint\severe_glint_area.dat" # 耀斑掩膜文件路径可选None表示不使用
flare_path = r"E:\code\WQ\封装\work_dir\2_Glint_Detection\severe_glint_area.dat" # 耀斑掩膜文件路径可选None表示不使用
boundary_path ="D:\BaiduNetdiskDownload\yaobao\water_mask.dat" # 边界掩膜文件路径可选None表示不使用
source_epsg = 4326 # 源坐标系EPSG代码默认为4326 (WGS84地理坐标系)

View File

@ -2,6 +2,7 @@ from osgeo import gdal, osr
import numpy as np
import pandas as pd
import os
import re
import spectral
from math import sin, cos, tan, sqrt, radians
@ -222,6 +223,73 @@ def get_hdr_file_path(file_path):
return os.path.splitext(file_path)[0] + ".hdr"
def load_wavelength_columns(imgpath, num_bands):
"""
加载 wavelength 列名(鲁棒版:三级回退)
优先级:
1) spectral.envi.read_envi_header标准库解析依赖 ENVI 头完整性)
2) 纯文本暴力解析 .hdr兜底绕过 spectral 对 band names / 波段数一致性的校验)
—— 解决 .hdr 中 band names 数量与 bands 不符导致的标准库解析失败问题
3) 最后回退band_1, band_2, ..., band_N
Args:
imgpath: 影像文件路径(.bsq / .bil / .bip 等)
num_bands: 影像实际波段数(用于回退列名长度 & 不一致警告)
Returns:
spectral_columns: 长度为 num_bands 的字符串列表(与原代码列名格式一致:纯数字字符串)
"""
hdr_path = get_hdr_file_path(imgpath)
# 1) 标准库解析
try:
in_hdr_dict = spectral.envi.read_envi_header(hdr_path)
wavelengths = np.array(in_hdr_dict['wavelength']).astype('float64')
spectral_columns = [str(wl) for wl in wavelengths]
print(f"[wavelength] 标准库解析成功,从 {hdr_path} 提取 {len(spectral_columns)} 个波长")
if len(spectral_columns) != num_bands:
print(f"[wavelength] 警告: 解析波长数 ({len(spectral_columns)}) 与影像波段数 ({num_bands}) 不一致,将以 num_bands 为准截断/补齐")
if len(spectral_columns) > num_bands:
spectral_columns = spectral_columns[:num_bands]
elif len(spectral_columns) < num_bands:
spectral_columns = spectral_columns + [f"band_{j+1}" for j in range(len(spectral_columns), num_bands)]
return spectral_columns
except Exception as e_std:
print(f"[wavelength] 标准库解析失败: {str(e_std)},将尝试文本兜底解析")
# 2) 兜底:纯文本暴力解析
try:
if not os.path.isfile(hdr_path):
print(f"[wavelength] 文本兜底失败: {hdr_path} 不存在")
else:
with open(hdr_path, 'r', encoding='utf-8', errors='ignore') as f:
hdr_text = f.read()
pattern = r'wavelength\s*=\s*\{([^}]+)\}'
m = re.search(pattern, hdr_text, flags=re.IGNORECASE | re.DOTALL)
if m:
inner = m.group(1)
tokens = [t.strip() for t in inner.split(',') if t.strip()]
if tokens:
if len(tokens) != num_bands:
print(f"[wavelength] 文本解析波长数 ({len(tokens)}) 与影像波段数 ({num_bands}) 不一致,将以 num_bands 为准截断/补齐")
if len(tokens) > num_bands:
tokens = tokens[:num_bands]
elif len(tokens) < num_bands:
tokens = tokens + [f"band_{j+1}" for j in range(len(tokens), num_bands)]
print(f"[wavelength] 文本暴力解析成功,从 {hdr_path} 提取 {len(tokens)} 个真实波长")
return tokens
print(f"[wavelength] 文本兜底: 已匹配到 wavelength = {{ ... }},但内部为空")
else:
print(f"[wavelength] 文本兜底: 未在 {hdr_path} 中匹配到 wavelength = {{ ... }} 字段")
except Exception as e_txt:
print(f"[wavelength] 文本兜底解析异常: {str(e_txt)}")
# 3) 全部失败,最后回退
print(f"[wavelength] 所有解析路径均失败,回退到 band_1..band_{num_bands}")
return ["band_" + str(j + 1) for j in range(num_bands)]
def calculate_utm_zone(longitude):
"""
根据经度计算UTM分区号
@ -473,9 +541,56 @@ def get_spectral_in_coor(imgpath, coorpath, outpath, radius=0, flare_path=None,
for i in range(min(3, coor_data.shape[0])):
print(f"{i + 1}: {coor_data[i, :min(5, coor_data.shape[1])]}") # 只显示前5列
# 提取原始坐标
lat_array = coor_data[:, 0] # 第1列是纬度
lon_array = coor_data[:, 1] # 第2列是经度
# 提取原始坐标(使用智能坐标列检测)
lon_patterns = [
r'^lon', r'^lng', r'^longitude', r'经度', r'^x$', r'^utm_x$', r'^pixel_x$'
]
lat_patterns = [
r'^lat', r'^latitude', r'纬度', r'^y$', r'^utm_y$', r'^pixel_y$'
]
x_col_name, y_col_name = None, None
if coor_df is not None and hasattr(coor_df, 'columns'):
for col in coor_df.columns:
col_str = str(col).lower().strip()
if x_col_name is None and any(re.search(p, col_str) for p in lon_patterns):
x_col_name = col
if y_col_name is None and any(re.search(p, col_str) for p in lat_patterns):
y_col_name = col
if x_col_name and y_col_name and x_col_name in coor_df.columns and y_col_name in coor_df.columns:
lon_array = coor_df[x_col_name].values
lat_array = coor_df[y_col_name].values
print(f"💡 坐标列名检测: X/经度=[{x_col_name}], Y/纬度=[{y_col_name}]")
else:
numeric_cols = coor_df.select_dtypes(include=[np.number]).columns.tolist() if coor_df is not None else []
if len(numeric_cols) >= 2:
col1, col2 = numeric_cols[0], numeric_cols[1]
mean1 = coor_df[col1].head(10).mean()
mean2 = coor_df[col2].head(10).mean()
if abs(mean1) <= 90 and abs(mean2) > 90:
y_col_name, x_col_name = col1, col2
lon_array = coor_df[x_col_name].values
lat_array = coor_df[y_col_name].values
elif abs(mean2) <= 90 and abs(mean1) > 90:
x_col_name, y_col_name = col1, col2
lon_array = coor_df[x_col_name].values
lat_array = coor_df[y_col_name].values
else:
if mean1 > mean2:
x_col_name, y_col_name = col1, col2
else:
x_col_name, y_col_name = col2, col1
lon_array = coor_df[x_col_name].values
lat_array = coor_df[y_col_name].values
print(f"💡 触发智能数值推断坐标列: X/经度=[{x_col_name}], Y/纬度=[{y_col_name}]")
else:
if coor_data is not None and coor_data.shape[1] >= 3:
lat_array = coor_data[:, 1]
lon_array = coor_data[:, 2]
else:
raise Exception("坐标文件格式错误需要至少2列数据且最好包含坐标列名如lon/lat/经度/纬度)")
print(f"\n=== 原始坐标信息 ===")
print(f"原始坐标范围: 经度 {np.min(lon_array):.6f} ~ {np.max(lon_array):.6f}, 纬度 {np.min(lat_array):.6f} ~ {np.max(lat_array):.6f}")
@ -711,17 +826,8 @@ def get_spectral_in_coor(imgpath, coorpath, outpath, radius=0, flare_path=None,
else:
original_columns = []
# 读取波长信息,用作光谱列名
wavelengths = None
try:
in_hdr_dict = spectral.envi.read_envi_header(get_hdr_file_path(imgpath))
wavelengths = np.array(in_hdr_dict['wavelength']).astype('float64')
# 将波长值转换为字符串作为列名
spectral_columns = [str(wl) for wl in wavelengths]
print(f"成功读取波长信息,共 {len(spectral_columns)} 个波段")
except Exception as e:
print(f"警告: 无法读取波长信息 ({str(e)}),使用默认列名 band_1, band_2, ...")
spectral_columns = ["band_" + str(j + 1) for j in range(num_bands)]
# 读取波长信息,用作光谱列名三级回退spectral 解析 → 文本暴力解析 → band_N 兜底)
spectral_columns = load_wavelength_columns(imgpath, num_bands)
# 构建输出列名不包含前两列坐标列和UTM列
all_columns = original_columns + spectral_columns
@ -758,11 +864,11 @@ def get_spectral_in_coor(imgpath, coorpath, outpath, radius=0, flare_path=None,
if __name__ == '__main__':
# 在这里直接设置参数
imgpath = r"E:\code\WQ\封装\work_dir\3_deglint\deglint_goodman.bsq" # BIL格式影像文件路径
coorpath = r"E:\code\WQ\封装\work_dir\4_processed_data\processed_data.csv"# CSV格式坐标文件路径第1、2列为纬度和经度
output_path = r"E:\code\WQ\封装\work_dir\5_training_spectra/yangdian_output.csv" # CSV格式输出文件路径
coorpath = r"E:\code\WQ\封装\work_dir\5_Data_Cleaning\processed_data.csv"# CSV格式坐标文件路径第1、2列为纬度和经度
output_path = r"E:\code\WQ\封装\work_dir\6_Spectral_Feature_Extraction/yangdian_output.csv" # CSV格式输出文件路径
radius = 5 # 采样半径像素0表示单点采样>0表示半径内平均
flare_path = r"E:\code\WQ\封装\work_dir\2_glint\severe_glint_area.dat" # 耀斑掩膜文件路径可选None表示不使用
flare_path = r"E:\code\WQ\封装\work_dir\2_Glint_Detection\severe_glint_area.dat" # 耀斑掩膜文件路径可选None表示不使用
boundary_path = r"D:\BaiduNetdiskDownload\yaobao\water_mask.dat" # 边界掩膜文件路径可选None表示不使用
source_epsg = 4326 # 源坐标系EPSG代码默认为4326 (WGS84地理坐标系)

View File

@ -0,0 +1,46 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
步骤处理器包
将 WaterQualityInversionPipeline 的 14 个巨型 step* 方法
拆分为独立的 Handler 类,每个 Handler 实现 BaseStepHandler 接口。
调度器PipelineScheduler仅维护执行上下文并根据 step_key
从注册表查找对应 Handler 执行,自身不再包含任何算法逻辑。
"""
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.handlers.step1_water_mask import Step1WaterMaskHandler
from src.core.handlers.step2_glint_detection import Step2GlintDetectionHandler
from src.core.handlers.step3_glint_removal import Step3GlintRemovalHandler
from src.core.handlers.step4_sampling import Step4SamplingHandler
from src.core.handlers.step5_process_csv import Step5ProcessCsvHandler
from src.core.handlers.step6_extract_spectra import Step6ExtractSpectraHandler
from src.core.handlers.step7_calc_indices import Step7CalcIndicesHandler
from src.core.handlers.step8_ml_train import Step8MlTrainHandler
from src.core.handlers.step9_ml_predict import Step9MlPredictHandler
from src.core.handlers.step10_qaa_inversion import Step10QaaInversionHandler
from src.core.handlers.step11_concentration import Step11ConcentrationHandler
from src.core.handlers.step12_kriging import Step12KrigingHandler
from src.core.handlers.step13_visualization import Step13VisualizationHandler
from src.core.handlers.step14_report import Step14ReportHandler
__all__ = [
'BaseStepHandler',
'PipelineContext',
'Step1WaterMaskHandler',
'Step2GlintDetectionHandler',
'Step3GlintRemovalHandler',
'Step4SamplingHandler',
'Step5ProcessCsvHandler',
'Step6ExtractSpectraHandler',
'Step7CalcIndicesHandler',
'Step8MlTrainHandler',
'Step9MlPredictHandler',
'Step10QaaInversionHandler',
'Step11ConcentrationHandler',
'Step12KrigingHandler',
'Step13VisualizationHandler',
'Step14ReportHandler',
]

282
src/core/handlers/base.py Normal file
View File

@ -0,0 +1,282 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Handler 基类与 Pipeline 执行上下文
BaseStepHandler —— 所有步骤 Handler 的抽象基类,定义统一的 execute 接口。
PipelineContext —— 在 Handler 之间传递的共享状态容器(路径、计时、回调等)。
设计原则:
- Handler 只负责"执行一个步骤的算法逻辑",不管理调度/依赖/跳过。
- Context 是 Handler 之间唯一的共享状态通道。
- 调度器PipelineScheduler负责遍历 config、查找 Handler、调用 execute。
"""
from __future__ import annotations
import time
from abc import ABC, abstractmethod
from datetime import datetime
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
class PipelineContext:
"""管道执行上下文 —— Handler 之间共享状态的唯一载体。
包含:
- 工作目录及子目录
- 中间结果路径water_mask_path, glint_mask_path, ...
- 步骤计时记录
- 回调函数(用于 GUI 进度通知)
- 可视化/报告生成器实例
"""
def __init__(self, work_dir: str = "./work_dir"):
self.work_dir = Path(work_dir)
self.work_dir.mkdir(parents=True, exist_ok=True)
# ── 子目录 ──
self.water_mask_dir = self.work_dir / "1_water_mask"
self.glint_dir = self.work_dir / "2_Glint_Detection"
self.deglint_dir = self.work_dir / "3_deglint"
self.processed_data_dir = self.work_dir / "5_Data_Cleaning"
self.training_spectra_dir = self.work_dir / "6_Spectral_Feature_Extraction"
self.indices_dir = self.work_dir / "7_Water_Quality_Indices"
self.models_dir = self.work_dir / "8_Supervised_Model_Training"
self.non_empirical_models_dir = self.work_dir / "8_Non_Empirical_Regression"
self.custom_regression_dir = self.work_dir / "13_Custom_Regression"
self.sampling_dir = self.work_dir / "4_sampling"
self.prediction_dir = self.work_dir / "11_12_13_predictions"
self.visualization_dir = self.work_dir / "14_visualization"
self.reports_dir = self.work_dir / "reports"
for d in [self.water_mask_dir, self.glint_dir, self.deglint_dir,
self.processed_data_dir, self.training_spectra_dir,
self.indices_dir, self.models_dir, self.non_empirical_models_dir,
self.custom_regression_dir, self.sampling_dir, self.prediction_dir,
self.visualization_dir, self.reports_dir]:
d.mkdir(parents=True, exist_ok=True)
# ── 中间结果路径 ──
self.water_mask_path: Optional[str] = None
self.glint_mask_path: Optional[str] = None
self.interpolated_img_path: Optional[str] = None
self.deglint_img_path: Optional[str] = None
self.processed_csv_path: Optional[str] = None
self.training_csv_path: Optional[str] = None
self.indices_path: Optional[str] = None
self.custom_regression_path: Optional[str] = None
self.sampling_csv_path: Optional[str] = None
self.prediction_files: Dict[str, str] = {}
self.distribution_map_path: Optional[str] = None
self.qaa_output_path: Optional[str] = None
self.concentration_output_path: Optional[str] = None
# ── 计时 ──
self.step_timings: Dict[str, dict] = {}
self.pipeline_start_time: Optional[float] = None
self.pipeline_end_time: Optional[float] = None
# ── 回调 ──
self._callback: Optional[Callable] = None
# ── 可视化组件(延迟导入避免循环依赖)──
self._visualizer = None
self._report_generator = None
self._scatter_batch = None
# ── matplotlib 中文字体 ──
plt.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei',
'DejaVu Sans', 'Arial Unicode MS']
plt.rcParams['axes.unicode_minus'] = False
# ═══════════════════════════════════════════════════════════
# 回调
# ═══════════════════════════════════════════════════════════
def set_callback(self, callback: Callable):
"""设置回调函数,用于向 GUI 报告进度。
Args:
callback: 签名为 callback(step_name, status, message="")
status: 'start' | 'completed' | 'skipped' | 'error' | 'info' | 'warning'
"""
self._callback = callback
def notify(self, step_name: str, status: str, message: str = ""):
"""通知回调函数。"""
if self._callback:
try:
self._callback(step_name, status, message)
except Exception as e:
print(f"回调函数执行失败: {e}")
# ═══════════════════════════════════════════════════════════
# 计时
# ═══════════════════════════════════════════════════════════
def record_step_time(self, step_name: str, start_time: float, end_time: float,
status: str = "completed", error: Optional[str] = None):
elapsed = end_time - start_time
self.step_timings[step_name] = {
'start_time': datetime.fromtimestamp(start_time).strftime('%Y-%m-%d %H:%M:%S'),
'end_time': datetime.fromtimestamp(end_time).strftime('%Y-%m-%d %H:%M:%S'),
'elapsed_seconds': elapsed,
'elapsed_formatted': self._format_time(elapsed),
'status': status,
'error': error,
}
@staticmethod
def _format_time(seconds: float) -> str:
if seconds < 60:
return f"{seconds:.2f}"
elif seconds < 3600:
minutes = int(seconds // 60)
secs = seconds % 60
return f"{minutes}{secs:.2f}"
else:
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = seconds % 60
return f"{hours}小时{minutes}{secs:.2f}"
# ═══════════════════════════════════════════════════════════
# 可视化组件(延迟导入)
# ═══════════════════════════════════════════════════════════
@property
def visualizer(self):
if self._visualizer is None:
from src.postprocessing.visualization_reports import WaterQualityVisualization
self._visualizer = WaterQualityVisualization(str(self.visualization_dir))
return self._visualizer
@property
def report_generator(self):
if self._report_generator is None:
from src.postprocessing.visualization_reports import ReportGenerator
self._report_generator = ReportGenerator(str(self.reports_dir))
return self._report_generator
@property
def scatter_batch(self):
if self._scatter_batch is None:
from src.core.prediction.sctter_batch import WaterQualityScatterBatch
self._scatter_batch = WaterQualityScatterBatch()
return self._scatter_batch
# ═══════════════════════════════════════════════════════════
# 步骤输出目录查找(兼容旧接口)
# ═══════════════════════════════════════════════════════════
_STEP_OUTPUT_DIR_MAP: Optional[Dict[str, Path]] = None
def _ensure_step_dir_map(self) -> Dict[str, Path]:
if PipelineContext._STEP_OUTPUT_DIR_MAP is not None:
return PipelineContext._STEP_OUTPUT_DIR_MAP
wp = self.work_dir
m = {
'step1': wp / '1_water_mask',
'step2': wp / '2_Glint_Detection',
'step3': wp / '3_deglint',
'step4_sampling': wp / '4_sampling',
'step5_clean': wp / '5_Data_Cleaning',
'step6_feature': wp / '6_Spectral_Feature_Extraction',
'step7_index': wp / '7_Water_Quality_Indices',
'step8_ml_train': wp / '8_Supervised_Model_Training',
'step9_ml_predict': wp / '8_Non_Empirical_Regression',
'step10_watercolor': wp / '10_WaterIndex_Images',
'step11_map': wp / '14_visualization',
'step12_viz': wp / '14_visualization',
'step13_report': wp / '14_visualization',
'step11_predictions': wp / '11_12_13_predictions',
'step12_predictions': wp / '11_12_13_predictions',
'step13_predictions': wp / '11_12_13_predictions',
'custom_regression': wp / '13_Custom_Regression',
'prediction_dir': wp / '11_12_13_predictions',
'visualization': wp / '14_visualization',
'reports': wp / 'reports',
'step8': wp / '8_Supervised_Model_Training',
'step9': wp / '8_Non_Empirical_Regression',
'step10': wp / '10_WaterIndex_Images',
'step11': wp / '11_12_13_predictions',
'step12': wp / '13_Custom_Regression',
'step13': wp / 'reports',
'step14': wp / '14_visualization',
}
PipelineContext._STEP_OUTPUT_DIR_MAP = m
return m
def get_step_output_dir(self, step_name: str) -> Path:
mapping = self._ensure_step_dir_map()
key = (step_name or '').strip()
if key in mapping:
return mapping[key]
print(f"[PipelineContext.get_step_output_dir] 未知 step_name={key!r},回退到 work_dir")
return self.work_dir
class BaseStepHandler(ABC):
"""步骤处理器抽象基类。
所有步骤 Handler 必须实现:
- step_key: 类属性,对应 config 中的 key'step1', 'step2', ...
- execute(context, config): 执行步骤逻辑,返回结果字典
用法示例::
class Step1WaterMaskHandler(BaseStepHandler):
step_key = 'step1'
def execute(self, ctx, config):
result = WaterMaskStep.run(...)
ctx.water_mask_path = result
return {'water_mask_path': result}
"""
# 子类必须定义:对应 config 字典中的 key
step_key: str = None
@abstractmethod
def execute(self, context: PipelineContext, config: dict) -> dict:
"""执行步骤逻辑。
Args:
context: 管道执行上下文(共享状态)
config: 该步骤的配置字典(即 config[self.step_key]
Returns:
结果字典,包含该步骤产生的输出路径等信息。
调度器会将返回值合并到全局结果中。
Raises:
Exception: 任何异常都会由调度器捕获并记录。
"""
...
def _resolve_path(self, explicit: Optional[str], fallback: Optional[str],
label: str = "path") -> Optional[str]:
"""解析路径:优先使用显式传入值,否则回退到上下文中的缓存值。
Args:
explicit: 调用方显式传入的路径
fallback: 上下文中的缓存路径
label: 用于日志的标签
Returns:
解析后的路径,若两者均为 None 则返回 None
"""
if explicit is not None:
return explicit
if fallback is not None:
return fallback
return None

View File

@ -0,0 +1,199 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
极简管道调度器
替代原 WaterQualityInversionPipeline2598 行上帝类)的调度核心。
调度器自身不包含任何算法逻辑,仅负责:
1. 维护 PipelineContext共享状态
2. 根据 config key 从 Handler 注册表查找对应处理器
3. 按序调用 handler.execute(ctx, config),收集结果
4. 异常时记录错误并继续(或中止,取决于配置)
Handler 注册表是 step_key → BaseStepHandler 的映射。
新增步骤只需:写一个 Handler 类 + 在注册表中加一行。
"""
from __future__ import annotations
import time
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional
from src.core.handlers.base import BaseStepHandler, PipelineContext
class PipelineScheduler:
"""极简管道调度器。
用法::
scheduler = PipelineScheduler(work_dir="./work_dir")
scheduler.register_handler(Step1WaterMaskHandler())
scheduler.register_handler(Step2GlintDetectionHandler())
# ... 注册所有步骤 ...
scheduler.set_callback(my_callback) # 可选GUI 进度回调
result = scheduler.run_full_pipeline(config)
# result['step1'] → {'water_mask_path': ...}
# result['step2'] → {'glint_mask_path': ...}
# ...
"""
def __init__(self, work_dir: str = "./work_dir"):
self.ctx = PipelineContext(work_dir)
self._handlers: Dict[str, BaseStepHandler] = {}
# ═══════════════════════════════════════════════════════════
# Handler 注册
# ═══════════════════════════════════════════════════════════
def register_handler(self, handler: BaseStepHandler):
"""注册一个步骤处理器。
Args:
handler: BaseStepHandler 实例(其 step_key 类属性决定 config 中的 key
"""
if handler.step_key is None:
raise ValueError(
f"Handler {type(handler).__name__} 未定义 step_key 类属性"
)
self._handlers[handler.step_key] = handler
def register_handlers(self, handlers: List[BaseStepHandler]):
"""批量注册步骤处理器。"""
for h in handlers:
self.register_handler(h)
# ═══════════════════════════════════════════════════════════
# 回调
# ═══════════════════════════════════════════════════════════
def set_callback(self, callback: Callable):
"""设置 GUI 进度回调,代理到 PipelineContext。"""
self.ctx.set_callback(callback)
# ═══════════════════════════════════════════════════════════
# 单步执行
# ═══════════════════════════════════════════════════════════
def run_step(self, step_key: str, config: dict) -> Dict[str, Any]:
"""执行单个步骤。
Args:
step_key: 步骤 key'step1', 'step2', ...
config: 该步骤的配置字典
Returns:
步骤执行结果字典
Raises:
KeyError: 如果 step_key 未注册 Handler
Exception: 步骤执行中的任何异常
"""
handler = self._handlers.get(step_key)
if handler is None:
raise KeyError(
f"未注册的步骤: {step_key!r}"
f"已注册: {list(self._handlers.keys())}"
)
self.ctx.notify(handler.step_key, 'start')
result = handler.execute(self.ctx, config)
self.ctx.notify(handler.step_key, 'completed')
return result
# ═══════════════════════════════════════════════════════════
# 全流程执行
# ═══════════════════════════════════════════════════════════
def run_full_pipeline(self, config: Dict[str, dict]) -> Dict[str, Any]:
"""按 config 中的 key 顺序执行全流程。
遍历 config 的顶层 key对每个 key
- 如果已注册 Handler → 执行并收集结果
- 如果未注册 → 跳过并通知
- 如果执行失败 → 记录错误,继续执行后续步骤(不中止)
Args:
config: 全流程配置字典,格式为 {step_key: step_config, ...}
例如: {'step1': {...}, 'step2': {...}, ...}
Returns:
{
'step_results': {step_key: result_dict, ...},
'step_timings': {...},
'total_elapsed': float,
'errors': {step_key: error_message, ...},
}
"""
self.ctx.pipeline_start_time = time.time()
step_results: Dict[str, Any] = {}
errors: Dict[str, str] = {}
# 按 config 中的顺序遍历Python 3.7+ dict 保序)
for step_key, step_config in config.items():
handler = self._handlers.get(step_key)
if handler is None:
self.ctx.notify(step_key, 'skipped', '未注册 Handler')
continue
try:
result = handler.execute(self.ctx, step_config)
step_results[step_key] = result
self.ctx.notify(step_key, 'completed', str(result))
except Exception as e:
error_msg = f"{type(e).__name__}: {e}"
errors[step_key] = error_msg
step_results[step_key] = {'error': error_msg}
self.ctx.notify(step_key, 'error', error_msg)
# 不中止,继续执行后续步骤
self.ctx.pipeline_end_time = time.time()
total_elapsed = self.ctx.pipeline_end_time - self.ctx.pipeline_start_time
return {
'step_results': step_results,
'step_timings': self.ctx.step_timings,
'total_elapsed': total_elapsed,
'total_elapsed_formatted': self.ctx._format_time(total_elapsed),
'errors': errors,
}
# ═══════════════════════════════════════════════════════════
# 便捷属性(代理到 PipelineContext
# ═══════════════════════════════════════════════════════════
@property
def work_dir(self) -> Path:
return self.ctx.work_dir
@property
def water_mask_path(self) -> Optional[str]:
return self.ctx.water_mask_path
@property
def glint_mask_path(self) -> Optional[str]:
return self.ctx.glint_mask_path
@property
def deglint_img_path(self) -> Optional[str]:
return self.ctx.deglint_img_path
@property
def processed_csv_path(self) -> Optional[str]:
return self.ctx.processed_csv_path
@property
def training_csv_path(self) -> Optional[str]:
return self.ctx.training_csv_path
@property
def indices_path(self) -> Optional[str]:
return self.ctx.indices_path
def get_step_output_dir(self, step_name: str) -> Path:
return self.ctx.get_step_output_dir(step_name)

View File

@ -0,0 +1,57 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Handler 注册辅助函数
将所有步骤 Handler 一次性注册到 PipelineScheduler。
新增步骤只需在此函数中加一行 register_handler() 调用。
"""
from __future__ import annotations
from typing import TYPE_CHECKING
from src.core.handlers.step1_water_mask import Step1WaterMaskHandler
from src.core.handlers.step2_glint_detection import Step2GlintDetectionHandler
from src.core.handlers.step3_glint_removal import Step3GlintRemovalHandler
from src.core.handlers.step4_sampling import Step4SamplingHandler
from src.core.handlers.step5_process_csv import Step5ProcessCsvHandler
from src.core.handlers.step6_extract_spectra import Step6ExtractSpectraHandler
from src.core.handlers.step7_calc_indices import Step7CalcIndicesHandler
from src.core.handlers.step8_ml_train import Step8MlTrainHandler
from src.core.handlers.step9_ml_predict import Step9MlPredictHandler
from src.core.handlers.step10_qaa_inversion import Step10QaaInversionHandler
from src.core.handlers.step11_concentration import Step11ConcentrationHandler
from src.core.handlers.step12_kriging import Step12KrigingHandler
from src.core.handlers.step13_visualization import Step13VisualizationHandler
from src.core.handlers.step14_report import Step14ReportHandler
if TYPE_CHECKING:
from src.core.handlers.pipeline_scheduler import PipelineScheduler
def register_all_handlers(scheduler: PipelineScheduler):
"""将所有已实现的步骤 Handler 注册到调度器。
用法::
scheduler = PipelineScheduler(work_dir="./work_dir")
register_all_handlers(scheduler)
result = scheduler.run_full_pipeline(config)
新增步骤时,在此函数中追加一行 register_handler() 即可。
"""
scheduler.register_handler(Step1WaterMaskHandler())
scheduler.register_handler(Step2GlintDetectionHandler())
scheduler.register_handler(Step3GlintRemovalHandler())
scheduler.register_handler(Step4SamplingHandler())
scheduler.register_handler(Step5ProcessCsvHandler())
scheduler.register_handler(Step6ExtractSpectraHandler())
scheduler.register_handler(Step7CalcIndicesHandler())
scheduler.register_handler(Step8MlTrainHandler())
scheduler.register_handler(Step9MlPredictHandler())
scheduler.register_handler(Step10QaaInversionHandler())
scheduler.register_handler(Step11ConcentrationHandler())
scheduler.register_handler(Step12KrigingHandler())
scheduler.register_handler(Step13VisualizationHandler())
scheduler.register_handler(Step14ReportHandler())

View File

@ -0,0 +1,137 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step10 处理器QAA 准解析算法反演
将原 WaterQualityInversionPipeline.step8_qaa_inversion() 方法
剥离为独立的 Step10QaaInversionHandler。
"""
import os
import time
from typing import Any, Dict
import numpy as np
import pandas as pd
from src.core.handlers.base import BaseStepHandler, PipelineContext
class Step10QaaInversionHandler(BaseStepHandler):
"""步骤10QAA 准解析算法反演(非经验模型)。
对应 config key: 'step10_qaa'
直接使用 QAABaselineSolver 进行物理推导。
"""
step_key = 'step10_qaa'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
from src.core.algorithms.qaa.qaas_baseline import QAABaselineSolver
from src.utils.water_owt_config import get_lambda_0
step_start_time = time.time()
lake_name = config.get('lake_name', 'Unknown')
lambda_0 = config.get('lambda_0', get_lambda_0(lake_name))
output_dir = os.path.join(context.work_dir, "10_QAA_Inversion")
os.makedirs(output_dir, exist_ok=True)
output_path = config.get('output_path') or os.path.join(output_dir, "a_lambda_results.csv")
spectrum_csv = config.get('spectrum_csv_path')
if not spectrum_csv:
spectrum_csv = context.training_csv_path
if not spectrum_csv or not os.path.exists(spectrum_csv):
fallback_candidates = []
step6_dir = os.path.join(context.work_dir, "6_Spectral_Feature_Extraction")
if os.path.isdir(step6_dir):
for f in sorted(os.listdir(step6_dir)):
if f.lower().endswith('.csv'):
fallback_candidates.append(os.path.join(step6_dir, f))
if fallback_candidates:
spectrum_csv = fallback_candidates[0]
context.notify('step10_qaa', 'info',
f'spectrum_csv_path 为空,已自动回退到 step6 产物: {spectrum_csv}')
else:
msg = f'训练光谱 CSV 不存在或路径为空: {spectrum_csv}'
context.notify('step10_qaa', 'error', msg)
step_end_time = time.time()
context.record_step_time(
"步骤10: QAA 反演", step_start_time, step_end_time,
status="failed", error=msg
)
return {'error': msg}
try:
df = pd.read_csv(spectrum_csv, encoding="utf-8-sig")
col_names = df.columns.tolist()
wavelength_col_idx = None
for i, col in enumerate(col_names):
try:
float(col)
wavelength_col_idx = i
break
except (ValueError, TypeError):
pass
if wavelength_col_idx is None:
msg = "无法从 CSV 列名中识别波长信息"
context.notify('step10_qaa', 'error', msg)
step_end_time = time.time()
context.record_step_time(
"步骤10: QAA 反演", step_start_time, step_end_time,
status="failed", error=msg
)
return {'error': msg}
meta_df = df.iloc[:, :wavelength_col_idx].copy()
wavelengths = np.array([float(c) for c in col_names[wavelength_col_idx:]], dtype=np.float64)
data_matrix = df.iloc[:, wavelength_col_idx:].values.astype(np.float64)
if data_matrix.ndim == 1:
data_matrix = data_matrix[np.newaxis, :]
solver = QAABaselineSolver()
raw_result = solver.run_inversion(wavelengths, data_matrix, lambda_0)
if isinstance(raw_result, list):
sample_results = raw_result
else:
sample_results = [raw_result]
rows_out = []
for i, sample_result in enumerate(sample_results):
wl_arr = wavelengths
a_arr = sample_result['a_lambda']
bb_arr = sample_result['bb_lambda']
meta_row = meta_df.iloc[i].to_dict() if i < len(meta_df) else {}
for j, wl in enumerate(wl_arr):
rows_out.append({
'sample_id': f"sample_{i}",
'Wavelength': wl,
'a_lambda': a_arr[j],
'bb_lambda': bb_arr[j],
**meta_row,
})
result_df = pd.DataFrame(rows_out)
result_df.to_csv(output_path, index=False, float_format='%.8f')
context.qaa_output_path = output_path
step_end_time = time.time()
context.record_step_time(
"步骤10: QAA 反演", step_start_time, step_end_time
)
context.notify('step10_qaa', 'completed',
f"QAA 反演完毕,水域={lake_name},λ₀={lambda_0}nm")
return {'qaa_output_path': output_path}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤10: QAA 反演", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,71 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step11 处理器:浓度反演
将原 WaterQualityInversionPipeline.step9_concentration_inversion() 方法
剥离为独立的 Step11ConcentrationHandler。
"""
import os
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
class Step11ConcentrationHandler(BaseStepHandler):
"""步骤11浓度反演基于 QAA Step10 输出的 a_lambda/bb_lambda
对应 config key: 'step11_concentration'
直接使用 ConcentrationPipeline 进行浓度反演。
"""
step_key = 'step11_concentration'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
from src.core.algorithms.concentration_inversion import ConcentrationPipeline
step_start_time = time.time()
input_csv = config.get('input_csv') or context.qaa_output_path
output_csv = config.get('output_csv')
lake_case = config.get('lake_case', 'medium')
if not input_csv or not os.path.exists(input_csv):
msg = f"QAA 结果文件不存在或路径为空: {input_csv}"
context.notify('step11_concentration', 'error', msg)
step_end_time = time.time()
context.record_step_time(
"步骤11: 浓度反演", step_start_time, step_end_time,
status="failed", error=msg
)
return {'error': msg}
if not output_csv:
output_dir = os.path.join(context.work_dir, "11_Concentration")
os.makedirs(output_dir, exist_ok=True)
output_csv = os.path.join(output_dir, "final_concentrations.csv")
try:
pipeline = ConcentrationPipeline(lake_case=lake_case)
result_csv = pipeline.run_pipeline(input_csv, output_csv)
context.concentration_output_path = result_csv
step_end_time = time.time()
context.record_step_time(
"步骤11: 浓度反演", step_start_time, step_end_time
)
context.notify('step11_concentration', 'completed',
f"浓度反演完毕,结果保存于: {result_csv}")
return {'concentration_output_path': result_csv}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤11: 浓度反演", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,81 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step12 处理器:克里金空间插值与分布图生成
将原 WaterQualityInversionPipeline.step10_map() 方法
剥离为独立的 Step12KrigingHandler。
"""
import time
from pathlib import Path
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.mapping_step import MappingStep
class Step12KrigingHandler(BaseStepHandler):
"""步骤12克里金空间插值与分布图生成。
对应 config key: 'step12_kriging'
委托类: MappingStep.generate_distribution_map()
"""
step_key = 'step12_kriging'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
prediction_csv_path = config.get('prediction_csv_path')
boundary_shp_path = config.get('boundary_shp_path')
# 强制输出到 visualization_dir
csv_name = Path(prediction_csv_path).stem if prediction_csv_path else "distribution"
forced_image_path = str(context.visualization_dir / f"{csv_name}_distribution.png")
viz_dir_resolved = str(context.visualization_dir)
output_image_path = config.get('output_image_path')
if output_image_path and output_image_path != forced_image_path:
norm_user = output_image_path.replace('\\', '/').rstrip('/')
norm_viz = viz_dir_resolved.replace('\\', '/').rstrip('/')
if not norm_user.startswith(norm_viz + '/') and norm_user != norm_viz:
output_image_path = forced_image_path
else:
output_image_path = forced_image_path
try:
result = MappingStep.generate_distribution_map(
prediction_csv_path=prediction_csv_path,
boundary_shp_path=boundary_shp_path,
output_image_path=output_image_path,
resolution=config.get('resolution', 30),
input_crs=config.get('input_crs', 'EPSG:32651'),
output_crs=config.get('output_crs', 'EPSG:4326'),
show_sample_points=config.get('show_sample_points', False),
base_map_tif=config.get('base_map_tif'),
use_distance_diffusion=config.get('use_distance_diffusion', True),
max_diffusion_distance=config.get('max_diffusion_distance'),
diffusion_power=config.get('diffusion_power', 2),
diffusion_n_neighbors=config.get('diffusion_n_neighbors', 15),
cmap=config.get('cmap'),
expand_ratio=config.get('expand_ratio', 0.05),
output_dir=str(context.visualization_dir),
)
context.distribution_map_path = result
step_end_time = time.time()
context.record_step_time(
"步骤12: 克里金插值与分布图", step_start_time, step_end_time
)
return {'distribution_map_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤12: 克里金插值与分布图", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,349 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step13 处理器:可视化成图
将原 WaterQualityInversionPipeline 中的可视化方法
(散点图、箱型图、光谱曲线、统计图表、耀斑预览)
剥离为独立的 Step13VisualizationHandler。
"""
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from src.core.handlers.base import BaseStepHandler, PipelineContext
class Step13VisualizationHandler(BaseStepHandler):
"""步骤13可视化成图。
对应 config key: 'step13_visualization'
包含:散点图、箱型图、光谱曲线、统计图表、耀斑预览。
"""
step_key = 'step13_visualization'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
output_files: Dict[str, Any] = {}
try:
# ── 散点图 ──
if config.get('generate_scatter', True):
if context.training_csv_path and context.models_dir.exists():
try:
scatter_config = config.get('scatter_config', {})
scatter_paths = self._generate_scatter_plots(context, scatter_config)
output_files['scatter_plots'] = scatter_paths
except Exception as e:
context.notify('step13_visualization', 'warning',
f"生成散点图时出错: {e}")
# ── 箱型图 ──
if config.get('generate_boxplots', True):
if context.processed_csv_path:
try:
boxplot_config = config.get('boxplot_config', {})
boxplot_paths = self._generate_boxplots(context, boxplot_config)
output_files['boxplots'] = boxplot_paths
except Exception as e:
context.notify('step13_visualization', 'warning',
f"生成箱型图时出错: {e}")
# ── 光谱曲线 ──
if config.get('generate_spectrum', True):
if context.training_csv_path:
try:
spectrum_paths = self._generate_spectrum_plots(context, config)
output_files['spectrum_plots'] = spectrum_paths
except Exception as e:
context.notify('step13_visualization', 'warning',
f"生成光谱曲线图时出错: {e}")
# ── 统计图表 ──
if config.get('generate_statistics', True):
if context.processed_csv_path:
try:
stat_charts = self._generate_statistics(context)
output_files['statistical_charts'] = stat_charts
except Exception as e:
context.notify('step13_visualization', 'warning',
f"生成统计图表时出错: {e}")
# ── 耀斑预览 ──
if config.get('generate_glint_previews', True):
try:
glint_config = config.get('glint_preview_config', {})
preview_paths = context.visualizer.generate_glint_deglint_previews(
work_dir=glint_config.get('work_dir') or str(context.work_dir),
output_subdir=glint_config.get('output_subdir', 'glint_deglint_previews'),
generate_glint=glint_config.get('generate_glint', True),
generate_deglint=glint_config.get('generate_deglint', True),
)
output_files['glint_deglint_previews'] = preview_paths
except Exception as e:
context.notify('step13_visualization', 'warning',
f"生成耀斑预览图时出错: {e}")
step_end_time = time.time()
context.record_step_time(
"步骤13: 可视化成图", step_start_time, step_end_time
)
return {'visualization_outputs': output_files}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤13: 可视化成图", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise
# ── 散点图 ──
def _generate_scatter_plots(self, context: PipelineContext,
scatter_config: dict) -> Dict[str, str]:
training_csv_path = context.training_csv_path
models_dir = str(context.models_dir)
metric = scatter_config.get('metric', 'test_r2')
use_enhanced = scatter_config.get('use_enhanced', True)
feature_start_column = scatter_config.get('feature_start_column', 13)
test_size = scatter_config.get('test_size', 0.2)
random_state = scatter_config.get('random_state', 42)
scatter_paths = {}
if use_enhanced:
try:
results = context.scatter_batch.batch_plot_scatter(
models_root_dir=models_dir,
csv_path=training_csv_path,
output_dir=str(context.visualization_dir / "scatter_plots"),
metric=metric,
target_column=None,
feature_start_column=feature_start_column,
test_size=test_size,
random_state=random_state,
)
for target_name, result in results.items():
if result.get('status') == 'success':
scatter_paths[target_name] = result.get('save_path', '')
except Exception:
use_enhanced = False
if not use_enhanced or not scatter_paths:
from src.core.prediction.inference_batch import WaterQualityInference
models_path = Path(models_dir)
for target_folder in models_path.iterdir():
if not target_folder.is_dir():
continue
target_name = target_folder.name
try:
inferencer = WaterQualityInference(str(target_folder))
eval_result = inferencer.evaluate_with_split(
data_csv_path=training_csv_path,
split_method="spxy",
test_size=test_size,
random_state=random_state,
metric=metric,
)
predictions = eval_result.get('predictions', {})
if predictions:
y_train_true = predictions.get('y_train_true')
y_train_pred = predictions.get('y_train_pred')
y_test_true = predictions.get('y_test_true')
y_test_pred = predictions.get('y_test_pred')
metrics = eval_result.get('test_metrics', {})
if y_train_true is not None and y_test_true is not None:
y_all_true = np.concatenate([y_train_true, y_test_true])
y_all_pred = np.concatenate([y_train_pred, y_test_pred])
train_indices = np.arange(len(y_train_true))
test_indices = np.arange(len(y_train_true), len(y_all_true))
scatter_path = context.visualizer.plot_scatter_true_vs_pred(
y_true=y_all_true,
y_pred=y_all_pred,
target_name=target_name,
train_indices=train_indices,
test_indices=test_indices,
metrics={
'train_r2': eval_result.get('train_metrics', {}).get('r2', 0),
'test_r2': metrics.get('r2', 0),
'train_rmse': eval_result.get('train_metrics', {}).get('rmse', 0),
'test_rmse': metrics.get('rmse', 0),
}
)
scatter_paths[target_name] = scatter_path
except Exception:
continue
return scatter_paths
# ── 箱型图 ──
def _generate_boxplots(self, context: PipelineContext,
boxplot_config: dict) -> Dict[str, str]:
csv_path = context.processed_csv_path
parameter_columns = boxplot_config.get('parameter_columns')
data_start_column = boxplot_config.get('data_start_column', 4)
save_individual = boxplot_config.get('save_individual', True)
use_seaborn = boxplot_config.get('use_seaborn', True)
df = pd.read_csv(csv_path)
if parameter_columns is None:
data_columns = df.iloc[:, data_start_column:]
parameter_columns = list(data_columns.columns)
else:
parameter_columns = [col for col in parameter_columns if col in df.columns]
if not parameter_columns:
return {}
boxplot_dir = context.visualization_dir / "boxplots"
boxplot_dir.mkdir(parents=True, exist_ok=True)
boxplot_paths = {}
if save_individual:
for column in parameter_columns:
if column not in df.columns:
continue
clean_data = df[column].dropna()
if len(clean_data) == 0:
continue
try:
plt.figure(figsize=(8, 6))
if use_seaborn:
plot_data = pd.DataFrame({'参数': [column] * len(clean_data), '数值': clean_data})
sns.boxplot(data=plot_data, x='参数', y='数值', palette='Set2')
sns.stripplot(data=plot_data, x='参数', y='数值',
color='red', alpha=0.6, size=5, jitter=True)
else:
box_plot = plt.boxplot([clean_data], labels=[column],
patch_artist=True, showfliers=False)
box_plot['boxes'][0].set_facecolor('lightblue')
box_plot['boxes'][0].set_alpha(0.7)
x_pos = np.random.normal(1, 0.04, size=len(clean_data))
plt.scatter(x_pos, clean_data, alpha=0.6, s=30, color='red',
edgecolors='black', linewidth=0.5, zorder=3)
plt.title(f'{column} - 箱型图', fontsize=14, fontweight='bold')
plt.xlabel('参数', fontsize=12)
plt.ylabel('数值', fontsize=12)
stats_text = (f'数据点数: {len(clean_data)}\n'
f'均值: {clean_data.mean():.2f}\n'
f'中位数: {clean_data.median():.2f}\n'
f'标准差: {clean_data.std():.2f}')
plt.text(0.02, 0.98, stats_text, transform=plt.gca().transAxes,
verticalalignment='top',
bbox=dict(boxstyle='round',
facecolor='wheat' if not use_seaborn else 'lightgreen',
alpha=0.8))
plt.grid(True, alpha=0.3, linestyle='--')
plt.tight_layout()
safe_name = column.replace('/', '_').replace('\\', '_').replace(':', '_')
save_path = boxplot_dir / f'{safe_name}_boxplot.png'
plt.savefig(save_path, dpi=300, bbox_inches='tight')
plt.close()
boxplot_paths[column] = str(save_path)
except Exception:
continue
# 综合箱型图
try:
plt.figure(figsize=(max(12, len(parameter_columns) * 0.8), 8))
box_data = []
labels = []
for column in parameter_columns:
if column in df.columns:
clean_data = df[column].dropna()
if len(clean_data) > 0:
box_data.append(clean_data)
labels.append(column)
if box_data:
if use_seaborn:
melted_data = pd.melt(df[labels], var_name='参数', value_name='数值')
melted_data = melted_data.dropna()
sns.boxplot(data=melted_data, x='参数', y='数值', palette='Set3')
sns.stripplot(data=melted_data, x='参数', y='数值',
color='red', alpha=0.6, size=4, jitter=True)
else:
box_plot = plt.boxplot(box_data, labels=labels, patch_artist=True, showfliers=False)
colors = plt.cm.Set3(np.linspace(0, 1, len(box_data)))
for patch, color in zip(box_plot['boxes'], colors):
patch.set_facecolor(color)
patch.set_alpha(0.7)
for i, data in enumerate(box_data):
x_pos = np.random.normal(i + 1, 0.04, size=len(data))
plt.scatter(x_pos, data, alpha=0.6, s=20, color='red',
edgecolors='black', linewidth=0.5, zorder=3)
plt.title('水质参数箱型图(综合)', fontsize=16, fontweight='bold')
plt.xlabel('参数', fontsize=12)
plt.ylabel('数值', fontsize=12)
plt.xticks(rotation=45, ha='right')
plt.grid(True, alpha=0.3, linestyle='--')
plt.tight_layout()
combined_path = boxplot_dir / 'all_parameters_boxplot.png'
plt.savefig(combined_path, dpi=300, bbox_inches='tight')
plt.close()
boxplot_paths['all_parameters'] = str(combined_path)
except Exception:
pass
return boxplot_paths
# ── 光谱曲线 ──
def _generate_spectrum_plots(self, context: PipelineContext,
config: dict) -> Dict[str, str]:
csv_path = context.training_csv_path
wavelength_start_column = config.get('feature_start_column', 'UTM_Y')
df = pd.read_csv(csv_path)
if isinstance(wavelength_start_column, str):
try:
wavelength_start_idx = df.columns.get_loc(wavelength_start_column)
except KeyError:
wavelength_start_idx = 13
else:
wavelength_start_idx = wavelength_start_column
parameter_columns = list(df.columns[:wavelength_start_idx])
if len(parameter_columns) > 2:
parameter_columns = parameter_columns[2:]
spectrum_paths = {}
for param_col in parameter_columns:
if param_col not in df.columns:
continue
try:
spectrum_path = context.visualizer.plot_spectrum_by_parameter(
csv_path=csv_path,
parameter_column=param_col,
wavelength_start_column=wavelength_start_column,
n_groups=5,
)
spectrum_paths[param_col] = spectrum_path
except Exception:
continue
return spectrum_paths
# ── 统计图表 ──
def _generate_statistics(self, context: PipelineContext) -> Dict[str, str]:
csv_path = context.processed_csv_path
df = pd.read_csv(csv_path)
parameter_columns = list(df.columns[2:])
parameter_columns = [col for col in parameter_columns
if df[col].dtype in [np.float64, np.int64]]
return context.visualizer.plot_statistical_charts(
csv_path=csv_path,
parameter_columns=parameter_columns,
)

View File

@ -0,0 +1,142 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step14 处理器:报告生成
将原 WaterQualityInversionPipeline.generate_pipeline_report() 方法
剥离为独立的 Step14ReportHandler。
"""
import time
from datetime import datetime
from pathlib import Path
from typing import Any, Dict
import numpy as np
import pandas as pd
from src.core.handlers.base import BaseStepHandler, PipelineContext
class Step14ReportHandler(BaseStepHandler):
"""步骤14流程执行报告生成。
对应 config key: 'step14_report'
生成 CSV 和 TXT 格式的流程执行报告。
"""
step_key = 'step14_report'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
try:
output_path = config.get('output_path')
if output_path is None:
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
output_path = str(context.reports_dir / f"pipeline_report_{timestamp}.csv")
report_data = []
total_time = 0.0
step_order = [
"步骤1: 水域掩膜生成",
"步骤2: 耀斑区域检测",
"步骤3: 耀斑去除",
"步骤4: 数据预处理",
"步骤5: 光谱提取",
"步骤6: 水质光谱指数计算",
"步骤7: 机器学习建模与训练",
"步骤8: 非经验模型训练",
"步骤9: 自定义回归",
"步骤10: 采样点生成",
"步骤11: 参数预测",
"步骤12: 分布图生成",
]
for step_name in step_order:
if step_name in context.step_timings:
timing_info = context.step_timings[step_name]
report_data.append({
'步骤': step_name,
'开始时间': timing_info['start_time'],
'结束时间': timing_info['end_time'],
'耗时(秒)': f"{timing_info['elapsed_seconds']:.2f}",
'耗时(格式化)': timing_info['elapsed_formatted'],
'状态': timing_info['status'],
'错误信息': timing_info.get('error', '')
})
if timing_info['status'] == 'completed':
total_time += timing_info['elapsed_seconds']
if context.pipeline_start_time and context.pipeline_end_time:
pipeline_total = context.pipeline_end_time - context.pipeline_start_time
report_data.append({
'步骤': '总计',
'开始时间': datetime.fromtimestamp(context.pipeline_start_time).strftime('%Y-%m-%d %H:%M:%S'),
'结束时间': datetime.fromtimestamp(context.pipeline_end_time).strftime('%Y-%m-%d %H:%M:%S'),
'耗时(秒)': f"{pipeline_total:.2f}",
'耗时(格式化)': context._format_time(pipeline_total),
'状态': 'completed',
'错误信息': ''
})
df_report = pd.DataFrame(report_data)
df_report.to_csv(output_path, index=False, encoding='utf-8-sig')
txt_output_path = str(Path(output_path).with_suffix('.txt'))
with open(txt_output_path, 'w', encoding='utf-8') as f:
f.write("=" * 80 + "\n")
f.write("水质参数反演流程执行报告\n")
f.write("=" * 80 + "\n\n")
if context.pipeline_start_time and context.pipeline_end_time:
f.write(f"流程开始时间: {datetime.fromtimestamp(context.pipeline_start_time).strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"流程结束时间: {datetime.fromtimestamp(context.pipeline_end_time).strftime('%Y-%m-%d %H:%M:%S')}\n")
f.write(f"总耗时: {context._format_time(context.pipeline_end_time - context.pipeline_start_time)}\n\n")
f.write("-" * 80 + "\n")
f.write("各步骤执行详情:\n")
f.write("-" * 80 + "\n\n")
for step_name in step_order:
if step_name in context.step_timings:
timing_info = context.step_timings[step_name]
f.write(f"{step_name}\n")
f.write(f" 开始时间: {timing_info['start_time']}\n")
f.write(f" 结束时间: {timing_info['end_time']}\n")
f.write(f" 耗时: {timing_info['elapsed_formatted']} ({timing_info['elapsed_seconds']:.2f}秒)\n")
f.write(f" 状态: {timing_info['status']}\n")
if timing_info.get('error'):
f.write(f" 错误: {timing_info['error']}\n")
f.write("\n")
f.write("-" * 80 + "\n")
f.write("统计摘要:\n")
f.write("-" * 80 + "\n")
completed_steps = [s for s in context.step_timings.values() if s['status'] == 'completed']
failed_steps = [s for s in context.step_timings.values() if s['status'] == 'failed']
skipped_steps = [s for s in context.step_timings.values() if s['status'] == 'skipped']
f.write(f"成功完成的步骤: {len(completed_steps)}\n")
f.write(f"失败的步骤: {len(failed_steps)}\n")
f.write(f"跳过的步骤: {len(skipped_steps)}\n")
if completed_steps:
completed_times = [s['elapsed_seconds'] for s in completed_steps]
f.write(f"平均耗时: {context._format_time(np.mean(completed_times))}\n")
f.write(f"最长耗时: {context._format_time(np.max(completed_times))}\n")
f.write(f"最短耗时: {context._format_time(np.min(completed_times))}\n")
step_end_time = time.time()
context.record_step_time(
"步骤14: 报告生成", step_start_time, step_end_time
)
return {'report_csv': output_path, 'report_txt': txt_output_path}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤14: 报告生成", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,83 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step1 处理器:水域掩膜生成
将原 WaterQualityInversionPipeline.step1_generate_water_mask() 方法
剥离为独立的 Step1WaterMaskHandler。
这是 14 个步骤 Handler 的**打样模板**,其余步骤照此模式拆分:
1. 继承 BaseStepHandler设置 step_key 类属性
2. 实现 execute(ctx, config) → 调用对应 Step 类的静态方法
3. 将输出路径写入 ctx上下文共享
4. 记录步骤耗时
5. 返回结果字典
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.water_mask_step import WaterMaskStep
class Step1WaterMaskHandler(BaseStepHandler):
"""步骤1水域掩膜生成。
对应 config key: 'step1'
委托类: WaterMaskStep.run()
用法::
handler = Step1WaterMaskHandler()
result = handler.execute(ctx, config['step1'])
# ctx.water_mask_path 已被更新
"""
step_key = 'step1'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
"""执行水域掩膜生成。
config 可包含的键(全部透传给 WaterMaskStep.run()
- mask_path: 水体掩膜文件路径(.shp / .dat / .tif
- img_path: 输入影像路径shp 栅格化或 NDWI 时需要)
- ndwi_threshold: NDWI 阈值(默认 0.4
- use_ndwi: 是否使用 NDWI 方法(默认 False
- generate_png: 是否生成 PNG 预览(默认 True
- output_path: 指定输出路径(可选)
Returns:
{'water_mask_path': str}
"""
step_start_time = time.time()
try:
result = WaterMaskStep.run(
mask_path=config.get('mask_path'),
img_path=config.get('img_path'),
ndwi_threshold=config.get('ndwi_threshold', 0.4),
use_ndwi=config.get('use_ndwi', False),
generate_png=config.get('generate_png', True),
output_path=config.get('output_path'),
water_mask_dir=str(context.water_mask_dir),
callback=context.notify,
)
# 将输出路径写入上下文,供后续步骤使用
context.water_mask_path = result
step_end_time = time.time()
context.record_step_time(
"步骤1: 水域掩膜生成", step_start_time, step_end_time
)
return {'water_mask_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤1: 水域掩膜生成", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,67 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step2 处理器:耀斑区域检测
将原 WaterQualityInversionPipeline.step2_find_glint_area() 方法
剥离为独立的 Step2GlintDetectionHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.glint_detection_step import GlintDetectionStep
class Step2GlintDetectionHandler(BaseStepHandler):
"""步骤2耀斑区域检测。
对应 config key: 'step2'
委托类: GlintDetectionStep.run()
"""
step_key = 'step2'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
water_mask_path = self._resolve_path(
config.get('water_mask_path'), context.water_mask_path, 'water_mask'
)
try:
result = GlintDetectionStep.run(
img_path=config.get('img_path'),
glint_wave=config.get('glint_wave', 750.0),
method=config.get('method', 'otsu'),
z_threshold=config.get('z_threshold', 2.5),
percentile=config.get('percentile', 95.0),
iqr_multiplier=config.get('iqr_multiplier', 1.5),
window_size=config.get('window_size', 15),
multi_band_waves=config.get('multi_band_waves'),
sub_method=config.get('sub_method', 'zscore'),
weights=config.get('weights'),
max_area=config.get('max_area'),
buffer_size=config.get('buffer_size'),
water_mask_path=water_mask_path,
glint_dir=str(context.glint_dir),
callback=context.notify,
)
context.glint_mask_path = result
step_end_time = time.time()
context.record_step_time(
"步骤2: 耀斑区域检测", step_start_time, step_end_time
)
return {'glint_mask_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤2: 耀斑区域检测", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,85 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step3 处理器:耀斑去除
将原 WaterQualityInversionPipeline.step3_remove_glint() 方法
剥离为独立的 Step3GlintRemovalHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.glint_removal_step import GlintRemovalStep
class Step3GlintRemovalHandler(BaseStepHandler):
"""步骤3耀斑去除。
对应 config key: 'step3'
委托类: GlintRemovalStep.run()
"""
step_key = 'step3'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
water_mask_path = self._resolve_path(
config.get('water_mask_path'), context.water_mask_path, 'water_mask'
)
try:
result = GlintRemovalStep.run(
img_path=config.get('img_path'),
method=config.get('method', 'subtract_nir'),
start_wave=config.get('start_wave'),
end_wave=config.get('end_wave'),
json_path=config.get('json_path'),
left_shoulder_wave=config.get('left_shoulder_wave'),
valley_wave=config.get('valley_wave'),
right_shoulder_wave=config.get('right_shoulder_wave'),
water_mask=water_mask_path,
interpolate_zeros=config.get('interpolate_zeros', False),
interpolation_method=config.get('interpolation_method', 'nearest'),
enabled=config.get('enabled', True),
kutser_shp_path=config.get('kutser_shp_path'),
oxy_band=config.get('oxy_band', 38),
lower_oxy=config.get('lower_oxy', 36),
upper_oxy=config.get('upper_oxy', 49),
nir_band=config.get('nir_band', 47),
nir_lower=config.get('nir_lower', 25),
nir_upper=config.get('nir_upper', 37),
goodman_A=config.get('goodman_A', 0.000019),
goodman_B=config.get('goodman_B', 0.1),
hedley_shp_path=config.get('hedley_shp_path'),
hedley_nir_band=config.get('hedley_nir_band', 47),
sugar_bounds=config.get('sugar_bounds'),
sugar_sigma=config.get('sugar_sigma', 1.0),
sugar_estimate_background=config.get('sugar_estimate_background', True),
sugar_glint_mask_method=config.get('sugar_glint_mask_method', 'cdf'),
sugar_iter=config.get('sugar_iter', 3),
sugar_termination_thresh=config.get('sugar_termination_thresh', 20.0),
deglint_dir=str(context.deglint_dir),
water_mask_dir=str(context.water_mask_dir),
callback=context.notify,
output_path=config.get('output_path'),
)
context.deglint_img_path = result
step_end_time = time.time()
context.record_step_time(
"步骤3: 耀斑去除", step_start_time, step_end_time
)
return {'deglint_img_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤3: 耀斑去除", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,64 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step4 处理器:预测采样点生成
将原 WaterQualityInversionPipeline.step4_sampling() 方法
剥离为独立的 Step4SamplingHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.prediction_step import PredictionStep
class Step4SamplingHandler(BaseStepHandler):
"""步骤4生成预测采样点并提取光谱。
对应 config key: 'step4_sampling'
委托类: PredictionStep.generate_sampling_points()
"""
step_key = 'step4_sampling'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
deglint_img_path = self._resolve_path(
config.get('deglint_img_path'), context.deglint_img_path, 'deglint_img'
)
water_mask_path = self._resolve_path(
config.get('water_mask_path'), context.water_mask_path, 'water_mask'
)
glint_mask_path = self._resolve_path(
config.get('glint_mask_path'), context.glint_mask_path, 'glint_mask'
)
try:
result = PredictionStep.generate_sampling_points(
deglint_img_path=deglint_img_path,
interval=config.get('interval', 50),
sample_radius=config.get('sample_radius', 5),
chunk_size=config.get('chunk_size', 1000),
water_mask_path=water_mask_path,
glint_mask_path=glint_mask_path,
output_dir=str(context.sampling_dir),
use_adaptive_sampling=config.get('use_adaptive_sampling', True),
)
step_end_time = time.time()
context.record_step_time(
"步骤4: 生成预测采样点", step_start_time, step_end_time
)
return {'sampling_csv_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤4: 生成预测采样点", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,50 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step5 处理器CSV 数据处理
将原 WaterQualityInversionPipeline.step5_process_csv() 方法
剥离为独立的 Step5ProcessCsvHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.data_preparation_step import DataPreparationStep
class Step5ProcessCsvHandler(BaseStepHandler):
"""步骤5处理 CSV 文件,筛选剔除异常值。
对应 config key: 'step5_clean'
委托类: DataPreparationStep.process_csv()
"""
step_key = 'step5_clean'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
try:
result = DataPreparationStep.process_csv(
csv_path=config.get('csv_path'),
output_dir=str(context.processed_data_dir),
)
context.processed_csv_path = result
step_end_time = time.time()
context.record_step_time(
"步骤5: 处理CSV文件", step_start_time, step_end_time
)
return {'processed_csv_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤5: 处理CSV文件", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,66 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step6 处理器:训练样本点光谱提取
将原 WaterQualityInversionPipeline.step6_extract_spectra() 方法
剥离为独立的 Step6ExtractSpectraHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.data_preparation_step import DataPreparationStep
class Step6ExtractSpectraHandler(BaseStepHandler):
"""步骤6根据采样点坐标在去耀斑影像中提取平均光谱。
对应 config key: 'step6_feature'
委托类: DataPreparationStep.extract_training_spectra()
"""
step_key = 'step6_feature'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
deglint_img_path = self._resolve_path(
config.get('deglint_img_path'), context.deglint_img_path, 'deglint_img'
)
csv_path = self._resolve_path(
config.get('csv_path'), context.processed_csv_path, 'csv'
)
glint_mask_path = self._resolve_path(
config.get('glint_mask_path'), context.glint_mask_path, 'glint_mask'
)
try:
result = DataPreparationStep.extract_training_spectra(
deglint_img_path=deglint_img_path,
radius=config.get('radius', 5),
source_epsg=config.get('source_epsg', 4326),
csv_path=csv_path,
boundary_path=config.get('boundary_path'),
glint_mask_path=glint_mask_path,
water_mask_path=context.water_mask_path,
output_dir=str(context.training_spectra_dir),
)
context.training_csv_path = result
step_end_time = time.time()
context.record_step_time(
"步骤6: 提取训练样本点光谱", step_start_time, step_end_time
)
return {'training_csv_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤6: 提取训练样本点光谱", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,58 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step7 处理器:水质光谱指数计算
将原 WaterQualityInversionPipeline.step7_calc_indices() 方法
剥离为独立的 Step7CalcIndicesHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.data_preparation_step import DataPreparationStep
class Step7CalcIndicesHandler(BaseStepHandler):
"""步骤7根据训练光谱计算水质光谱指数。
对应 config key: 'step7_index'
委托类: DataPreparationStep.calculate_water_quality_indices()
"""
step_key = 'step7_index'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
training_csv_path = self._resolve_path(
config.get('training_csv_path'), context.training_csv_path, 'training_csv'
)
try:
result = DataPreparationStep.calculate_water_quality_indices(
training_csv_path=training_csv_path,
formula_csv_file=config.get('formula_csv_file'),
formula_names=config.get('formula_names'),
output_file=config.get('output_file'),
enabled=config.get('enabled', True),
output_dir=str(context.indices_dir),
)
context.indices_path = result
step_end_time = time.time()
context.record_step_time(
"步骤7: 计算水质光谱指数", step_start_time, step_end_time
)
return {'indices_path': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤7: 计算水质光谱指数", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,58 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step8 处理器:机器学习建模与训练
将原 WaterQualityInversionPipeline.step8_train_ml() 方法
剥离为独立的 Step8MlTrainHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.modeling_step import ModelingStep
class Step8MlTrainHandler(BaseStepHandler):
"""步骤8机器学习建模与训练。
对应 config key: 'step8_ml_train'
委托类: ModelingStep.train_models()
"""
step_key = 'step8_ml_train'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
training_csv_path = self._resolve_path(
config.get('training_csv_path'), context.training_csv_path, 'training_csv'
)
try:
result = ModelingStep.train_models(
feature_start_column=config.get('feature_start_column', '374.285004'),
preprocessing_methods=config.get('preprocessing_methods'),
model_names=config.get('model_names'),
split_methods=config.get('split_methods'),
cv_folds=config.get('cv_folds', 5),
training_csv_path=training_csv_path,
output_dir=str(context.models_dir),
_report_generator=context.report_generator,
)
step_end_time = time.time()
context.record_step_time(
"步骤8: 机器学习建模与训练", step_start_time, step_end_time
)
return {'models_dir': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤8: 机器学习建模与训练", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -0,0 +1,64 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Step9 处理器:机器学习推理预测
将原 WaterQualityInversionPipeline.step9_predict_ml() 方法
剥离为独立的 Step9MlPredictHandler。
"""
import time
from typing import Any, Dict
from src.core.handlers.base import BaseStepHandler, PipelineContext
from src.core.steps.prediction_step import PredictionStep
class Step9MlPredictHandler(BaseStepHandler):
"""步骤9机器学习推理预测。
对应 config key: 'step9_ml_predict'
委托类: PredictionStep.predict_water_quality()
"""
step_key = 'step9_ml_predict'
def execute(self, context: PipelineContext, config: dict) -> Dict[str, Any]:
step_start_time = time.time()
sampling_csv_path = self._resolve_path(
config.get('sampling_csv_path'), context.sampling_csv_path, 'sampling_csv'
)
models_dir = config.get('models_dir') or str(context.models_dir)
try:
result = PredictionStep.predict_water_quality(
sampling_csv_path=sampling_csv_path,
models_dir=models_dir,
metric=config.get('metric', 'test_r2'),
prediction_column=config.get('prediction_column', 'prediction'),
output_dir=str(context.prediction_dir / "9_ML_Prediction"),
_report_generator=context.report_generator,
_external_model=config.get('_external_model'),
_external_model_path=config.get('_external_model_path'),
_external_models_dict=config.get('_external_models_dict'),
_external_model_dir=config.get('_external_model_dir'),
)
context.prediction_files.update(result)
step_end_time = time.time()
context.record_step_time(
"步骤9: 机器学习推理预测", step_start_time, step_end_time
)
return {'prediction_files': result}
except Exception as e:
step_end_time = time.time()
context.record_step_time(
"步骤9: 机器学习推理预测", step_start_time, step_end_time,
status="failed", error=str(e)
)
raise

View File

@ -13,7 +13,7 @@ from sklearn.svm import SVR
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold, train_test_split
from sklearn.model_selection import RandomizedSearchCV, cross_val_score, KFold, train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.cross_decomposition import PLSRegression
from sklearn.ensemble import GradientBoostingRegressor, AdaBoostRegressor, ExtraTreesRegressor
@ -45,6 +45,7 @@ is_frozen_env = getattr(sys, 'frozen', False)
safe_n_jobs = 1 if is_frozen_env else -1
from src.preprocessing.spectral_Preprocessing import Preprocessing
from src.core.utils.split_methods import spxy, ks
class WaterQualityModelingBatch:
@ -288,11 +289,24 @@ class WaterQualityModelingBatch:
# 提取所有目标列从0列到feature_start_index-1列
y_dict = {}
target_columns = data.columns[:feature_start_index]
print(f"检测到的潜在目标列: {list(target_columns)}")
print(f"检测到的目标列: {list(target_columns)}")
# 新增:跳过非预测目标的系统保留列
ignore_cols = {'ID', 'id', 'Id', 'Longitude', 'Latitude', 'Lon', 'Lat', 'longitude', 'latitude', 'lon', 'lat', 'Station', 'station'}
for col_name in target_columns:
# 过滤黑名单列
if col_name in ignore_cols:
print(f" 跳过目标列 '{col_name}': 属于系统保留列或空间坐标")
continue
y_series = data[col_name]
# 过滤非数值类型列 (避免将纯文本备注等拿去回归)
if not pd.api.types.is_numeric_dtype(y_series):
print(f" 跳过目标列 '{col_name}': 非数值类型")
continue
# 检查是否有非空值
if not y_series.isna().all():
y_dict[col_name] = y_series
@ -407,159 +421,12 @@ class WaterQualityModelingBatch:
return X_train, X_test, y_train, y_test
def spxy(self, data, label, test_size=0.2):
"""
SPXY算法划分数据集考虑X和Y空间的距离
Args:
data: shape (n_samples, n_features)
label: shape (n_samples, )
test_size: 测试集比例,默认: 0.2
Returns:
X_train: (n_samples, n_features)
X_test: (n_samples, n_features)
y_train: (n_samples, )
y_test: (n_samples, )
"""
# 确保 data 和 label 是 NumPy 数组
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
# 备份原始数据和标签
x_backup = data
y_backup = label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
# 归一化标签数据
label = (label - np.mean(label)) / np.std(label)
D = np.zeros((M, M))
Dy = np.zeros((M, M))
# 计算样本之间的距离
for i in range(M - 1):
xa = data[i, :]
ya = label[i]
for j in range((i + 1), M):
xb = data[j, :]
yb = label[j]
D[i, j] = np.linalg.norm(xa - xb)
Dy[i, j] = np.linalg.norm(ya - yb)
# 距离归一化
Dmax = np.max(D)
Dymax = np.max(Dy)
D = D / Dmax + Dy / Dymax
# 找到最远的两个点
maxD = D.max(axis=0)
index_row = D.argmax(axis=0)
index_column = maxD.argmax()
m = np.zeros(N, dtype=int)
m[0] = index_row[index_column]
m[1] = index_column
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
# 根据距离选择训练集
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros(M - i)
for j in range(M - i):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(samples, m)
# 划分训练集和测试集
X_train = data[m, :]
y_train = y_backup[m]
X_test = data[m_complement, :]
y_test = y_backup[m_complement]
return X_train, X_test, y_train, y_test
"""SPXY算法划分数据集委托至 src.core.utils.split_methods.spxy"""
return spxy(data, label, test_size=test_size)
def ks(self, data, label, test_size=0.2):
"""
Kennard-Stone算法划分数据集
Args:
data: shape (n_samples, n_features)
label: shape (n_sample, )
test_size: 测试集比例,默认: 0.2
Returns:
X_train: (n_samples, n_features)
X_test: (n_samples, n_features)
y_train: (n_samples, )
y_test: (n_samples, )
"""
# 确保 data 和 label 是 NumPy 数组
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
D = np.zeros((M, M))
for i in range((M - 1)):
xa = data[i, :]
for j in range((i + 1), M):
xb = data[j, :]
D[i, j] = np.linalg.norm(xa - xb)
maxD = np.max(D, axis=0)
index_row = np.argmax(D, axis=0)
index_column = np.argmax(maxD)
m = np.zeros(N)
m[0] = np.array(index_row[index_column])
m[1] = np.array(index_column)
m = m.astype(int)
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros((M - i))
for j in range((M - i)):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(np.arange(data.shape[0]), m)
X_train = data[m, :]
y_train = label[m]
X_test = data[m_complement, :]
y_test = label[m_complement]
return X_train, X_test, y_train, y_test
"""Kennard-Stone算法划分数据集委托至 src.core.utils.split_methods.ks"""
return ks(data, label, test_size=test_size)
def split_data(self, X: np.ndarray, y: pd.Series, method: str = "random",
test_size: float = 0.2, random_state: int = 42) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
@ -639,21 +506,20 @@ class WaterQualityModelingBatch:
elif model_name == 'LightGBM':
base_model.set_params(verbose=-1)
# 网格搜索 - 使用KFold代替StratifiedKFold
# 随机搜索 —— 替代穷举式 GridSearchCV大幅降低寻优时间
cv_strategy = KFold(n_splits=cv_folds, shuffle=True, random_state=random_state)
grid_search = GridSearchCV(
grid_search = RandomizedSearchCV(
base_model,
config['params'],
n_iter=10,
cv=cv_strategy,
scoring=scoring,
n_jobs=safe_n_jobs,
verbose=1
random_state=random_state,
verbose=1,
)
# 在训练集上训练模型
# with parallel_backend("threading", n_jobs=-1):
# grid_search.fit(X_train, y_train)
grid_search.fit(X_train, y_train)
# 获取最佳模型

View File

@ -315,7 +315,7 @@ def main():
# 示例1: 使用所有回归方法分析光谱指数
print("\n1. 光谱指数与叶绿素a的回归分析:")
sample_data = pd.read_csv(r"E:\code\WQ\pipeline_result\work_dir\5_training_spectra\water_quality_results.csv")
sample_data = pd.read_csv(r"E:\code\WQ\pipeline_result\work_dir\6_Spectral_Feature_Extraction\water_quality_results.csv")
spectral_indices = ['Al10SABI','Am092Bsub']
results1 = analyzer.batch_single_variable_regression(
@ -323,7 +323,7 @@ def main():
x_columns=spectral_indices,
y_column='Chlorophyll',
methods='all',
output_file=r'E:\code\WQ\pipeline_result\work_dir\5_training_spectra\spectral_indices_regression.csv'
output_file=r'E:\code\WQ\pipeline_result\work_dir\6_Spectral_Feature_Extraction\spectral_indices_regression.csv'
)
# # 示例2: 使用特定方法分析反射率波段
@ -343,7 +343,7 @@ def main():
best_models = analyzer.get_best_models_summary()
if not best_models.empty:
print(best_models[['x_variable', 'regression_method', 'r_squared', 'equation']].to_string(index=False))
best_models.to_csv(r'E:\code\WQ\pipeline_result\work_dir\5_training_spectra\best_models_summary.csv', index=False)
best_models.to_csv(r'E:\code\WQ\pipeline_result\work_dir\6_Spectral_Feature_Extraction\best_models_summary.csv', index=False)
print("\n最佳模型汇总已保存到 'best_models_summary.csv'")
#
# def advanced_usage_example():

View File

@ -246,8 +246,8 @@ def non_empirical_retrieval(algorithm, model_info_path, coor_spectral_path, outp
if __name__ == "__main__":
algorithm= "chl_a"
model_info_path= r"E:\code\WQ\pipeline_result\work_dir\5_training_spectra\8_non_empirical_models\SS\SS_chl_a.json"
coor_spectral_path= r"E:\code\WQ\pipeline_result\work_dir\10_sampling\sampling_spectra.csv"
model_info_path= r"E:\code\WQ\pipeline_result\work_dir\6_Spectral_Feature_Extraction\8_non_empirical_models\SS\SS_chl_a.json"
coor_spectral_path= r"E:\code\WQ\pipeline_result\work_dir\4_sampling\sampling_spectra.csv"
output_path= r"E:\code\WQ\pipeline_result\work_dir\11_12_13_predictions\SS_chl_a.csv"
wave_radius=5.0
non_empirical_retrieval(algorithm, model_info_path, coor_spectral_path, output_path, wave_radius)

View File

@ -0,0 +1,24 @@
# -*- coding: utf-8 -*-
"""
Pipeline 调度核心:基于 Context 的内存级依赖注入。
设计目标:
- 用 PipelineContext 替代 dict 散落传参9 步主路径 + 14 个 step 共享同一份 ctx
- 14 个 step 声明式描述StepSpec便于 Web / 异步 / 单元测试复用
- 不绑定具体 Pipeline 实现duck-typedWorkerThread / Web API / 单测可共用
"""
from .context import (
PipelineContext,
STEP_MAP_OLD_TO_NEW, STEP_MAP_NEW_TO_OLD,
resolve_step_id, ALL_STEP_IDS,
)
from .runner import (
StepSpec, PIPELINE_STEPS, PipelineRunner, PipelineHalt,
)
__all__ = [
"PipelineContext", "StepSpec", "PIPELINE_STEPS", "PipelineRunner", "PipelineHalt",
"STEP_MAP_OLD_TO_NEW", "STEP_MAP_NEW_TO_OLD",
"resolve_step_id", "ALL_STEP_IDS",
]

View File

@ -0,0 +1,148 @@
# -*- coding: utf-8 -*-
"""
PipelineContext内存级数据载体跨 14 个 step 传递路径与元信息。
设计原则:
- 所有路径字段以 `_path` 为后缀(与 step 方法形参命名约定一致)
- 字段值可缺省None由 StepSpec.requires 在调度时注入
- dataclass + field(default_factory=dict) 支持原地增删
- 不放 GUI 状态(避免循环依赖)
- 不绑具体 step 方法duck-typed cancellation / log append
"""
from __future__ import annotations
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Set
# ============================================================
# 步骤命名映射(定义在叶子节点,打破循环依赖)
# ============================================================
STEP_MAP_OLD_TO_NEW: Dict[str, str] = {
"step5_5": "step7",
"step6_5": "step8_non_empirical_modeling",
"step6_75": "step9",
"step8_5": "step11",
"step7": "step8",
"step8": "step7",
"step9": "step14",
"step10": "step4",
"step11_ml": "step10",
"step11": "step11",
}
STEP_MAP_NEW_TO_OLD: Dict[str, str] = {v: k for k, v in STEP_MAP_OLD_TO_NEW.items()}
ALL_STEP_IDS: Set[str] = set(STEP_MAP_OLD_TO_NEW.keys()) | set(STEP_MAP_OLD_TO_NEW.values())
def resolve_step_id(step_id: str) -> str:
"""将任意 step_id 转换为标准新格式。"""
if step_id in STEP_MAP_OLD_TO_NEW:
return STEP_MAP_OLD_TO_NEW[step_id]
return step_id
@dataclass
class PipelineContext:
"""流水线运行上下文(在 14 个 step 之间传递的内存字典)
字段命名约定:
- 路径类字段名 = panel key 名 = step 形参名(全链路无翻译)
- 训练/产物 CSV 用 `_path` 后缀(如 training_csv_path / water_mask_path
- 入参影像/CSV 沿用 panel 原名img_path / csv_path无 `_path` 后缀
- 目录类字段无 `_path` 后缀(如 models_dir / prediction_dir
- 元信息字段无后缀(如 user_config / status / log
"""
# ── 11 个 step 的入参/产物(按 step 顺序排列;字段名 = panel key = step 形参) ──
img_path: Optional[str] = None # Step 1/2/3 入参:原始影像
water_mask_path: Optional[str] = None # Step 1 出 → Step 2/3/7 入
glint_mask_path: Optional[str] = None # Step 2 出 → Step 3/7 入
deglint_img_path: Optional[str] = None # Step 3 出 → Step 5/7 入
csv_path: Optional[str] = None # Step 4/5/6_5/6_75 入参:原始/训练 CSV
processed_csv_path: Optional[str] = None # Step 4 出 → Step 5 入
training_csv_path: Optional[str] = None # Step 5 出 → Step 5_5/6/6_5/6_75 入
boundary_path: Optional[str] = None # Step 5 入参:边界 SHPpanel step5 名)
indices_path: Optional[str] = None # Step 5.5 出
sampling_csv_path: Optional[str] = None # Step 7 出 → Step 8/8_5/8_75/9 入
prediction_csv_path: Optional[str] = None # Step 8 出 → Step 9 入
distribution_map_path: Optional[str] = None # Step 9 出
boundary_shp_path: Optional[str] = None # Step 9 入参:边界 SHPpanel step9 名)
formula_csv_path: Optional[str] = None # Step 8_75 入参:公式 CSV
# ── 目录类(命名不带 _path 以示区别) ──
models_dir: Optional[str] = None
prediction_dir: Optional[str] = None
work_dir: Optional[str] = None
# ── Step 6 训练产物AutoML 模式有,常规模式为空) ──
model_files: List[str] = field(default_factory=list)
# ── 元信息(三件套:用户传的配置 / 取消事件 / 状态) ──
user_config: Dict[str, Any] = field(default_factory=dict)
cancel_event: Optional[Any] = None # duck-typed threading.Event / asyncio.Event
status: Dict[str, str] = field(default_factory=dict) # {step_id: 'start'/'completed'/'skipped'/'error'}
log: List[str] = field(default_factory=list)
# ── 诊断 ──
step_timings: Dict[str, float] = field(default_factory=dict)
pipeline_start_time: Optional[float] = None
pipeline_end_time: Optional[float] = None
last_error: Optional[str] = None
# ── 错误汇总(全流程结束后可用) ──
error_summary: List[tuple[str, str]] = field(default_factory=list)
# ── 出错时立即停止全流程(默认 False继续后续步骤 ──
breakpoint_on_error: bool = False
# ── ★ 智能补全锁定步骤列表(由 _auto_fill_missing_steps 自动开启的步骤) ──
# GUI 层读取此字段,在运行期间禁用对应面板的启用复选框
locked_steps: List[str] = field(default_factory=list)
# ============================================================
# 读写辅助
# ============================================================
def step_id(self, step_id: str) -> str:
"""将任意 step_id可能是旧名转换为标准新格式。
用法示例:
ctx.status[ctx.step_id('step6_5')] # 'step8_non_empirical_modeling'
ctx.user_config[ctx.step_id('step8_5')] # 'step11'
"""
if step_id in STEP_MAP_OLD_TO_NEW:
return STEP_MAP_OLD_TO_NEW[step_id]
return step_id
def set(self, key: str, value: Any) -> None:
"""原地写入任意属性。
允许动态字段(如 'report_path')直接挂在 __dict__ 上,
避免因静态字段缺失而抛 AttributeError。
"""
object.__setattr__(self, key, value)
def get(self, key: str, default: Any = None) -> Any:
"""原地读出,缺 key 不抛错。"""
return getattr(self, key, default)
def is_cancelled(self) -> bool:
"""统一软取消检查入口duck-typed
支持:
- threading.Event.is_set()
- asyncio.Eventloop-boundis_set 同步接口存在)
- 自定义 .is_set() / .cancelled 属性
"""
ev = self.cancel_event
if ev is None:
return False
is_set = getattr(ev, "is_set", None)
if callable(is_set):
return bool(is_set())
return bool(getattr(ev, "cancelled", False))
def append_log(self, msg: str) -> None:
"""写入日志列表(也用于主进程 stdout 调试)。"""
self.log.append(msg)

650
src/core/pipeline/runner.py Normal file
View File

@ -0,0 +1,650 @@
# -*- coding: utf-8 -*-
"""
PipelineRunner基于 StepSpec 声明式调度 14 个 step。
设计要点:
- StepSpec 声明 requiresctx 字段名列表)+ producesctx 字段名列表)
- 命名约定ctx 字段名 == panel key 名 == step 形参名(全链路无翻译)
- 步骤命名step_id 格式为 stepN 或 stepN_suffix无小数位method_name 与 step_id 对齐
- 调度顺序:按 PIPELINE_STEPS 列表顺序requires 缺则 skip
- 软取消:在每个 step 前检查 ctx.is_cancelled()
- 断点续跑spec.output_file 已落盘则跳过执行
- 错误汇总:全流程结束后 error_summary 记录所有 step 的异常
- 预检run() 入口硬校验 step1 img_path其余依赖通过智能补全 + 软警告处理
- PipelineHalt外层 run() 不 catch触发循环 break实现硬终止
- STEP_MAP旧 step_id → 新 step_id 双向映射,供 GUI 配置兼容使用
- duck-typed pipelinerunner 只调 getattr(pipeline, method_name),不强依赖类层级
"""
from __future__ import annotations
import inspect
import logging
import os
import time
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Sequence
from .context import PipelineContext, STEP_MAP_OLD_TO_NEW, STEP_MAP_NEW_TO_OLD, resolve_step_id
logger = logging.getLogger(__name__)
# ============================================================
# 终止异常(外层 run() 不 catch触发循环 break
# ============================================================
class PipelineHalt(Exception):
"""不可恢复的错误,在 run() 循环中抛出后直接 break不走 Exception 处理分支。
适用场景:
- GUI 层通过 _notify 弹窗拦截后主动抛出的硬终止信号
"""
pass
# ============================================================
# StepSpec 声明式描述
# ============================================================
@dataclass
class StepSpec:
"""单个 step 的元信息(声明式,避免硬编码)"""
step_id: str
method_name: str
requires: List[str] # PipelineContext 字段名列表
produces: List[str] = field(default_factory=list) # 写入 ctx 的字段名列表
enabled: bool = True
parameter_map: Dict[str, str] = field(default_factory=dict)
# 当 requires 中任一字段为 None 时是否跳过;默认 True缺输入就 skip
skip_when_missing: bool = True
# 备注(仅用于文档生成 / 调试输出)
description: str = ""
# ★ 断点续跑:产物文件路径,支持 {work_dir} 占位符(运行时解析)
output_file: Optional[str] = None
# ★ 预检用:需要验证磁盘文件实际存在的 ctx key 列表
required_input_files: List[str] = field(default_factory=list)
# ============================================================
# 14 个 step 的声明表(顺序即调度顺序)
# step_id / method_name 均不含小数位,与前端显示对齐
# output_file / required_input_files 使用 {work_dir} 占位符,由 _resolve_path 展开
# ============================================================
PIPELINE_STEPS: List[StepSpec] = [
StepSpec(
step_id="step1", method_name="step1_generate_water_mask",
requires=["img_path"], produces=["water_mask_path"],
required_input_files=["img_path"],
output_file="{work_dir}/1_water_mask/water_mask.dat",
description="水域掩膜生成NDWI 或 SHP",
),
StepSpec(
step_id="step2", method_name="step2_find_glint_area",
requires=["img_path", "water_mask_path"], produces=["glint_mask_path"],
required_input_files=["img_path", "water_mask_path"],
output_file="{work_dir}/2_Glint_Detection/severe_glint_area.dat",
description="耀斑区域检测",
),
StepSpec(
step_id="step3", method_name="step3_remove_glint",
requires=["img_path", "water_mask_path", "glint_mask_path"],
produces=["deglint_img_path"],
required_input_files=["img_path", "water_mask_path", "glint_mask_path"],
output_file="{work_dir}/3_deglint/deglint.bsq",
description="耀斑去除",
),
StepSpec(
step_id="step4", method_name="step5_process_csv",
requires=["csv_path"], produces=["processed_csv_path"],
required_input_files=["csv_path"],
output_file="{work_dir}/5_Data_Cleaning/processed_data.csv",
description="CSV 异常值清洗",
),
StepSpec(
step_id="step5", method_name="step6_extract_spectra",
requires=["deglint_img_path", "processed_csv_path", "csv_path", "boundary_path", "glint_mask_path"],
produces=["training_csv_path"],
parameter_map={
"processed_csv_path": "csv_path",
"csv_path": "_raw_csv_ignored",
},
skip_when_missing=False,
required_input_files=["deglint_img_path", "processed_csv_path", "boundary_path", "glint_mask_path"],
output_file="{work_dir}/6_Spectral_Feature_Extraction/training_spectra.csv",
description="实测样本点光谱提取",
),
StepSpec(
step_id="step7", method_name="step7_calc_indices",
requires=["training_csv_path"], produces=["indices_path", "trad_indices_dir"],
required_input_files=["training_csv_path"],
output_file="{work_dir}/7_Water_Quality_Indices/training_spectra_indices.csv",
description="水质参数指数计算双轨输出A轨宽表 + B轨单文件",
),
StepSpec(
step_id="step8", method_name="step8_train_ml",
requires=["training_csv_path"], produces=["models_dir"],
required_input_files=["training_csv_path"],
output_file="{work_dir}/8_Supervised_Model_Training/best_models.pkl",
description="ML 建模GridSearchCV / AutoML",
),
StepSpec(
step_id="step8_non_empirical_modeling",
method_name="step8_non_empirical_modeling",
requires=["training_csv_path"], produces=["models_dir"],
parameter_map={"training_csv_path": "csv_path"},
required_input_files=["training_csv_path"],
output_file="{work_dir}/8_Non_Empirical_Regression/non_empirical_models.pkl",
description="非经验统计回归",
),
StepSpec(
step_id="step9", method_name="step9_watercolor_inversion",
requires=["deglint_img_path", "water_mask_path"], produces=["watercolor_index_dir"],
required_input_files=["deglint_img_path"],
output_file="{work_dir}/9_WaterColor_Index_Images",
description="水色指数反演BSQ 影像直接处理)",
),
StepSpec(
step_id="step10", method_name="step4_sampling",
requires=["deglint_img_path", "water_mask_path"], produces=["sampling_csv_path"],
required_input_files=["deglint_img_path", "water_mask_path"],
output_file="{work_dir}/4_sampling/sampling_spectra.csv",
description="整景密集采样点生成 + 光谱提取",
),
StepSpec(
step_id="step11_ml", method_name="step9_predict_ml",
requires=["sampling_csv_path", "models_dir"], produces=["prediction_csv_path"],
required_input_files=["sampling_csv_path", "models_dir"],
output_file="{work_dir}/11_12_13_predictions/prediction_results.csv",
description="ML 模型预测(采样点)",
),
StepSpec(
step_id="step11", method_name="step11_non_empirical_prediction",
requires=["sampling_csv_path", "models_dir"], produces=["prediction_dir"],
parameter_map={"models_dir": "non_empirical_models_dir"},
required_input_files=["sampling_csv_path", "models_dir"],
output_file="{work_dir}/11_12_13_predictions/non_empirical_predictions",
description="非经验模型预测",
),
StepSpec(
step_id="step14", method_name="step10_map",
requires=["prediction_csv_path", "boundary_shp_path"],
produces=["distribution_map_path"],
required_input_files=["prediction_csv_path", "boundary_shp_path"],
output_file="{work_dir}/distribution_map.png",
description="克里金插值成图",
),
]
# ============================================================
# PipelineRunner执行者
# ============================================================
class PipelineRunner:
"""按 StepSpec 调度 14 个 step 方法,支持软取消 + 断点续跑 + 错误汇总。
用法:
ctx = PipelineContext(img_path=..., work_dir=..., user_config=config)
runner = PipelineRunner(pipeline_instance)
result_ctx = runner.run(ctx, config=config) # 预检通过后开始执行
print(result_ctx.error_summary) # [(step_id, error_msg), ...]
"""
def __init__(self, pipeline, steps: Optional[Sequence[StepSpec]] = None):
self.pipeline = pipeline
self.steps: List[StepSpec] = list(steps) if steps else list(PIPELINE_STEPS)
# ------------------------------------------------------------------
# 主入口
# ------------------------------------------------------------------
def run(self, ctx: PipelineContext, config=None, skip_list: Optional[List[str]] = None) -> PipelineContext:
self.config = config or {}
skip_list = skip_list or []
logger.info("开始运行完整流程 (Runner 调度模式)...")
ctx.pipeline_start_time = time.time()
error_summary: List[tuple[str, str]] = []
skip_set = set(skip_list) if skip_list else set()
# ── ★ Step1 img_path 硬校验(缺失则立即终止整个流程) ──
if not ctx.get("img_path"):
msg = "【全流程预检失败】缺少参考影像路径 (img_path),流程无法启动。"
ctx.append_log(f"[RUNNER] {msg}")
self._notify_step("全流程", "error", msg)
ctx.last_error = msg
ctx.pipeline_end_time = time.time()
return ctx
# ── ★ 智能补全:扫描 work_dir 默认产物路径,回填 ctx ──
self._scan_workdir_outputs(ctx)
# ── ★ 自动补全缺失步骤work_dir 有产物则强制开启 + 回填路径 ──
self._auto_fill_missing_steps(ctx)
# ── 软预检警告(不再阻断,仅记录日志)──
self._preflight_warnings(ctx)
# 断点续跑预扫描ctx 已有产物则记录诊断日志
self._restore_outputs_from_ctx(ctx)
# 1. 暴力上下文注入:将 GUI config 中的所有参数强行塞入 ctx防丢失
for step_id, cfg in self.config.items():
if isinstance(cfg, dict):
for k, v in cfg.items():
if k != 'enabled' and v:
setattr(ctx, k, v)
# 2. 构建依赖提供者映射 (Provider Map)
provider_map = {}
for step in self.steps:
for prod in step.produces:
provider_map[prod] = step
# 3. 强力依赖级联唤醒 (Auto-Wakeup Engine)
changed = True
woke_up_steps = []
while changed:
changed = False
for step in self.steps:
if step.step_id in skip_set:
continue # 用户强踢的,绝不唤醒
step_cfg = self.config.setdefault(step.step_id, {})
if not step_cfg.get('enabled', True):
continue
for req in step.requires:
# 如果上下文缺这个参数
if not (hasattr(ctx, req) and getattr(ctx, req)):
provider = provider_map.get(req)
if provider and provider.step_id not in skip_set:
prov_cfg = self.config.setdefault(provider.step_id, {})
if not prov_cfg.get('enabled', True):
prov_cfg['enabled'] = True
changed = True
woke_up_steps.append(provider.step_id)
logger.info(f"[*] 自动唤醒: {provider.step_id} (为下游提供 {req})")
if woke_up_steps:
logger.info(f"★ 依赖唤醒完成,共唤醒 {len(woke_up_steps)} 个次/步骤")
# 4. 正式执行流水线
for step in self.steps:
# ── 软取消 ──
if ctx.is_cancelled():
ctx.append_log(f"[RUNNER] 收到取消信号,提前终止 @ {step.step_id}")
break
if step.step_id in skip_set:
ctx.status[step.step_id] = "user_skipped"
ctx.append_log(
f"\n{'='*60}\n"
f" ⚠ 用户强制跳过: {step.step_id}{step.description}\n"
f" 原因:用户在预检弹窗中勾选「忽略」,已确认跳过\n"
f"{'='*60}\n"
)
self._notify_step(step.step_id, "skipped", "用户强制跳过(预检弹窗)")
continue
step_cfg = self.config.get(step.step_id, {})
if not step_cfg.get('enabled', True):
continue
# 4.1 检查磁盘产物:如果已落盘,恢复上下文并跳过(拒绝静默跳过,必须打日志)
if step.output_file and os.path.exists(step.output_file):
for prod in step.produces:
if not (hasattr(ctx, prod) and getattr(ctx, prod)):
setattr(ctx, prod, step.output_file)
ctx.status[step.step_id] = "skipped"
ctx.append_log(f"[CACHE] 产物已存在,跳过运行并恢复上下文: {step.step_id}")
self._notify_step(step.step_id, "skipped", "产物已存在(断点续跑)")
continue
# 4.2 依赖死线检查
missing = [req for req in step.requires if not (hasattr(ctx, req) and getattr(ctx, req))]
if missing:
ctx.status[step.step_id] = "skipped"
reason = f"缺少必要的上下文参数,自动跳过: {missing}"
ctx.append_log(f"[RUNNER] 跳过 {step.step_id},仍缺少必要参数: {missing}")
self._notify_step(step.step_id, "skipped", reason)
continue
# 4.3 真正执行
ctx.append_log(f"[START] 正在执行步骤: {step.step_id}")
self._notify_step(step.step_id, "running", f"正在执行: {step.description}")
try:
method = getattr(self.pipeline, step.method_name)
sig = inspect.signature(method)
kwargs = {}
current_step_cfg = self.config.get(step.step_id, {})
for param_name in sig.parameters:
# 优先级 1直接使用当前步骤专属配置中的值
if param_name in current_step_cfg:
kwargs[param_name] = current_step_cfg[param_name]
continue
# 优先级 1.5:【核心修复】硬隔离 output_file防止被其他步骤的同名变量污染
if param_name == 'output_file' and hasattr(step, 'output_file') and step.output_file:
work_dir = getattr(ctx, 'work_dir', '')
kwargs[param_name] = step.output_file.format(work_dir=work_dir)
continue
# 优先级 2处理跨步骤的映射逻辑
ctx_key = param_name
if hasattr(step, 'parameter_map') and step.parameter_map:
for k, v in step.parameter_map.items():
if v == param_name:
ctx_key = k
break
# 优先级 3从全局大背包 ctx 中取(排在最后)
if hasattr(ctx, ctx_key):
kwargs[param_name] = getattr(ctx, ctx_key)
# 使用解包后的关键字参数调用底层函数
result = method(**kwargs)
# 【产物接力 1】如果底层函数返回了字典直接合并到上下文
if isinstance(result, dict):
for k, v in result.items():
setattr(ctx, k, v)
# 【产物接力 2】强制通过 StepSpec 的 output_file 模板注入
if hasattr(step, 'output_file') and step.output_file:
work_dir = getattr(ctx, 'work_dir', '')
actual_out_path = step.output_file.format(work_dir=work_dir)
for prod in step.produces:
if not hasattr(ctx, prod) or not getattr(ctx, prod):
setattr(ctx, prod, actual_out_path)
logger.info(f"[产物接力] 登记 {prod} = {actual_out_path}")
except PipelineHalt:
ctx.status[step.step_id] = "error"
ctx.append_log(f"[RUNNER] PipelineHalt 硬终止 @ {step.step_id}")
self._notify_step(step.step_id, "error", "预检失败,硬终止")
break
except Exception as e:
ctx.status[step.step_id] = "error"
error_summary.append((step.step_id, str(e)))
ctx.last_error = f"{step.step_id}: {e!r}"
ctx.append_log(f"[ERROR] 步骤 {step.step_id} 执行崩溃: {str(e)}")
self._notify_step(step.step_id, "error", str(e))
break
ctx.pipeline_end_time = time.time()
ctx.error_summary = error_summary
return ctx
# ------------------------------------------------------------------
# ★ 智能补全:工作目录产物扫描
# ------------------------------------------------------------------
def _scan_workdir_outputs(self, ctx: PipelineContext) -> None:
"""扫描 work_dir 下所有步骤的默认产物路径,若存在则回填 ctx。
利用 spec.output_file 的 {work_dir} 占位符,展开为实际绝对路径。
存在则写入对应的 ctx 字段produces供后续步骤直接使用。
已在 ctx 中有值的字段不会被覆盖。
"""
work_dir = ctx.get("work_dir") or ""
if not work_dir:
return
for spec in self.steps:
if not spec.produces:
continue
for produce_key in spec.produces:
if ctx.get(produce_key):
continue # 已有人工填写的值,不覆盖
resolved = self._resolve_path(spec.output_file, ctx)
if resolved and os.path.exists(resolved):
ctx.set(produce_key, resolved)
ctx.append_log(
f"[AUTO_FILL] 检测到已有产物,回填 {produce_key} = {resolved}"
)
# ------------------------------------------------------------------
# ★ 智能补全:强制开启被静默跳过的步骤
# ------------------------------------------------------------------
def _auto_fill_missing_steps(self, ctx: PipelineContext) -> None:
"""检查所有 disabled 步骤。
若某步骤的 output_file 已在 work_dir 落盘(断点续跑),
说明该步骤之前已完成但被用户在 GUI 中禁用了。
此时系统自动重开启该步骤forced=True并将其加入 locked_steps。
同时,将已落盘的产物路径回填到对应的 ctx 字段,
确保下游步骤能正常拿到输入。
阻断性缺失step1 img_path已在 run() 入口硬校验,此处不处理。
"""
newly_locked: List[str] = []
for spec in self.steps:
if spec.enabled:
continue # 用户主动开启的步骤不受影响
skip_set = getattr(ctx, '_skip_set', set())
if spec.step_id in skip_set:
continue # 用户在 PreflightDialog 中手动忽略的步骤不自动补全
resolved = self._resolve_path(spec.output_file, ctx)
if resolved and os.path.exists(resolved):
# ── 该步骤已有产物但被禁用 → 自动开启 ──
spec.enabled = True
ctx.locked_steps.append(spec.step_id)
newly_locked.append(spec.step_id)
# 回填所有产物字段到 ctx
for produce_key in spec.produces:
if not ctx.get(produce_key):
ctx.set(produce_key, resolved)
ctx.append_log(
f"[AUTO_FILL] 强制开启并回填 {spec.step_id} 产物 {produce_key} = {resolved}"
)
ctx.append_log(
f"\n{'='*60}\n"
f" ⚡ 智能补全:步骤 {spec.step_id}{spec.description}\n"
f" 原因:该步骤在 work_dir 中已有产物但被您在 GUI 中禁用了。\n"
f" 操作:系统已自动开启该步骤,产物路径已回填。\n"
f" 注意:运行期间该步骤已被锁定,您无法临时关闭。\n"
f"{'='*60}\n"
)
if newly_locked:
self._notify_step(
"全流程",
"info",
f"智能补全已自动开启 {len(newly_locked)} 个步骤:{newly_locked}"
)
def _resolve_output_for_key(
self, produce_key: str, ctx: PipelineContext
) -> Optional[str]:
"""根据 produces key 查找对应步骤的 output_file 并展开路径。"""
for spec in self.steps:
if produce_key in spec.produces:
return self._resolve_path(spec.output_file, ctx)
return None
def _scan_single_step_outputs(
self, spec: StepSpec, ctx: PipelineContext
) -> None:
"""扫描单个步骤的 work_dir 产物,回填 ctx不覆盖已有值"""
if not spec.produces:
return
for produce_key in spec.produces:
if ctx.get(produce_key):
continue
resolved = self._resolve_path(spec.output_file, ctx)
if resolved and os.path.exists(resolved):
ctx.set(produce_key, resolved)
ctx.append_log(
f"[AUTO_FILL] 依赖唤醒后检测到产物,回填 {produce_key} = {resolved}"
)
# ------------------------------------------------------------------
# 软预检警告(不再阻断)
# ------------------------------------------------------------------
def _preflight_warnings(self, ctx: PipelineContext) -> None:
"""软预检警告:遍历所有步骤,检测可预见的运行时跳过。
所有缺失均以 warning 记录日志,不抛异常,不阻止执行。
GUI 层可通过回调函数 _notify_step 向用户展示警告列表。
"""
warnings: List[str] = []
for spec in self.steps:
if not spec.enabled:
continue
# ── Step4 csv_path 缺失警告 ──
if spec.step_id == "step4":
if not ctx.get("csv_path"):
warnings.append(
f"[{spec.step_id}] 缺少实测水质数据 (csv_path)"
"步骤 5-9 将被自动跳过"
)
# ── 磁盘文件缺失警告(已填充 ctx 但文件实际不存在)──
for ctx_key in spec.required_input_files:
value = ctx.get(ctx_key)
if not value:
continue
if not os.path.exists(value):
warnings.append(
f"[{spec.step_id}] 磁盘文件缺失(但 ctx 已回填): {ctx_key} = {value}"
)
if warnings:
detail = "\n".join(f" - {w}" for w in warnings)
ctx.append_log(
f"[RUNNER] 【软预检警告】(流程将继续执行,缺失项将被自动跳过)\n{detail}"
)
self._notify_step("全流程", "warning", f"预检警告:{len(warnings)}\n{detail}")
# ------------------------------------------------------------------
# 单步调用
# ------------------------------------------------------------------
def _invoke(self, spec: StepSpec, ctx: PipelineContext) -> None:
"""调一个 step 方法ctx 路径 → 形参;产出 → ctx 字段。"""
ctx.append_log(
f"[DEBUG] Step {spec.step_id} requires: {spec.requires}, "
f"actual ctx data: {[ctx.get(k) for k in spec.requires]}"
)
method = getattr(self.pipeline, spec.method_name, None)
if method is None:
ctx.append_log(f"[RUNNER] 步骤方法缺失: {spec.method_name}(跳过)")
ctx.status[spec.step_id] = "skipped"
return
# 1) 把 ctx 路径作为形参注入
kwargs: Dict[str, Any] = {}
for ctx_key in spec.requires:
param_name = spec.parameter_map.get(ctx_key, self._default_param_name(ctx_key))
kwargs[param_name] = ctx.get(ctx_key)
# 2) 允许用户在 ctx.user_config[step_id] 覆盖/补充(非空值才覆盖)
user_overrides = ctx.user_config.get(spec.step_id) or {}
if isinstance(user_overrides, dict):
for k, v in user_overrides.items():
if v is not None and v != "":
kwargs[k] = v
# 3) 状态置 start
ctx.append_log(
f"[RUNNER] -> {spec.method_name}({list(kwargs.keys())})"
)
ctx.status[spec.step_id] = "start"
self._notify_step(spec.step_id, "start", spec.method_name)
# 4) 执行(外层 run() 统一捕获异常)
t0 = time.time()
result = method(**kwargs)
ctx.status[spec.step_id] = "completed"
ctx.step_timings[spec.step_id] = time.time() - t0
# 5) 产出收割
self._harvest(spec, result, ctx)
self._notify_step(
spec.step_id, "completed",
str(result)[:200] if result is not None else "",
)
# ------------------------------------------------------------------
# 产出收割
# ------------------------------------------------------------------
def _harvest(self, spec: StepSpec, result: Any, ctx: PipelineContext) -> None:
"""把 step 方法返回值灌入 ctx 的 produces 字段。"""
if not spec.produces:
return
if isinstance(result, dict):
for produce_key in spec.produces:
if produce_key in result:
ctx.set(produce_key, result[produce_key])
elif result is not None:
ctx.set(spec.produces[0], result)
# ------------------------------------------------------------------
# 断点续跑辅助
# ------------------------------------------------------------------
def _resolve_path(
self, template: Optional[str], ctx: PipelineContext
) -> Optional[str]:
"""解析模板中的 {work_dir} 占位符,返回展开后的绝对路径或 None。"""
if not template:
return None
work_dir = ctx.get("work_dir") or ""
try:
return template.format(work_dir=work_dir)
except (KeyError, ValueError):
return template
def _restore_outputs_from_ctx(self, ctx: PipelineContext) -> None:
"""诊断日志:记录 ctx 中已有的非 None 产物。"""
for spec in self.steps:
if not (spec.enabled and spec.produces):
continue
for key in spec.produces:
val = ctx.get(key)
if val:
ctx.append_log(
f"[RUNNER] 断点续跑检测: {spec.step_id} 已有 {key} = {val}"
)
def _restore_ctx_from_output(
self, spec: StepSpec, resolved_path: str, ctx: PipelineContext
) -> None:
"""断点跳过时:将已存在的 output_file 写回 ctx 所有 produces 字段,供下游使用。
接力棒断链修复:遍历 spec.produces 逐一注册,不遗漏任何下游可能依赖的 key。
"""
if not spec.produces:
return
for produce_key in spec.produces:
ctx.set(produce_key, resolved_path)
# ------------------------------------------------------------------
# 工具
# ------------------------------------------------------------------
@staticmethod
def _default_param_name(ctx_key: str) -> str:
"""默认原样返回 ctx 键名作为形参名。特殊缩写由 parameter_map 显式处理。"""
return ctx_key
def _notify_step(self, step_id: str, status: str, message: str) -> None:
"""通过 pipeline.callback 通知 GUI 当前步骤状态。"""
notify = getattr(self.pipeline, "_notify", None)
if callable(notify):
try:
notify(step_id, status, message)
except Exception:
pass

View File

@ -0,0 +1,544 @@
# -*- coding: utf-8 -*-
"""
Optuna + 智能子采样 AutoML 训练器(路线 B 防爆引擎)。
为什么需要这个:
- 老路径11 预处理 × 4 模型 × 3 划分 = 132 组 GridSearchCV
对中小数据集 10 分钟+,对大数据集 5w+ 行 直接 OOM
- AutoML 路径1 预处理 × N 模型Optuna 调超参),用智能子采样避开 OOM
再用最优超参在**全量数据**上 refit最终保存单一模型
设计要点:
- 入口 train_with_automl(csv, feature_start_column, model_names, ...)
- AutoMLResult dataclass 返回(每个目标列一份)
- smart_subsampleN > max_samples 时随机下采样
- 失败兜底optuna 未装 / 全 trial 失败 → fallback 到 WaterQualityModelingBatch
- 文件命名规范:{target}_{preprocess}_{model}_AUTOML.joblib
- save_data["metadata"]["automl"] = True 标记
调用:
from src.core.prediction.automl_trainer import train_with_automl
results = train_with_automl(
training_csv_path=".../training_spectra.csv",
feature_start_column="374.285004",
model_names=["RF", "SVR", "Ridge"],
n_trials=20,
timeout_sec=300,
)
"""
from __future__ import annotations
import json
import time
from dataclasses import asdict, dataclass, field
from pathlib import Path
from typing import Any, Callable, Dict, List, Optional, Tuple
import numpy as np
import pandas as pd
# ============================================================
# 常量
# ============================================================
# AutoML 寻优阶段允许的最大样本数(避免 OOM
# 5000 样本对 RF/SVR/Ridge 的 Optuna 寻优足够给出稳定 CV
DEFAULT_MAX_SAMPLES = 5000
# 单次 Optuna trial 的默认超时(秒)
DEFAULT_TIMEOUT = 300.0
# 默认 trial 数
DEFAULT_N_TRIALS = 20
# AutoML 输出目录名后缀
AUTOML_DIR_SUFFIX = "_AutoML"
# ============================================================
# 数据类
# ============================================================
@dataclass
class AutoMLResult:
"""单个目标列的 AutoML 训练结果"""
success: bool = False
model_path: Optional[str] = None
cv_score: float = -float("inf")
best_params: Optional[Dict[str, Any]] = None
target_column: str = ""
preprocessing: str = ""
model_name: str = ""
n_trials_done: int = 0
n_samples_used: int = 0
fallback_used: bool = False
elapsed_sec: float = 0.0
error: Optional[str] = None
metadata: Dict[str, Any] = field(default_factory=dict)
# ============================================================
# 智能子采样
# ============================================================
def smart_subsample(
X: np.ndarray,
y: np.ndarray,
max_samples: int = DEFAULT_MAX_SAMPLES,
random_state: int = 42,
) -> Tuple[np.ndarray, np.ndarray, bool]:
"""当 N > max_samples 时随机下采样;否则原样返回。
Returns:
(X_sub, y_sub, was_subsampled)
"""
n = X.shape[0]
if n <= max_samples:
return X, y, False
rng = np.random.default_rng(random_state)
idx = rng.choice(n, size=max_samples, replace=False)
return X[idx], y[idx], True
# ============================================================
# 模型工厂
# ============================================================
def _build_model(model_name: str, random_state: int = 42):
"""根据英文模型键名构造 sklearn-compatible 模型实例factory"""
from sklearn.ensemble import (
AdaBoostRegressor, ExtraTreesRegressor, GradientBoostingRegressor,
RandomForestRegressor,
)
from sklearn.linear_model import (
ElasticNet, Lasso, LinearRegression, Ridge,
)
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
factory = {
"RF": lambda **kw: RandomForestRegressor(random_state=random_state, n_jobs=1, **kw),
"ET": lambda **kw: ExtraTreesRegressor(random_state=random_state, n_jobs=1, **kw),
"GradientBoosting": lambda **kw: GradientBoostingRegressor(random_state=random_state, **kw),
"AdaBoost": lambda **kw: AdaBoostRegressor(random_state=random_state, **kw),
"Ridge": lambda **kw: Ridge(**kw),
"Lasso": lambda **kw: Lasso(max_iter=5000, **kw),
"ElasticNet": lambda **kw: ElasticNet(max_iter=5000, **kw),
"LinearRegression": lambda **kw: LinearRegression(**kw),
"SVR": lambda **kw: SVR(**kw),
"KNN": lambda **kw: KNeighborsRegressor(n_jobs=1, **kw),
"MLP": lambda **kw: MLPRegressor(max_iter=500, random_state=random_state, **kw),
"DecisionTree": lambda **kw: DecisionTreeRegressor(random_state=random_state, **kw),
"PLS": None, # sklearn.cross_decomposition.PLSRegression 暂未集成
}
builder = factory.get(model_name)
if builder is None:
return None
return builder
# ============================================================
# Optuna 超参 search space
# ============================================================
def _get_search_space(model_name: str, trial) -> Dict[str, Any]:
"""按模型名返回 Optuna 超参 search space。"""
sp: Dict[str, Any] = {}
if model_name == "RF":
sp["n_estimators"] = trial.suggest_int("n_estimators", 50, 300, step=50)
sp["max_depth"] = trial.suggest_int("max_depth", 3, 20)
sp["min_samples_split"] = trial.suggest_int("min_samples_split", 2, 10)
sp["min_samples_leaf"] = trial.suggest_int("min_samples_leaf", 1, 5)
elif model_name == "ET":
sp["n_estimators"] = trial.suggest_int("n_estimators", 50, 300, step=50)
sp["max_depth"] = trial.suggest_int("max_depth", 3, 20)
elif model_name == "GradientBoosting":
sp["n_estimators"] = trial.suggest_int("n_estimators", 50, 300, step=50)
sp["max_depth"] = trial.suggest_int("max_depth", 3, 8)
sp["learning_rate"] = trial.suggest_float("learning_rate", 0.01, 0.3, log=True)
elif model_name == "SVR":
sp["C"] = trial.suggest_float("C", 0.1, 100.0, log=True)
sp["epsilon"] = trial.suggest_float("epsilon", 0.001, 1.0, log=True)
sp["kernel"] = trial.suggest_categorical("kernel", ["rbf", "linear"])
elif model_name == "KNN":
sp["n_neighbors"] = trial.suggest_int("n_neighbors", 3, 20)
sp["weights"] = trial.suggest_categorical("weights", ["uniform", "distance"])
elif model_name in ("Ridge", "Lasso", "ElasticNet"):
sp["alpha"] = trial.suggest_float("alpha", 0.01, 100.0, log=True)
if model_name == "ElasticNet":
sp["l1_ratio"] = trial.suggest_float("l1_ratio", 0.0, 1.0)
elif model_name == "MLP":
sp["hidden_layer_sizes"] = trial.suggest_categorical(
"hidden_layer_sizes", [(50,), (100,), (50, 50), (100, 50)]
)
sp["alpha"] = trial.suggest_float("alpha", 1e-5, 1e-1, log=True)
sp["learning_rate_init"] = trial.suggest_float("learning_rate_init", 1e-4, 1e-2, log=True)
elif model_name == "DecisionTree":
sp["max_depth"] = trial.suggest_int("max_depth", 3, 20)
sp["min_samples_split"] = trial.suggest_int("min_samples_split", 2, 10)
elif model_name == "AdaBoost":
sp["n_estimators"] = trial.suggest_int("n_estimators", 30, 200, step=30)
sp["learning_rate"] = trial.suggest_float("learning_rate", 0.01, 1.0, log=True)
else:
sp["n_estimators"] = trial.suggest_int("n_estimators", 50, 200, step=50)
return sp
def _make_objective(model_name: str, X: np.ndarray, y: np.ndarray,
cv_folds: int, random_state: int):
"""构造 Optuna objective5 折 CV R²"""
from sklearn.model_selection import KFold, cross_val_score
def objective(trial):
params = _get_search_space(model_name, trial)
try:
builder = _build_model(model_name, random_state=random_state)
if builder is None:
return -1.0
model = builder(**params)
kf = KFold(n_splits=cv_folds, shuffle=True, random_state=random_state)
scores = cross_val_score(model, X, y, cv=kf, scoring="r2", n_jobs=1)
return float(np.mean(scores))
except Exception:
return -1.0
return objective
def _refit_full(model_name: str, best_params: Dict[str, Any],
X: np.ndarray, y: np.ndarray, random_state: int):
"""用 best params 在**全量数据**上 refit。"""
builder = _build_model(model_name, random_state=random_state)
if builder is None:
return None
model = builder(**best_params)
model.fit(X, y)
return model
# ============================================================
# 失败兜底(回退到老 GridSearchCV 路径)
# ============================================================
def _fallback_train(
training_csv_path: str,
feature_start_column,
preprocessing: str,
model_name: str,
split_method: str,
cv_folds: int,
output_dir: Path,
target_column: str,
) -> AutoMLResult:
"""AutoML 失败时调老 WaterQualityModelingBatch。
返回的 AutoMLResult.fallback_used=True。
"""
try:
from src.core.modeling.modeling_batch import WaterQualityModelingBatch
except ImportError as e:
return AutoMLResult(
success=False, error=f"fallback 导入失败: {e!r}", fallback_used=True,
target_column=target_column, preprocessing=preprocessing, model_name=model_name,
)
try:
out_dir = output_dir / preprocessing
out_dir.mkdir(parents=True, exist_ok=True)
modeler = WaterQualityModelingBatch(str(out_dir))
modeler.train_models_batch(
csv_path=training_csv_path,
feature_start_column=feature_start_column,
preprocessing_methods=[preprocessing],
model_names=[model_name],
split_methods=[split_method],
cv_folds=cv_folds,
)
# 找产出
candidates = list(out_dir.rglob(f"{target_column}_{preprocessing}_{model_name}.joblib"))
model_path = str(candidates[0]) if candidates else None
return AutoMLResult(
success=model_path is not None,
model_path=model_path,
target_column=target_column, preprocessing=preprocessing, model_name=model_name,
fallback_used=True,
metadata={"source": "WaterQualityModelingBatch"},
)
except Exception as e:
return AutoMLResult(
success=False, error=f"fallback 失败: {e!r}", fallback_used=True,
target_column=target_column, preprocessing=preprocessing, model_name=model_name,
)
# ============================================================
# 主入口
# ============================================================
def train_with_automl(
training_csv_path: str,
feature_start_column,
preprocessing_methods: Optional[List[str]] = None,
model_names: Optional[List[str]] = None,
split_methods: Optional[List[str]] = None,
cv_folds: int = 5,
output_dir: Optional[str] = None,
n_trials: int = DEFAULT_N_TRIALS,
timeout_sec: float = DEFAULT_TIMEOUT,
max_samples: int = DEFAULT_MAX_SAMPLES,
random_state: int = 42,
callback: Optional[Callable[[str, str, str], None]] = None,
) -> List[AutoMLResult]:
"""用 Optuna + 子采样跑 AutoML。失败时自动回退到 GridSearchCV。
Args:
training_csv_path: 训练用 CSVStep 5 产物 training_spectra.csv
feature_start_column: 特征起始列名或索引(之前所有列视为目标 y
preprocessing_methods: 候选预处理列表(**仅用第 1 个**,避免笛卡尔爆炸)
model_names: 候选模型列表(每个都会跑一遍 Optuna
split_methods: 候选数据划分列表AutoML 仅用第 1 个)
cv_folds: 交叉验证折数
output_dir: 输出目录(默认 <models_dir>_AutoML
n_trials: 单模型 Optuna trial 数
timeout_sec: 单模型超时(秒),到时强制停止
max_samples: 寻优阶段允许的最大样本数
callback: 状态回调 callback(step_name, status, message)
Returns:
List[AutoMLResult],每个目标列一份结果
"""
def notify(status: str, msg: str = "") -> None:
if callback:
callback("步骤6_AutoML", status, msg)
# ---- 1) 参数默认值 ----
if preprocessing_methods is None:
preprocessing_methods = ["MMS"]
if model_names is None:
model_names = ["RF", "SVR", "Ridge"]
if split_methods is None:
split_methods = ["spxy"]
# 决策:仅用第一个预处理 + 第一个划分,避免笛卡尔爆炸
preproc = preprocessing_methods[0]
split_method = split_methods[0]
if output_dir is None:
output_dir = "./8_Supervised_Model_Training_AutoML"
out_dir = Path(output_dir)
out_dir.mkdir(parents=True, exist_ok=True)
preproc_dir = out_dir / preproc
preproc_dir.mkdir(parents=True, exist_ok=True)
# ---- 2) 加载数据 ----
notify("start", f"AutoML 训练开始 (n_trials={n_trials}, timeout={timeout_sec}s, max_samples={max_samples})")
if not Path(training_csv_path).exists():
return [AutoMLResult(success=False, error=f"训练 CSV 不存在: {training_csv_path}")]
df = pd.read_csv(training_csv_path)
# 提取目标列feature_start_column 之前所有数值列)
if isinstance(feature_start_column, int):
y_cols = [c for c in df.columns[:feature_start_column]
if pd.api.types.is_numeric_dtype(df[c])]
else:
try:
idx = list(df.columns).index(feature_start_column)
y_cols = [c for c in df.columns[:idx]
if pd.api.types.is_numeric_dtype(df[c])]
except ValueError:
y_cols = []
if not y_cols:
notify("error", "AutoML: 未识别出目标列feature_start_column 之前的所有数值列)")
return [AutoMLResult(success=False, error="未识别出目标列")]
feat_cols = [c for c in df.columns if c not in y_cols]
X_all = df[feat_cols].values.astype(np.float64)
# ---- 3) 预处理(仅第一项) ----
if preproc != "None":
try:
from src.preprocessing.spectral_Preprocessing import Preprocessing
processed = Preprocessing(preproc, df[feat_cols])
if isinstance(processed, pd.DataFrame):
X_all = processed.values.astype(np.float64)
else:
X_all = np.asarray(processed, dtype=np.float64)
except Exception as e:
notify("warning", f"预处理 {preproc} 失败: {e!r},改用 None")
preproc = "None"
# ---- 4) 检查 Optuna 是否可用 ----
try:
import optuna
optuna.logging.set_verbosity(optuna.logging.WARNING)
optuna_available = True
except ImportError:
optuna_available = False
notify("warning", "optuna 未安装,全目标列回退到 GridSearchCVpip install \"optuna>=3.6\"")
# ---- 5) 逐 target 跑 ----
results: List[AutoMLResult] = []
total = len(y_cols)
per_model_timeout = max(10.0, timeout_sec / max(1, len(model_names)))
for ti, tgt in enumerate(y_cols, 1):
t0 = time.time()
yv = df[tgt].values.astype(np.float64)
mask = ~np.isnan(yv)
X_t = X_all[mask]
y_t = yv[mask]
if X_t.shape[0] < cv_folds * 2:
notify("warning", f"目标 {tgt}: 有效样本 {X_t.shape[0]} 不足,跳过")
results.append(AutoMLResult(
success=False, target_column=tgt, error=f"样本不足({X_t.shape[0]})",
preprocessing=preproc,
))
continue
X_sub, y_sub, was_sub = smart_subsample(X_t, y_t, max_samples=max_samples, random_state=random_state)
if was_sub:
notify("info", f"目标 {tgt}: {X_t.shape[0]} 样本 → 子采样 {X_sub.shape[0]}(寻优用)")
best_overall = AutoMLResult(success=False, target_column=tgt, preprocessing=preproc)
if not optuna_available:
# 全目标列一次性 fallback
best_overall = _fallback_train(
training_csv_path, feature_start_column, preproc, model_names[0], split_method,
cv_folds, out_dir, tgt,
)
else:
for model_name in model_names:
try:
builder = _build_model(model_name, random_state=random_state)
if builder is None:
notify("warning", f"模型 {model_name} 暂不支持 AutoML 寻优")
continue
study = optuna.create_study(
direction="maximize",
sampler=optuna.samplers.TPESampler(seed=random_state),
)
study.optimize(
_make_objective(model_name, X_sub, y_sub, cv_folds, random_state),
n_trials=n_trials,
timeout=per_model_timeout,
show_progress_bar=False,
)
if study.best_value is None or study.best_value <= -1.0:
notify("warning", f"{tgt}/{model_name}: 全部 trial 失败CV 全部 <= -1")
continue
# refit on FULL
final_model = _refit_full(model_name, study.best_params, X_t, y_t, random_state)
if final_model is None:
continue
# 保存
import joblib
fname = f"{tgt}_{preproc}_{model_name}_AUTOML.joblib"
fpath = preproc_dir / fname
joblib.dump({
"model": final_model,
"target_column_name": tgt,
"preprocess_method": preproc,
"model_name": model_name,
"metadata": {
"automl": True,
"best_params": study.best_params,
"cv_score": float(study.best_value),
"n_trials_done": len(study.trials),
"n_samples_used_full": int(X_t.shape[0]),
"n_samples_used_for_search": int(X_sub.shape[0]),
"was_subsampled": was_sub,
"split_method": split_method,
},
}, fpath)
cand = AutoMLResult(
success=True,
model_path=str(fpath),
cv_score=float(study.best_value),
best_params=study.best_params,
target_column=tgt,
preprocessing=preproc,
model_name=model_name,
n_trials_done=len(study.trials),
n_samples_used=int(X_sub.shape[0]),
metadata={"refit_on_full": True, "n_samples_full": int(X_t.shape[0])},
)
if cand.cv_score > best_overall.cv_score:
best_overall = cand
except Exception as e:
notify("warning", f"目标 {tgt} / 模型 {model_name} 失败: {e!r}")
continue
if not best_overall.success:
notify("warning", f"目标 {tgt} 全部 Optuna trial 失败,回退 GridSearchCV")
best_overall = _fallback_train(
training_csv_path, feature_start_column, preproc, model_names[0], split_method,
cv_folds, out_dir, tgt,
)
best_overall.elapsed_sec = time.time() - t0
results.append(best_overall)
notify("info", f"AutoML 目标 {tgt} 完成 ({ti}/{total}) cv={best_overall.cv_score:.4f}")
# ---- 6) 汇总 json ----
summary_path = out_dir / "automl_summary.json"
try:
with open(summary_path, "w", encoding="utf-8") as f:
json.dump([asdict(r) for r in results], f, ensure_ascii=False, indent=2, default=str)
except Exception as e:
notify("warning", f"写 automl_summary.json 失败: {e!r}")
success_n = sum(1 for r in results if r.success)
fallback_n = sum(1 for r in results if r.fallback_used)
notify("completed", f"AutoML 训练完成 {success_n}/{len(results)} 成功({fallback_n} 走 fallback汇总 {summary_path}")
return results
# ============================================================
# CLI 自测
# ============================================================
if __name__ == "__main__":
import argparse
p = argparse.ArgumentParser(description="AutoML 训练器 CLI 自测")
p.add_argument("--csv", required=True, help="训练用 CSVfeature_start_column 之前的列为目标 y")
p.add_argument("--feature-start", default="0", help="特征起始列名或索引(默认 0")
p.add_argument("--n-trials", type=int, default=DEFAULT_N_TRIALS)
p.add_argument("--timeout", type=float, default=DEFAULT_TIMEOUT)
p.add_argument("--max-samples", type=int, default=DEFAULT_MAX_SAMPLES)
p.add_argument("--out", default="./8_Supervised_Model_Training_AutoML")
args = p.parse_args()
# 智能推断 feature_start_column 类型
fsc: Any = args.feature_start
try:
fsc = int(fsc)
except ValueError:
pass
res = train_with_automl(
training_csv_path=args.csv,
feature_start_column=fsc,
n_trials=args.n_trials,
timeout_sec=args.timeout,
max_samples=args.max_samples,
output_dir=args.out,
)
print(f"\n训练完成 {len(res)} 个目标")
for r in res:
marker = "" if r.success else ""
fb = " [fallback]" if r.fallback_used else ""
print(f" {marker} {r.target_column}: cv={r.cv_score:.4f} path={r.model_path}{fb}")

View File

@ -3,9 +3,9 @@
"""
自定义回归预测模块
该模块根据9_Custom_Regression_Modeling文件夹中的CSV信息批量预测水质指数。
该模块根据13_Custom_Regression文件夹中的CSV信息批量预测水质指数。
处理流程:
1. 读取9_Custom_Regression_Modeling文件夹中的CSV文件
1. 读取13_Custom_Regression文件夹中的CSV文件
2. 根据r_squared选择最佳模型指数公式+反演公式)
3. 使用指数公式计算光谱指数值
4. 使用反演公式计算水质参数值
@ -38,12 +38,12 @@ class CustomRegressionPredictor:
"""
自定义回归预测器
基于9_Custom_Regression_Modeling文件夹中的回归模型CSV文件
基于13_Custom_Regression文件夹中的回归模型CSV文件
进行水质参数的批量预测。
"""
def __init__(self,
regression_models_dir: str = "9_Custom_Regression_Modeling",
regression_models_dir: str = "13_Custom_Regression",
formula_csv_path: Optional[str] = None,
output_dir: str = "prediction_results",
log_level: int = logging.INFO):
@ -102,7 +102,7 @@ class CustomRegressionPredictor:
def load_regression_models(self) -> Dict[str, pd.DataFrame]:
"""
加载9_Custom_Regression_Modeling文件夹中的所有CSV文件
加载13_Custom_Regression文件夹中的所有CSV文件
支持的CSV格式
- 回归结果CSV包含列y_variable, x_variable, equation, r_squared
@ -621,7 +621,7 @@ def main():
parser = argparse.ArgumentParser(description='自定义回归预测模块')
parser.add_argument('--input_csv', required=True, help='输入的光谱采样CSV文件路径')
parser.add_argument('--models_dir', default='9_Custom_Regression_Modeling',
parser.add_argument('--models_dir', default='13_Custom_Regression',
help='回归模型CSV文件目录')
parser.add_argument('--output_dir', default='prediction_results',
help='预测结果输出目录')

View File

@ -13,6 +13,7 @@ import sys
import os
from src.preprocessing.spectral_Preprocessing import Preprocessing
from src.core.utils.split_methods import spxy, ks
# try:
# from modeling import WaterQualityModeling
@ -26,18 +27,30 @@ from sklearn.model_selection import train_test_split
class WaterQualityInference:
"""水质参数反演推理类"""
def __init__(self, artifacts_dir: str = "models/artifacts"):
def __init__(self, artifacts_dir: str = "models/artifacts",
external_model=None, external_model_path=None):
"""
初始化推理类
Args:
artifacts_dir: 模型保存目录
external_model: 外部预训练模型对象(来自 GUI 导入,跳过磁盘加载)
external_model_path: 外部模型文件路径(仅用于日志)
"""
self.artifacts_dir = Path(artifacts_dir)
if not self.artifacts_dir.exists():
print(f"警告: 模型目录不存在: {artifacts_dir},将在需要时创建")
self.best_model_info = None
self.external_model = external_model
self.external_model_path = external_model_path
# 规范化 loaded_model_data始终为 dict确保 ['model'] 访问不崩溃
if external_model is not None:
# 外部传入的是裸模型对象 → 包装为 dict统一后续 .get('model') 访问
self.loaded_model_data = {'model': external_model, 'preprocess_method': 'None'}
print(f" 外部模型已规范化: type={type(external_model).__name__}")
else:
self.loaded_model_data = None
def load_sampling_data(self, csv_path: str) -> Tuple[pd.DataFrame, pd.DataFrame, pd.DataFrame]:
@ -126,159 +139,12 @@ class WaterQualityInference:
return X_train, X_test, y_train, y_test
def spxy(self, data, label, test_size=0.2):
"""
SPXY算法划分数据集考虑X和Y空间的距离
Args:
data: shape (n_samples, n_features)
label: shape (n_samples, )
test_size: 测试集比例,默认: 0.2
Returns:
X_train: (n_samples, n_features)
X_test: (n_samples, n_features)
y_train: (n_samples, )
y_test: (n_samples, )
"""
# 确保 data 和 label 是 NumPy 数组
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
# 备份原始数据和标签
x_backup = data
y_backup = label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
# 归一化标签数据
label = (label - np.mean(label)) / np.std(label)
D = np.zeros((M, M))
Dy = np.zeros((M, M))
# 计算样本之间的距离
for i in range(M - 1):
xa = data[i, :]
ya = label[i]
for j in range((i + 1), M):
xb = data[j, :]
yb = label[j]
D[i, j] = np.linalg.norm(xa - xb)
Dy[i, j] = np.linalg.norm(ya - yb)
# 距离归一化
Dmax = np.max(D)
Dymax = np.max(Dy)
D = D / Dmax + Dy / Dymax
# 找到最远的两个点
maxD = D.max(axis=0)
index_row = D.argmax(axis=0)
index_column = maxD.argmax()
m = np.zeros(N, dtype=int)
m[0] = index_row[index_column]
m[1] = index_column
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
# 根据距离选择训练集
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros(M - i)
for j in range(M - i):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(samples, m)
# 划分训练集和测试集
X_train = data[m, :]
y_train = y_backup[m]
X_test = data[m_complement, :]
y_test = y_backup[m_complement]
return X_train, X_test, y_train, y_test
"""SPXY算法划分数据集委托至 src.core.utils.split_methods.spxy"""
return spxy(data, label, test_size=test_size)
def ks(self, data, label, test_size=0.2):
"""
Kennard-Stone算法划分数据集
Args:
data: shape (n_samples, n_features)
label: shape (n_sample, )
test_size: 测试集比例,默认: 0.2
Returns:
X_train: (n_samples, n_features)
X_test: (n_samples, n_features)
y_train: (n_samples, )
y_test: (n_samples, )
"""
# 确保 data 和 label 是 NumPy 数组
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
D = np.zeros((M, M))
for i in range((M - 1)):
xa = data[i, :]
for j in range((i + 1), M):
xb = data[j, :]
D[i, j] = np.linalg.norm(xa - xb)
maxD = np.max(D, axis=0)
index_row = np.argmax(D, axis=0)
index_column = np.argmax(maxD)
m = np.zeros(N)
m[0] = np.array(index_row[index_column])
m[1] = np.array(index_column)
m = m.astype(int)
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros((M - i))
for j in range((M - i)):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(np.arange(data.shape[0]), m)
X_train = data[m, :]
y_train = label[m]
X_test = data[m_complement, :]
y_test = label[m_complement]
return X_train, X_test, y_train, y_test
"""Kennard-Stone算法划分数据集委托至 src.core.utils.split_methods.ks"""
return ks(data, label, test_size=test_size)
def split_data(self, X: np.ndarray, y: pd.Series, method: str = "random",
test_size: float = 0.2, random_state: int = 42) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
@ -745,7 +611,10 @@ class WaterQualityInference:
# 1. 加载模型
print("\n步骤1: 加载模型")
print("-" * 40)
if model_file_path:
if self.external_model is not None:
# 已在 __init__ 中规范化,无需重复赋值
print(f" 使用外部预训练模型: type={type(self.external_model).__name__}")
elif model_file_path:
self.load_specific_model(model_file_path)
else:
self.load_best_model(metric=metric)
@ -793,8 +662,8 @@ class WaterQualityInference:
info = {
"status": "model_loaded",
"preprocess_method": self.loaded_model_data['preprocess_method'],
"model_name": self.loaded_model_data['model_name'],
"preprocess_method": self.loaded_model_data.get('preprocess_method', 'Unknown'),
"model_name": self.loaded_model_data.get('model_name', type(self.external_model).__name__ if self.external_model else 'Unknown'),
"model_type": str(type(self.loaded_model_data['model'])),
"metadata": self.loaded_model_data.get('metadata', {})
}
@ -866,7 +735,10 @@ class WaterQualityInference:
def batch_inference_multi_models(self, models_root_dir: str, sampling_csv_path: str,
output_dir: str, metric: str = 'test_r2',
prediction_column: str = 'prediction',
output_format: str = 'csv'):
output_format: str = 'csv',
external_model=None,
external_model_path=None,
external_models_dict=None):
"""
使用多个子文件夹中的模型进行批量推理
@ -882,27 +754,61 @@ class WaterQualityInference:
output_path = Path(output_dir)
output_path.mkdir(parents=True, exist_ok=True)
# 查找所有子文件夹
subdirs = [d for d in models_root.iterdir() if d.is_dir()]
if not subdirs:
print(f"在目录 {models_root_dir} 中未找到子文件夹")
return
print(f"找到 {len(subdirs)} 个模型子文件夹进行批量推理")
print(f"输出格式: {output_format.upper()}")
all_results = {}
for subdir in subdirs:
# 优先级 1_external_models_dict 非空 → 直接用字典的 keys 作为 targets不扫描磁盘
print(f"[BatchInference] 终于收到字典啦!包含模型: {list(external_models_dict.keys()) if external_models_dict else 'None'}")
if external_models_dict is not None and len(external_models_dict) > 0:
targets = list(external_models_dict.keys())
print(f"\n使用外部导入模型字典({len(targets)} 个模型)")
print(f"检测到外部导入模型,将预测以下参数: {targets}")
elif external_model is not None:
print(f"\n使用外部预训练模型: {external_model_path or 'unknown'}")
subdirs = [d for d in models_root.iterdir() if d.is_dir()]
if not subdirs:
print(f"在目录 {models_root_dir} 中未找到子文件夹")
return {}
print(f"找到 {len(subdirs)} 个模型子文件夹进行批量推理")
targets = [d.name for d in subdirs]
else:
subdirs = [d for d in models_root.iterdir() if d.is_dir()]
if not subdirs:
print(f"在目录 {models_root_dir} 中未找到子文件夹")
return {}
print(f"找到 {len(subdirs)} 个模型子文件夹进行批量推理")
targets = [d.name for d in subdirs]
print(f"输出格式: {output_format.upper()}")
for subdir_name in targets:
try:
subdir_name = subdir.name
print(f"\n{'='*60}")
print(f"处理模型文件夹: {subdir_name}")
print(f"处理模型: {subdir_name}")
print(f"{'='*60}")
# 创建新的推理实例使用当前子文件夹作为artifacts_dir
model_inferencer = WaterQualityInference(str(subdir))
# 优先级:字典中该 target 的模型 > 共享单模型 > 磁盘加载
effective_model = None
if external_models_dict and subdir_name in external_models_dict:
effective_model = external_models_dict[subdir_name]
print(f" → 使用字典中模型: {type(effective_model).__name__}")
elif external_model is not None:
effective_model = external_model
print(f" → 使用共享外部模型: {type(effective_model).__name__}")
# artifacts_dir字典模式优先用 placeholder "./",否则用真实子目录
artifacts_dir = (
str(models_root / subdir_name)
if (models_root / subdir_name).is_dir()
else str(models_root)
)
if effective_model is not None:
model_inferencer = WaterQualityInference(
artifacts_dir,
external_model=effective_model,
external_model_path=external_model_path or "",
)
else:
model_inferencer = WaterQualityInference(artifacts_dir)
# 根据输出格式设置文件扩展名
file_ext = f".{output_format}"
@ -931,10 +837,10 @@ class WaterQualityInference:
}
}
print(f"子文件夹 {subdir_name} 处理完成")
print(f"模型 {subdir_name} 处理完成")
except Exception as e:
print(f"处理子文件夹 {subdir_name} 失败: {e}")
print(f"处理模型 {subdir_name} 失败: {e}")
all_results[subdir_name] = {
'status': 'error',
'error': str(e)

View File

@ -24,6 +24,7 @@ from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold, train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score
from sklearn.cross_decomposition import PLSRegression
from src.core.utils.split_methods import spxy, ks
# 第三方模型导入
# try:
@ -256,133 +257,12 @@ class WaterQualityScatterBatch:
return X_train, X_test, y_train, y_test
def spxy(self, data, label, test_size=0.2):
"""SPXY算法划分数据集"""
# 确保 data 和 label 是 NumPy 数组
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
# 备份原始数据和标签
x_backup = data
y_backup = label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
# 归一化标签数据
label = (label - np.mean(label)) / np.std(label)
D = np.zeros((M, M))
Dy = np.zeros((M, M))
# 计算样本之间的距离
for i in range(M - 1):
xa = data[i, :]
ya = label[i]
for j in range((i + 1), M):
xb = data[j, :]
yb = label[j]
D[i, j] = np.linalg.norm(xa - xb)
Dy[i, j] = np.linalg.norm(ya - yb)
# 距离归一化
Dmax = np.max(D)
Dymax = np.max(Dy)
D = D / Dmax + Dy / Dymax
# 找到最远的两个点
maxD = D.max(axis=0)
index_row = D.argmax(axis=0)
index_column = maxD.argmax()
m = np.zeros(N, dtype=int)
m[0] = index_row[index_column]
m[1] = index_column
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
# 根据距离选择训练集
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros(M - i)
for j in range(M - i):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(samples, m)
# 划分训练集和测试集
X_train = data[m, :]
y_train = y_backup[m]
X_test = data[m_complement, :]
y_test = y_backup[m_complement]
return X_train, X_test, y_train, y_test
"""SPXY算法划分数据集(委托至 src.core.utils.split_methods.spxy"""
return spxy(data, label, test_size=test_size)
def ks(self, data, label, test_size=0.2):
"""Kennard-Stone算法划分数据集"""
# 确保 data 和 label 是 NumPy 数组
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
D = np.zeros((M, M))
for i in range((M - 1)):
xa = data[i, :]
for j in range((i + 1), M):
xb = data[j, :]
D[i, j] = np.linalg.norm(xa - xb)
maxD = np.max(D, axis=0)
index_row = np.argmax(D, axis=0)
index_column = np.argmax(maxD)
m = np.zeros(N)
m[0] = np.array(index_row[index_column])
m[1] = np.array(index_column)
m = m.astype(int)
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros((M - i))
for j in range((M - i)):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(np.arange(data.shape[0]), m)
X_train = data[m, :]
y_train = label[m]
X_test = data[m_complement, :]
y_test = label[m_complement]
return X_train, X_test, y_train, y_test
"""Kennard-Stone算法划分数据集(委托至 src.core.utils.split_methods.ks"""
return ks(data, label, test_size=test_size)
def split_data(self, X: np.ndarray, y: pd.Series, method: str = "random",
test_size: float = 0.2, random_state: int = 42) -> Tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:

View File

@ -2,7 +2,7 @@
"""
数据准备步骤
包含 step4_process_csv, step5_extract_training_spectra, step5_5_calculate_water_quality_indices
包含 step5_process_csv, step6_extract_spectra, step5_5_calculate_water_quality_indices
"""
import time
@ -21,7 +21,7 @@ class DataPreparationStep:
@staticmethod
def process_csv(
csv_path: str,
output_dir: Union[str, Path] = "./4_processed_data",
output_dir: Union[str, Path] = "./5_Data_Cleaning",
callback: Optional[Callable] = None,
) -> str:
"""处理CSV文件筛选剔除异常值"""
@ -61,7 +61,7 @@ class DataPreparationStep:
boundary_path: Optional[str] = None,
glint_mask_path: Optional[str] = None,
water_mask_path: Optional[str] = None,
output_dir: Union[str, Path] = "./5_training_spectra",
output_dir: Union[str, Path] = "./6_Spectral_Feature_Extraction",
callback: Optional[Callable] = None,
) -> str:
"""根据采样点坐标在去耀斑影像中提取平均光谱"""
@ -126,12 +126,12 @@ class DataPreparationStep:
@staticmethod
def calculate_water_quality_indices(
training_spectra_path: Optional[str] = None,
training_csv_path: Optional[str] = None,
formula_csv_file: Optional[str] = None,
formula_names: Optional[List[str]] = None,
output_file: Optional[str] = None,
enabled: bool = True,
output_dir: Union[str, Path] = "./6_water_quality_indices",
output_dir: Union[str, Path] = "./7_Water_Quality_Indices",
callback: Optional[Callable] = None,
) -> Optional[str]:
"""根据训练光谱计算水质光谱指数(使用 band_math 方法)"""
@ -153,15 +153,15 @@ class DataPreparationStep:
notify("skipped", "跳过水质指数计算")
return None
if training_spectra_path is None:
raise ValueError("必须提供 training_spectra_path 参数")
if training_csv_path is None:
raise ValueError("必须提供 training_csv_path 参数")
if formula_csv_file is None:
raise ValueError("必须提供 formula_csv_file 参数")
if output_file:
output_path = str(Path(output_file))
else:
output_path = str(output_dir / "water_quality_indices.csv")
output_path = str(output_dir / "training_spectra_indices.csv")
if Path(output_path).exists():
print(f"检测到已存在的水质指数文件,直接使用: {output_path}")
@ -170,7 +170,7 @@ class DataPreparationStep:
from src.utils.band_math import BandMathCalculator
calculator = BandMathCalculator(training_spectra_path)
calculator = BandMathCalculator(training_csv_path)
result_df = calculator.process_formulas_from_csv(
formula_csv_file=formula_csv_file,
formula_names=formula_names,

View File

@ -28,7 +28,7 @@ class GlintDetectionStep:
max_area: Optional[int] = None,
buffer_size: Optional[int] = None,
water_mask_path: Optional[str] = None,
glint_dir: Union[str, Path] = "./2_glint",
glint_dir: Union[str, Path] = "./2_Glint_Detection",
callback: Optional[callable] = None,
) -> str:
"""

View File

@ -135,7 +135,7 @@ class ModelingStep:
split_methods: Optional[List[str]] = None,
cv_folds: int = 5,
training_csv_path: Optional[str] = None,
output_dir: Union[str, Path] = "./7_Supervised_Model_Training",
output_dir: Union[str, Path] = "./8_Supervised_Model_Training",
callback: Optional[Callable] = None,
_report_generator=None,
) -> str:
@ -251,7 +251,7 @@ class ModelingStep:
if output_dir is not None:
non_empirical_dir = Path(output_dir)
else:
non_empirical_dir = Path.cwd() / "8_Regression_Modeling"
non_empirical_dir = Path.cwd() / "8_Non_Empirical_Regression"
non_empirical_dir.mkdir(parents=True, exist_ok=True)
if preprocessing_methods is None:
@ -362,7 +362,7 @@ class ModelingStep:
raise ValueError(f"因变量列不存在: {missing_y}")
if output_dir is None:
custom_regression_dir = Path(work_dir) / "9_Custom_Regression_Modeling"
custom_regression_dir = Path(work_dir) / "13_Custom_Regression"
else:
custom_regression_dir = Path(work_dir) / output_dir
custom_regression_dir.mkdir(parents=True, exist_ok=True)
@ -430,7 +430,7 @@ def _apply_preprocessing_internal(
save_path = None
if preprocess_method == "SS":
models_dir = output_dir.parent.parent / "7_Supervised_Model_Training"
models_dir = output_dir.parent.parent / "8_Supervised_Model_Training"
models_dir.mkdir(parents=True, exist_ok=True)
save_path = str(models_dir / "scaler_params.pkl")
print(f"SS预处理: scaler模型将保存到 {save_path}")

View File

@ -24,8 +24,9 @@ class PredictionStep:
chunk_size: int = 1000,
water_mask_path: Optional[str] = None,
glint_mask_path: Optional[str] = None,
output_dir: Union[str, Path] = "./10_sampling",
output_dir: Union[str, Path] = "./4_sampling",
callback: Optional[Callable] = None,
use_adaptive_sampling: bool = True,
) -> str:
"""生成水域掩膜内且耀斑掩膜外的采样点,统计平均光谱"""
from pathlib import Path
@ -83,10 +84,14 @@ class PredictionStep:
if glint_mask_to_use is None:
print("未检测到耀斑掩膜,将在采样点生成时不做耀斑区域剔除。")
# 传递极度安全的 deglint_img_str 进底层
# 传递极度安全的 deglint_img_str 进底层(关键字传参,避免 positional 参数顺序陷阱)
get_spectral_sampling_points_chunked(
deglint_img_str, water_mask_path, glint_mask_to_use,
output_path, interval, sample_radius, chunk_size
output_path,
interval=interval,
sample_radius=sample_radius,
chunk_size=chunk_size,
use_adaptive_sampling=use_adaptive_sampling,
)
notify("completed", f"采样点光谱数据已保存: {output_path}")
@ -100,9 +105,13 @@ class PredictionStep:
models_dir: Optional[str] = None,
metric: str = "test_r2",
prediction_column: str = "prediction",
output_dir: Union[str, Path] = "./11_12_13_predictions/Machine_Learning_Prediction",
output_dir: Union[str, Path] = "./9_ML_Prediction",
callback: Optional[Callable] = None,
_report_generator=None,
_external_model=None,
_external_model_path=None,
_external_models_dict=None,
_external_model_dir=None,
) -> Dict[str, str]:
"""将训练好的最佳机器学习模型应用到采样点光谱上,预测水质参数"""
from src.core.prediction.inference_batch import WaterQualityInference
@ -114,6 +123,8 @@ class PredictionStep:
print("\n" + "=" * 80)
print("步骤8: 预测水质参数")
print("=" * 80)
print(f"[PredictionStep] 准备执行预测,字典状态: {'Yes' if _external_models_dict else 'No'}"
f", 单模型状态: {'Yes' if _external_model else 'No'}")
step_start_time = time.time()
@ -149,7 +160,44 @@ class PredictionStep:
else:
print(f"检测到部分预测结果文件,缺少: {missing_targets},将继续生成...")
inferencer = WaterQualityInference(models_dir)
all_results = {}
if _external_models_dict:
# 外部模型字典优先:直接用字典的 keys 作为 targets 列表,
# 手动为每个模型创建 inference 实例并调用 inference_pipeline。
print(f"\n使用外部导入模型字典({len(_external_models_dict)} 个模型)...")
for target_name, model_obj in _external_models_dict.items():
try:
output_file = ml_prediction_dir / f"{target_name}.csv"
model_inferencer = WaterQualityInference(
models_dir or "./",
external_model=model_obj,
external_model_path=_external_model_dir or "",
)
predictions, result_df = model_inferencer.inference_pipeline(
sampling_csv_path=sampling_csv_path,
output_csv_path=str(output_file),
metric=metric,
prediction_column=prediction_column,
)
prediction_files[target_name] = str(output_file)
all_results[target_name] = {
"status": "success",
"output_file": str(output_file),
"sample_count": len(predictions),
}
print(f"{target_name}: {len(predictions)} 个预测值")
except Exception as e:
print(f"{target_name}: 失败 — {type(e).__name__}: {e}")
prediction_files[target_name] = None
all_results[target_name] = {"status": "error", "error": str(e)}
else:
# 字典为空或不存在:回退到扫描 models_dir 子目录的传统逻辑
inferencer = WaterQualityInference(
models_dir,
external_model=_external_model,
external_model_path=_external_model_path,
)
all_results = inferencer.batch_inference_multi_models(
models_root_dir=models_dir,
sampling_csv_path=sampling_csv_path,
@ -157,8 +205,12 @@ class PredictionStep:
metric=metric,
prediction_column=prediction_column,
output_format="csv",
external_model=_external_model,
external_model_path=_external_model_path,
external_models_dict=_external_models_dict,
)
# batch_inference_multi_models 已确保返回字典,永不返回 None
if all_results:
for target_name, result in all_results.items():
if result.get("status") == "success":
prediction_files[target_name] = result["output_file"]
@ -207,7 +259,7 @@ class PredictionStep:
if non_empirical_models_dir is not None:
final_models_dir = non_empirical_models_dir
else:
default_models_dir = str(Path(work_dir) / "8_Regression_Modeling")
default_models_dir = str(Path(work_dir) / "8_Non_Empirical_Regression")
if Path(default_models_dir).exists():
final_models_dir = default_models_dir
else:
@ -311,14 +363,14 @@ class PredictionStep:
if custom_regression_dir is not None:
final_regression_dir = custom_regression_dir
else:
final_regression_dir = str(Path(work_dir) / "9_Custom_Regression_Modeling")
final_regression_dir = str(Path(work_dir) / "13_Custom_Regression")
if not Path(final_regression_dir).exists():
raise ValueError(
"请先执行步骤6.75: 自定义回归分析,或提供 custom_regression_dir 参数"
)
if output_dir is None:
custom_regression_prediction_dir = Path(work_dir) / "11_12_13_predictions" / "Custom_Regression_Prediction"
custom_regression_prediction_dir = Path(work_dir) / "13_Custom_Regression" / "Custom_Regression_Prediction"
custom_regression_prediction_dir.mkdir(parents=True, exist_ok=True)
prediction_output_dir = str(custom_regression_prediction_dir)
else:

View File

@ -0,0 +1,158 @@
# -*- coding: utf-8 -*-
"""
数据集划分算法 —— SPXY / Kennard-Stone
从 modeling_batch.py / inference_batch.py / sctter_batch.py 中抽离,
消除三处完全相同的重复实现。
"""
import numpy as np
import pandas as pd
def spxy(data, label, test_size=0.2):
"""
SPXY算法划分数据集考虑X和Y空间的距离
Args:
data: shape (n_samples, n_features) —— np.ndarray 或 pd.DataFrame
label: shape (n_samples, ) —— np.ndarray 或 pd.Series
test_size: 测试集比例,默认: 0.2
Returns:
X_train: (n_samples, n_features)
X_test: (n_samples, n_features)
y_train: (n_samples, )
y_test: (n_samples, )
"""
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
x_backup = data
y_backup = label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
label = (label - np.mean(label)) / np.std(label)
D = np.zeros((M, M))
Dy = np.zeros((M, M))
for i in range(M - 1):
xa = data[i, :]
ya = label[i]
for j in range((i + 1), M):
xb = data[j, :]
yb = label[j]
D[i, j] = np.linalg.norm(xa - xb)
Dy[i, j] = np.linalg.norm(ya - yb)
Dmax = np.max(D)
Dymax = np.max(Dy)
D = D / Dmax + Dy / Dymax
maxD = D.max(axis=0)
index_row = D.argmax(axis=0)
index_column = maxD.argmax()
m = np.zeros(N, dtype=int)
m[0] = index_row[index_column]
m[1] = index_column
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros(M - i)
for j in range(M - i):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(samples, m)
X_train = data[m, :]
y_train = y_backup[m]
X_test = data[m_complement, :]
y_test = y_backup[m_complement]
return X_train, X_test, y_train, y_test
def ks(data, label, test_size=0.2):
"""
Kennard-Stone算法划分数据集
Args:
data: shape (n_samples, n_features) —— np.ndarray 或 pd.DataFrame
label: shape (n_samples, ) —— np.ndarray 或 pd.Series
test_size: 测试集比例,默认: 0.2
Returns:
X_train: (n_samples, n_features)
X_test: (n_samples, n_features)
y_train: (n_samples, )
y_test: (n_samples, )
"""
data = data.to_numpy() if isinstance(data, pd.DataFrame) else data
label = label.to_numpy() if isinstance(label, pd.Series) else label
M = data.shape[0]
N = round((1 - test_size) * M)
samples = np.arange(M)
D = np.zeros((M, M))
for i in range((M - 1)):
xa = data[i, :]
for j in range((i + 1), M):
xb = data[j, :]
D[i, j] = np.linalg.norm(xa - xb)
maxD = np.max(D, axis=0)
index_row = np.argmax(D, axis=0)
index_column = np.argmax(maxD)
m = np.zeros(N)
m[0] = np.array(index_row[index_column])
m[1] = np.array(index_column)
m = m.astype(int)
dminmax = np.zeros(N)
dminmax[1] = D[m[0], m[1]]
for i in range(2, N):
pool = np.delete(samples, m[:i])
dmin = np.zeros((M - i))
for j in range((M - i)):
indexa = pool[j]
d = np.zeros(i)
for k in range(i):
indexb = m[k]
if indexa < indexb:
d[k] = D[indexa, indexb]
else:
d[k] = D[indexb, indexa]
dmin[j] = np.min(d)
dminmax[i] = np.max(dmin)
index = np.argmax(dmin)
m[i] = pool[index]
m_complement = np.delete(np.arange(data.shape[0]), m)
X_train = data[m, :]
y_train = label[m]
X_test = data[m_complement, :]
y_test = label[m_complement]
return X_train, X_test, y_train, y_test

View File

@ -16,12 +16,12 @@ def generate_glint_deglint_previews(
output_dir: Optional[str] = None
) -> Dict[str, str]:
"""
生成2_glint和3_deglint文件夹中影像文件的PNG预览图
生成2_Glint_Detection和3_deglint文件夹中影像文件的PNG预览图
Args:
work_dir: 工作目录
output_subdir: 输出子目录名称
generate_glint: 是否处理2_glint文件夹
generate_glint: 是否处理2_Glint_Detection文件夹
generate_deglint: 是否处理3_deglint文件夹
output_dir: 输出目录None则使用默认

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,283 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
工作空间管理器
负责工作目录文件扫描、步骤输出路径发现、配置裁剪等业务逻辑,
与 GUI 组件解耦,不直接引用任何 UI 类。
"""
import copy
from pathlib import Path
from src.gui.core.event_bus import global_event_bus
class WorkspaceManager:
"""管理步骤默认输出路径、文件扫描与配置裁剪"""
# 白名单:科学数据格式后缀
SCIENTIFIC_EXTENSIONS = {'.dat', '.tif', '.tiff', '.shp'}
# 临时文件关键词黑名单
TMP_KEYWORDS = ('__tmp', '_tmp')
# 掩膜类型集合
MASK_TYPES = {'water_mask', 'glint_mask', 'boundary_mask'}
def __init__(self):
self.step_default_outputs = {
'step1': {'water_mask': [
"1_water_mask/water_mask_out.dat",
"1_water_mask/water_mask_from_ndwi.dat",
"1_water_mask/water_mask_from_shp.dat",
]},
'step2': {'glint_mask': "2_Glint_Detection/severe_glint_area.dat"},
'step3': {'deglint_image': [
"3_deglint/deglint_image.bsq",
"3_deglint/deglint_goodman.bsq",
]},
'step4_sampling': {'sampling_points': "4_sampling/sampling_spectra.csv"},
'step5_clean': {'processed_data': "5_Data_Cleaning/processed_data.csv"},
'step6_feature': {'training_spectra': "6_Spectral_Feature_Extraction/training_spectra.csv"},
'step7_index': {'training_spectra_indices': "7_Water_Quality_Indices/training_spectra_indices.csv"},
'step8_ml_train': {'Supervised_Model_Training': "8_Supervised_Model_Training/"},
'step9_ml_predict': {'9_ML_Prediction': "9_ML_Prediction/"},
'step10_watercolor': {'WaterIndex_Images': "10_WaterIndex_Images/"},
'step11_map': {'14_visualization': "14_visualization/"},
}
self.step_outputs = {}
def _publish_outputs(self, step_id: str, outputs: dict):
"""将发现的产出发布到 EventBus。
Args:
step_id: 面板 step_id'step1', 'step5_clean'
outputs: {output_type: path_str}
"""
for output_type, path in outputs.items():
if path:
global_event_bus.publish('OutputUpdated', {
'step_id': step_id,
'output_type': output_type,
'path': path,
})
@staticmethod
def _is_scientific_mask(path_str):
"""白名单判断:只有 .dat .tif .tiff .shp 才算科学数据格式"""
p = Path(path_str)
name_lower = str(path_str).lower()
if any(kw in name_lower for kw in WorkspaceManager.TMP_KEYWORDS):
return False
return p.suffix.lower() in WorkspaceManager.SCIENTIFIC_EXTENSIONS
def find_step_output(self, work_path, step_id, output_type, ref_img_path=None):
"""查找指定步骤的输出文件
Args:
work_path: 工作目录 Path 对象
step_id: 步骤 ID
output_type: 输出类型(如 'water_mask', 'deglint_image' 等)
ref_img_path: 参考影像路径(仅 output_type='reference_img' 时需要)
Returns:
找到的文件路径字符串,或 None
"""
if step_id not in self.step_default_outputs:
return None
raw = self.step_default_outputs[step_id]
rel_path = None
if isinstance(raw, str):
rel_path = raw
elif isinstance(raw, dict):
rel_path = raw.get(output_type) or list(raw.values())[0]
if not rel_path:
return None
# 特殊处理:从 step_outputs 记录中查找实际输出路径
if step_id in self.step_outputs:
actual_outputs = self.step_outputs[step_id]
if output_type in actual_outputs:
candidate = actual_outputs[output_type]
if output_type in self.MASK_TYPES and not self._is_scientific_mask(candidate):
pass
else:
return candidate
if output_type == 'water_mask':
if isinstance(rel_path, list):
for candidate in rel_path:
mask_path = work_path / candidate
if mask_path.exists():
return str(mask_path)
elif rel_path:
mask_path = work_path / rel_path
if mask_path.exists():
return str(mask_path)
elif output_type == 'reference_img':
if ref_img_path and Path(ref_img_path).exists():
return ref_img_path
elif output_type == 'deglint_image':
if isinstance(rel_path, list):
for candidate in rel_path:
deglint_path = work_path / candidate
if deglint_path.exists():
return str(deglint_path)
elif rel_path:
deglint_path = work_path / rel_path
if deglint_path.exists():
return str(deglint_path)
deglint_dir = work_path / "3_deglint"
if deglint_dir.exists():
for file_path in deglint_dir.glob("deglint_*.bsq"):
return str(file_path)
for file_path in deglint_dir.glob("interpolated_*.bsq"):
return str(file_path)
elif isinstance(rel_path, str):
if rel_path.endswith('/'):
output_path = work_path / rel_path.rstrip('/')
if output_path.exists() and output_path.is_dir():
return str(output_path)
else:
output_path = work_path / rel_path
if output_path.exists():
return str(output_path)
return None
def scan_work_directory_for_files(self, work_path):
"""扫描工作目录,自动发现各步骤的输出文件
Returns:
discovered_outputs: dict, {step_id: {output_type: path_str}}
"""
discovered_outputs = {}
subdirs = {
'1_water_mask': 'step1',
'2_Glint_Detection': 'step2',
'3_deglint': 'step3',
'5_Data_Cleaning': 'step5_clean',
'6_Spectral_Feature_Extraction': 'step6_feature',
'7_Water_Quality_Indices': 'step7_index',
'8_Supervised_Model_Training': 'step8_ml_train',
'8_Regression_Modeling': 'step8_ml_train',
'13_Custom_Regression': 'step13',
'9_ML_Prediction': 'step9_ml_predict',
'11_12_13_predictions/Non_Empirical_Prediction': 'step11_map',
'13_Custom_Regression/Custom_Regression_Prediction': 'step13',
'14_visualization': 'step13_report',
'10_geotiff_batch_rendering': 'step11_map'
}
for subdir, step_ids in subdirs.items():
subdir_path = work_path / subdir
if not subdir_path.exists():
continue
if isinstance(step_ids, str):
step_ids = [step_ids]
for file_path in subdir_path.rglob('*'):
if file_path.is_file():
file_name = file_path.name.lower()
for step_id in step_ids:
if step_id not in discovered_outputs:
discovered_outputs[step_id] = {}
if 'water_mask' in file_name and step_id == 'step1':
if self._is_scientific_mask(file_path):
discovered_outputs[step_id]['water_mask'] = str(file_path)
elif 'glint' in file_name and 'mask' in file_name and step_id == 'step2':
if self._is_scientific_mask(file_path):
discovered_outputs[step_id]['glint_mask'] = str(file_path)
elif 'deglint' in file_name and step_id == 'step3':
discovered_outputs[step_id]['deglint_image'] = str(file_path)
elif 'processed_data' in file_name and step_id == 'step4_sampling':
discovered_outputs[step_id]['processed_data'] = str(file_path)
elif 'training_spectra' in file_name and step_id == 'step5_clean':
discovered_outputs[step_id]['training_spectra'] = str(file_path)
elif 'water_quality_indices' in file_name and step_id == 'step6_feature':
discovered_outputs[step_id]['water_indices'] = str(file_path)
elif 'sampling_spectra' in file_name and step_id == 'step4_sampling':
discovered_outputs[step_id]['sampling_points'] = str(file_path)
elif file_name.endswith('.csv') and step_id in ['step9_ml_predict', 'step11_map', 'step12_viz']:
discovered_outputs[step_id]['predictions'] = str(file_path)
for step_id, outputs in discovered_outputs.items():
if step_id not in self.step_outputs:
self.step_outputs[step_id] = {}
self.step_outputs[step_id].update(outputs)
# ★ 发布 EventBus 事件,驱动下游面板自动填充
self._publish_outputs(step_id, outputs)
return discovered_outputs
def update_step_outputs(self, step_name, work_path):
"""更新指定步骤的输出路径记录并发布 EventBus 事件。"""
if step_name not in self.step_default_outputs:
return
step_outputs = self.step_default_outputs[step_name]
published = {}
for output_type, relative_path in step_outputs.items():
if isinstance(relative_path, list):
for candidate in relative_path:
output_path = work_path / candidate
if output_path.exists():
path_str = str(output_path)
self.step_outputs.setdefault(step_name, {})[output_type] = path_str
published[output_type] = path_str
break
elif '*' in relative_path:
pattern_path = work_path / relative_path.replace('*', '*')
matching_files = list(pattern_path.parent.glob(pattern_path.name))
if matching_files:
latest_file = max(matching_files, key=lambda p: p.stat().st_mtime)
path_str = str(latest_file)
self.step_outputs.setdefault(step_name, {})[output_type] = path_str
published[output_type] = path_str
else:
output_path = work_path / relative_path
if output_path.exists():
path_str = str(output_path)
self.step_outputs.setdefault(step_name, {})[output_type] = path_str
published[output_type] = path_str
if published:
self._publish_outputs(step_name, published)
@staticmethod
def prune_config_for_prediction_mode(config: dict) -> dict:
"""Prediction-only 模式:禁用训练相关步骤,保留预测和成图步骤。
被禁用的 step dict 中统一写入 'enabled': False
这些配置最终传给 PipelineRunnerRunner 会跳过它们。
同时,被跳过的步骤的 required_input_files 在 build_missing_items
中不会被检查,从而自然规避了"CSV 缺失"等训练模式下的误报。
Args:
config: 完整配置字典(来自 get_current_config
Returns:
裁剪后的 config深拷贝原 config 不被修改)
"""
cfg = copy.deepcopy(config)
training_steps = [
"step4",
"step5",
"step7",
"step6",
"step8_non_empirical_modeling",
"step9",
]
for step_id in training_steps:
step_cfg = cfg.setdefault(step_id, {})
step_cfg["enabled"] = False
return cfg

View File

@ -0,0 +1,471 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
图表与交互弹窗模块
包含 ChartViewerDialog、ChartBrowserDialog 和 InteractiveViewerDialog 类。
"""
import numpy as np
import pandas as pd
from PyQt5.QtWidgets import (
QDialog, QVBoxLayout, QHBoxLayout, QPushButton,
QSizePolicy, QFileDialog, QMessageBox, QGroupBox,
QListWidget, QLabel, QComboBox, QCheckBox,
)
from PyQt5.QtCore import Qt, QAbstractTableModel
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.backends.backend_qt5agg import NavigationToolbar2QT as NavigationToolbar
from matplotlib.figure import Figure
class ChartViewerDialog(QDialog):
"""图表查看器对话框"""
def __init__(self, title="图表查看器", parent=None):
super().__init__(parent)
self.setWindowTitle(title)
self.resize(1000, 700)
self.init_ui()
def init_ui(self):
layout = QVBoxLayout()
self.figure = Figure(figsize=(10, 7))
self.canvas = FigureCanvas(self.figure)
self.canvas.setSizePolicy(QSizePolicy.Expanding, QSizePolicy.Expanding)
self.toolbar = NavigationToolbar(self.canvas, self)
layout.addWidget(self.toolbar)
layout.addWidget(self.canvas)
btn_layout = QHBoxLayout()
self.save_btn = QPushButton("保存图表")
self.save_btn.clicked.connect(self.save_chart)
btn_layout.addWidget(self.save_btn)
btn_layout.addStretch()
self.close_btn = QPushButton("关闭")
self.close_btn.clicked.connect(self.close)
btn_layout.addWidget(self.close_btn)
layout.addLayout(btn_layout)
self.setLayout(layout)
def display_image(self, image_path):
"""显示图片"""
self.figure.clear()
ax = self.figure.add_subplot(111)
try:
import matplotlib.image as mpimg
img = mpimg.imread(image_path)
ax.imshow(img)
ax.axis('off')
self.figure.tight_layout()
self.canvas.draw()
self.current_image_path = image_path
except Exception as e:
ax.text(0.5, 0.5, f'加载图片失败:\n{str(e)}',
ha='center', va='center', transform=ax.transAxes)
self.canvas.draw()
def display_custom_plot(self, plot_func):
"""显示自定义绘图函数"""
self.figure.clear()
try:
plot_func(self.figure)
self.canvas.draw()
except Exception as e:
ax = self.figure.add_subplot(111)
ax.text(0.5, 0.5, f'绘图失败:\n{str(e)}',
ha='center', va='center', transform=ax.transAxes)
self.canvas.draw()
def save_chart(self):
"""保存图表"""
file_path, _ = QFileDialog.getSaveFileName(
self, "保存图表", "",
"PNG图片 (*.png);;JPG图片 (*.jpg);;PDF文件 (*.pdf);;所有文件 (*.*)"
)
if file_path:
try:
self.figure.savefig(file_path, dpi=300, bbox_inches='tight')
QMessageBox.information(self, "成功", f"图表已保存到:\n{file_path}")
except Exception as e:
QMessageBox.critical(self, "错误", f"保存失败:\n{str(e)}")
class ChartBrowserDialog(QDialog):
"""图表浏览器对话框"""
def __init__(self, chart_files, parent=None):
super().__init__(parent)
self.chart_files = sorted(chart_files, key=lambda x: x.stat().st_mtime, reverse=True)
self.current_index = 0
self.setWindowTitle("图表浏览器")
self.resize(1200, 800)
self.init_ui()
self.show_chart(0)
def init_ui(self):
layout = QVBoxLayout()
list_group = QGroupBox(f"图表列表 (共 {len(self.chart_files)} 个)")
list_layout = QHBoxLayout()
self.chart_list = QListWidget()
self.chart_list.setMaximumHeight(150)
for chart_file in self.chart_files:
self.chart_list.addItem(chart_file.name)
self.chart_list.currentRowChanged.connect(self.show_chart)
list_layout.addWidget(self.chart_list)
list_group.setLayout(list_layout)
layout.addWidget(list_group)
self.figure = Figure(figsize=(12, 8))
self.canvas = FigureCanvas(self.figure)
self.canvas.setSizePolicy(QSizePolicy.Expanding, QSizePolicy.Expanding)
self.toolbar = NavigationToolbar(self.canvas, self)
layout.addWidget(self.toolbar)
layout.addWidget(self.canvas, 1)
btn_layout = QHBoxLayout()
self.prev_btn = QPushButton("◀ 上一个")
self.prev_btn.clicked.connect(self.prev_chart)
btn_layout.addWidget(self.prev_btn)
self.next_btn = QPushButton("下一个 >")
self.next_btn.clicked.connect(self.next_chart)
btn_layout.addWidget(self.next_btn)
btn_layout.addStretch()
self.save_btn = QPushButton("💾 保存当前图表")
self.save_btn.clicked.connect(self.save_current_chart)
btn_layout.addWidget(self.save_btn)
self.close_btn = QPushButton("关闭")
self.close_btn.clicked.connect(self.close)
btn_layout.addWidget(self.close_btn)
layout.addLayout(btn_layout)
self.setLayout(layout)
def show_chart(self, index):
"""显示指定索引的图表"""
if 0 <= index < len(self.chart_files):
self.current_index = index
self.chart_list.setCurrentRow(index)
chart_file = self.chart_files[index]
self.figure.clear()
ax = self.figure.add_subplot(111)
try:
import matplotlib.image as mpimg
img = mpimg.imread(str(chart_file))
ax.imshow(img)
ax.axis('off')
ax.set_title(chart_file.name, fontsize=12, pad=10)
self.figure.tight_layout()
self.canvas.draw()
except Exception as e:
ax.text(0.5, 0.5, f'加载图片失败:\n{str(e)}',
ha='center', va='center', transform=ax.transAxes)
self.canvas.draw()
self.prev_btn.setEnabled(index > 0)
self.next_btn.setEnabled(index < len(self.chart_files) - 1)
def prev_chart(self):
"""上一个图表"""
if self.current_index > 0:
self.show_chart(self.current_index - 1)
def next_chart(self):
"""下一个图表"""
if self.current_index < len(self.chart_files) - 1:
self.show_chart(self.current_index + 1)
def save_current_chart(self):
"""保存当前图表"""
if 0 <= self.current_index < len(self.chart_files):
current_file = self.chart_files[self.current_index]
file_path, _ = QFileDialog.getSaveFileName(
self, "保存图表", current_file.name,
"PNG图片 (*.png);;JPG图片 (*.jpg);;所有文件 (*.*)"
)
if file_path:
try:
import shutil
shutil.copy(str(current_file), file_path)
QMessageBox.information(self, "成功", f"图表已保存到:\n{file_path}")
except Exception as e:
QMessageBox.critical(self, "错误", f"保存失败:\n{str(e)}")
class InteractiveViewerDialog(QDialog):
"""交互式影像预览对话框:显示影像、参考点散点图、点击查询坐标/值"""
def __init__(self, parent, img_path, ref_csv=None):
super().__init__(parent)
self.img_path = img_path
self.ref_csv = ref_csv
self.geotransform = None
self.fig = None
self.canvas = None
self.ax = None
self.status_label = None
self.init_ui()
def init_ui(self):
self.setWindowTitle("👁️ 交互式影像预览")
self.setMinimumSize(900, 700)
layout = QVBoxLayout()
toolbar = QHBoxLayout()
self.band_combo = QComboBox()
self.band_combo.currentIndexChanged.connect(self.on_band_changed)
toolbar.addWidget(QLabel("显示波段:"))
toolbar.addWidget(self.band_combo)
self.gray_check = QCheckBox("灰度显示")
self.gray_check.stateChanged.connect(self.on_band_changed)
toolbar.addWidget(self.gray_check)
toolbar.addStretch()
layout.addLayout(toolbar)
try:
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.figure import Figure
import matplotlib
matplotlib.use('Qt5Agg')
self.fig = Figure(figsize=(10, 8))
self.canvas = FigureCanvas(self.fig)
self.ax = self.fig.add_subplot(111)
self.fig.tight_layout()
layout.addWidget(self.canvas)
self.load_and_display()
except ImportError as e:
layout.addWidget(QLabel(f"Matplotlib 未安装: {e}"))
self.status_label = QLabel("点击影像查看像素坐标和经纬度")
self.status_label.setStyleSheet("background:#f0f0f0;padding:4px;font-size:12px;")
self.status_label.setWordWrap(True)
layout.addWidget(self.status_label)
close_btn = QPushButton("关闭")
close_btn.clicked.connect(self.close)
layout.addWidget(close_btn)
self.setLayout(layout)
def load_and_display(self):
"""加载影像并显示"""
from osgeo import gdal
dataset = gdal.Open(self.img_path)
if dataset is None:
self.status_label.setText(f"无法打开影像: {self.img_path}")
return
self.geotransform = dataset.GetGeoTransform()
self.projection = dataset.GetProjection()
n_bands = dataset.RasterCount
self.height = dataset.RasterYSize
self.width = dataset.RasterXSize
self.band_combo.clear()
if n_bands >= 3:
for i in range(1, n_bands + 1):
self.band_combo.addItem(f"RGB (B{i-0}, G{i-1}, R{i-2})" if i >= 3 else f"波段 {i}", i)
self.band_combo.addItem(f"单波段 (B1)", 0)
else:
for i in range(1, n_bands + 1):
self.band_combo.addItem(f"波段 {i}", i - 1)
self.band_combo.setCurrentIndex(0)
self.dataset = dataset
self.display_band(0, is_gray=False)
self.load_ref_points()
def display_band(self, band_idx, is_gray=False):
"""显示指定波段组合"""
from osgeo import gdal
import numpy as np
dataset = self.dataset
self.ax.clear()
if is_gray or (self.band_combo.currentData() == 0 and dataset.RasterCount == 1):
band = dataset.GetRasterBand(1 if band_idx == 0 else band_idx + 1)
data = band.ReadAsArray()
data = np.nan_to_num(data, nan=0.0)
self.ax.imshow(data, cmap='gray')
self.ax.set_title(f"波段 {band_idx + 1} (灰度)")
else:
n = min(3, dataset.RasterCount)
bands_data = []
for i in range(n):
b = dataset.GetRasterBand(i + 1)
bd = b.ReadAsArray()
bd = np.nan_to_num(bd, nan=0.0)
bands_data.append(bd)
rgb = np.dstack(bands_data)
for i in range(rgb.shape[2]):
p2, p98 = np.percentile(rgb[:, :, i], [2, 98])
if p98 > p2:
rgb[:, :, i] = np.clip((rgb[:, :, i] - p2) / (p98 - p2), 0, 1)
else:
rgb[:, :, i] = np.clip(rgb[:, :, i] / (p98 + 1e-6), 0, 1)
self.ax.imshow(rgb)
self.ax.set_title(f"RGB 显示")
self.ax.set_xlabel("列 (Column)")
self.ax.set_ylabel("行 (Row)")
self.fig.tight_layout()
self.canvas.draw()
self.cid = self.canvas.mpl_connect('button_press_event', self.on_click)
def on_band_changed(self):
"""波段选择变化时更新显示"""
if not hasattr(self, 'dataset'):
return
is_gray = self.gray_check.isChecked()
band_data = self.band_combo.currentData()
self.display_band(band_data if band_data != 0 else 0, is_gray=is_gray)
def load_ref_points(self):
"""加载并显示参考点"""
import os
if not self.ref_csv or not os.path.isfile(self.ref_csv):
return
try:
import csv
lon_list, lat_list = [], []
with open(self.ref_csv, 'r', encoding='utf-8-sig') as f:
reader = csv.DictReader(f)
for row in reader:
try:
lon = float(row.get('Lon', row.get('lon', row.get('LON', 0))))
lat = float(row.get('Lat', row.get('lat', row.get('LAT', 0))))
if lon and lat:
lon_list.append(lon)
lat_list.append(lat)
except (ValueError, TypeError):
continue
if not lon_list:
return
px_list, py_list = [], []
gt = self.geotransform
if gt and (gt[1] != 0 or gt[5] != 0):
for lon, lat in zip(lon_list, lat_list):
px = (lon - gt[0]) / gt[1]
py = (lat - gt[3]) / gt[5]
if 0 <= px < self.width and 0 <= py < self.height:
px_list.append(px)
py_list.append(py)
if px_list:
self.ax.scatter(px_list, py_list, c='red', s=40, marker='o',
edgecolors='white', linewidths=0.8, zorder=5, alpha=0.9,
label=f'参考点 ({len(px_list)}个)')
self.ax.legend(loc='upper right', fontsize=9)
self.fig.tight_layout()
self.canvas.draw()
self.status_label.setText(
f"已加载 {len(px_list)} 个参考点(仅显示在影像范围内的点)"
)
except Exception as e:
self.status_label.setText(f"加载参考点失败: {e}")
def pixel_to_geo(self, px, py):
"""像素坐标转经纬度"""
gt = self.geotransform
if gt is None:
return None, None
lon = gt[0] + px * gt[1] + py * gt[2]
lat = gt[3] + px * gt[4] + py * gt[5]
return lon, lat
def on_click(self, event):
"""鼠标点击事件"""
if event.inaxes != self.ax or event.xdata is None or event.ydata is None:
return
px, py = int(round(event.xdata)), int(round(event.ydata))
if not (0 <= px < self.width and 0 <= py < self.height):
return
from osgeo import gdal
import numpy as np
dataset = self.dataset
n_bands = dataset.RasterCount
vals = []
for b in range(1, n_bands + 1):
val = dataset.GetRasterBand(b).ReadAsArray()[py, px]
vals.append(f"{val:.4f}" if isinstance(val, float) else str(val))
lon, lat = self.pixel_to_geo(px, py)
geo_str = f"Lon={lon:.6f}, Lat={lat:.6f}" if lon is not None else "无地理参考"
self.status_label.setText(
f"像素: (行={py}, 列={px}) | {geo_str} | "
f"波段值: {' | '.join(vals[:5])}" +
(f" ... ({n_bands}波段的更多信息)" if n_bands > 5 else "")
)
class PandasTableModel(QAbstractTableModel):
"""支持DataFrame的表格模型"""
def __init__(self, data_frame: pd.DataFrame):
super().__init__()
self._data = data_frame.copy()
if self._data.empty:
self._data = pd.DataFrame()
self._data.fillna("", inplace=True)
self._columns = [str(col) for col in self._data.columns]
def rowCount(self, parent=None):
return len(self._data)
def columnCount(self, parent=None):
return len(self._columns)
def data(self, index, role=Qt.DisplayRole):
if not index.isValid() or role != Qt.DisplayRole:
return None
value = self._data.iat[index.row(), index.column()]
if pd.isna(value):
return ""
return str(value)
def headerData(self, section, orientation, role=Qt.DisplayRole):
if role != Qt.DisplayRole:
return None
if orientation == Qt.Horizontal:
if section < len(self._columns):
return self._columns[section]
return str(section)
return str(section + 1)
def flags(self, index):
if not index.isValid():
return Qt.NoItemFlags
return Qt.ItemIsEnabled | Qt.ItemIsSelectable

View File

@ -0,0 +1,50 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
数据模型模块
包含 PandasTableModel 等数据模型类。
"""
import pandas as pd
from PyQt5.QtCore import Qt, QAbstractTableModel
class PandasTableModel(QAbstractTableModel):
"""支持DataFrame的表格模型"""
def __init__(self, data_frame: pd.DataFrame):
super().__init__()
self._data = data_frame.copy()
if self._data.empty:
self._data = pd.DataFrame()
self._data.fillna("", inplace=True)
self._columns = [str(col) for col in self._data.columns]
def rowCount(self, parent=None):
return len(self._data)
def columnCount(self, parent=None):
return len(self._columns)
def data(self, index, role=Qt.DisplayRole):
if not index.isValid() or role != Qt.DisplayRole:
return None
value = self._data.iat[index.row(), index.column()]
if pd.isna(value):
return ""
return str(value)
def headerData(self, section, orientation, role=Qt.DisplayRole):
if role != Qt.DisplayRole:
return None
if orientation == Qt.Horizontal:
if section < len(self._columns):
return self._columns[section]
return str(section)
return str(section + 1)
def flags(self, index):
if not index.isValid():
return Qt.NoItemFlags
return Qt.ItemIsEnabled | Qt.ItemIsSelectable

View File

@ -0,0 +1,374 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
图像查看器组件模块
包含 ImageCategoryTree 和 ImageViewerWidget 类。
"""
import os
from pathlib import Path
from typing import Optional, List
from PyQt5.QtWidgets import (
QWidget, QVBoxLayout, QHBoxLayout, QPushButton,
QFrame, QScrollArea, QLabel, QFileDialog, QMessageBox,
QTreeWidget, QTreeWidgetItem,
)
from PyQt5.QtCore import Qt, QTimer
from PyQt5.QtGui import QPixmap
class ImageCategoryTree(QTreeWidget):
"""图像分类目录树 - 按类别组织图像文件"""
# 图像类别定义:(类别名称, 关键词列表, 图标)
CATEGORIES = [
("模型评估", ["scatter", "regression", "validation", "r2", "rmse"], "📊"),
("光谱分析", ["spectrum", "spectral", "band", "wavelength"], "📈"),
("统计图表", ["boxplot", "histogram", "heatmap", "statistics", "stats"], "📉"),
("处理结果", ["mask", "glint", "deglint", "preview", "overlay", "water_mask"], "🖼️"),
("含量分布图", [], "📁"),
]
def __init__(self, parent=None):
super().__init__(parent)
self.setHeaderLabel("图像目录")
self.setMaximumWidth(300)
self.setMinimumWidth(250)
self.setup_categories()
self.setStyleSheet("""
QTreeWidget {
border: 1px solid #ddd;
border-radius: 5px;
background-color: #f8f9fa;
}
QTreeWidget::item {
padding: 5px;
border-radius: 3px;
}
QTreeWidget::item:selected {
background-color: #0078D4;
color: white;
}
QTreeWidget::item:hover {
background-color: #e3f2fd;
}
""")
def setup_categories(self):
"""初始化类别节点"""
self.category_items = {}
for category_name, keywords, icon in self.CATEGORIES:
item = QTreeWidgetItem(self)
item.setText(0, f"{icon} {category_name}")
item.setData(0, Qt.UserRole, {"type": "category", "keywords": keywords, "name": category_name})
item.setExpanded(True)
self.category_items[category_name] = item
def clear_all_images(self):
"""清除所有图像项"""
for category_item in self.category_items.values():
# 删除所有子项
while category_item.childCount() > 0:
category_item.removeChild(category_item.child(0))
def add_image(self, file_path: Path, display_name: str = None):
"""添加图像到对应的类别"""
if display_name is None:
display_name = file_path.stem
# 根据文件名关键词确定类别
category = self._determine_category(file_path.name)
category_item = self.category_items.get(category, self.category_items["含量分布图"])
# 创建图像项
image_item = QTreeWidgetItem(category_item)
image_item.setText(0, f" └─ {display_name}")
image_item.setData(0, Qt.UserRole, {"type": "image", "path": str(file_path)})
image_item.setToolTip(0, str(file_path))
return image_item
def _determine_category(self, filename: str) -> str:
"""根据文件名确定类别"""
filename_lower = filename.lower()
for category_name, keywords, _ in self.CATEGORIES:
if any(keyword in filename_lower for keyword in keywords):
return category_name
return "含量分布图"
def scan_directory(self, work_dir: str):
"""扫描目录中的所有图像文件"""
self.clear_all_images()
work_path = Path(work_dir)
if not work_path.exists():
return
# 查找所有图像文件14_visualization 为主,同时扫描步骤产出目录(如 1_water_mask 下的预览/叠置图)
image_extensions = ['*.png', '*.jpg', '*.jpeg', '*.tif', '*.tiff', '*.bmp']
scan_roots: List[Path] = []
_viz = work_path / "14_visualization"
if _viz.is_dir():
scan_roots.append(_viz)
_wm = work_path / "1_water_mask"
if _wm.is_dir():
scan_roots.append(_wm)
if not scan_roots:
scan_roots.append(work_path)
seen_norm: set = set()
image_files: List[Path] = []
for root in scan_roots:
for ext in image_extensions:
for p in root.glob(f"**/{ext}"):
key = os.path.normcase(os.path.normpath(str(p.resolve())))
if key in seen_norm:
continue
seen_norm.add(key)
image_files.append(p)
# 添加图像到树
for img_file in sorted(image_files):
# 跳过缩略图和临时文件
if img_file.name.startswith('.') or 'thumb' in img_file.name.lower():
continue
self.add_image(img_file)
# 更新类别项文本显示数量
for category_name, item in self.category_items.items():
count = item.childCount()
if count > 0:
for cat_name, _, icon in self.CATEGORIES:
if cat_name == category_name:
item.setText(0, f"{icon} {category_name} ({count})")
break
def get_selected_image_path(self) -> Optional[str]:
"""获取当前选中的图像路径"""
selected_item = self.currentItem()
if not selected_item:
return None
data = selected_item.data(0, Qt.UserRole)
if data and data.get("type") == "image":
return data.get("path")
return None
class ImageViewerWidget(QWidget):
"""图像查看器组件 - 支持缩放、平移"""
def __init__(self, parent=None):
super().__init__(parent)
self.current_image_path = None
self.scale_factor = 1.0
self._update_timer = QTimer() # 防抖定时器
self._update_timer.setSingleShot(True)
self._update_timer.timeout.connect(self._do_update_display)
self._pending_scale = None # 待更新的缩放比例
self.setup_ui()
def setup_ui(self):
layout = QVBoxLayout()
layout.setContentsMargins(0, 0, 0, 0)
# 工具栏
toolbar = QHBoxLayout()
self.refresh_btn = QPushButton("🔄 刷新目录")
self.refresh_btn.setToolTip("重新扫描工作目录中的图像文件")
toolbar.addWidget(self.refresh_btn)
# 添加分隔线
separator = QFrame()
separator.setFrameShape(QFrame.VLine)
separator.setFrameShadow(QFrame.Sunken)
toolbar.addWidget(separator)
self.zoom_in_btn = QPushButton("🔍+")
self.zoom_in_btn.setToolTip("放大")
self.zoom_in_btn.setMaximumWidth(50)
toolbar.addWidget(self.zoom_in_btn)
self.zoom_out_btn = QPushButton("🔍-")
self.zoom_out_btn.setToolTip("缩小")
self.zoom_out_btn.setMaximumWidth(50)
toolbar.addWidget(self.zoom_out_btn)
self.fit_btn = QPushButton("⬜ 适应窗口")
self.fit_btn.setToolTip("适应窗口大小")
toolbar.addWidget(self.fit_btn)
self.original_btn = QPushButton("1:1 原始大小")
self.original_btn.setToolTip("原始大小")
toolbar.addWidget(self.original_btn)
toolbar.addStretch()
self.save_btn = QPushButton("💾 保存")
self.save_btn.setToolTip("保存当前图像")
toolbar.addWidget(self.save_btn)
layout.addLayout(toolbar)
# 图像显示区域 - 使用 QLabel + QScrollArea
self.scroll_area = QScrollArea()
self.scroll_area.setWidgetResizable(True)
self.scroll_area.setStyleSheet("background-color: white;")
self.image_label = QLabel()
self.image_label.setAlignment(Qt.AlignCenter)
self.image_label.setStyleSheet("background-color: white;")
self.scroll_area.setWidget(self.image_label)
layout.addWidget(self.scroll_area, 1)
# 状态栏
status_layout = QHBoxLayout()
self.status_label = QLabel("就绪")
self.status_label.setStyleSheet("color: #666; font-size: 11px;")
status_layout.addWidget(self.status_label)
status_layout.addStretch()
layout.addLayout(status_layout)
self.setLayout(layout)
# 连接信号
self.zoom_in_btn.clicked.connect(self.zoom_in)
self.zoom_out_btn.clicked.connect(self.zoom_out)
self.fit_btn.clicked.connect(self.fit_to_window)
self.original_btn.clicked.connect(self.original_size)
self.save_btn.clicked.connect(self.save_image)
def load_image(self, image_path: str):
"""加载并显示图像"""
if not image_path or not Path(image_path).exists():
self.image_label.setText("图像不存在")
self.status_label.setText("图像加载失败")
return
self.current_image_path = image_path
self.scale_factor = 1.0
# 加载图像
pixmap = QPixmap(image_path)
if pixmap.isNull():
self.image_label.setText("无法加载图像")
self.status_label.setText("图像格式不支持")
return
self.original_pixmap = pixmap
# 默认适应窗口显示
self.fit_to_window()
# 更新状态
file_info = Path(image_path).stat()
size_mb = file_info.st_size / (1024 * 1024)
self.status_label.setText(f"{pixmap.width()}x{pixmap.height()} | {size_mb:.2f} MB | {Path(image_path).name} | 适应窗口")
def update_image_display(self):
"""更新图像显示 - 使用防抖避免频繁重绘卡顿"""
# 取消之前的待执行更新,重新计时
self._update_timer.stop()
self._pending_scale = self.scale_factor
self._update_timer.start(50) # 50ms后执行实际更新
def _do_update_display(self):
"""实际执行图像更新"""
if not hasattr(self, 'original_pixmap') or self.original_pixmap.isNull():
return
if self._pending_scale is None:
return
# 根据缩放比例选择变换模式大幅度缩放用Fast模式提升性能
if self._pending_scale > 2.0 or self._pending_scale < 0.5:
transform = Qt.FastTransformation
else:
transform = Qt.SmoothTransformation
scaled_pixmap = self.original_pixmap.scaled(
int(self.original_pixmap.width() * self._pending_scale),
int(self.original_pixmap.height() * self._pending_scale),
Qt.KeepAspectRatio,
transform
)
self.image_label.setPixmap(scaled_pixmap)
self._pending_scale = None
def wheelEvent(self, event):
"""鼠标滚轮缩放 - 实时响应"""
delta = event.angleDelta().y()
if delta > 0:
# 向上滚动 - 放大
if self.scale_factor < 5.0:
self.scale_factor = min(self.scale_factor * 1.1, 5.0)
self.update_image_display()
else:
# 向下滚动 - 缩小
if self.scale_factor > 0.1:
self.scale_factor = max(self.scale_factor / 1.1, 0.1)
self.update_image_display()
event.accept()
def zoom_in(self):
"""放大"""
if self.scale_factor < 5.0:
self.scale_factor = min(self.scale_factor * 1.25, 5.0)
self.update_image_display()
def zoom_out(self):
"""缩小"""
if self.scale_factor > 0.1:
self.scale_factor = max(self.scale_factor / 1.25, 0.1)
self.update_image_display()
def fit_to_window(self):
"""适应窗口"""
if not hasattr(self, 'original_pixmap') or self.original_pixmap.isNull():
return
# 计算适应窗口的缩放比例
view_size = self.scroll_area.viewport().size()
img_size = self.original_pixmap.size()
scale_w = view_size.width() / img_size.width()
scale_h = view_size.height() / img_size.height()
# 记录适应前的比例(用于后续恢复参考)
self._fit_scale = min(scale_w, scale_h)
self.scale_factor = self._fit_scale
self.update_image_display()
self.status_label.setText(f"适应窗口 | 缩放: {self.scale_factor:.1%}")
def original_size(self):
"""原始大小"""
self.scale_factor = 1.0
self._fit_scale = None # 清除适应记录
self.update_image_display()
self.status_label.setText("原始大小 | 缩放: 100%")
def save_image(self):
"""保存图像"""
if not self.current_image_path:
return
file_path, _ = QFileDialog.getSaveFileName(
self, "保存图像", Path(self.current_image_path).name,
"PNG图片 (*.png);;JPG图片 (*.jpg);;所有文件 (*.*)"
)
if file_path:
try:
import shutil
shutil.copy(self.current_image_path, file_path)
except Exception as e:
QMessageBox.critical(self, "错误", f"保存失败: {e}")

View File

@ -0,0 +1,351 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
图像浏览组件模块
包含 ImageCategoryTree 和 ImageViewerWidget 类。
"""
import os
from pathlib import Path
from typing import List, Optional
from PyQt5.QtWidgets import (
QTreeWidget, QTreeWidgetItem, QWidget, QVBoxLayout, QHBoxLayout,
QPushButton, QLabel, QScrollArea, QFrame, QGroupBox,
QFileDialog, QMessageBox,
)
from PyQt5.QtCore import Qt, QTimer
from PyQt5.QtGui import QPixmap
class ImageCategoryTree(QTreeWidget):
"""图像分类目录树 - 按类别组织图像文件"""
CATEGORIES = [
("模型评估", ["scatter", "regression", "validation", "r2", "rmse"], "📊"),
("光谱分析", ["spectrum", "spectral", "band", "wavelength"], "📈"),
("统计图表", ["boxplot", "histogram", "heatmap", "statistics", "stats"], "📉"),
("处理结果", ["mask", "glint", "deglint", "preview", "overlay", "water_mask"], "🖼️"),
("含量分布图", [], "📁"),
]
def __init__(self, parent=None):
super().__init__(parent)
self.setHeaderLabel("图像目录")
self.setMaximumWidth(300)
self.setMinimumWidth(250)
self.setup_categories()
self.setStyleSheet("""
QTreeWidget {
border: 1px solid #ddd;
border-radius: 5px;
background-color: #f8f9fa;
}
QTreeWidget::item {
padding: 5px;
border-radius: 3px;
}
QTreeWidget::item:selected {
background-color: #0078D4;
color: white;
}
QTreeWidget::item:hover {
background-color: #e3f2fd;
}
""")
def setup_categories(self):
"""初始化类别节点"""
self.category_items = {}
for category_name, keywords, icon in self.CATEGORIES:
item = QTreeWidgetItem(self)
item.setText(0, f"{icon} {category_name}")
item.setData(0, Qt.UserRole, {"type": "category", "keywords": keywords, "name": category_name})
item.setExpanded(True)
self.category_items[category_name] = item
def clear_all_images(self):
"""清除所有图像项"""
for category_item in self.category_items.values():
while category_item.childCount() > 0:
category_item.removeChild(category_item.child(0))
def add_image(self, file_path: Path, display_name: str = None):
"""添加图像到对应的类别"""
if display_name is None:
display_name = file_path.stem
category = self._determine_category(file_path.name)
category_item = self.category_items.get(category, self.category_items["含量分布图"])
image_item = QTreeWidgetItem(category_item)
image_item.setText(0, f" └─ {display_name}")
image_item.setData(0, Qt.UserRole, {"type": "image", "path": str(file_path)})
image_item.setToolTip(0, str(file_path))
return image_item
def _determine_category(self, filename: str) -> str:
"""根据文件名确定类别"""
filename_lower = filename.lower()
for category_name, keywords, _ in self.CATEGORIES:
if any(keyword in filename_lower for keyword in keywords):
return category_name
return "含量分布图"
def scan_directory(self, work_dir: str):
"""扫描目录中的所有图像文件"""
self.clear_all_images()
work_path = Path(work_dir)
if not work_path.exists():
return
image_extensions = ['*.png', '*.jpg', '*.jpeg', '*.tif', '*.tiff', '*.bmp']
scan_roots: List[Path] = []
_viz = work_path / "14_visualization"
if _viz.is_dir():
scan_roots.append(_viz)
_wm = work_path / "1_water_mask"
if _wm.is_dir():
scan_roots.append(_wm)
if not scan_roots:
scan_roots.append(work_path)
seen_norm: set = set()
image_files: List[Path] = []
for root in scan_roots:
for ext in image_extensions:
for p in root.glob(f"**/{ext}"):
key = os.path.normcase(os.path.normpath(str(p.resolve())))
if key in seen_norm:
continue
seen_norm.add(key)
image_files.append(p)
for img_file in sorted(image_files):
if img_file.name.startswith('.') or 'thumb' in img_file.name.lower():
continue
self.add_image(img_file)
for category_name, item in self.category_items.items():
count = item.childCount()
if count > 0:
for cat_name, _, icon in self.CATEGORIES:
if cat_name == category_name:
item.setText(0, f"{icon} {category_name} ({count})")
break
def get_selected_image_path(self) -> Optional[str]:
"""获取当前选中的图像路径"""
selected_item = self.currentItem()
if not selected_item:
return None
data = selected_item.data(0, Qt.UserRole)
if data and data.get("type") == "image":
return data.get("path")
return None
class ImageViewerWidget(QWidget):
"""图像查看器组件 - 支持缩放、平移"""
def __init__(self, parent=None):
super().__init__(parent)
self.current_image_path = None
self.scale_factor = 1.0
self._update_timer = QTimer()
self._update_timer.setSingleShot(True)
self._update_timer.timeout.connect(self._do_update_display)
self._pending_scale = None
self.setup_ui()
def setup_ui(self):
layout = QVBoxLayout()
layout.setContentsMargins(0, 0, 0, 0)
toolbar = QHBoxLayout()
self.refresh_btn = QPushButton("🔄 刷新目录")
self.refresh_btn.setToolTip("重新扫描工作目录中的图像文件")
toolbar.addWidget(self.refresh_btn)
separator = QFrame()
separator.setFrameShape(QFrame.VLine)
separator.setFrameShadow(QFrame.Sunken)
toolbar.addWidget(separator)
self.zoom_in_btn = QPushButton("🔍+")
self.zoom_in_btn.setToolTip("放大")
self.zoom_in_btn.setMaximumWidth(50)
toolbar.addWidget(self.zoom_in_btn)
self.zoom_out_btn = QPushButton("🔍-")
self.zoom_out_btn.setToolTip("缩小")
self.zoom_out_btn.setMaximumWidth(50)
toolbar.addWidget(self.zoom_out_btn)
self.fit_btn = QPushButton("⬜ 适应窗口")
self.fit_btn.setToolTip("适应窗口大小")
toolbar.addWidget(self.fit_btn)
self.original_btn = QPushButton("1:1 原始大小")
self.original_btn.setToolTip("原始大小")
toolbar.addWidget(self.original_btn)
toolbar.addStretch()
self.save_btn = QPushButton("💾 保存")
self.save_btn.setToolTip("保存当前图像")
toolbar.addWidget(self.save_btn)
layout.addLayout(toolbar)
self.scroll_area = QScrollArea()
self.scroll_area.setWidgetResizable(True)
self.scroll_area.setStyleSheet("background-color: white;")
self.image_label = QLabel()
self.image_label.setAlignment(Qt.AlignCenter)
self.image_label.setStyleSheet("background-color: white;")
self.scroll_area.setWidget(self.image_label)
layout.addWidget(self.scroll_area, 1)
status_layout = QHBoxLayout()
self.status_label = QLabel("就绪")
self.status_label.setStyleSheet("color: #666; font-size: 11px;")
status_layout.addWidget(self.status_label)
status_layout.addStretch()
layout.addLayout(status_layout)
self.setLayout(layout)
self.zoom_in_btn.clicked.connect(self.zoom_in)
self.zoom_out_btn.clicked.connect(self.zoom_out)
self.fit_btn.clicked.connect(self.fit_to_window)
self.original_btn.clicked.connect(self.original_size)
self.save_btn.clicked.connect(self.save_image)
def load_image(self, image_path: str):
"""加载并显示图像"""
if not image_path or not Path(image_path).exists():
self.image_label.setText("图像不存在")
self.status_label.setText("图像加载失败")
return
self.current_image_path = image_path
self.scale_factor = 1.0
pixmap = QPixmap(image_path)
if pixmap.isNull():
self.image_label.setText("无法加载图像")
self.status_label.setText("图像格式不支持")
return
self.original_pixmap = pixmap
self.fit_to_window()
file_info = Path(image_path).stat()
size_mb = file_info.st_size / (1024 * 1024)
self.status_label.setText(f"{pixmap.width()}x{pixmap.height()} | {size_mb:.2f} MB | {Path(image_path).name} | 适应窗口")
def update_image_display(self):
"""更新图像显示 - 使用防抖避免频繁重绘卡顿"""
self._update_timer.stop()
self._pending_scale = self.scale_factor
self._update_timer.start(50)
def _do_update_display(self):
"""实际执行图像更新"""
if not hasattr(self, 'original_pixmap') or self.original_pixmap.isNull():
return
if self._pending_scale is None:
return
if self._pending_scale > 2.0 or self._pending_scale < 0.5:
transform = Qt.FastTransformation
else:
transform = Qt.SmoothTransformation
scaled_pixmap = self.original_pixmap.scaled(
int(self.original_pixmap.width() * self._pending_scale),
int(self.original_pixmap.height() * self._pending_scale),
Qt.KeepAspectRatio,
transform
)
self.image_label.setPixmap(scaled_pixmap)
self._pending_scale = None
def wheelEvent(self, event):
"""鼠标滚轮缩放 - 实时响应"""
delta = event.angleDelta().y()
if delta > 0:
if self.scale_factor < 5.0:
self.scale_factor = min(self.scale_factor * 1.1, 5.0)
self.update_image_display()
else:
if self.scale_factor > 0.1:
self.scale_factor = max(self.scale_factor / 1.1, 0.1)
self.update_image_display()
event.accept()
def zoom_in(self):
"""放大"""
if self.scale_factor < 5.0:
self.scale_factor = min(self.scale_factor * 1.25, 5.0)
self.update_image_display()
def zoom_out(self):
"""缩小"""
if self.scale_factor > 0.1:
self.scale_factor = max(self.scale_factor / 1.25, 0.1)
self.update_image_display()
def fit_to_window(self):
"""适应窗口"""
if not hasattr(self, 'original_pixmap') or self.original_pixmap.isNull():
return
view_size = self.scroll_area.viewport().size()
img_size = self.original_pixmap.size()
scale_w = view_size.width() / img_size.width()
scale_h = view_size.height() / img_size.height()
self._fit_scale = min(scale_w, scale_h)
self.scale_factor = self._fit_scale
self.update_image_display()
self.status_label.setText(f"适应窗口 | 缩放: {self.scale_factor:.1%}")
def original_size(self):
"""原始大小"""
self.scale_factor = 1.0
self._fit_scale = None
self.update_image_display()
self.status_label.setText("原始大小 | 缩放: 100%")
def save_image(self):
"""保存图像"""
if not self.current_image_path:
return
file_path, _ = QFileDialog.getSaveFileName(
self, "保存图像", Path(self.current_image_path).name,
"PNG图片 (*.png);;JPG图片 (*.jpg);;所有文件 (*.*)"
)
if file_path:
try:
import shutil
shutil.copy(self.current_image_path, file_path)
except Exception as e:
QMessageBox.critical(self, "错误", f"保存失败: {e}")

View File

@ -0,0 +1,161 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
配置管理器
接管主窗口中所有配置读写逻辑:
- new_config() 清空所有面板配置
- load_config(file_path) 从 JSON 文件加载配置并回填面板
- save_config(file_path) 将当前配置保存为 JSON 文件
- get_current_config() 遍历 PanelFactory 收集配置(懒加载安全)
懒加载兼容原则:
- get_current_config() 仅遍历已加载面板,未加载面板返回空字典 {}
- 绝不为了拿配置而强行唤醒/渲染所有 Panel
- 如需全量配置(如保存),调用方应先执行 panel_factory.preload_all()
"""
import json
import os
from typing import Dict, Optional
from PyQt5.QtCore import QObject
from PyQt5.QtWidgets import QMessageBox, QFileDialog
from src.gui.core.event_bus import global_event_bus
class ConfigManager(QObject):
"""配置管理器。
用法::
cfg_mgr = ConfigManager(panel_factory, parent=self)
cfg_mgr.new_config() # 清空配置
cfg_mgr.load_config(path) # 加载 JSON
cfg_mgr.save_config(path) # 保存 JSON
config = cfg_mgr.get_current_config() # 收集当前配置
"""
def __init__(self, panel_factory, parent=None):
"""
Args:
panel_factory: PanelFactory 实例
parent: 父 QObject用于弹窗定位
"""
super().__init__(parent)
self._panel_factory = panel_factory
# ═══════════════════════════════════════════════════════════
# 公开 API
# ═══════════════════════════════════════════════════════════
def new_config(self):
"""清空所有面板配置(需用户确认)。"""
reply = QMessageBox.question(
self.parent(), "新建配置", "是否清空当前配置?",
QMessageBox.Yes | QMessageBox.No
)
if reply != QMessageBox.Yes:
return
for panel in self._panel_factory.get_loaded_panels().values():
if hasattr(panel, 'clear_config'):
panel.clear_config()
global_event_bus.publish('LogMessage', {
'message': '已清空配置',
'level': 'info',
})
def load_config(self, file_path: str = None):
"""从 JSON 文件加载配置并回填面板。
Args:
file_path: JSON 文件路径。若为 None弹出文件选择对话框。
"""
if file_path is None:
file_path, _ = QFileDialog.getOpenFileName(
self.parent(), "加载配置", "",
"JSON Files (*.json);;All Files (*.*)"
)
if not file_path:
return
try:
with open(file_path, 'r', encoding='utf-8') as f:
config = json.load(f)
except Exception as e:
QMessageBox.critical(
self.parent(), "加载失败",
f"无法读取配置文件:\n{file_path}\n\n错误: {e}"
)
return
# 回填已加载面板
loaded_count = 0
for step_id, panel in self._panel_factory.get_loaded_panels().items():
if step_id in config and hasattr(panel, 'set_config'):
try:
panel.set_config(config[step_id])
loaded_count += 1
except Exception:
pass
global_event_bus.publish('LogMessage', {
'message': f'已加载配置: {file_path}(回填 {loaded_count} 个面板)',
'level': 'info',
})
def save_config(self, file_path: str = None):
"""将当前配置保存为 JSON 文件。
注意保存前会强制加载所有面板preload_all确保配置完整。
Args:
file_path: 目标 JSON 文件路径。若为 None弹出保存对话框。
"""
if file_path is None:
file_path, _ = QFileDialog.getSaveFileName(
self.parent(), "保存配置", "config.json",
"JSON Files (*.json);;All Files (*.*)"
)
if not file_path:
return
# 保存前强制加载所有面板,确保配置完整
self._panel_factory.preload_all()
config = self.get_current_config()
try:
with open(file_path, 'w', encoding='utf-8') as f:
json.dump(config, f, indent=2, ensure_ascii=False)
except Exception as e:
QMessageBox.critical(
self.parent(), "保存失败",
f"无法保存配置文件:\n{file_path}\n\n错误: {e}"
)
return
global_event_bus.publish('LogMessage', {
'message': f'已保存配置: {file_path}',
'level': 'info',
})
def get_current_config(self) -> Dict[str, dict]:
"""收集当前所有步骤的配置。
懒加载安全:仅遍历已加载面板,未加载面板返回空字典 {}
绝不为了拿配置而强行唤醒/渲染所有 Panel。
Returns:
{step_id: panel_config_dict}
"""
config = {}
for step_id, panel in self._panel_factory.get_loaded_panels().items():
if hasattr(panel, 'get_config'):
try:
config[step_id] = panel.get_config()
except Exception:
config[step_id] = {}
return config

View File

@ -0,0 +1,64 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
依赖订阅混入模块
提供 subscribe_panel_to_dependencies() 函数,让步骤面板根据
PANEL_REGISTRY 中声明的 dependencies 自动向 global_event_bus
订阅 OutputUpdated 事件。当上游步骤产出落地时,面板自动将路径
填入对应的 FileSelectWidget无需主窗口手工传导。
"""
from src.gui.core.event_bus import global_event_bus
def subscribe_panel_to_dependencies(panel, step_id, dependencies):
"""为面板订阅其依赖的上游步骤产出事件。
当 global_event_bus 发布 OutputUpdated 事件且 step_id/output_type
匹配时,自动将路径填入面板对应的 FileSelectWidget。
Args:
panel: 步骤面板实例QWidget 子类)
step_id: 当前面板的 step_id仅用于日志非匹配键
dependencies: dict, {input_field: (dep_step, output_type, panel_attr)}
"""
if not dependencies:
return
for _input_field, (dep_step, output_type, panel_attr) in dependencies.items():
_make_subscription(panel, dep_step, output_type, panel_attr)
def _make_subscription(panel, dep_step, output_type, panel_attr):
"""为单个依赖项创建事件订阅。使用工厂函数避免闭包变量延迟绑定。"""
def callback(data):
if data.get('step_id') != dep_step:
return
if data.get('output_type') != output_type:
return
widget = getattr(panel, panel_attr, None)
if widget is None:
return
current = ''
if hasattr(widget, 'get_path'):
current = widget.get_path().strip()
elif hasattr(widget, 'text'):
current = widget.text().strip()
if current:
return
path = data.get('path', '')
if not path:
return
if hasattr(widget, 'set_path'):
widget.set_path(path)
elif hasattr(widget, 'setText'):
widget.setText(path)
global_event_bus.subscribe('OutputUpdated', callback)

View File

@ -0,0 +1,67 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
对话框服务
封装纯展示类弹窗,从主窗口中彻底剥离 UI 对话框逻辑。
职责:
- show_pipeline_status() 显示 Pipeline 模块加载状态
- show_about() 显示"关于"对话框
- show_ai_settings() 显示 AI 引擎配置对话框
"""
from PyQt5.QtCore import QObject
from PyQt5.QtWidgets import QMessageBox
class DialogService(QObject):
"""对话框服务。
用法::
dlg_svc = DialogService(parent=self)
dlg_svc.show_about()
dlg_svc.show_pipeline_status()
dlg_svc.show_ai_settings()
"""
def __init__(self, parent=None):
super().__init__(parent)
# ═══════════════════════════════════════════════════════════
# 公开 API
# ═══════════════════════════════════════════════════════════
def show_pipeline_status(self):
"""显示 Pipeline 模块加载状态。"""
from src.gui.core.worker_thread import PIPELINE_AVAILABLE, PIPELINE_ERROR_INFO
if PIPELINE_AVAILABLE:
QMessageBox.information(
self.parent(), "Pipeline状态",
"Pipeline模块: 正常加载"
)
else:
detail = "\n".join(PIPELINE_ERROR_INFO)
QMessageBox.warning(
self.parent(), "Pipeline状态",
f"Pipeline模块: 加载失败\n\n{detail}"
)
def show_about(self):
"""显示"关于"对话框。"""
QMessageBox.about(
self.parent(), "关于",
"MegaCube-Water Quality V1.2\n\n"
"一个完整的水质参数反演工作流程工具\n\n"
"公司:北京依锐思遥感技术有限公司\n"
"地址北京市海淀区清河安宁庄东路18号5号楼二层205\n"
"电话010-51292601\n"
"邮箱hanshanlong@iris-rs.cn"
)
def show_ai_settings(self):
"""显示 AI 引擎配置对话框。"""
from src.gui.dialogs import AISettingsDialog
AISettingsDialog(self.parent()).exec_()

34
src/gui/core/event_bus.py Normal file
View File

@ -0,0 +1,34 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
轻量级事件总线
支持 subscribe(event_name, callback) 和 publish(event_name, data)
用于步骤面板间的去中心化参数传导。
"""
from collections import defaultdict
from typing import Any, Callable, Dict, List
class EventBus:
"""发布-订阅事件总线"""
def __init__(self):
self._subscribers: Dict[str, List[Callable]] = defaultdict(list)
def subscribe(self, event_name: str, callback: Callable[[dict], None]):
"""订阅事件。callback 接收一个 dict 作为事件数据。"""
self._subscribers[event_name].append(callback)
def publish(self, event_name: str, data: Dict[str, Any]):
"""发布事件,通知所有订阅者。"""
for callback in self._subscribers.get(event_name, []):
try:
callback(data)
except Exception:
pass
# 全局单例
global_event_bus = EventBus()

187
src/gui/core/log_manager.py Normal file
View File

@ -0,0 +1,187 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
日志与进度管理器
将主窗口中的日志区QTextEdit、进度条QProgressBar"清空日志"按钮
的 UI 创建与控制逻辑完全封装。
职责:
- create_log_panel() → 返回组装好的 QWidget日志 + 进度条)
- 内部订阅 LogMessage / ProgressUpdate 事件,自动更新 UI
- 主窗口无需再关心日志/进度的状态同步
订阅的事件:
LogMessage → {message, level} 写入日志区
ProgressUpdate → {percentage, message} 更新进度条
"""
from datetime import datetime
from PyQt5.QtCore import QObject
from PyQt5.QtGui import QTextCursor
from PyQt5.QtWidgets import (
QWidget, QVBoxLayout, QHBoxLayout,
QGroupBox, QTextEdit, QProgressBar, QPushButton,
)
from src.gui.core.event_bus import global_event_bus
class LogManager(QObject):
"""日志与进度管理器。
用法::
log_mgr = LogManager(parent=self)
log_panel = log_mgr.create_log_panel()
layout.addWidget(log_panel)
# 之后所有日志/进度更新由 EventBus 自动驱动,无需手动操作
"""
def __init__(self, parent=None):
super().__init__(parent)
self._log_text: QTextEdit = None
self._progress_bar: QProgressBar = None
# 订阅事件
global_event_bus.subscribe('LogMessage', self._on_log_message)
global_event_bus.subscribe('ProgressUpdate', self._on_progress_update)
# ═══════════════════════════════════════════════════════════
# 公开 API
# ═══════════════════════════════════════════════════════════
def create_log_panel(self) -> QWidget:
"""创建并返回日志+进度面板的组装 Widget。
Returns:
QWidget: 包含日志区QGroupBox和进度条QGroupBox的垂直布局容器
"""
from src.gui.styles import ModernStylesheet
container = QWidget()
layout = QVBoxLayout()
layout.setContentsMargins(0, 0, 0, 0)
layout.setSpacing(10)
# ── 日志区 ──
log_group = QGroupBox("执行日志")
log_group.setStyleSheet(f"""
QGroupBox {{
background-color: {ModernStylesheet.COLORS['panel_bg']};
border: 1px solid {ModernStylesheet.COLORS['border_light']};
border-radius: 5px; margin-top: 8px; padding-top: 15px;
padding-left: 9px; padding-right: 9px; padding-bottom: 9px;
}}
QGroupBox::title {{
subcontrol-origin: margin; subcontrol-position: top left;
padding: 0 5px; font-weight: bold;
color: {ModernStylesheet.COLORS['text_primary']};
}}
""")
log_layout = QVBoxLayout()
log_layout.setContentsMargins(5, 5, 5, 5)
self._log_text = QTextEdit()
self._log_text.setReadOnly(True)
self._log_text.setMaximumHeight(200)
self._log_text.setStyleSheet(f"""
QTextEdit {{
background-color: {ModernStylesheet.COLORS['panel_bg']};
color: {ModernStylesheet.COLORS['text_primary']};
border: 1px solid {ModernStylesheet.COLORS['border']};
border-radius: 4px; padding: 5px;
font-family: 'Courier New', monospace; font-size: 10px;
}}
""")
log_layout.addWidget(self._log_text)
clear_btn = QPushButton("清空日志")
clear_btn.setMaximumWidth(100)
clear_btn.setStyleSheet(ModernStylesheet.get_button_stylesheet('normal'))
clear_btn.clicked.connect(self.clear_log)
btn_row = QHBoxLayout()
btn_row.addWidget(clear_btn)
btn_row.addStretch()
log_layout.addLayout(btn_row)
log_group.setLayout(log_layout)
layout.addWidget(log_group, 1)
# ── 进度条 ──
progress_group = QGroupBox("执行进度")
progress_group.setStyleSheet(f"""
QGroupBox {{
background-color: {ModernStylesheet.COLORS['panel_bg']};
border: 1px solid {ModernStylesheet.COLORS['border_light']};
border-radius: 5px; margin-top: 8px; padding-top: 10px;
padding-left: 9px; padding-right: 9px; padding-bottom: 9px;
}}
QGroupBox::title {{
subcontrol-origin: margin; subcontrol-position: top left;
padding: 0 5px; font-weight: bold;
color: {ModernStylesheet.COLORS['text_primary']};
}}
""")
progress_layout = QVBoxLayout()
progress_layout.setContentsMargins(5, 5, 5, 5)
self._progress_bar = QProgressBar()
self._progress_bar.setValue(0)
self._progress_bar.setStyleSheet(f"""
QProgressBar {{
background-color: {ModernStylesheet.COLORS['panel_bg']};
border: 1px solid {ModernStylesheet.COLORS['border']};
border-radius: 4px; padding: 2px; text-align: center; height: 20px;
}}
QProgressBar::chunk {{
background-color: {ModernStylesheet.COLORS['success']}; border-radius: 3px;
}}
""")
progress_layout.addWidget(self._progress_bar)
progress_group.setLayout(progress_layout)
layout.addWidget(progress_group, 0)
container.setLayout(layout)
return container
def clear_log(self):
"""清空日志区。"""
if self._log_text is not None:
self._log_text.clear()
@property
def progress_bar(self) -> QProgressBar:
return self._progress_bar
@property
def log_text(self) -> QTextEdit:
return self._log_text
# ═══════════════════════════════════════════════════════════
# EventBus 订阅回调
# ═══════════════════════════════════════════════════════════
def _on_log_message(self, data: dict):
"""LogMessage 事件回调:写入日志区。"""
if self._log_text is None:
return
message = data.get('message', '')
level = data.get('level', 'info')
timestamp = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
color_map = {'error': 'red', 'warning': 'orange'}
color = color_map.get(level, 'black')
formatted = f'<span style="color: {color};">[{timestamp}] {message}</span>'
self._log_text.append(formatted)
cursor = self._log_text.textCursor()
cursor.movePosition(QTextCursor.End)
self._log_text.setTextCursor(cursor)
def _on_progress_update(self, data: dict):
"""ProgressUpdate 事件回调:更新进度条。"""
if self._progress_bar is None:
return
percentage = data.get('percentage', 0)
self._progress_bar.setValue(percentage)

View File

@ -0,0 +1,294 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
面板注册与装载工厂
按需懒加载步骤面板,替代 create_content_area 中一次性全量 new 14 个面板的做法。
主窗口只需持有 PanelFactory 实例,通过 factory.create_tab_widget() 获取
已挂载占位页的 QTabWidget面板在用户首次切换到对应 Tab 时才实例化。
特性:
- 懒加载:仅在 tab 首次激活时创建面板实例
- 邻接预加载:切换 tab 时自动预加载左右邻居(可配置预加载窗口大小)
- 注册表驱动:完全依赖 PANEL_REGISTRY零硬编码
- 事件总线自动接线:面板创建后自动调用 subscribe_panel_to_dependencies
- 占位页:未加载的 tab 显示空白 QWidget加载后原地替换为 QScrollArea(panel)
"""
from PyQt5.QtWidgets import QWidget, QTabWidget, QScrollArea
from PyQt5.QtCore import Qt
from src.gui.core.panel_registry import PANEL_REGISTRY
from src.gui.core.dependency_subscriber import subscribe_panel_to_dependencies
from src.gui.core.event_bus import global_event_bus
class PanelFactory:
"""面板注册与装载工厂。
用法::
factory = PanelFactory(registry=PANEL_REGISTRY, main_window=self)
tab_widget = factory.create_tab_widget(icons_dir="data/icons")
# tab_widget 已包含所有占位页,可直接加入主窗口布局
# 后续通过 factory.get_panel(step_id) 按需获取面板实例
"""
def __init__(self, registry, main_window, preload_window=1):
"""
Args:
registry: PANEL_REGISTRY 列表
main_window: WaterQualityGUI 实例(用于注入 main_window 依赖)
preload_window: 邻接预加载窗口大小。0=仅加载当前 tab
1=当前+左右各1个-1=全量预加载(退化为旧行为)
"""
self._registry = registry
self._main_window = main_window
self._preload_window = preload_window
# step_id → panel 实例(仅已加载的)
self._panels = {}
# tab_index → 是否已加载
self._loaded = set()
# tab_index → placeholder QWidget加载后被替换
self._placeholders = {}
# 对外的 QTabWidget 引用
self._tab_widget = None
# ── 公开 API ──────────────────────────────────────────────
def create_tab_widget(self, icons_dir="data/icons"):
"""创建并返回已填充占位页的 QTabWidget。
每个 tab 初始为空白 QWidget 占位,面板在首次激活时懒加载。
同时连接 currentChanged 信号驱动懒加载 + 邻接预加载。
Args:
icons_dir: 图标目录名(相对于项目根),用于 get_resource_path
Returns:
QTabWidget: 已添加所有占位 tab 的标签页控件
"""
from src.gui.water_quality_gui import get_resource_path
from PyQt5.QtGui import QIcon
self._tab_widget = QTabWidget()
self._tab_widget.setTabPosition(QTabWidget.North)
self._tab_widget.setTabsClosable(False)
for idx, entry in enumerate(self._registry):
step_id = entry['step_id']
title = entry['title']
icon_name = entry['icon']
# 创建占位页
placeholder = QWidget()
self._placeholders[idx] = placeholder
icon_path = get_resource_path(f"{icons_dir}/{icon_name}")
self._tab_widget.addTab(placeholder, QIcon(icon_path), title)
# 连接切换信号 → 懒加载
self._tab_widget.currentChanged.connect(self._on_tab_changed)
# 立即预加载首个 tab
if self._registry:
self._ensure_loaded(0)
return self._tab_widget
def get_panel(self, step_id):
"""获取面板实例(若未加载则触发懒加载)。
Args:
step_id: 步骤 ID'step1''step5_clean'
Returns:
QWidget 或 None: 面板实例,未找到则返回 None
"""
tab_index = self._step_id_to_tab_index(step_id)
if tab_index < 0:
return None
self._ensure_loaded(tab_index)
return self._panels.get(step_id)
def get_loaded_panels(self):
"""返回所有已加载的面板字典 {step_id: panel}。"""
return dict(self._panels)
def preload_all(self):
"""强制加载所有面板(用于配置保存等需要遍历全部面板的场景)。"""
for idx in range(len(self._registry)):
self._ensure_loaded(idx)
def get_tab_widget(self):
"""返回内部 QTabWidget 引用。"""
return self._tab_widget
# ── 内部方法 ──────────────────────────────────────────────
def _on_tab_changed(self, index):
"""Tab 切换时:加载当前 tab + 邻接预加载。"""
if index < 0:
return
self._ensure_loaded(index)
self._preload_neighbors(index)
def _ensure_loaded(self, tab_index):
"""确保指定 tab 已加载;若未加载则实例化面板并替换占位页。"""
if tab_index in self._loaded:
return
if tab_index < 0 or tab_index >= len(self._registry):
return
entry = self._registry[tab_index]
step_id = entry['step_id']
cls = entry['class_ref']
title = entry['title']
kwargs = entry.get('constructor_kwargs')
deps = entry.get('dependencies')
# 解析构造参数
resolved_kwargs = {}
if kwargs:
for k in kwargs:
if k == 'main_window':
resolved_kwargs[k] = self._main_window
# 实例化面板
panel = cls(**resolved_kwargs)
# 包裹到 QScrollArea
scroll = QScrollArea()
scroll.setWidget(panel)
scroll.setWidgetResizable(True)
# 替换占位页blockSignals 阻断 removeTab/insertTab/setCurrentIndex 触发的
# currentChanged 信号风暴,防止 _on_tab_changed → _ensure_loaded 无限递归)
placeholder = self._placeholders.get(tab_index)
if placeholder is not None and self._tab_widget is not None:
tab_title = self._tab_widget.tabText(tab_index)
tab_icon = self._tab_widget.tabIcon(tab_index)
self._tab_widget.blockSignals(True)
try:
self._tab_widget.removeTab(tab_index)
self._tab_widget.insertTab(tab_index, scroll, tab_icon, tab_title)
self._tab_widget.setCurrentIndex(tab_index)
finally:
self._tab_widget.blockSignals(False)
# 注册
self._panels[step_id] = panel
self._loaded.add(tab_index)
# 事件总线自动接线
if deps:
subscribe_panel_to_dependencies(panel, step_id, deps)
# ★ Catch-up向刚苏醒的懒加载面板回放已累积的状态
# (面板在 OutputUpdated 事件广播之后才实例化,错过了事件,
# 必须主动回放 step_outputs + 全局输入,否则输入框全空)
self._replay_state_to_panel(panel)
# ── Catch-up 状态追溯 ────────────────────────────────────
def _replay_state_to_panel(self, panel):
"""向刚实例化的懒加载面板回放已累积的状态。
三步回放:
1. update_from_config —— 生成默认输出路径 + 跨面板参数读取
2. 回放 WorkspaceManager.step_outputs —— 已运行步骤的产出文件路径
3. 实时扫描已加载面板 —— 读取被依赖的属性值(如 Step1 的 img_file
发布为 OutputUpdated触发 dependency_subscriber 回填输入框
"""
# 1. update_from_config生成默认输出路径
if hasattr(panel, 'update_from_config'):
try:
work_dir = self._get_current_work_dir()
panel.update_from_config(work_dir=work_dir, pipeline=None)
except Exception:
pass
# 2. 回放 WorkspaceManager 中已累积的 step_outputs
ws_manager = self._get_workspace_manager()
if ws_manager:
for src_step_id, outputs in ws_manager.step_outputs.items():
for output_type, path in outputs.items():
if not path:
continue
global_event_bus.publish('OutputUpdated', {
'step_id': src_step_id,
'output_type': output_type,
'path': path,
})
# 3. 实时扫描已加载面板中被依赖的属性(覆盖全局输入如 reference_img
self._replay_live_panel_inputs()
def _replay_live_panel_inputs(self):
"""遍历 PANEL_REGISTRY 依赖声明,从已加载面板实时读取属性值。
若源面板已实例化,读取其 widget 的当前值并发布为 OutputUpdated
确保懒加载面板能收到全局输入(如 Step1.img_file → reference_img
"""
for entry in self._registry:
deps = entry.get('dependencies')
if not deps:
continue
for _input_field, (dep_step, output_type, panel_attr) in deps.items():
src_panel = self._panels.get(dep_step)
if src_panel is None:
continue
widget = getattr(src_panel, panel_attr, None)
if widget is None:
continue
path = ''
if hasattr(widget, 'get_path'):
path = widget.get_path().strip()
elif hasattr(widget, 'text'):
path = widget.text().strip()
if not path:
continue
global_event_bus.publish('OutputUpdated', {
'step_id': dep_step,
'output_type': output_type,
'path': path,
})
def _get_current_work_dir(self):
"""从 WorkspaceInitializer 获取当前工作目录。"""
try:
return self._main_window._workspace_initializer.work_dir
except Exception:
return None
def _get_workspace_manager(self):
"""从 WorkspaceInitializer 获取 WorkspaceManager 实例。"""
try:
return self._main_window._workspace_initializer.workspace_manager
except Exception:
return None
def _preload_neighbors(self, index):
"""预加载当前 tab 的邻居(根据 preload_window 配置)。"""
if self._preload_window < 0:
# 全量预加载
for i in range(len(self._registry)):
self._ensure_loaded(i)
return
if self._preload_window == 0:
return
start = max(0, index - self._preload_window)
end = min(len(self._registry), index + self._preload_window + 1)
for i in range(start, end):
if i != index:
self._ensure_loaded(i)
def _step_id_to_tab_index(self, step_id):
"""step_id → tab_index 映射。"""
for i, entry in enumerate(self._registry):
if entry['step_id'] == step_id:
return i
return -1

View File

@ -0,0 +1,270 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
面板注册中心
集中定义所有步骤面板的结构化配置,包括:
- 步骤 ID / 类引用 / 标题 / 图标 / 阶段 / 导航显示名
- 步骤间依赖关系(输入字段 → 上游步骤/输出类型/面板属性)
- 构造参数(如 Step13ReportPanel 需要 main_window
WaterQualityGUI 通过遍历 PANEL_REGISTRY 动态生成导航树、Tab 页、
依赖传递和配置读写,彻底消除硬编码。
"""
from src.gui.panels.step1_panel import Step1Panel
from src.gui.panels.step2_panel import Step2Panel
from src.gui.panels.step3_panel import Step3Panel
from src.gui.panels.step4_sampling_panel import Step4SamplingPanel
from src.gui.panels.step5_clean_panel import Step5CleanPanel
from src.gui.panels.step6_feature_panel import Step6FeaturePanel
from src.new.views.step7_view import Step7View
from src.gui.panels.step8_ml_train_panel import Step8MlTrainPanel
from src.gui.panels.step9_ml_predict_panel import Step9MlPredictPanel
from src.gui.panels.step10_watercolor_panel import Step10WatercolorPanel
from src.gui.panels.step11_map_panel import Step11MapPanel
from src.gui.panels.step12_viz_panel import Step12VizPanel
from src.gui.panels.step13_report_panel import Step13ReportPanel
PANEL_REGISTRY = [
# ═══════════════════════════════════════════════════════════════
# 阶段一:影像预处理
# ═══════════════════════════════════════════════════════════════
{
'step_id': 'step1',
'class_ref': Step1Panel,
'title': '水域掩膜',
'icon': '1.png',
'stage': '阶段一:影像预处理',
'display_name': '1. 水域掩膜生成',
'dependencies': None,
'constructor_kwargs': None,
},
{
'step_id': 'step2',
'class_ref': Step2Panel,
'title': '耀斑检测',
'icon': '2.png',
'stage': '阶段一:影像预处理',
'display_name': '2. 耀斑区域识别',
'dependencies': {
'img_path': ('step1', 'reference_img', 'img_file'),
'water_mask_path': ('step1', 'water_mask', 'water_mask_file'),
},
'constructor_kwargs': None,
},
{
'step_id': 'step3',
'class_ref': Step3Panel,
'title': '耀斑去除',
'icon': '3.png',
'stage': '阶段一:影像预处理',
'display_name': '3. 耀斑去除与修复',
'dependencies': {
'img_path': ('step1', 'reference_img', 'img_file'),
'water_mask': ('step1', 'water_mask', 'water_mask_file'),
},
'constructor_kwargs': None,
},
# ═══════════════════════════════════════════════════════════════
# 阶段二:样本数据准备
# ═══════════════════════════════════════════════════════════════
{
'step_id': 'step4_sampling',
'class_ref': Step4SamplingPanel,
'title': '采样点布设',
'icon': '4.png',
'stage': '阶段二:样本数据准备',
'display_name': '4. 采样点布设',
'dependencies': {
'deglint_img_path': ('step3', 'deglint_image', 'deglint_img_file'),
'water_mask_path': ('step1', 'water_mask', 'water_mask_file'),
},
'constructor_kwargs': None,
},
{
'step_id': 'step5_clean',
'class_ref': Step5CleanPanel,
'title': '数据清洗',
'icon': '5.png',
'stage': '阶段二:样本数据准备',
'display_name': '5. 数据清洗',
# 业务要求保持输入源独立,不自动抓取 step4_sampling 的输出
'dependencies': None,
'constructor_kwargs': None,
},
{
'step_id': 'step6_feature',
'class_ref': Step6FeaturePanel,
'title': '光谱特征',
'icon': '6.png',
'stage': '阶段二:样本数据准备',
'display_name': '6. 光谱特征提取',
'dependencies': {
'deglint_img_path': ('step3', 'deglint_image', 'deglint_img_file'),
'csv_path': ('step5_clean', 'processed_data', 'csv_file'),
'boundary_mask_path': ('step1', 'water_mask', 'water_mask_file'),
'glint_mask_path': ('step2', 'glint_mask', 'glint_mask_file'),
},
'constructor_kwargs': None,
},
{
'step_id': 'step7_index',
'class_ref': Step7View,
'title': '水质光谱指数计算',
'icon': '7.png',
'stage': '阶段二:样本数据准备',
'display_name': '7. 水质指数计算',
'dependencies': {
'training_csv_path': ('step6_feature', 'training_spectra', 'training_data_widget'),
},
'constructor_kwargs': None,
},
# ═══════════════════════════════════════════════════════════════
# 阶段三:模型构建与训练
# ═══════════════════════════════════════════════════════════════
{
'step_id': 'step8_ml_train',
'class_ref': Step8MlTrainPanel,
'title': '机器学习建模',
'icon': '8.png',
'stage': '阶段三:模型构建与训练',
'display_name': '8. 机器学习建模',
'dependencies': {
'training_csv_file': ('step7_index', 'training_spectra_indices', 'training_csv_file'),
},
'constructor_kwargs': None,
},
# ═══════════════════════════════════════════════════════════════
# 阶段四:预测与成果输出
# ═══════════════════════════════════════════════════════════════
{
'step_id': 'step9_ml_predict',
'class_ref': Step9MlPredictPanel,
'title': '机器学习预测',
'icon': '10.png',
'stage': '阶段四:预测与成果输出',
'display_name': '9. 机器学习预测',
'dependencies': {
'models_dir': ('step8_ml_train', 'Supervised_Model_Training', 'models_dir_file'),
},
'constructor_kwargs': None,
},
{
'step_id': 'step10_watercolor',
'class_ref': Step10WatercolorPanel,
'title': '水色指数反演',
'icon': '10.png',
'stage': '阶段四:预测与成果输出',
'display_name': '10. 水色指数反演',
'dependencies': {
'bsq_file': ('step3', 'deglint_image', 'bsq_file'),
},
'constructor_kwargs': None,
},
{
'step_id': 'step11_map',
'class_ref': Step11MapPanel,
'title': '专题图生成',
'icon': '10.png',
'stage': '阶段四:预测与成果输出',
'display_name': '11. 专题图生成',
'dependencies': {
'prediction_csv_dir_edit': ('step9_ml_predict', '9_ML_Prediction', 'prediction_csv_dir_edit'),
'geotiff_dir_edit': ('step10_watercolor', 'WaterIndex_Images', 'geotiff_dir_edit'),
},
'constructor_kwargs': None,
},
{
'step_id': 'step12_viz',
'class_ref': Step12VizPanel,
'title': '可视化',
'icon': '9.png',
'stage': '阶段四:预测与成果输出',
'display_name': '12. 可视化展示',
'dependencies': None,
'constructor_kwargs': None,
},
{
'step_id': 'step13_report',
'class_ref': Step13ReportPanel,
'title': '报告生成',
'icon': '10.png',
'stage': '阶段四:预测与成果输出',
'display_name': '13. 分析报告生成',
'dependencies': None,
'constructor_kwargs': {'main_window'}, # 需要注入 main_window=self
},
]
def build_step_dependencies():
"""从 PANEL_REGISTRY 构建 step_dependencies 字典。
Returns:
dict: {step_id: {input_field: (dep_step, output_type, panel_attr)}}
"""
deps = {}
for entry in PANEL_REGISTRY:
if entry['dependencies']:
deps[entry['step_id']] = entry['dependencies']
return deps
def build_stage_groups():
"""从 PANEL_REGISTRY 构建阶段分组字典。
Returns:
dict: {stage_name: [(step_id, display_name), ...]}
"""
groups = {}
for entry in PANEL_REGISTRY:
stage = entry['stage']
if stage not in groups:
groups[stage] = []
groups[stage].append((entry['step_id'], entry['display_name']))
return groups
def get_tab_index(step_id):
"""根据 step_id 获取其在 PANEL_REGISTRY 中的索引(即 Tab 索引)。"""
for i, entry in enumerate(PANEL_REGISTRY):
if entry['step_id'] == step_id:
return i
return -1
def get_step_id_by_tab_index(tab_index):
"""根据 Tab 索引获取 step_id。"""
if 0 <= tab_index < len(PANEL_REGISTRY):
return PANEL_REGISTRY[tab_index]['step_id']
return None
def get_entry(step_id):
"""根据 step_id 获取注册表条目。"""
for entry in PANEL_REGISTRY:
if entry['step_id'] == step_id:
return entry
return None
def build_output_types():
"""从 PANEL_REGISTRY 的 dependencies 反向推导每个步骤产出的 output_type 列表。
Returns:
dict: {step_id: [output_type, ...]}
"""
output_types = {}
for entry in PANEL_REGISTRY:
if entry['dependencies']:
for _input_field, (dep_step, output_type, _panel_attr) in entry['dependencies'].items():
if dep_step not in output_types:
output_types[dep_step] = []
if output_type not in output_types[dep_step]:
output_types[dep_step].append(output_type)
return output_types

View File

@ -0,0 +1,622 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""
Pipeline 执行器
接管 WaterQualityGUI 中所有 Pipeline 执行相关逻辑:
- run_full_pipeline() 完整流程执行
- run_single_step() 单步执行
- stop_pipeline() 停止执行
- _precheck_step3_bands() step3 波段越界预检
关键设计原则:
- 所有状态变化通过 global_event_bus 发布事件,绝不直接操作 UI 控件
- WorkerThread 的 Qt 信号连接到内部槽函数,槽函数仅做 EventBus 转发
- 预检对话框PreflightDialog / PipelineModeDialog / BandConfirmDialog
仍为模态弹窗(用户交互必需),但结果通过 EventBus 发布
发布的事件:
PipelineStarted → {} 主窗口订阅:禁用运行按钮
PipelineFinished → {success, message} 主窗口订阅:恢复按钮 + 弹窗
PipelineStopped → {} 主窗口订阅:恢复按钮
StepCompleted → {step_name, success, message} WorkspaceInitializer 订阅:扫描产物
LogMessage → {message, level} LogManager 订阅:写入日志区
ProgressUpdate → {percentage, message} LogManager 订阅:更新进度条
"""
import os
import copy
import traceback
from pathlib import Path
from typing import Dict, List, Optional
from PyQt5.QtCore import QObject, Qt
from PyQt5.QtWidgets import QMessageBox, QDialog
from src.gui.core.event_bus import global_event_bus
from src.gui.core.worker_thread import (
WorkerThread,
PIPELINE_AVAILABLE,
)
from src.gui.core.preflight_dialog import PreflightDialog
from src.gui.core.pipeline_mode_dialog import PipelineModeDialog
from src.gui.dialogs import BandConfirmDialog
from src.core.pipeline.runner import PipelineHalt
class PipelineExecutor(QObject):
"""Pipeline 执行器 —— 纯逻辑层,零 UI 直接操作。"""
def __init__(self, panel_factory, workspace_initializer, parent=None):
"""
Args:
panel_factory: PanelFactory 实例(用于获取面板和配置)
workspace_initializer: WorkspaceInitializer 实例(用于获取 work_dir
parent: 父 QObject通常为 WaterQualityGUI
"""
super().__init__(parent)
self._panel_factory = panel_factory
self._workspace_initializer = workspace_initializer
self._worker: Optional[WorkerThread] = None
# 订阅面板发出的单步执行请求(解耦面板与执行器)
global_event_bus.subscribe('RequestRunSingleStep', self._on_request_run_single_step)
# ═══════════════════════════════════════════════════════════
# 公开 API
# ═══════════════════════════════════════════════════════════
@property
def worker(self):
return self._worker
@property
def is_running(self) -> bool:
return self._worker is not None and self._worker.isRunning()
def run_full_pipeline(self):
"""运行完整流程。
流程:
1. 检查 PIPELINE_AVAILABLE
2. 获取 work_dir从 WorkspaceInitializer
3. 扫描工作目录 + 自动回填
4. step3 波段越界预检
5. 全流程模式选择弹窗
6. 获取配置 + 模式裁剪
7. 一次性全预检 + 用户交互
8. 确认执行 → 创建 WorkerThread → 启动
关键防静默失败设计:
- 每一个 return 前必须通过 EventBus 发布 LogMessage
- 整个方法体包裹在 try/except 中,防止 PyQt5 槽函数静默吞异常
"""
print("==== [探针] run_full_pipeline 方法体已进入 ====", flush=True)
try:
self._run_full_pipeline_impl()
except Exception as e:
err_detail = traceback.format_exc()
global_event_bus.publish('LogMessage', {
'message': f'[致命错误] run_full_pipeline 异常: {e}',
'level': 'error',
})
global_event_bus.publish('LogMessage', {
'message': f'详细追踪:\n{err_detail}',
'level': 'error',
})
QMessageBox.critical(
self.parent(), "运行失败",
f"启动流程时发生未预期的错误:\n\n{e}\n\n详细信息已输出到日志区。"
)
def _run_full_pipeline_impl(self):
"""run_full_pipeline 的实现体,由外层 try/except 保护。"""
# ★ 终端即时反馈:确保即使 EventBus/日志区未就绪也能看到
print("\n[PipelineExecutor] 收到「运行完整流程」指令,开始执行...")
if not PIPELINE_AVAILABLE:
global_event_bus.publish('LogMessage', {
'message': '无法导入 Pipeline 模块,请检查项目文件结构!',
'level': 'error',
})
QMessageBox.critical(
self.parent(), "错误",
"无法导入 Pipeline 模块,请检查 src/core/handlers/ 目录是否完整!"
)
return
# ── 1) 获取 work_dir ──
work_dir = self._workspace_initializer.work_dir
if not work_dir:
global_event_bus.publish('LogMessage', {
'message': '⚠ 未选择工作目录,流程中止。请先通过「工具 → 设置工作目录」选择工作目录。',
'level': 'warning',
})
QMessageBox.warning(self.parent(), "警告", "未选择工作目录,请先设置工作目录。")
return
work_path = Path(work_dir)
global_event_bus.publish('LogMessage', {
'message': f'[运行] 工作目录: {work_dir}',
'level': 'info',
})
# ── 2) 运行前扫描 + 自动回填 ──
global_event_bus.publish('LogMessage', {
'message': '正在进行运行前环境预检与自动扫描...',
'level': 'info',
})
self._workspace_initializer.auto_populate_all()
global_event_bus.publish('LogMessage', {
'message': '✓ 预检完成:已扫描工作目录并自动回填已落盘的产物',
'level': 'info',
})
# ── 3) step3 波段越界预检 ──
if not self._precheck_step3_bands():
global_event_bus.publish('LogMessage', {
'message': '⚠ 流程中止step3 波段越界预检未通过(用户取消或波段配置无效)',
'level': 'warning',
})
return
# ── 4) 全流程模式选择弹窗 ──
mode_dlg = PipelineModeDialog(main_window=self.parent(), parent=self.parent())
if mode_dlg.exec() != QDialog.Accepted:
global_event_bus.publish('LogMessage', {
'message': '⚠ 流程中止:用户取消了模式选择对话框',
'level': 'warning',
})
return
selected_mode = mode_dlg.selected_mode
global_event_bus.publish('LogMessage', {
'message': (
f"[模式选择] 选定模式: "
f"{'训练新模型' if selected_mode == 'training' else '使用已有模型直接预测'}"
),
'level': 'info',
})
# ── 5) 获取配置(★ 先预加载所有面板,确保配置完整) ──
global_event_bus.publish('LogMessage', {
'message': '[运行] 正在收集所有步骤面板的配置...',
'level': 'info',
})
self._panel_factory.preload_all()
config = self._get_current_config()
global_event_bus.publish('LogMessage', {
'message': f'[运行] 已收集 {len(config)} 个步骤的配置: {list(config.keys())}',
'level': 'info',
})
# ── 6) 模式裁剪 ──
if selected_mode == "prediction_only":
from src.core.workspace_manager import WorkspaceManager
config = WorkspaceManager.prune_config_for_prediction_mode(config)
global_event_bus.publish('LogMessage', {
'message': '[模式选择] 已裁剪训练相关步骤step4/5/7/8进入仅预测模式',
'level': 'info',
})
# ── 7) 一次性全预检 + 用户交互式决策 ──
missing_items = PreflightDialog.build_missing_items(config)
skip_list: List[str] = []
if missing_items:
global_event_bus.publish('LogMessage', {
'message': f'[预检] 发现 {len(missing_items)} 个缺失项,弹出预检对话框...',
'level': 'warning',
})
critical_items = [it for it in missing_items if it.is_critical]
if critical_items:
lines = "\n".join(f" - [{it.step_name}] {it.reason}" for it in critical_items)
global_event_bus.publish('LogMessage', {
'message': f'[预检] 阻断性错误 ({len(critical_items)} 项):\n{lines}',
'level': 'error',
})
QMessageBox.critical(
self.parent(), "预检失败(阻断性错误)",
f"以下为阻断性缺失,流程无法启动:\n\n{lines}\n\n请填写后重新运行。"
)
return
dialog = PreflightDialog(missing_items, parent=self.parent())
if dialog.exec() != QDialog.Accepted:
global_event_bus.publish('LogMessage', {
'message': '⚠ 流程中止:用户取消了预检对话框',
'level': 'warning',
})
return
result = dialog.get_result()
if result is None:
global_event_bus.publish('LogMessage', {
'message': '⚠ 流程中止:预检对话框返回空结果',
'level': 'warning',
})
return
action, *payload = result
if action == "fill":
_, step_id, tab_index = result
global_event_bus.publish('NavigateToTab', {
'tab_index': tab_index,
'step_id': step_id,
})
global_event_bus.publish('LogMessage', {
'message': f'[预检] 用户选择填写 {step_id},已切换到对应面板。流程暂停,填写完成后请重新运行。',
'level': 'info',
})
return
skip_list = payload[0] if payload else []
if skip_list:
global_event_bus.publish('LogMessage', {
'message': f'[预检] 用户强制跳过 {len(skip_list)} 个步骤: {skip_list}',
'level': 'warning',
})
else:
global_event_bus.publish('LogMessage', {
'message': '[预检] ✓ 所有必需项均已就绪,无需弹窗',
'level': 'info',
})
# ── 8) 确认执行 ──
reply = QMessageBox.question(
self.parent(), "确认",
"是否开始执行完整流程?\n\n这可能需要较长时间,请确保配置正确。",
QMessageBox.Yes | QMessageBox.No
)
if reply != QMessageBox.Yes:
global_event_bus.publish('LogMessage', {
'message': '⚠ 流程中止:用户取消了执行确认',
'level': 'warning',
})
return
# ── 9) 准备 worker_config ──
global_event_bus.publish('LogMessage', {
'message': f'初始化 Pipeline工作目录: {work_dir}',
'level': 'info',
})
worker_config = copy.deepcopy(config)
step6_cfg = worker_config.get('step6_feature')
if step6_cfg:
enabled = step6_cfg.pop('enabled', True)
if not enabled:
worker_config.pop('step6_feature', None)
global_event_bus.publish('LogMessage', {
'message': f'[运行] 最终执行配置包含 {len(worker_config)} 个步骤: {list(worker_config.keys())}',
'level': 'info',
})
# ── 10) 创建 WorkerThread 并连线 ──
self._worker = WorkerThread(work_dir, worker_config, mode='full', skip_list=skip_list)
self._worker.log_message.connect(self._on_log_message, Qt.QueuedConnection)
self._worker.progress_update.connect(self._on_progress_update, Qt.QueuedConnection)
self._worker.step_completed.connect(self._on_step_completed, Qt.QueuedConnection)
self._worker.finished.connect(self._on_finished, Qt.QueuedConnection)
# ── 11) 发布启动事件 → 主窗口订阅后禁用按钮 ──
global_event_bus.publish('PipelineStarted', {})
global_event_bus.publish('ProgressUpdate', {'percentage': 0, 'message': '准备执行...'})
global_event_bus.publish('LogMessage', {'message': '=' * 50, 'level': 'info'})
global_event_bus.publish('LogMessage', {'message': '开始执行完整流程...', 'level': 'info'})
global_event_bus.publish('LogMessage', {'message': '=' * 50, 'level': 'info'})
self._worker.start()
def run_single_step(self, step_name: str, config: dict = None):
"""运行单个步骤。
Args:
step_name: 步骤名称(如 'step1', 'step5_clean'
config: 步骤配置字典(可选,默认从面板获取)
"""
try:
self._run_single_step_impl(step_name, config)
except Exception as e:
err_detail = traceback.format_exc()
global_event_bus.publish('LogMessage', {
'message': f'[致命错误] run_single_step 异常: {e}',
'level': 'error',
})
global_event_bus.publish('LogMessage', {
'message': f'详细追踪:\n{err_detail}',
'level': 'error',
})
QMessageBox.critical(
self.parent(), "运行失败",
f"启动单步执行时发生未预期的错误:\n\n{e}\n\n详细信息已输出到日志区。"
)
def _run_single_step_impl(self, step_name: str, config: dict = None):
if not PIPELINE_AVAILABLE:
global_event_bus.publish('LogMessage', {
'message': '无法导入 Pipeline 模块,请检查 src/core/handlers/ 目录是否完整!',
'level': 'error',
})
QMessageBox.critical(
self.parent(), "错误",
"无法导入 Pipeline 模块,请检查 src/core/handlers/ 目录是否完整!"
)
return
work_dir = self._workspace_initializer.work_dir or './work_dir'
if config is None:
global_event_bus.publish('LogMessage', {
'message': '[运行] 正在收集所有步骤面板的配置...',
'level': 'info',
})
self._panel_factory.preload_all()
config = self._get_current_config()
global_event_bus.publish('LogMessage', {
'message': f'[运行] 已收集 {len(config)} 个步骤的配置',
'level': 'info',
})
global_event_bus.publish('LogMessage', {
'message': f'初始化 Pipeline工作目录: {work_dir}',
'level': 'info',
})
self._worker = WorkerThread(work_dir, config, mode='single_step', step_name=step_name)
self._worker.log_message.connect(self._on_log_message, Qt.QueuedConnection)
self._worker.progress_update.connect(self._on_progress_update, Qt.QueuedConnection)
self._worker.step_completed.connect(self._on_step_completed, Qt.QueuedConnection)
self._worker.finished.connect(self._on_finished, Qt.QueuedConnection)
global_event_bus.publish('PipelineStarted', {})
global_event_bus.publish('ProgressUpdate', {'percentage': 0, 'message': f'准备执行 {step_name}...'})
global_event_bus.publish('LogMessage', {'message': '=' * 50, 'level': 'info'})
global_event_bus.publish('LogMessage', {
'message': f'开始独立运行步骤 {step_name}...',
'level': 'info',
})
global_event_bus.publish('LogMessage', {'message': '=' * 50, 'level': 'info'})
self._worker.start()
def stop_pipeline(self):
"""停止当前执行的流程。"""
if self._worker and self._worker.isRunning():
reply = QMessageBox.question(
self.parent(), "确认",
"是否停止当前流程?",
QMessageBox.Yes | QMessageBox.No
)
if reply == QMessageBox.Yes:
self._worker.stop()
global_event_bus.publish('LogMessage', {
'message': '用户取消执行',
'level': 'warning',
})
global_event_bus.publish('PipelineStopped', {})
# ═══════════════════════════════════════════════════════════
# EventBus 订阅回调
# ═══════════════════════════════════════════════════════════
def _on_request_run_single_step(self, data: dict):
"""处理面板通过 EventBus 发出的单步执行请求。
data 格式: {'step_name': 'step1', 'config': {'step1': {...}}}
前置条件检查(预检/工作目录)由 run_single_step → _run_single_step_impl
内部统一处理,此处仅做解析 + 转发 + 异常兜底。
"""
try:
step_name = data.get('step_name')
config = data.get('config')
print(f"==== Executor 收到单步请求: {step_name} ====", flush=True)
if not step_name:
global_event_bus.publish('LogMessage', {
'message': '[单步执行] 请求缺少 step_name忽略',
'level': 'warning',
})
return
# ★ 防死锁:若已有 Worker 在运行,不静默吞掉,而是通知用户
if self.is_running:
global_event_bus.publish('LogMessage', {
'message': f'[单步执行] 后台正在运行中,无法启动 {step_name}。请等待当前任务完成或手动停止后再试。',
'level': 'warning',
})
QMessageBox.warning(
self.parent(), "后台忙碌",
f"后台正在运行中,无法启动 {step_name}\n\n请等待当前任务完成,或点击「停止」按钮后再试。"
)
return
global_event_bus.publish('LogMessage', {
'message': f'[单步执行] 收到 {step_name} 的执行请求',
'level': 'info',
})
self.run_single_step(step_name, config)
except Exception as e:
err_detail = traceback.format_exc()
global_event_bus.publish('LogMessage', {
'message': f'[致命错误] _on_request_run_single_step({step_name}) 异常: {e}',
'level': 'error',
})
global_event_bus.publish('LogMessage', {
'message': f'详细追踪:\n{err_detail}',
'level': 'error',
})
# ═══════════════════════════════════════════════════════════
# WorkerThread 信号 → EventBus 事件(纯转发,零 UI 操作)
# ═══════════════════════════════════════════════════════════
def _on_log_message(self, message: str, level: str):
"""WorkerThread 日志 → EventBus LogMessage 事件。"""
global_event_bus.publish('LogMessage', {
'message': message,
'level': level,
})
def _on_progress_update(self, percentage: int, message: str):
"""WorkerThread 进度 → EventBus ProgressUpdate 事件。"""
global_event_bus.publish('ProgressUpdate', {
'percentage': percentage,
'message': message,
})
def _on_step_completed(self, step_name: str, success: bool, message: str):
"""WorkerThread 步骤完成 → EventBus StepCompleted 事件。
WorkspaceInitializer 订阅此事件,自动扫描产物并发布 OutputUpdated。
"""
global_event_bus.publish('StepCompleted', {
'step_name': step_name,
'success': success,
'message': message,
})
def _on_finished(self, success: bool, message: str):
"""WorkerThread 完成 → EventBus PipelineFinished 事件。
主窗口订阅此事件,恢复按钮状态并弹窗。
"""
global_event_bus.publish('PipelineFinished', {
'success': success,
'message': message,
})
# ═══════════════════════════════════════════════════════════
# 内部辅助
# ═══════════════════════════════════════════════════════════
def _get_current_config(self) -> dict:
"""从所有已加载面板收集配置。
注意:仅收集已加载面板(懒加载模式下可能不全)。
如需全量配置,调用方应先执行 panel_factory.preload_all()。
"""
config = {}
for step_id, panel in self._panel_factory.get_loaded_panels().items():
if hasattr(panel, 'get_config'):
config[step_id] = panel.get_config()
return config
def _precheck_step3_bands(self) -> bool:
"""步骤 3 波段越界预检(主线程同步执行,避免多线程弹窗问题)。
读取 step1 影像的 RasterCount校验 step3 面板当前方法下所有波段索引
是否越界。若越界,弹 BandConfirmDialog60s 倒计时)让用户调整或取消。
Returns:
True: 预检通过或已自动调整,继续执行
False: 用户点"取消运行",应中止
"""
try:
step1_panel = self._panel_factory.get_panel('step1')
step3_panel = self._panel_factory.get_panel('step3')
img_path = step1_panel.img_file.get_path() if step1_panel else None
step3_cfg = step3_panel.get_config() if step3_panel else None
step3_enabled = step3_panel.enable_checkbox.isChecked() if step3_panel else False
except Exception as e:
global_event_bus.publish('LogMessage', {
'message': f'⚠ step3 波段预检:读取面板状态失败 - {e}',
'level': 'warning',
})
return True
if not step3_enabled:
return True
if not img_path or not os.path.isfile(img_path):
global_event_bus.publish('LogMessage', {
'message': '⚠ step3 波段预检:未找到参考影像,跳过',
'level': 'info',
})
return True
if not step3_cfg:
return True
try:
from osgeo import gdal
dataset = gdal.Open(img_path)
if dataset is None:
global_event_bus.publish('LogMessage', {
'message': f'⚠ step3 波段预检gdal 无法打开影像 {img_path}',
'level': 'warning',
})
return True
max_band = dataset.RasterCount
dataset = None
except Exception as e:
global_event_bus.publish('LogMessage', {
'message': f'⚠ step3 波段预检:读取 RasterCount 失败 - {e}',
'level': 'warning',
})
return True
if max_band <= 0:
return True
method = step3_cfg.get('method', 'goodman')
if method == 'goodman':
band_fields = [
('nir_lower', 'nir_lower', 65, 'NIR下波段'),
('nir_upper', 'nir_upper', 91, 'NIR上波段'),
]
elif method == 'kutser':
band_fields = [
('oxy_band', 'oxy_band', 38, '氧吸收波段'),
('lower_oxy', 'lower_oxy', 36, '下氧吸收波段'),
('upper_oxy', 'upper_oxy', 49, '上氧吸收波段'),
('nir_band', 'nir_band', 47, 'NIR波段'),
]
elif method == 'hedley':
band_fields = [
('hedley_nir_band', 'hedley_nir_band', 47, 'NIR波段'),
]
else:
return True
for cfg_key, panel_attr, recommended, label in band_fields:
requested = step3_cfg.get(cfg_key)
if requested is None or requested <= max_band:
continue
global_event_bus.publish('LogMessage', {
'message': f'⚠ step3 波段越界:{label}={requested} > 影像波段数 {max_band}',
'level': 'warning',
})
dlg = BandConfirmDialog(
self.parent(),
requested_band=requested,
max_band=max_band,
recommended_band=recommended,
method_label=label,
)
result = dlg.exec_()
if result == QDialog.Rejected:
global_event_bus.publish('LogMessage', {
'message': '✗ 用户取消运行step3 波段越界未解决)',
'level': 'warning',
})
return False
new_band = dlg.selected_band()
try:
spin = getattr(step3_panel, panel_attr)
spin.setValue(new_band)
except AttributeError:
global_event_bus.publish('LogMessage', {
'message': f'⚠ step3 panel 缺控件 {panel_attr},跳过回写',
'level': 'warning',
})
continue
global_event_bus.publish('LogMessage', {
'message': f'{label}{requested}{new_band}(影像最多 {max_band} 波段)',
'level': 'info',
})
return True

View File

@ -0,0 +1,237 @@
# -*- coding: utf-8 -*-
"""
PipelineModeDialog全流程运行前的模式选择弹窗。
用户点击"运行完整流程"后,首先弹出此弹窗选择执行模式:
- 选项 A训练新模型并预测执行完整建模与预测流程需要实测水质 CSV
- 选项 B使用已有模型直接预测跳过训练步骤直接使用外部模型目录进行预测
弹窗结果:
- QDialog.Accepted + self.selected_mode = "training""prediction_only"
- QDialog.Rejected → 调用方中止 run_full_pipeline
"""
import os
from typing import Optional
from PyQt5.QtCore import Qt
from PyQt5.QtGui import QFont
from PyQt5.QtWidgets import (
QDialog, QVBoxLayout, QHBoxLayout, QLabel, QPushButton,
QRadioButton, QGroupBox, QButtonGroup, QMessageBox, QSizePolicy,
)
def _is_valid_model_dir(path: str) -> bool:
"""深层递归检测模型目录:只要任意层级存在文件即返回 True。"""
if not path or not os.path.isdir(path):
return False
for _root, _dirs, files in os.walk(path):
if files:
return True
return False
class PipelineModeDialog(QDialog):
"""全流程模式选择对话框。
两个单选按钮覆盖两种业务场景:
- A训练新模型完整流程需要 step4 CSV
- B仅预测跳过 step4/5/7/8直接用外部模型目录
属性:
selected_mode: "training" | "prediction_only"
"""
def __init__(self, main_window=None, parent=None):
super().__init__(parent)
self.main_window = main_window
self.selected_mode: Optional[str] = None
self.setWindowTitle("选择运行模式")
self.setMinimumSize(560, 340)
self.setModal(True)
self._setup_ui()
# ------------------------------------------------------------------
# UI 构建
# ------------------------------------------------------------------
def _setup_ui(self):
layout = QVBoxLayout(self)
layout.setContentsMargins(28, 24, 28, 20)
layout.setSpacing(14)
# ── 标题 ──
title = QLabel("请选择全流程运行模式")
title_font = QFont()
title_font.setPointSize(13)
title_font.setBold(True)
title.setFont(title_font)
title.setAlignment(Qt.AlignCenter)
layout.addWidget(title)
layout.addSpacing(4)
# ── 选项 A训练新模型 ──
group_a = QGroupBox()
group_a.setObjectName("groupA")
group_a.setMinimumHeight(100)
layout.addWidget(group_a)
self.radio_a = QRadioButton("【训练新模型并预测】")
self.radio_a.setChecked(True) # 默认选项 A
self.radio_a.setObjectName("radioTraining")
desc_a = QLabel(
"需要提供实测水质数据 (CSV),将执行完整建模与预测流程。\n"
"包括:水域掩膜 → 耀斑去除 → 光谱特征提取 → 模型训练 → 密集采样 → 预测 → 专题图"
)
desc_a.setWordWrap(True)
desc_a.setStyleSheet("color: #555555; background: transparent;")
desc_a.setObjectName("descA")
vbox_a = QVBoxLayout(group_a)
vbox_a.setContentsMargins(16, 20, 16, 14)
vbox_a.setSpacing(8)
vbox_a.addWidget(self.radio_a)
vbox_a.addWidget(desc_a)
# ── 选项 B仅预测 ──
group_b = QGroupBox()
group_b.setObjectName("groupB")
group_b.setMinimumHeight(100)
layout.addWidget(group_b)
self.radio_b = QRadioButton("【使用已有模型直接预测】")
self.radio_b.setObjectName("radioPrediction")
desc_b = QLabel(
"跳过模型训练步骤,直接使用导入的外部模型目录进行预测。\n"
"前提条件:请在「监督预测」或「回归预测」面板中指定模型目录。\n"
"适用范围:已有预训练模型、或其他来源模型目录。"
)
desc_b.setWordWrap(True)
desc_b.setStyleSheet("color: #555555; background: transparent;")
desc_b.setObjectName("descB")
vbox_b = QVBoxLayout(group_b)
vbox_b.setContentsMargins(16, 20, 16, 14)
vbox_b.setSpacing(8)
vbox_b.addWidget(self.radio_b)
vbox_b.addWidget(desc_b)
# ── 强制互斥QButtonGroup ──
self.mode_group = QButtonGroup(self)
self.mode_group.addButton(self.radio_a)
self.mode_group.addButton(self.radio_b)
# ── 提示栏(动态显示 models_dir 状态) ──
self.models_hint = QLabel()
self.models_hint.setObjectName("modelsHint")
self.models_hint.setWordWrap(True)
self.models_hint.setStyleSheet("color: #888888; font-size: 11px; padding: 4px 0;")
layout.addWidget(self.models_hint)
# ── 强制 QRadioButton 指示器为实心圆点 ──
self.setStyleSheet("""
QRadioButton::indicator {
width: 14px;
height: 14px;
}
QRadioButton::indicator:checked {
background-color: #0078D7;
border: 2px solid #0078D7;
border-radius: 7px;
}
QRadioButton::indicator:unchecked {
background-color: white;
border: 2px solid #A0A0A0;
border-radius: 7px;
}
""")
# ── 按钮 ──
btn_layout = QHBoxLayout()
btn_layout.addStretch()
cancel_btn = QPushButton("取消")
cancel_btn.setObjectName("cancelBtn")
cancel_btn.setMinimumWidth(90)
cancel_btn.clicked.connect(self.reject)
self.btn_confirm = QPushButton("确认")
self.btn_confirm.setObjectName("confirmBtn")
self.btn_confirm.setMinimumWidth(90)
self.btn_confirm.setDefault(True)
self.btn_confirm.clicked.connect(self._on_confirm)
btn_layout.addWidget(self.btn_confirm)
btn_layout.addWidget(cancel_btn)
layout.addLayout(btn_layout)
# 信号连接:任一 radio 切换时重新渲染提示 + 按钮状态
self.radio_a.toggled.connect(self._update_models_hint)
self.radio_b.toggled.connect(self._update_models_hint)
# 初始状态渲染
self._update_models_hint()
def _update_models_hint(self, checked=False, *args) -> None:
"""根据当前选中模式和 models_dir 状态更新提示文字及确认按钮可用性。"""
training_checked = self.radio_a.isChecked()
# 从主窗口 config 读取 models_dir优先 ml其次 reg
models_dir = ""
if self.main_window:
config = self.main_window.get_current_config()
models_dir = config.get("step11_ml", {}).get("models_dir", "")
if not models_dir:
models_dir = config.get("step11", {}).get("models_dir", "")
has_files = bool(models_dir and _is_valid_model_dir(models_dir))
dir_exists = bool(models_dir and os.path.isdir(models_dir))
if training_checked:
if hasattr(self, 'btn_confirm') and self.btn_confirm is not None:
self.btn_confirm.setEnabled(True)
if has_files:
self.models_hint.setText(
f"⚠ 注意:当前模型目录已包含文件,继续训练将会【覆盖】原有模型!\n路径:{models_dir}"
)
self.models_hint.setStyleSheet("color: #e65100; font-size: 11px; padding: 4px 0;")
else:
label = f"✓ 模型将保存至该目录(当前为空,安全)。\n路径:{models_dir}" if dir_exists else "✓ 尚未指定模型目录,将使用默认路径创建新模型。"
self.models_hint.setText(label)
self.models_hint.setStyleSheet("color: #2e7d32; font-size: 11px; padding: 4px 0;")
else:
if has_files:
self.models_hint.setText(
f"✓ 已检测到有效模型目录,可以直接预测。\n路径:{models_dir}"
)
self.models_hint.setStyleSheet("color: #2e7d32; font-size: 11px; padding: 4px 0;")
if hasattr(self, 'btn_confirm') and self.btn_confirm is not None:
self.btn_confirm.setEnabled(True)
else:
if dir_exists:
self.models_hint.setText(
f"❌ 错误:模型目录为空(未找到任何文件),无法进行预测!\n路径:{models_dir}"
)
else:
self.models_hint.setText(
"❌ 错误:模型目录为空或不存在!请先返回对应面板配置有效路径。"
)
self.models_hint.setStyleSheet("color: #c62828; font-size: 11px; padding: 4px 0;")
if hasattr(self, 'btn_confirm') and self.btn_confirm is not None:
self.btn_confirm.setEnabled(False)
def _on_confirm(self) -> None:
"""确认按钮回调:直接存储模式并关闭。
注意:按钮禁用状态已在 _update_models_hint 中处理,
此处仅负责结果存储,不再做二次弹窗拦截。
"""
if self.radio_a.isChecked():
self.selected_mode = "training"
else:
self.selected_mode = "prediction_only"
self.accept()

View File

@ -0,0 +1,429 @@
# -*- coding: utf-8 -*-
"""
预检交互对话框:一次性全预检 + 用户交互式决策。
用户点击"运行"后,若存在缺失项:
- 列出每个缺失项(步骤名 + 原因)
- 每项提供"填写"(跳转面板)和"忽略"(加入 skip_list选项
- 底部三个操作按钮决定流程走向
"""
import os
from dataclasses import dataclass
from typing import Dict, List, Optional, Set, Tuple
from PyQt5.QtWidgets import (
QDialog, QVBoxLayout, QHBoxLayout, QLabel, QPushButton,
QScrollArea, QWidget, QCheckBox, QGroupBox, QFrame,
QSizePolicy, QStyleFactory,
)
from PyQt5.QtCore import Qt
from PyQt5.QtGui import QFont, QColor, QPalette
from src.core.pipeline.runner import PIPELINE_STEPS
@dataclass
class MissingItem:
"""单个缺失项的结构化描述"""
step_id: str # step_id如 "step1"、"step8_non_empirical_modeling"
step_name: str # 面板 tab 显示名称,如 "水域掩膜"
reason: str # 缺失原因,如 "缺少参考影像路径"
panel_tab_index: int # step_stack 中的 tab 索引(用于切换)
is_critical: bool = False # 是否为阻断性缺失img_path 缺失 = True
# ============================================================
# PreflightDialog
# ============================================================
class PreflightDialog(QDialog):
"""预检交互对话框。
对每个 MissingItem用户可选择
- 勾选"忽略":将该 step_id 加入 skip_list运行时跳过
- 点击"填写":关闭弹窗,切换到对应面板 tab
对话框结果 (exec 返回值)
- QDialog.Accepted + self.result_data = ("fill", step_id)
→ 填写待办:切换到目标面板,停止流程
- QDialog.Accepted + self.result_data = ("skip", skip_list)
→ 强制跳过:携带 skip_list 继续运行
- QDialog.Rejected
→ 取消运行:完全停止
"""
# step_id → (step_name, panel_tab_index)
STEP_TAB_MAP = {
"step1": ("水域掩膜", 0),
"step2": ("耀斑检测", 1),
"step3": ("耀斑去除", 2),
"step4": ("数据清洗", 3),
"step5": ("特征构建", 4),
"step7": ("水质指数", 5),
"step8_non_empirical_modeling": ("回归建模", 7),
"step9": ("水色指数反演", 8),
"step10": ("采样点布设", 10),
"step11_ml": ("监督预测", 11),
"step11": ("回归预测", 12),
"step14": ("专题图生成", 13),
}
def __init__(self, missing_items: List[MissingItem], parent=None):
super().__init__(parent)
self.missing_items = missing_items
self.result_data: Optional[Tuple[str, any]] = None # ("fill", step_id) | ("skip", [step_id])
self._skip_checkboxes: List[QCheckBox] = []
self._fill_buttons: List[QPushButton] = []
self.setWindowTitle("⚠ 预检发现缺失项")
self.setMinimumSize(680, 420)
self.setModal(True)
self._setup_ui()
# ------------------------------------------------------------------
# UI 构建
# ------------------------------------------------------------------
def _setup_ui(self):
main_layout = QVBoxLayout(self)
main_layout.setContentsMargins(20, 20, 20, 16)
main_layout.setSpacing(10)
# ── 顶部提示 ──
header_label = QLabel(
f"检测到 <b>{len(self.missing_items)}</b> 个缺失项,请逐项处理后继续:"
)
header_label.setStyleSheet("font-size: 14px; color: #e67e22; font-weight: bold;")
main_layout.addWidget(header_label)
# ── 滚动区域(缺失项列表) ──
scroll = QScrollArea()
scroll.setWidgetResizable(True)
scroll.setFrameShape(QFrame.NoFrame)
scroll.setStyleSheet("background: transparent;")
container = QWidget()
container_layout = QVBoxLayout(container)
container_layout.setContentsMargins(0, 0, 8, 0)
container_layout.setSpacing(8)
for item in self.missing_items:
row = self._build_item_row(item)
container_layout.addWidget(row)
container_layout.addStretch()
scroll.setWidget(container)
main_layout.addWidget(scroll, 1)
# ── 底部操作按钮 ──
btn_layout = QHBoxLayout()
btn_layout.setSpacing(12)
# 取消运行(左)
cancel_btn = QPushButton("取消运行")
cancel_btn.setCursor(Qt.PointingHandCursor)
cancel_btn.setMinimumHeight(38)
cancel_btn.setStyleSheet(
"QPushButton { background: #95a5a6; color: white; border-radius: 6px; "
"font-weight: bold; font-size: 13px; padding: 4px 16px; }"
"QPushButton:hover { background: #7f8c8d; }"
)
cancel_btn.clicked.connect(self._on_cancel)
btn_layout.addWidget(cancel_btn)
btn_layout.addStretch()
# 强制跳过运行(中)
skip_btn = QPushButton("强制跳过运行")
skip_btn.setCursor(Qt.PointingHandCursor)
skip_btn.setMinimumHeight(38)
skip_btn.setStyleSheet(
"QPushButton { background: #3498db; color: white; border-radius: 6px; "
"font-weight: bold; font-size: 13px; padding: 4px 16px; }"
"QPushButton:hover { background: #2980b9; }"
)
skip_btn.clicked.connect(self._on_force_skip)
btn_layout.addWidget(skip_btn)
# 填写待办primary
fill_btn = QPushButton("填写待办")
fill_btn.setCursor(Qt.PointingHandCursor)
fill_btn.setMinimumHeight(38)
fill_btn.setDefault(True)
fill_btn.setAutoDefault(True)
fill_btn.setStyleSheet(
"QPushButton { background: #27ae60; color: white; border-radius: 6px; "
"font-weight: bold; font-size: 13px; padding: 4px 16px; }"
"QPushButton:hover { background: #1e8449; }"
)
fill_btn.clicked.connect(self._on_fill_first)
btn_layout.addWidget(fill_btn)
main_layout.addLayout(btn_layout)
def _build_item_row(self, item: MissingItem) -> QWidget:
"""构建单个缺失项行 widget。"""
frame = QFrame()
frame.setFrameShape(QFrame.StyledPanel)
frame.setStyleSheet(
"QFrame { background: #2c3e50; border-radius: 8px; padding: 10px; }"
"QFrame[critical=true] { border: 2px solid #e74c3c; }"
"QFrame[critical=false] { border: 1px solid #34495e; }"
)
frame.setProperty("critical", item.is_critical)
layout = QVBoxLayout(frame)
layout.setContentsMargins(12, 10, 12, 10)
layout.setSpacing(6)
# ── 第一行:步骤标签 + 原因 ──
top = QHBoxLayout()
top.setSpacing(8)
# 步骤名标签
name_label = QLabel(f"📌 {item.step_name}")
name_label.setStyleSheet(
"font-size: 13px; font-weight: bold; color: #f39c12; background: #1a252f; "
"border-radius: 4px; padding: 4px 10px;"
)
top.addWidget(name_label)
# 阻断性标记
if item.is_critical:
critical_label = QLabel("阻断")
critical_label.setStyleSheet(
"background: #e74c3c; color: white; border-radius: 4px; "
"font-size: 11px; font-weight: bold; padding: 3px 8px;"
)
top.addWidget(critical_label)
top.addStretch()
# "填写"按钮
fill_btn = QPushButton("填写")
fill_btn.setCursor(Qt.PointingHandCursor)
fill_btn.setFixedWidth(70)
fill_btn.setFixedHeight(28)
fill_btn.setStyleSheet(
"QPushButton { background: #27ae60; color: white; border-radius: 5px; "
"font-size: 12px; font-weight: bold; }"
"QPushButton:hover { background: #1e8449; }"
)
fill_btn.clicked.connect(lambda *a, sid=item.step_id, idx=item.panel_tab_index: self._on_fill(sid, idx))
self._fill_buttons.append(fill_btn)
top.addWidget(fill_btn)
layout.addLayout(top)
# ── 第二行:原因文本 ──
reason_label = QLabel(item.reason)
reason_label.setWordWrap(True)
reason_label.setStyleSheet(
"font-size: 12px; color: #bdc3c7; background: transparent; padding: 2px 4px;"
)
reason_label.setTextInteractionFlags(Qt.TextSelectableByMouse)
layout.addWidget(reason_label)
# ── 第三行:忽略复选框 ──
bottom = QHBoxLayout()
bottom.addStretch()
skip_cb = QCheckBox("忽略此项(强制跳过)")
skip_cb.setCursor(Qt.PointingHandCursor)
skip_cb.setStyleSheet(
"QCheckBox { color: #95a5a6; font-size: 12px; spacing: 6px; }"
"QCheckBox::indicator { width: 16px; height: 16px; }"
)
skip_cb.setChecked(False)
skip_cb.stateChanged.connect(
lambda state, cb=skip_cb: cb.setStyleSheet(
"QCheckBox { color: #27ae60; font-size: 12px; spacing: 6px; }"
"QCheckBox::indicator { width: 16px; height: 16px; }"
if state else
"QCheckBox { color: #95a5a6; font-size: 12px; spacing: 6px; }"
"QCheckBox::indicator { width: 16px; height: 16px; }"
)
)
self._skip_checkboxes.append((item.step_id, skip_cb))
bottom.addWidget(skip_cb)
layout.addLayout(bottom)
return frame
# ------------------------------------------------------------------
# 槽函数
# ------------------------------------------------------------------
def _on_cancel(self):
"""取消运行:完全停止。"""
self.result_data = None
self.reject()
def _on_force_skip(self):
"""强制跳过:收集所有被勾选"忽略"的 step_id携带 skip_list 继续。"""
skip_list = [
step_id for step_id, cb in self._skip_checkboxes if cb.isChecked()
]
self.result_data = ("skip", skip_list)
self.accept()
def _on_fill_first(self):
"""填写待办:找到第一个未被勾选"忽略"的缺失项,切换到其面板。"""
for step_id, cb in self._skip_checkboxes:
if not cb.isChecked():
item = self._find_item(step_id)
if item:
self.result_data = ("fill", step_id, item.panel_tab_index)
self.accept()
return
# 所有项都被勾选 → 等同于 force_skip
self._on_force_skip()
def _on_fill(self, step_id: str, tab_index: int):
"""填写:直接切换到指定面板。"""
self.result_data = ("fill", step_id, tab_index)
self.accept()
# ------------------------------------------------------------------
# 辅助
# ------------------------------------------------------------------
def _find_item(self, step_id: str) -> Optional[MissingItem]:
for item in self.missing_items:
if item.step_id == step_id:
return item
return None
def get_result(self) -> Optional[Tuple[str, any]]:
"""供外部获取结果。"""
return self.result_data
@staticmethod
def build_missing_items(config: dict) -> List[MissingItem]:
"""DAG-aware 预检:从 config 构建缺失项列表。
拓扑预判逻辑:
1. 按 pipeline 顺序遍历所有 enabled=True 的步骤,收集其 produces 列表,
构建「动态产物集合」dynamically_produced_keys。
2. 检查某个 required_input_file 时:
- 若磁盘已存在 → OK用户已手动提供
- 若 key 在 dynamically_produced_keys 中 → OK前置步骤会生成
- 否则 → MissingItem真正缺失
3. 智能免检规则:
- formula_csv_path底层的完全可选参数任何情况下都免检。
- step5 boundary_path若 step1 enabled 或 config 中有 water_mask_path
则信任 panel/底层的自动推导机制,不拦截。
- step14 boundary_shp_path若 step1 enabled信任 panel 的自动回填,
不拦截。
关键阻断项is_critical=Truestep1 img_path 缺失。
"""
items: List[MissingItem] = []
step1_cfg = config.get('step1', {})
step1_enabled = step1_cfg.get('enabled', False)
# ── ★ 构建「动态产物集合」:按 pipeline 顺序收集所有 enabled 步骤的 produces ──
dynamically_produced_keys: Set[str] = set()
enabled_step_ids: Set[str] = set()
for step_spec in PIPELINE_STEPS:
step_cfg = config.get(step_spec.step_id, {})
if not step_cfg.get('enabled', True):
continue
enabled_step_ids.add(step_spec.step_id)
dynamically_produced_keys.update(step_spec.produces)
# ── step1 img_path阻断性───────────────────────────────
img_path = step1_cfg.get('img_path')
if not img_path:
items.append(MissingItem(
step_id="step1", step_name="水域掩膜",
reason="缺少参考影像路径 → 请在「阶段一」中填写「参考影像」",
panel_tab_index=0, is_critical=True
))
elif not os.path.isfile(img_path):
items.append(MissingItem(
step_id="step1", step_name="水域掩膜",
reason=f"参考影像文件不存在:{img_path}",
panel_tab_index=0, is_critical=True
))
# ── step4 csv_path纯外部输入必须手动提供───────────────
step4_cfg = config.get('step4', {})
step4_enabled = step4_cfg.get('enabled', True)
if step4_enabled:
csv_path = step4_cfg.get('csv_path')
if not csv_path:
items.append(MissingItem(
step_id="step4", step_name="数据清洗",
reason="请在「数据清洗」中填写「实测水质数据 CSV」",
panel_tab_index=3
))
elif not os.path.isfile(csv_path):
items.append(MissingItem(
step_id="step4", step_name="数据清洗",
reason=f"实测水质数据文件不存在:{csv_path}",
panel_tab_index=3
))
# ── step12 formula_csv_path绝对免检底层完全可选────────
# formula_csv_path 在底层 CustomRegressionPredictor 中不传即可运行,
# 只影响日志输出,不阻断任何功能。此处不做任何检查。
# ── ★ DAG-aware 检查:遍历 enabled 步骤的 required_input_files ──
PURE_EXTERNAL_INPUT_KEYS: Set[str] = {'img_path', 'csv_path'}
_TAB_INDEX_MAP: Dict[str, int] = {
"step1": 0, "step2": 1, "step3": 2, "step4": 3,
"step5": 4, "step8": 5, "step7": 6,
"step8_non_empirical_modeling": 7, "step9": 8,
"step10": 9, "step11_ml": 10, "step11": 11,
"step12": 12, "step14": 13,
}
_STEP_NAME_MAP: Dict[str, str] = {
"step1": "水域掩膜", "step2": "耀斑检测", "step3": "耀斑去除",
"step4": "数据清洗", "step5": "特征构建", "step8": "水质指数",
"step7": "监督建模", "step8_non_empirical_modeling": "回归建模",
"step9": "自定义回归建模", "step10": "采样点布设",
"step11_ml": "监督预测", "step11": "回归预测",
"step12": "自定义回归预测", "step14": "专题图生成",
}
for step_spec in PIPELINE_STEPS:
if step_spec.step_id not in enabled_step_ids:
continue
step_cfg = config.get(step_spec.step_id, {})
tab_idx = _TAB_INDEX_MAP.get(step_spec.step_id, 0)
step_name = _STEP_NAME_MAP.get(step_spec.step_id, step_spec.step_id)
for req_key in step_spec.required_input_files:
# ★★★ 高优先级硬编码白名单 ★★★
# 当检测到需求为边界文件时,只要 step1 有填影像(代表有基础,底层能自动推导),直接放行
if req_key in ('boundary_path', 'boundary_shp_path'):
step1_cfg = config.get('step1', {})
if step1_cfg.get('img_path') or step1_cfg.get('enabled', True):
continue # 直接跳过,不判定为缺失
if req_key in PURE_EXTERNAL_INPUT_KEYS:
continue
if req_key == 'formula_csv_path':
continue # ★ 底层完全可选,赦免
if req_key == 'boundary_path' and step_spec.step_id == 'step5':
continue # ★ step1 执行则 panel/底层自动推导,赦免
if req_key == 'boundary_shp_path' and step_spec.step_id == 'step14':
continue # ★ step1 执行则 panel 自动回填,赦免
cfg_val = step_cfg.get(req_key)
if cfg_val and os.path.isfile(cfg_val):
continue
if cfg_val and os.path.isdir(cfg_val):
continue
if req_key in dynamically_produced_keys:
continue # ★ 前置步骤会生成,拓扑预判通过
items.append(MissingItem(
step_id=step_spec.step_id,
step_name=step_name,
reason=f"缺少必需文件/目录 [{req_key}]",
panel_tab_index=tab_idx,
is_critical=(step_spec.step_id == "step1" and req_key == "img_path"),
))
return items

Some files were not shown because too many files have changed in this diff Show More