DocsAutoResponderProcess

AutoResponder Determination System - Complete Reference#

This document is the single, authoritative reference for the AutoResponderProcess determination system. It covers the end-to-end flow from email extraction through LLM classification to action execution, including all business rules, system update logic, Multipub validation, and implementation details.


Table of Contents#

  1. Executive Summary
  2. Architecture Overview
  3. Complete Process Flow
  4. LLM Classification Pipeline
  5. Category-to-Determination Mapping
  6. Determination Types — Detailed Reference
  7. Multipub Validation (Tarun's Process)
  8. System Update Rules
  9. Run Outputs (Spreadsheets & Logs)
  10. Notifications
  11. Special Cases & Edge Cases
  12. Configuration
  13. Dry-Run Mode
  14. Project Structure
  15. Integration History
  16. Error Handling
  17. Performance Considerations
  18. Testing
  19. Implementation Status & Enhancements
  20. Known Limitations & Risks
  21. Team Responsibilities

Executive Summary#

The AutoResponderProcess is a production-ready system that processes auto-response emails from a central MSSQL repository. It uses an LLM-based classification pipeline (OpenAI or Ollama) to determine the status of each contact, then executes appropriate actions across four database systems: Cupola, Hodor, Multipub, and Salesforce.

Key Principles:

Determination Types (6):

DeterminationPrimary Action
UNKNOWNSkip processing
INACTIVEMark inactive, check subscriptions, notify
REPLACEMENTMark original inactive, add replacement to Cupola, notify
EMAIL_UPDATEChanged Email (no alt) — update email/title in Cupola/Hodor/SF (skip Multipub/SF-if-Multipub)
TITLE_UPDATESame handler as EMAIL_UPDATE; not emitted by map_category_to_determination() (reserved for tests / future use)
ACTIVEEnsure active status in Cupola/Hodor (skip Multipub/SF-if-Multipub)

Architecture Overview#

Original Email Classifier#

Location: Auto_responder_email/unified_email_classifier.py (reference only — functionality fully integrated into main application)

The original program was a FastAPI-based service with a /classify/csv/ endpoint. Its LLM classification and QA agent logic has been extracted and integrated into the main application's determination services.

Current Application#

Location: src/auto_responder/

Workflow: Extract → Deduplicate → LLM determination (Step 4a) → resolve lookup email → contact lookup (Step 4c; stub in dry-run) → execute actions (Step 4d) → notify

(main.py runs classification before contact lookup so UNKNOWN emails skip database lookups.)

Key Features:

LLM Output Categories (9)#

  1. Not Relevant
  2. Flagged as Spam
  3. Undeliverable
  4. Retired
  5. Out of Office (with long-term leave flag)
  6. Deceased
  7. Left Company
  8. Changed Email
  9. Unknown

Application Determination Types (6)#

  1. ACTIVE — Person is active at organization
  2. INACTIVE — Person is no longer at organization
  3. REPLACEMENT — Person has been replaced by someone else
  4. EMAIL_UPDATEChanged Email (no alternate) — only determination the mapper produces for “changed contact info” today; _handle_title_update()
  5. TITLE_UPDATE — Same handler as EMAIL_UPDATE in ActionEngine; the live LLM path does not set this value (mapper never returns it). Use for fixtures/tests or future wiring.
  6. UNKNOWN — Cannot determine action type

Complete Process Flow#

PlantUML Diagram#

Source file: diagrams/business_flow_technical.puml. Pre-rendered exports sit alongside the source: business_flow_technical.svg and business_flow_technical.pdf. Re-render any of the .puml files with python docs/diagrams/_render_business_flow.py [stem] or with any PlantUML-compatible tool. The technical diagram calls out class / method entry points (EmailProcessor.deduplicate_emails(), AutoResponseDetermination.determine(), ContactLookup.lookup_with_enrichment(), ActionEngine.execute() and its _handle_* branches), the run artifacts written at each step, and the notify_* dispatches used for routing. The business-level, non-engineering view lives in diagrams/business_flow_simplified.puml — that is the file to share externally. The two diagrams are kept in sync (business_flow_simplified.puml / business_flow_technical.puml).

Processing Steps#

StepDescriptionKey Classes / notes
Step 1Extract emails from central MSSQL repositoryEmailRepository
Step 2Deduplicate emails by Message IDEmailProcessor
Step 3Initialize services (with graceful degradation)ContactLookup, AutoResponseDetermination, ActionEngine, Notifier
Step 4aLLM determination (classification, QA, map to Determination)AutoResponseDetermination.determine()
If determination is UNKNOWN, skip (no contact lookup, no actions)main.py
Step 4cContact lookup using resolved lookup_email (dry-run: stub Contact, no DB lookup)ContactLookup.lookup_with_enrichment()
Step 4dExecute actions based on determinationActionEngine.execute()

Data Flow Summary#

Email Repository (MSSQL - CBI_EmailExtraction.dbo.Emails)
    ↓
Step 1: Email Extraction (most recent N emails)
    ↓
Step 2: Deduplication (by Message ID)
    ↓
Step 3: Service Initialization
    ├─ ContactLookup (Cupola, Hodor, Multipub, Salesforce)
    │  (graceful degradation — continues if some connectors fail)
    ├─ AutoResponseDetermination (LLM: OpenAI or Ollama)
    ├─ ActionEngine (with all DB connectors + Notifier)
    └─ [Dry-Run: mock Cupola/Multipub/Salesforce + real HodorDatabase; optional mock email repo; ActionLogger + StageLogger]
    ↓
Step 4: Process Each Email
    ↓
┌───────────────────────────────────────────────────┐
│ Step 4a: DETERMINATION (LLM pipeline)               │
│                                                     │
│ 1. Classification Agent → Category + Contact Info │
│ 2. QA Agent → Verification + Corrections            │
│ 3. Email Validation → Clean addresses               │
│ 4. Normalize compound categories → single label     │
│ 5. Category Mapper → Determination enum             │
│ 6. Identify auto-response source email             │
│ 7. Post-Processing → replacement email ≠ sender   │
└───────────────────────────────────────────────────┘
    ↓
    UNKNOWN? → skip (no lookup, no actions)
    ↓
┌───────────────────────────────────────────────────┐
│ Step 4c: Contact Lookup                            │
│ Search: Cupola → Hodor → Multipub → Salesforce     │
│ (graceful degradation; dry-run: stub Contact)      │
│ Returns: Unified Contact with records from all     │
└───────────────────────────────────────────────────┘
    ↓
Step 4d: Action Engine Execution
    ├─→ UNKNOWN → Skip
    │
    ├─→ INACTIVE → ┌──────────────────────────────────────────┐
    │              │ MULTIPUB VALIDATION (Tarun's Process)     │
    │              │ 1. Match by email/user_id or name+org    │
    │              │ 2. Check active subscriptions            │
    │              │ 3. Check recently expired (12 mo)        │
    │              │ 4. Check single-issue purchases (12 mo)  │
    │              │ 5. Flag for review if subscriptions found│
    │              └──────────────────────────────────────────┘
    │              → Mark inactive (Cupola, Hodor only)
    │              → Skip Multipub & Salesforce-if-Multipub
    │              → Check subscriptions, notify if found
    │
    ├─→ REPLACEMENT → Mark original inactive (same logic)
    │                → Add replacement to Cupola
    │                → Notify Sai Teja
    │
    ├─→ EMAIL_UPDATE / TITLE_UPDATE → Update email/title in Cupola/Hodor/SF
    │                → Skip Multipub & Salesforce-if-Multipub
    │
    └─→ ACTIVE → Ensure active (Cupola/Hodor)
               → Add to Cupola if not present
               → Skip Multipub & Salesforce-if-Multipub
    ↓
Notifications (if needed)
    ├─→ Multipub validation flag notification → Client Services
    ├─→ Person left with subscription notification → Max / Client Services
    └─→ New replacement notification → Sai Teja
    ↓
Next Email / Summary Report

LLM Classification Pipeline#

Location in Code: src/auto_responder/main.py → Step 4a
Service: AutoResponseDetermination.determine()

Pipeline Steps#

  1. Classification Agent (LLM)

    • Analyzes email subject and body (truncated to 12,000 chars)
    • Extracts contact information (sender_new_email, alternate_contact, retired_personal_email)
    • Assigns category based on priority order
    • Returns initial classification result with confidence level

  2. QA Agent (LLM)

    • Verifies classification agent's output
    • Checks category correctness and priority order
    • Validates contact information extraction
    • Applies corrections if needed
    • Returns QA-verified result

  3. Email Address Validation

    • Validates sender_new_email and retired_personal_email using regex
    • Cleans and strips whitespace

  4. Normalize LLM category (normalize_llm_category)

    • Models sometimes return compound strings (e.g. "Out of Office, Changed Email").
    • The pipeline collapses these to one canonical label using pattern priority in category_mapper.py (highest business severity first: Undeliverable → Deceased → Retired → Left Company → Changed Email → Out of Office → …).
    • That single label drives map_category_to_determination and the person/email status columns.

  5. Category Mapping

    • Maps normalized LLM category to Determination enum
    • Parses alternate_contact into ReplacementInfo
    • If alternate_contact parsing fails for Retired/Deceased/Left Company → falls back to INACTIVE
    • Creates DeterminationResult (does not set updated_title or TITLE_UPDATE today — see EMAIL_UPDATE / TITLE_UPDATE under Determination Types — Detailed Reference)

  6. Source email identification (AutoResponseDetermination Step 5 in code)

    • identify_auto_response_source_email() sets source_email on the result; main.py prefers it over the raw sender when resolving lookup_email.

    • After Step 1b relay normalization (normalize_relay_email, optional multi-label split when the body is available), reconcile_email_with_body() corrects mangled person addresses (e.g. colleen.bivona.domail@maricopa.edu → signature colleen.bivona@domail.maricopa.edu) at ingest, during source identification (Step 3b before the cleaned-sender fallback), and again when resolving lookup_email.

  7. Post-Processing (alternate contacts vs sender)

    • Removes replacement entries whose replacement_email equals the sender
    • If no replacement entries remain and determination was REPLACEMENT:

      • Final category Retired, Deceased, or Left Company → change determination to INACTIVE
      • Otherwise (e.g. Changed Email with unusable alt) → change determination to UNKNOWN

Contact Information Fields#

FieldSourceUsageUpdates Systems?
sender_new_emailLLM extractionUpdate email in systemsYes
alternate_contactLLM extractionCreate replacement contactYes (Cupola)
retired_personal_emailLLM extractionStore in notes onlyNo
is_long_term_leaveLLM flagLog for reviewNo
new_org_details (new_org_* keys)LLM QA JSON (nested object or flat keys); mapper fallback via _enrich_new_org_detailsoutput_document_inactive_new_org when person movedNo (IP4 research list)
updated_titleNot populated by LLM pipelineWould drive update_contact_title() if setYes (if ever wired)

QA checklist (move / retire categories): Prefer a structured new_org_details object in QA JSON (new_org_name, new_org_email, etc.). When only final_sender_new_email is present, category_mapper._enrich_new_org_details still populates new_org_email for Left Company, Retired, Deceased, and Changed Email so output_document_inactive_new_org is not empty.


Category-to-Determination Mapping#

LLM CategoryPerson StatusEmail StatusDeterminationSpecial Handling
Not RelevantInconclusiveUnknownUNKNOWNSkip
Flagged as SpamInconclusiveUnknownUNKNOWNSkip
UndeliverableInconclusiveInvalidINACTIVEMark email invalid
Retired (no alt)InactiveInvalidINACTIVECheck subscriptions
Retired (with alt)InactiveInvalidREPLACEMENTMark original inactive + add replacement
Out of OfficeActiveValidOUT_OF_OFFICEFirst-class determination. Routed to Human Review digest (HUMAN_REVIEW_REASON_OUT_OF_OFFICE); no CUPOLA / Hodor / Multipub / SF writes — dedicated OOO extraction process will pick these up.
Deceased (no alt)InactiveInvalidINACTIVECheck subscriptions
Deceased (with alt)InactiveInvalidREPLACEMENTMark original inactive + add replacement
Left Company (no alt)InactiveInvalidINACTIVECheck subscriptions
Left Company (with alt)InactiveInvalidREPLACEMENTMark original inactive + add replacement
Changed Email (no alt)ActiveInvalidEMAIL_UPDATEUpdate email/title in Cupola/Hodor/SF (skip Multipub/SF-if-Multipub)
Changed Email (with alt)ActiveInvalidREPLACEMENTAdd replacement
UnknownInconclusiveUnknownUNKNOWNSkip

Compound categories: If the LLM returns multiple labels in one string, normalize_llm_category picks a single row using the severity order above — not necessarily the first phrase in the string.

Fallback: If alternate_contact is present for Retired/Deceased/Left Company but parsing fails, the determination falls back to INACTIVE.


Determination Types — Detailed Reference#

1. UNKNOWN#

Source Categories: Not Relevant, Flagged as Spam, Unknown

Business Logic: These emails are not actionable auto-responses. No database updates or notifications are required.

Process:

┌─────────────────────────────────┐
│ Skip Processing                 │
│ - Log warning                   │
│ - No database updates           │
│ - Continue to next email        │
└─────────────────────────────────┘

Spreadsheet landing (every UNKNOWN row):

Code Location: action_engine.pyexecute() → early return for UNKNOWN; main.py also calls OutputDocumentCollector.add_human_review() before the skip so UNKNOWN never "disappears" from the per-run ledger.


2. INACTIVE#

Source Categories: Undeliverable, Retired (without alt), Deceased (without alt), Left Company (without alt)

Business Rules:

Process:

┌─────────────────────────────────────────────────────────────────────────────┐
│ STEP 1: MULTIPUB VALIDATION (Tarun's Cross-Check Process)                   │
│    Before marking inactive, validate against Multipub:                      │
│                                                                             │
│    a. Match contact by:                                                     │
│       • Email address / User ID (PRIMARY)                                   │
│       • First Name + Last Name + Organization (SECONDARY - with caution)    │
│                                                                             │
│    b. Check subscription scenarios:                                         │
│       • Active subscriptions (multi-issue, eiss > today)                    │
│       • Subscriptions expired within last 12 months                         │
│       • Single-issue purchases within last 12 months                        │
│                                                                             │
│    c. If relevant subscriptions found:                                      │
│       → Flag for review                                                     │
│       → notify_multipub_subscriber_followup_batch() to Angel + Yogesh       │
│       → Proceed with marking inactive (after notification)                  │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 2: Update IPv4 Team Systems (Cupola and Hodor)                         │
│    ├─ Cupola: update_contact_status(False) + check_active_subscription()    │
│    └─ Hodor: update_contact_status(False) + check_active_subscription()     │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 3: ACE Team Systems (Read-Only Approach)                               │
│    ├─ Multipub: **SKIP status update**                                      │
│    │  (Multipub is subscription data only;                                  │
│    │   marketing team handles contact changes)                              │
│    │  Still check_active_subscription() for consistency                     │
│    │                                                                        │
│    └─ Salesforce:                                                           │
│       ├─ If contact IS in Multipub → **SKIP** Contact update               │
│       │  (Contact is related to Multipub subscriber)                        │
│       └─ If contact NOT in Multipub:                                        │
│          → update_contact_status(False)                                     │
│          → check_active_subscription()                                      │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 4: Notifications                                                       │
│    └─ If Active Subscription Found in any system:                           │
│       → notify_multipub_subscriber_followup_batch (catalog N02) │
│       → To: Max / Client Services                                           │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│ STEP 5: Store Additional Information                                        │
│    └─ Store retired_personal_email in notes (if provided, for reference)    │
│       (NOT updated in any system — logged only)                             │
└─────────────────────────────────────────────────────────────────────────────┘

Code Location: action_engine.py_handle_inactive() and _validate_multipub_before_inactive()


3. REPLACEMENT#

Source Categories:

Any INACTIVE category (Retired, Deceased, Left Company) with an alternate_contact present is automatically mapped to REPLACEMENT. If the alternate_contact string cannot be parsed, it falls back to INACTIVE.

Business Rules:

Process:

┌─────────────────────────────────────────────────────┐
│ 1. Validate replacement_info                        │
│    └─ If None → Log warning, return False           │
│                                                      │
│ 2. Mark Original Person INACTIVE                    │
│    └─ Calls _handle_inactive() for original         │
│       (Includes full Multipub validation process)   │
│                                                      │
│ 3. Use replacement_info from DeterminationResult    │
│    └─ Contains: name, email, title                  │
│       Parsed from alternate_contact string           │
│       Format: "Name: John, Email: john@abc.com"      │
│                                                      │
│ 4. Add Replacement to Cupola                        │
│    └─ check_contact_exists → add_contact() or      │
│       reuse existing org_person_id (see connections/cupola.html)  │
│       (IPv4 team system only)                       │
│                                                      │
│ 5. Notify Sai Teja                                  │
│    └─ notify_sai_action_items() (catalog N05/N06)       │
│       About: new replacement person                 │
└─────────────────────────────────────────────────────┘

Code Location: action_engine.py_handle_replacement()


4. EMAIL_UPDATE / TITLE_UPDATE#

Source Categories: Changed Email (without alternate_contact, and not normalized to another label) maps to EMAIL_UPDATE in category_mapper.py.
TITLE_UPDATE: The enum and ActionEngine branch exist and share _handle_title_update() with EMAIL_UPDATE, but map_category_to_determination() never returns TITLE_UPDATE, and create_determination_result() does not set updated_title. Production runs therefore only reach this handler as EMAIL_UPDATE (with sender_new_email from the LLM). Title-only updates via updated_title would require future LLM/schema wiring or manual construction of DeterminationResult.

Business Rules:

Process:

┌─────────────────────────────────────────────────────────────┐
│ 1. Check for sender_new_email                               │
│                                                              │
│ 2. If sender_new_email Present:                              │
│    └─ Update Email in systems (with skip logic):            │
│       ├─ Multipub → **SKIP**                                │
│       │  (Marketing team handles contact info changes)      │
│       ├─ Salesforce (if in Multipub) → **SKIP**             │
│       │  (Contact is related to Multipub subscriber)        │
│       ├─ Cupola → update_contact_email()                    │
│       ├─ Hodor → update_contact_email()                     │
│       └─ Salesforce (if NOT in Multipub)                    │
│          → update_contact_email()                           │
│                                                              │
│ 3. Log Email Mapping                                        │
│    └─ old_email → new_email                                  │
│                                                              │
│ 4. If updated_title Present:                                │
│    └─ Same skip rules as email; then update_contact_title()  │
│       on Cupola / Hodor / Salesforce where applicable       │
└─────────────────────────────────────────────────────────────┘

Code Location: action_engine.py_handle_title_update()


5. ACTIVE#

Source categories: Left without a clear OOO signal — e.g. emails that still resolve to "active person, no change needed" after QA. OOO is now tracked separately as OUT_OF_OFFICE (see §6).

Business Rules (Active-only policy):

Process:

┌─────────────────────────────────────────────────────────────┐
│ 1. Lookup CUPOLA rows for contact                           │
│    ├─ No rows                → add_inactive_no_cupola_match │
│    │                            + Human Review              │
│    │                            (ACTIVE_NEW_CONTACT)        │
│    └─ Rows found → for each row:                            │
│        ├─ active  → "Already active" (no-op)                │
│        └─ inactive → Human Review                           │
│                     (REACTIVATION_CANDIDATE)                │
│                                                              │
│ 2. All CUPOLA state changes require human verification;     │
│    no update_contact_status / add_contact calls fire on     │
│    this path.                                               │
└─────────────────────────────────────────────────────────────┘

Code Location: action_engine.py_handle_active()


6. OUT_OF_OFFICE#

Source Categories: Out of Office, Out of Office (Long-term Leave: is_long_term_leave=True)

Business Rules: OOO replies signal nothing actionable for CUPOLA / Hodor / Multipub / Salesforce on the current run. They are surfaced for the planned dedicated out-of-office extraction process.

Process:

┌─────────────────────────────────────────────────────────────┐
│ 1. Add row to Human Review digest                           │
│    reason = HUMAN_REVIEW_REASON_OUT_OF_OFFICE               │
│                                                              │
│ 2. NO writes to any system. NO status updates.              │
└─────────────────────────────────────────────────────────────┘

Code Location: action_engine.py_handle_out_of_office()


Multipub Validation (Tarun's Process)#

Purpose#

Ensure data accuracy across systems by verifying that inactive contacts from the Autoresponse process do not have active or recently relevant subscriptions in Multipub before marking them as inactive.

Applies to: INACTIVE determination (and the INACTIVE step within REPLACEMENT). Runs before marking inactive.

Matching Criteria#

PriorityMethodDescription
PRIMARYEmail / User IDDirect match on email address or user ID
SECONDARYName + OrgFirst Name + Last Name + Organization (use with caution, only if exactly one match found)

Subscription Scenarios Checked#

ScenarioCriteriaSQL Details
Active SubscriptionsMulti-issue orders with future end dateo.stat IN ('01','02','03','04','05','30') AND o.eiss > GETDATE() AND p.SM='M'
Recently ExpiredExpired within last 12 monthso.eiss BETWEEN DATEADD(YEAR,-1,GETDATE()) AND GETDATE() AND p.SM='M'
Single-Issue PurchasesPurchased within last 12 monthso.odte > DATEADD(YEAR,-1,GETDATE()) AND p.SM='S'

Active order queries include both direct orders and site license orders (via pub.site, pub.SITEORD, pub.SITEUSER joins, excluding IP-authenticated site licenses).

Important: The focus is on subscribers (contacts), not order-level criteria. Factors such as paid/unpaid status, comp, trial, or site license orders are NOT used to exclude contacts from the review.

Validation Result Actions#

Validation Flow#

Contact marked as INACTIVE by LLM
    ↓
_validate_multipub_before_inactive()
    ↓
┌─────────────────────────────────────────────────────────┐
│ 1. Find subscriber in Multipub                          │
│    a. Try email match                                   │
│    b. Try user_id match                                 │
│    c. Try name+org match (if single result only)        │
│                                                         │
│ 2. Check subscription scenarios:                        │
│    • _get_active_orders()                               │
│    • _get_recently_expired_orders()                     │
│    • _get_recent_single_issue_orders()                  │
│                                                         │
│ 3. If any relevant subscriptions found:                 │
│    • Set should_flag_for_review = True                  │
│    • Build review_reason summary                        │
│                                                         │
│ 4. Return MultipubValidationResult                      │
└─────────────────────────────────────────────────────────┘
    ↓
If flagged for review:
    → notify_multipub_subscriber_followup_batch()
    → Log detailed findings
    → Continue with inactive marking (with notification)
    ↓
Proceed with _handle_inactive()

SQL Queries#

Active Orders (Multi-issue + Site License):

sql
-- Direct orders
SELECT o.onum, o.subsnum, o.prdcde, p.prdtyp, p.gprdtyp,
       o.stat, o.odte, o.eiss, o.pamt, o.trm, ...
FROM pub.ord o
INNER JOIN pub.prd p ON p.prdcde = o.prdcde
WHERE o.subsnum = ?
  AND o.stat IN ('01','02','03','04','05','30')
  AND o.eiss > GETDATE()
  AND p.SM = 'M'

-- Site license orders (additional complexity)
-- Includes JOIN to pub.site, pub.SITEORD, pub.SITEUSER
-- Excludes IP-based site licenses

Expired Orders (Last 12 Months):

sql
SELECT o.onum, o.subsnum, ...
FROM pub.ord o
INNER JOIN pub.prd p ON p.prdcde = o.prdcde
INNER JOIN pub.sc sc ON sc.prdcde = o.prdcde AND sc.eiss = o.eiss
WHERE o.subsnum = ?
  AND o.stat NOT IN ('01','02','03','04','05','30')
  AND o.eiss BETWEEN DATEADD(YEAR,-1,GETDATE()) AND GETDATE()
  AND p.SM = 'M'

Single-Issue Orders (Last 12 Months):

sql
SELECT o.onum, o.subsnum, ...
FROM pub.ord o
INNER JOIN pub.prd p ON p.prdcde = o.prdcde
WHERE o.subsnum = ?
  AND o.odte > DATEADD(YEAR,-1,GETDATE())
  AND p.SM = 'S'

Notification Format#

When a contact is flagged for review, the notification includes:

Subject: [REVIEW REQUIRED] Multipub Validation Flag: John Doe

Contact: John Doe (john.doe@example.com)
Multipub Subscriber: 82025008
Match Method: email

=== SUBSCRIPTION FINDINGS ===

[ACTIVE SUBSCRIPTIONS] (2 found):
  - Order 12345: Product Name ABC (expires 2026-12-31)
  - Order 12346: Product Name XYZ (expires 2027-06-30) [Site License]

[RECENTLY EXPIRED] (1 found):
  - Order 12340: Product Name DEF (expired 2025-08-15)

[RECENT SINGLE-ISSUE PURCHASES] (3 found):
  - Order 12350: Single Issue Product (purchased 2025-10-01)
  ...

=== ACTION REQUIRED ===
Please review the above subscription details and verify that marking
this contact as inactive is appropriate.

Review Reason: 2 active subscription(s); 1 recently expired; 3 recent single-issue

Code Location: multipub.pyvalidate_for_inactive(), action_engine.py_validate_multipub_before_inactive()


System Update Rules#

SystemINACTIVEREPLACEMENTEMAIL_UPDATE / TITLE_UPDATEACTIVE
Cupola✅ Mark inactive✅ Mark original inactive + Add replacement✅ Update email/title✅ Mark active (or add new)
Hodor✅ Mark inactive (Prospect Import row plus AutoRenewal SQL by email in live mode) — or email list only if HODOR_AUTOMATIC_UPDATES=false✅ Mark original inactive (via _handle_inactive) — or email list✅ Update email/title — or email list✅ Mark active if inactive — or email list
MultipubSKIP (subscription data only)SKIP (via _handle_inactive)SKIP (marketing handles)SKIP (subscription data only)
Salesforce⚠️ Only if NOT in Multipub⚠️ Only if NOT in Multipub (via _handle_inactive)⚠️ Only if NOT in Multipub⚠️ Only if NOT in Multipub

Reasoning: Multipub is a subscription management system — the marketing team is responsible for updating contact information there. Salesforce Contacts linked to Multipub subscribers are managed through the Multipub→Salesforce sync, so direct updates would create conflicts.

Database Methods#

Cupola (IPv4 Team)

Hodor (IPv4 Team)

Multipub (ACE Team)

Salesforce (ACE Team)


Run Outputs (Spreadsheets & Logs)#

Every run writes its artifacts to a fresh processing_reports/run_{timestamp}/ folder. The invariant is simple: every email gets a row on the Processing Report (master); uncertain rows also get a row on Human Review; automated writes produce an audit bundle for Venu.

Per-email ledgers (always written)#

FileRowsPurpose
processing_report_master.csv / .jsonEvery emailThe run's ledger of record. One row per email with its final determination, outcome, systems touched, and errors.
processing_report_ip4.csvIP4-actionable categories onlyFiltered copy for Sai's team (Out of Office, Retired, Deceased, Left Company, Changed Email). Uses the IP4 column subset and order (see ReportGenerator._CSV_COLUMNS_IP4); the master CSV remains the full ledger.
processing_report.logEvery emailHuman-readable per-email log.
category_summary_report.csv / .jsonEvery emailCoarse category view (Undeliverable / Left Company / OOO / Changed Email / Other) for reporting.
classifier_output/classifier_output.csv / .jsonEvery classified emailRaw LLM + QA payloads.
batch_report.html / batch_report.pptxRun summaryDashboard + deck.

Outcome-specific spreadsheets (only when rows exist)#

FilePopulated whenConsumer
output_document_human_review.csv / .jsonUNKNOWN, no-match-actionable, bounce awaiting the Max + Vish rule, or replacement parse fallbackSai action items + Venu audit metadata (notify_sai_action_items / notify_venu_cupola_audit_files)
output_document_inactive_people.csv / .jsonINACTIVE actioned (including cross-verification rows where the person was already inactive)Internal ops traceability; Sai action items when applicable. Marketing receives separate *_NoLongerThere_*.csv via N04 (not this file).
{BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csvSame population as inactive people (deduped emails per business-unit label)Marketing team (notify_marketing_suppression, N04) — SFMC suppression import
output_document_alternate_contacts.csv / .jsonREPLACEMENT or ACTIVE-with-replacementSai Teja (notify_sai_action_items (catalog N05/N06) — output_document_alternate_contacts.csv)
output_document_inactive_new_org.csv / .jsonEMAIL_UPDATE where the person also moved organisationsSai Teja (notify_sai_action_items SECTION 3)
output_document_inactive_no_cupola_match.csv / .jsonINACTIVE-style outcome with no CUPOLA matchSai Teja (notify_sai_action_items SECTION 2)
output_document_domain_matches.csv / .jsonDomain fallback found CUPOLA org context but multiple candidate peopleSai Teja / IP4 manual selection queue
output_document_shared_email_reviews.csv / .jsonExact email maps to multiple CUPOLA people (shared/designation mailbox)Sai Teja / IP4 manual selection queue
output_document_undeliverables.csv / .jsonBounce / UndeliverableRun directory; Client Services notify_multipub_subscriber_followup_batch (N02) when Multipub evidence applies — not emailed on N04
output_document_multipub_audit.csv / .jsonEvery INACTIVE row that hit the Multipub gateRun directory only (not bulk-emailed); Tarun gets 1.2 / upload Yes-path
output_document_email_update_requests.csv / .jsonChanged-Email rowsRun directory only (not emailed to marketing)

Venu audit bundle (always written; attached to Venu's end-of-run email)#

FilePurpose
cupola_audit_log.csv / .jsonEvery CUPOLA status change + contact addition recorded this run. Always written, even when empty, so Venu gets a receipt every run.
cupola_audit_log_rollback_plan.csvSQL to revert each auto-applied CUPOLA status change. Only written when auto-apply actually ran.
hodor_prospect_import.csvWritten when HODOR_AUTOMATIC_UPDATES=false; emailed via notify_hodor_prospect_import (N01 To Venu).

Which outputs each outcome lands on#

OutcomeMasterIP4 copyHuman ReviewOutcome-specific file(s)Venu audit bundle
UNKNOWN✅ (no writes happened)
No match (actionable, non-inactive)
No match (inactive-style)Inactive — no CUPOLA match
ACTIVE(long-leave → Client Services only)
EMAIL_UPDATE / TITLE_UPDATEEmail Update Requests; Inactive at New Org (if moved)
INACTIVEInactive People; Multipub Audit (when gated)
REPLACEMENT☑ parse fallback onlyInactive People + Alternate Contacts (± Multipub Audit)
Bounce✅ until Max + Vish rule landsUndeliverables

Notifications#

NotificationTriggerRecipientData / Attachments
notify_multipub_subscriber_followup_batch()INACTIVE / left-org with determined sender + Multipub evidence (active / 12mo sub / 12mo purchase)Angel + Yogesh (Matt Cc)multipub_subscriber_followup.csv
notify_tarun_undetermined_sender_review()Sender email could not be determined for Multipub routingTaruntarun_undetermined_senders.csv
notify_multipub_subscriber_followup_from_upload()Tarun upload API outcome=yesAngel + Yogesh (Matt Cc; Tarun Cc)Uploaded CSV
notify_hodor_prospect_import()HODOR_AUTOMATIC_UPDATES=false and queued prospect-import rowsHodor (Cc Sai)hodor_prospect_import.csv
notify_marketing_suppression()End of run when *_NoLongerThere_*.csv deliverable(s) existMarketing team (N04){BusinessUnit}_NoLongerThere_{YYYY-MM-DD}.csv (SFMC import columns)
notify_sai_action_items()Replacements, no-Cupola-match, inactive-new-org, actionable HR rowsSaiSection CSVs + output_document_human_review_action_items.csv
notify_venu_cupola_audit_files()End of every runVenu (Cc Sai, Max)cupola_audit_log.csv, rollback plan, HR CSV when present
notify_run_audit_for_ip4()End of every runSaiProcessing log, master + IP4 CSV, category summary, batch HTML, impact.txt; backlog snapshot in body when available

Live email routing: see NOTIFICATIONS_CATALOG.html.

Batching: Most operational emails above are queued during processing and sent once per category at end of run via ActionEngine.flush_batch_notifications() (dry-run: logged only). Cupola audit attachments to Venu are separate. See NOTIFICATIONS_REFERENCE.html for recipient env vars and the full model.


Special Cases & Edge Cases#

ScenarioCurrent BehaviorStatus
Changed Email + Alternate ContactMaps to REPLACEMENT✅ Implemented
Out of Office + Alternate ContactMaps to OUT_OF_OFFICE; Human Review digest only, no system writes; alternates captured by _collect_alternate_contacts where applicable✅ Implemented
Deceased + Replacement ContactMaps to REPLACEMENT✅ Implemented
Left Company + Replacement ContactMaps to REPLACEMENT✅ Implemented
Retired + Replacement ContactMaps to REPLACEMENT✅ Implemented
Replacement email matches senderClears replacement_info, changes to UNKNOWN✅ Implemented
Alternate contact parsing failsFalls back to INACTIVE✅ Implemented

Case: Out of Office with Replacement Contact#

"I'm out of office, contact John at john@company.com"

Case: Retired with Personal Email#

"I'm retiring, keep in touch at personal@gmail.com"

Case: Undeliverable Email#

Bounce message indicating email is invalid.


Configuration#

Environment Variables#

bash
# Required for LLM classification
OPENAI_API_KEY=your_openai_api_key_here

# Optional (defaults shown)
CLASSIFIER_BACKEND=openai       # or "ollama"
OLLAMA_MODEL=mistral-nemo:12b
OLLAMA_URL=http://localhost:11434/api/chat
MAX_CONCURRENCY=10
MOCK_ALL_SERVICES_FOR_DEV=false
WRITE_OPERATIONS_ENABLED=true

LLM Configuration#


Operating Modes#

The application supports three operating modes:

Dev mock-all (MOCK_ALL_SERVICES_FOR_DEV=true)#

Cupola, Multipub, and Salesforce use mock connectors; SFMC uses a mock client. HodorDatabase is the real SQL connector (graceful-unavailable if HODOR_* / HODOR_CONNECTION_STRING is not set — not a MockHodor class). The email repository is mock by default; set dry_run_use_real_email_repository=true to read the real inbox. Step 4c contact lookup is skipped (main.py uses a minimal stub Contact so downstream steps can run). LLM classification is real. Operations are logged to dry_run_output.log where applicable.

Simulation — no writes (WRITE_OPERATIONS_ENABLED=false)#

Live database reads; writes are intercepted via ReadOnlyDatabaseWrapper for Cupola, Multipub, Salesforce, and Hodor when each connector is live and available. SFMC uses a mock client. Salesforce attempts a real connection with mock fallback. Write operations are logged to dry_run_output.log.

Live Mode (default)#

Real database connections and writes (with mock fallbacks where the code substitutes mocks after failed connectivity checks).

Mode Comparison#

ComponentLive ModeSimulation (no writes)Dev mock-all
Email Extraction (Step 1)Real MSSQL repositoryReal MSSQL repositoryMock by default; real if dry_run_use_real_email_repository=true
Cupola / Multipub / SalesforceReal reads and writesReal reads; writes intercepted (logged)Mock connectors
HodorReal SQL when configuredReal reads; writes intercepted when wrappedReal HodorDatabase when configured (not mocked); Step 4c contact lookup skipped — main.py uses a stub Contact
SFMCLive or mock fallbackMock clientMock
LLM ClassificationReal: OpenAI / OllamaReal: OpenAI / OllamaReal: OpenAI / Ollama
NotificationsSent via SMTPLogged only (no emails sent)Logged only (no emails sent)
Action LoggingStandard loggingActionLogger → dry_run_output.logActionLogger → dry_run_output.log
Stage LoggingStageLogger → stage_execution.logStageLogger → stage_execution.logStageLogger → stage_execution.log

Project Structure#

Core Application#

FileDescription
src/auto_responder/main.pyMain entry point — orchestrates all steps
src/auto_responder/config.pyApplication settings (Pydantic BaseSettings, loaded from .env)

Models#

FileDescription
src/auto_responder/models/email.pyAutoResponseEmail — email data model
src/auto_responder/models/contact.pyContact, ContactRecord — unified contact model
src/auto_responder/models/determination.pyDetermination enum, DeterminationResult, ReplacementInfo
src/auto_responder/models/output_document.pyInactivePersonRecord, AlternateContactRecord, InactiveNewOrgRecord, InactiveNoCupolaMatchRecord, UndeliverableRecord, MultipubAuditRecord, HumanReviewRecord — output document models
src/auto_responder/models/subscription.pyMultipubValidationResult, MultipubOrder, MultipubSubscriber

Processors#

FileDescription
src/auto_responder/processors/action_engine.pyAction execution engine — routes determinations to handlers
src/auto_responder/processors/contact_lookup.pyUnified contact lookup across all databases
src/auto_responder/processors/email_processor.pyEmail deduplication and filtering

Services — Determination#

FileDescription
src/auto_responder/services/determination/autoresponsedetermination.pyMain determination service (LLM pipeline orchestrator)
src/auto_responder/services/determination/llm_classifier.pyLLM classification and QA agents (OpenAI / Ollama)
src/auto_responder/services/determination/category_mapper.pyCategory → Determination mapping, alternate_contact parsing
src/auto_responder/services/determination/mock_classifier.pyMock LLM classifier for dry-run mode (keyword matching)

Services — Databases#

FileDescription
src/auto_responder/services/databases/base.pyBase database class (retry logic, health checks, connection enhancement)
src/auto_responder/services/databases/cupola.pyCupola SQL Server connector (IPv4 team)
src/auto_responder/services/databases/hodor.pyHodor SQL Server connector (IPv4 team)
src/auto_responder/services/databases/multipub.pyMultipub SQL Server connector + subscription validation (ACE team)
src/auto_responder/services/databases/salesforce.pySalesforce REST API connector (ACE team)
src/auto_responder/services/databases/mock.pyMock database connectors for dry-run mode
src/auto_responder/services/databases/readonly_wrapper.pyRead-only wrapper — live reads, mocked writes (for read-only mode)

Services — Email & Notifications#

FileDescription
src/auto_responder/services/email_repository.pyMSSQL email extraction from central repository
src/auto_responder/services/mock_email_repository.pyMock email repository for dry-run mode
src/auto_responder/services/email_service.pySMTP email sending service
src/auto_responder/services/notifications/notifier.pyNotification service (Client Services, Sai Teja, Venu, Multipub validation flags)
src/auto_responder/services/sfmc/client.pySFMC REST client — suppression lists (live)
src/auto_responder/services/sfmc/mock.pySFMC mock client (read-only mode, dry-run)

Services — Outlook (Deprecated)#

FileDescription
src/auto_responder/services/outlook/graph_client.pyMicrosoft Graph API client — DEPRECATED (replaced by central SQL server)

Utilities#

FileDescription
src/auto_responder/utils/action_logger.pyComprehensive operation logging for dry-run and read-only modes
src/auto_responder/utils/stage_logger.pyStage-by-stage execution logging (always enabled)
src/auto_responder/utils/logging_config.pyLogging configuration and correlation context
src/auto_responder/utils/output_document_collector.pyCollects records for the three business deliverable lists
src/auto_responder/utils/output_document_generator.pyGenerates CSV/JSON output documents from collected records
src/auto_responder/utils/report_generator.pyPer-email processing report generation (log + JSON + CSV)

Original Reference#

FileDescription
Auto_responder_email/unified_email_classifier.pyOriginal LLM classification program (reference only, integrated into services)

Integration History#

The LLM-based email classification from the original Auto_responder_email/unified_email_classifier.py was fully integrated into the AutoResponderProcess. Below is a summary of the integration phases.

Phase 1: Core Integration ✅#

ComponentFileChanges
Model Enhancementsmodels/determination.pyAdded sender_new_email, is_long_term_leave, retired_personal_email, confidence fields
Configurationconfig.pyAdded OpenAI, Ollama, and concurrency settings
LLM Classifierservices/determination/llm_classifier.pyCreated — extracted from original, pure service implementation
Category Mapperservices/determination/category_mapper.pyCreated — maps LLM categories to Determination enum
Determination Serviceservices/determination/autoresponsedetermination.pyReplaced placeholder with full LLM integration

Phase 2: Action Enhancements ✅#

ComponentFileChanges
Database Connectorsservices/databases/*.pyAdded update_contact_email() to all connectors
Action Engineprocessors/action_engine.pyEnhanced all handlers; added Multipub validation integration
Subscription Modelsmodels/subscription.pyCreatedMultipubValidationResult, MultipubOrder, MultipubSubscriber
Multipub Connectorservices/databases/multipub.pyMajor enhancements for Tarun's validation queries
Notifierservices/notifications/notifier.pyAdded Multipub validation notification
Dependenciesrequirements.txtAdded openai>=1.12.0

Error Handling#

Database Update Failures#

Notification Failures#

LLM Classification Failures#

Contact Lookup Failures#

Common Issues & Solutions#

IssueSolution
Replacement contact matches senderClear replacement_info, change to UNKNOWN
Invalid email formatValidate with regex before updating
Database connection timeoutRetry with exponential backoff
LLM returns malformed JSONReturn UNKNOWN, log for review
Contact not found in any systemLog warning, skip processing

Performance Considerations#

Batch Processing#

Database Connections#

Async/Sync Bridge#

Caching#


Testing#

Test Scenarios#

Test CaseInputExpected DeterminationExpected Actions
Retired PersonEmail with "retirement" keywordINACTIVEMark inactive, check subscriptions, notify if needed
Out of OfficeEmail with "out of office" keywordOUT_OF_OFFICEHuman Review digest only; no system writes
Changed EmailEmail with "new email is X"EMAIL_UPDATEUpdate email/title in Cupola/Hodor/SF (skips per rules)
Replacement ContactEmail with "contact John at john@co.com"REPLACEMENTMark inactive, add replacement, notify
UndeliverableBounce messageINACTIVEMark inactive, check subscriptions
Not RelevantSystem notificationUNKNOWNSkip processing

Testing Checklist#


Implementation Status & Enhancements#

Core Functionality (Implemented)#

Enhancements Needed#

High Priority

Medium Priority

Low Priority

Future Enhancements#

  1. Machine Learning Model — Train custom model on historical classifications to reduce LLM dependency
  2. Workflow Automation — Automatic approval for high-confidence; manual review queue for low-confidence
  3. Advanced Contact Matching — Fuzzy matching, duplicate detection, contact relationship mapping
  4. Reporting Dashboard — Real-time processing status, historical trends, error analysis

Known Limitations & Risks#

  1. API Costs: LLM calls have per-email costs (OpenAI)
  2. Latency: LLM calls add processing time per email
  3. Rate Limits: Must respect OpenAI API rate limits
  4. Malformed Responses: LLM may return unparseable JSON (falls back to UNKNOWN)
  5. Async/Sync Bridge: asyncio.run() may conflict with existing event loops
  6. Title Updates: update_contact_title() exists on connectors and _handle_title_update() applies it when DeterminationResult.updated_title is set; the LLM pipeline does not populate updated_title today, and TITLE_UPDATE is not returned by the category mapper — title-only automation needs future wiring
  7. Multipub Linked Server: Queries use OPENQUERY to multipub_live linked server — must be properly configured
  8. Error Recovery: Transient API failures fall back to UNKNOWN; no retry logic yet

Team Responsibilities#

SystemTeamContact
CupolaIPv4-
HodorIPv4-
MultipubACETarun (validation queries)
SalesforceACE-
LLM Classification-Shubham
Action Engine--
Notifications (Multipub flags)-Max / Client Services (recipient)
Notifications (Replacements)-Sai Teja (recipient)

Usage#

The integration is transparent — the main workflow remains the same:

python
from auto_responder.main import main

# Process 10 most recent emails (default)
main()

# Process a specific number of emails
main(max_results=25)

Operating Modes#

The application supports three modes controlled by environment variables:

ModeEnvironment VariableDescription
Live(default)Real database connections and writes
Simulation (no writes)WRITE_OPERATIONS_ENABLED=falseLive database reads; writes intercepted via ReadOnlyDatabaseWrapper (Cupola, Multipub, Salesforce, Hodor when configured) — operations logged
Dev mock-allMOCK_ALL_SERVICES_FOR_DEV=trueMock Cupola/Multipub/Salesforce/SFMC; real HodorDatabase; optional real email repository; real LLM; contact lookup skipped in main.py
Hodor manualHODOR_AUTOMATIC_UPDATES=falseHodor reads still run; Hodor writes are not executed — intended changes are emailed to Venu (NOTIFICATION_EMAIL_VENU, default Venu). Not the same as simulation (no-write) mode.

Note: Hodor uses a real SQL connector, not MockDatabase. Salesforce gracefully falls back to a mock if connection fails. See connections/README.html and connections/hodor.html.


Document Version: 3.6
Date: April 20, 2026
Status: Complete ✅
Last Updated: May 3, 2026 — Documented that processing_report_ip4.csv uses a fixed 23-column IP4 layout (ReportGenerator._CSV_COLUMNS_IP4) while the master CSV remains the full ledger; Client Services run-report email copy aligned. (Earlier April 20, 2026 update: Run Outputs section, Human Review, Venu audit bundle, notifications table.)