Forensic validation

Service: Tick Data QC (daily files)

We perform audit-grade checks on your tick files: strict UTC parsing, time ordering & duplicate detection, gap tolerance and weekend-tick checks, realistic spread caps, price-jump & volume sanity — delivered with a clear, actionable report.

Formats: .csv, .csv.gz, .xlsx, .zip UTC parsing Encoding: UTF-8/latin1/CP1250/auto Gaps & weekend checks Spread & jumps sanity

How we work (service)

  1. Quick preview (optional) — send 1–2 files; receive a mini QC snapshot (free).
  2. Scope & NDA — we sign an NDA and agree on thresholds (gap, spread cap, jump %, etc.).
  3. Forensic QC — we run full checks across your archives (by symbol/month).
  4. Delivery — you get a qc_report.txt, diagnostics logs, and an action list (delete/move/fix).
  5. Follow-up — optional cleanup or re-org; handoff to Bar Generator for reliable M1…D1 bars.

Key features

Robust format support

CSV, CSV.GZ, XLSX, ZIP (first CSV/CSV.GZ inside), optional XLSB.

Strict UTC time parsing

Auto-detects time column, parses tz-aware UTC, validates monotonicity & duplicates.

Weekend & gap rules

Detects weekend ticks and flags gaps above configurable thresholds.

Spread & jump sanity

Flags unrealistic spreads and large single-tick price jumps; checks volume bounds.

Intelligent encoding

Auto detection (chardet) with safe latin1 fallback; per-file overrides.

Fast reporting

TXT report with path, status, size, row count and error summary; perf logs included.


Checks performed

Checks performed

  • Time column & parsing: auto-detect, tz-aware UTC, NaT detection.
  • Ordering & duplicates: monotonic time, cross-chunk duplicate/ordering checks.
  • Weekend ticks: flags events between Friday 22:00–Sunday 22:00 UTC.
  • Gap tolerance: configurable max gap (e.g. 2s) with per-file summary.
  • NaN & invalid values: prices ≤ 0, negative volumes, missing fields.
  • Spread realism: (Ask−Bid)/Bid ≤ cap (e.g. 0.5%).
  • Price jumps: single-tick jump threshold (e.g. > 1.00%).
  • Date consistency: file name date vs. last timestamp; multi-day file detection.
Conservative defaults out-of-the-box. Thresholds can be tuned per symbol/venue.

What you get

  • qc_report.txt — tab-separated summary: path, status, size_B, rows, errors.
  • Rotating .log files with per-file diagnostics and performance metrics.
  • Action list — recommended deletes/moves and next-step guidance.

Tip: Keep monthly folders per symbol to speed up scanning and lifecycle rules.

Security & NDA and Quote requirements

Security & NDA

  • We sign an NDA before any data transfer.
  • Processing in an isolated environment; only results are shared.
  • No data reselling or third-party sharing.

What we need for a quote

  • Folder structure (root/symbol/year/month…).
  • Approximate file counts & average sizes (per symbol/month).
  • Preferred thresholds: max gap, spread cap, jump %, etc.

QC report (fields)

Field Description
pathFull file path processed.
statusOK or BAD depending on checks.
size_BFile size in bytes.
rowsRow count (fast path for CSV/ZIP; header-aware).
errorsSemicolon-joined issues (e.g., NaN in prices; Unsorted time; Gap > 2s).

For technical teams (informational)

Example CLI (informational)
conda activate forexenv
python check_tick_files.py ^
       "C:\Data\Ticks" ^
       --max-gap 2 --parallel auto --report qc_report.txt --encoding auto

Tune --spread-max, --xlsx-chunksize, --csv-chunksize, and post-actions --delete-bad/--move-bad as needed.

Related service

After cleaning, we can generate reliable M1…D1 bars from your tick data.

Bar Generator (tick → M1…D1)

FAQ

Do you share my data?

No. We sign an NDA and work in an isolated environment; only results are shared.

How long does it take?

Depends on scope (symbols/months). Typical turnaround is 2–5 business days per batch.

Can I get a sample?

Yes — send 1–2 files and we’ll provide a mini QC report.

Need a forensic sweep of your tick archives?

Send your root folders and preferred thresholds — we’ll run QC and deliver a clear action report.

Request a Quote