Forex Tick Data QC (daily files)
Audit-grade Forex tick data QC for backtesting and model training: strict UTC parsing, time ordering & duplicate detection, gap thresholds and weekend-tick checks, realistic spread caps, price-jump & volume sanity — delivered with a clear, actionable report.
How we work (service)
- Quick preview (optional) — send 1–2 files; receive a mini QC snapshot (free).
- Scope & NDA — we sign an NDA and agree on thresholds (gap, spread cap, jump %, etc.).
- Forensic QC — we run full checks across your archives (by symbol/month).
- Delivery — you get a
qc_report.txt, diagnostics logs, and an action list (delete/move/fix). - Follow-up — optional cleanup or re-org; handoff to Bar Generator for reliable M1…D1 bars.
Key features
Robust format support
CSV, CSV.GZ, XLSX, ZIP (first CSV/CSV.GZ inside), optional XLSB.
Designed for Forex tick archives; validates inputs used for OHLCV bar generation and research.
Strict UTC time parsing
Auto-detects time column, parses tz-aware UTC, validates monotonicity & duplicates.
Weekend & gap rules
Detects weekend ticks and flags gaps above configurable thresholds.
Spread & jump sanity
Flags unrealistic spreads and large single-tick price jumps; checks volume bounds.
Intelligent encoding
Auto detection (chardet) with safe latin1 fallback; per-file overrides.
Fast reporting
TXT report with path, status, size, row count and error summary; perf logs included.
Checks performed
Checks performed
- Time column & parsing: auto-detect, tz-aware UTC, NaT detection.
- Ordering & duplicates: monotonic time, cross-chunk duplicate/ordering checks.
- Weekend ticks: flags events between Friday 22:00–Sunday 22:00 UTC.
- Gap tolerance: configurable max gap (e.g. 2s) with per-file summary.
- NaN & invalid values: prices ≤ 0, negative volumes, missing fields.
- Spread realism:
(Ask−Bid)/Bid≤ cap (e.g. 0.5%). - Price jumps: single-tick jump threshold (e.g. > 1.00%).
- Date consistency: file name date vs. last timestamp; multi-day file detection.
What you get
qc_report.txt— tab-separated summary: path, status, size_B, rows, errors.- Rotating
.logfiles with per-file diagnostics and performance metrics. - Action list — recommended deletes/moves and next-step guidance.
- Bar-ready outputs — clean inputs that feed our Bar Generator for reliable UTC-aligned M1 minute bars with H1/H4/D1 aggregates.
Tip: Keep monthly folders per symbol to speed up scanning and lifecycle rules.
Security & NDA and Quote requirements
Security & NDA
- We sign an NDA before any data transfer.
- Processing in an isolated environment; only results are shared.
- No data reselling or third-party sharing.
What we need for a quote
- Folder structure (root/symbol/year/month…).
- Approximate file counts & average sizes (per symbol/month).
- Preferred thresholds: max gap, spread cap, jump %, etc.
QC report (fields)
| Field | Description |
|---|---|
path | Full file path processed. |
status | OK or BAD depending on checks. |
size_B | File size in bytes. |
rows | Row count (fast path for CSV/ZIP; header-aware). |
errors | Semicolon-joined issues (e.g., NaN in prices; Unsorted time; Gap > 2s). |
For technical teams (informational)
Example CLI (informational)
conda activate forexenv
python check_tick_files.py ^
"C:\Data\Ticks" ^
--max-gap 2 --parallel auto --report qc_report.txt --encoding auto
Tune --spread-max, --xlsx-chunksize, --csv-chunksize, and post-actions --delete-bad/--move-bad as needed.
Related service
After cleaning, we can generate reliable M1…D1 bars from your tick data.
Bar Generator (tick → M1…D1)FAQ
Do you share my data?
No. We sign an NDA and work in an isolated environment; only results are shared.
How long does it take?
Depends on scope (symbols/months). Typical turnaround is 2–5 business days per batch.
Can I get a sample?
Yes — send 1–2 files and we’ll provide a mini QC report.
Need a forensic sweep of your tick archives?
Send your root folders and preferred thresholds — we’ll run QC and deliver a clear action report.
Request a Quote