Forex Bar Generator (tick → M1…D1)
We generate reliable Forex OHLCV bars from your tick data: strict UTC alignment, deterministic outputs, extended schema, and conservative quality gates — engineered for reproducible research and production backtests.
Key features
M1-first design
Accurate high-TF Forex OHLCV bars (H1/H4/D1 aggregated from M1).
Extended schema
Spreads, mid stats, VWAP, returns, volatility proxies, gap flags.
Deterministic outputs
Same inputs ⇒ same outputs; stable naming and column order.
Monthly partitioning
Easy globbing and incremental compute on large histories.
CSV (.csv.gz) & Parquet
Choose either or both depending on workflow and tooling.
Scale & speed
Optimized for large inputs and iterative research cycles.
How we work (service)
- Inputs — you provide tick archives (per symbol/month) and target timeframes/formats.
- QC prerequisite — if needed, we run Tick QC first to remove problematic files.
- Bar generation — M1 as the base; H1/H4/D1 aggregation that preserves OHLC extremes.
- Output validation — HL ratio, M1 returns filter, minute alignment checks.
- Delivery — Parquet and/or CSV.GZ, stable column order, optional *_stats.csv summaries.
What you get
- Bar datasets by symbol/month/timeframe (Parquet and/or CSV.GZ) — UTC-aligned M1 minute bars with H1/H4/D1 aggregates.
- Stable schema for downstream validation and feature pipelines.
- Optional per-file stats and a concise quality summary.
Security & NDA
- We sign an NDA prior to data transfer.
- Processing is done in an isolated environment; we only share results.
- No data reselling or third-party sharing.
What we need for a quote
- Folder structure (root/symbol/year/month…).
- Approximate file counts & average sizes (per symbol/month).
- Target timeframes (M1/H1/H4/D1) and preferred output format(s).
Column dictionary (excerpt)
| Column | Description |
|---|---|
Ask_first / Ask_last / Ask_max / Ask_min | OHLC of best ask within the window. |
Bid_first / Bid_last / Bid_max / Bid_min | OHLC of best bid within the window. |
AskVolume_sum / BidVolume_sum | Total quoted volumes per side. |
Spread_mean / Spread_max | Spread statistics inside the bar. |
Mid_first / Mid_last / Mid_std | Mid-price level and dispersion. |
Quote_spread_var | Variance of spread (liquidity proxy). |
Tick_count / Price_change_cnt | Count of ticks and ask-price changes. |
Vol_imbalance | Ask minus bid volume. |
VWAP | Volume-weighted average price (ask+bid combined). |
HL_range / HL_ratio | High-low range and relative amplitude. |
Return | Relative bar return (Ask_last / Ask_first − 1). |
AskVol_pct / BidVol_pct | Side-share percentages of volume. |
Gap_flag | Minute gap / late-first-tick indicator. |
Timestamp_ms | Epoch milliseconds (UTC) for fast correlation. |
Index name is Gmt time (UTC). High timeframes (H1/H4/D1) are aggregated from M1 to preserve OHLC extremes.
Quality gates
- Alignment: minute start enforced; misaligned bars counted and reported.
- HL-ratio thresholds: drop obviously broken bars using conservative limits.
- Returns filter (M1): drop extreme minute returns by default (configurable).
Outputs
<SYMBOL>_<YYYYMM>_<TF>.(parq|csv.gz)with UTC index namedGmt time.- Optional per-file
*_stats.csvsummary for quick validation.
Tip: Parquet is recommended for heavy iterative research; CSV.GZ is best for maximum compatibility and archiving.
Integration tips
Partitioning
Keep monthly folders per symbol for simple globbing and lifecycle rules.
I/O strategy
Use Parquet for speed in notebooks/backtests; keep CSV.GZ as a portable mirror.
Schema stability
Stable column order eases downstream validations and feature pipelines.
Related service
Before bar generation, we can run a forensic sweep of your tick archives.
Tick Data QC (daily files)FAQ
Do you share my data?
No. We sign an NDA and process data in an isolated environment; only results are shared.
How long does it take?
Depends on scope (symbols/months). Typical turnaround is 2–5 business days per batch.
Can I get a sample?
Yes — send 1–2 files and we’ll provide a small sample bar set.
Need reliable bars for backtesting or model training?
Send your symbols/months and preferred formats — we’ll confirm scope and deliverables.
Request a Quote