Validated OHLCV

Service: Bar Generator (tick → M1…D1)

We generate reliable OHLCV bars from your tick data: strict UTC alignment, deterministic outputs, extended schema, and conservative quality gates — engineered for reproducible research and production backtests.

UTC-aligned Deterministic CSV & Parquet Audit-grade QC

Key features

M1-first design

Accurate high-TF OHLC (H1/H4/D1 aggregated from M1).

Extended schema

Spreads, mid stats, VWAP, returns, volatility proxies, gap flags.

Deterministic outputs

Same inputs ⇒ same outputs; stable naming and column order.

Monthly partitioning

Easy globbing and incremental compute on large histories.

CSV (.csv.gz) & Parquet

Choose either or both depending on workflow and tooling.

Scale & speed

Optimized for large inputs and iterative research cycles.

How we work (service)

  1. Inputs — you provide tick archives (per symbol/month) and target timeframes/formats.
  2. QC prerequisite — if needed, we run Tick QC first to remove problematic files.
  3. Bar generation — M1 as the base; H1/H4/D1 aggregation that preserves OHLC extremes.
  4. Output validation — HL ratio, M1 returns filter, minute alignment checks.
  5. Delivery — Parquet and/or CSV.GZ, stable column order, optional *_stats.csv summaries.

What you get

  • Bar datasets by symbol/month/timeframe (Parquet and/or CSV.GZ).
  • Stable schema for downstream validation and feature pipelines.
  • Optional per-file stats and a concise quality summary.

Security & NDA

  • We sign an NDA prior to data transfer.
  • Processing is done in an isolated environment; we only share results.
  • No data reselling or third-party sharing.

What we need for a quote

  • Folder structure (root/symbol/year/month…).
  • Approximate file counts & average sizes (per symbol/month).
  • Target timeframes (M1/H1/H4/D1) and preferred output format(s).

Column dictionary (excerpt)

Column Description
Ask_first / Ask_last / Ask_max / Ask_minOHLC of best ask within the window.
Bid_first / Bid_last / Bid_max / Bid_minOHLC of best bid within the window.
AskVolume_sum / BidVolume_sumTotal quoted volumes per side.
Spread_mean / Spread_maxSpread statistics inside the bar.
Mid_first / Mid_last / Mid_stdMid-price level and dispersion.
Quote_spread_varVariance of spread (liquidity proxy).
Tick_count / Price_change_cntCount of ticks and ask-price changes.
Vol_imbalanceAsk minus bid volume.
VWAPVolume-weighted average price (ask+bid combined).
HL_range / HL_ratioHigh-low range and relative amplitude.
ReturnRelative bar return (Ask_last / Ask_first − 1).
AskVol_pct / BidVol_pctSide-share percentages of volume.
Gap_flagMinute gap / late-first-tick indicator.
Timestamp_msEpoch milliseconds (UTC) for fast correlation.

Index name is Gmt time (UTC). High timeframes (H1/H4/D1) are aggregated from M1 to preserve OHLC extremes.

Quality gates

  • Alignment: minute start enforced; misaligned bars counted and reported.
  • HL-ratio thresholds: drop obviously broken bars using conservative limits.
  • Returns filter (M1): drop extreme minute returns by default (configurable).
Conservative defaults are used out-of-the-box and can be tuned per symbol or venue.

Outputs

  • <SYMBOL>_<YYYYMM>_<TF>.(parq|csv.gz) with UTC index named Gmt time.
  • Optional per-file *_stats.csv summary for quick validation.

Tip: Parquet is recommended for heavy iterative research; CSV.GZ is best for maximum compatibility and archiving.

Integration tips

Partitioning

Keep monthly folders per symbol for simple globbing and lifecycle rules.

I/O strategy

Use Parquet for speed in notebooks/backtests; keep CSV.GZ as a portable mirror.

Schema stability

Stable column order eases downstream validations and feature pipelines.

Related service

Before bar generation, we can run a forensic sweep of your tick archives.

Tick Data QC (daily files)

FAQ

Do you share my data?

No. We sign an NDA and process data in an isolated environment; only results are shared.

How long does it take?

Depends on scope (symbols/months). Typical turnaround is 2–5 business days per batch.

Can I get a sample?

Yes — send 1–2 files and we’ll provide a small sample bar set.

Need reliable bars for backtesting or model training?

Send your symbols/months and preferred formats — we’ll confirm scope and deliverables.

Request a Quote