Validated OHLCV

Service: Bar Generator (tick → M1…D1)

We generate reliable OHLCV bars from your tick data: strict UTC alignment, deterministic outputs, extended schema, and conservative quality gates — engineered for reproducible research and production backtests.

UTC-aligned Deterministic CSV & Parquet Audit-grade QC

Request a Quote

Key features

M1-first design

Accurate high-TF OHLC (H1/H4/D1 aggregated from M1).

Extended schema

Spreads, mid stats, VWAP, returns, volatility proxies, gap flags.

Deterministic outputs

Same inputs ⇒ same outputs; stable naming and column order.

Monthly partitioning

Easy globbing and incremental compute on large histories.

CSV (.csv.gz) & Parquet

Choose either or both depending on workflow and tooling.

Scale & speed

Optimized for large inputs and iterative research cycles.

How we work (service)

Inputs — you provide tick archives (per symbol/month) and target timeframes/formats.
QC prerequisite — if needed, we run Tick QC first to remove problematic files.
Bar generation — M1 as the base; H1/H4/D1 aggregation that preserves OHLC extremes.
Output validation — HL ratio, M1 returns filter, minute alignment checks.
Delivery — Parquet and/or CSV.GZ, stable column order, optional *_stats.csv summaries.

What you get

Bar datasets by symbol/month/timeframe (Parquet and/or CSV.GZ).
Stable schema for downstream validation and feature pipelines.
Optional per-file stats and a concise quality summary.

Security & NDA

We sign an NDA prior to data transfer.
Processing is done in an isolated environment; we only share results.
No data reselling or third-party sharing.

What we need for a quote

Folder structure (root/symbol/year/month…).
Approximate file counts & average sizes (per symbol/month).
Target timeframes (M1/H1/H4/D1) and preferred output format(s).

Column dictionary (excerpt)

Column	Description
`Ask_first / Ask_last / Ask_max / Ask_min`	OHLC of best ask within the window.
`Bid_first / Bid_last / Bid_max / Bid_min`	OHLC of best bid within the window.
`AskVolume_sum / BidVolume_sum`	Total quoted volumes per side.
`Spread_mean / Spread_max`	Spread statistics inside the bar.
`Mid_first / Mid_last / Mid_std`	Mid-price level and dispersion.
`Quote_spread_var`	Variance of spread (liquidity proxy).
`Tick_count / Price_change_cnt`	Count of ticks and ask-price changes.
`Vol_imbalance`	Ask minus bid volume.
`VWAP`	Volume-weighted average price (ask+bid combined).
`HL_range / HL_ratio`	High-low range and relative amplitude.
`Return`	Relative bar return (`Ask_last / Ask_first − 1`).
`AskVol_pct / BidVol_pct`	Side-share percentages of volume.
`Gap_flag`	Minute gap / late-first-tick indicator.
`Timestamp_ms`	Epoch milliseconds (UTC) for fast correlation.

Index name is Gmt time (UTC). High timeframes (H1/H4/D1) are aggregated from M1 to preserve OHLC extremes.

Quality gates

Alignment: minute start enforced; misaligned bars counted and reported.
HL-ratio thresholds: drop obviously broken bars using conservative limits.
Returns filter (M1): drop extreme minute returns by default (configurable).

Conservative defaults are used out-of-the-box and can be tuned per symbol or venue.

Outputs

<SYMBOL>_<YYYYMM>_<TF>.(parq|csv.gz) with UTC index named Gmt time.
Optional per-file *_stats.csv summary for quick validation.

Tip: Parquet is recommended for heavy iterative research; CSV.GZ is best for maximum compatibility and archiving.

Integration tips

Partitioning

Keep monthly folders per symbol for simple globbing and lifecycle rules.

I/O strategy

Use Parquet for speed in notebooks/backtests; keep CSV.GZ as a portable mirror.

Schema stability

Stable column order eases downstream validations and feature pipelines.

Related service

Before bar generation, we can run a forensic sweep of your tick archives.

Tick Data QC (daily files)

FAQ

Do you share my data?

No. We sign an NDA and process data in an isolated environment; only results are shared.

How long does it take?

Depends on scope (symbols/months). Typical turnaround is 2–5 business days per batch.

Can I get a sample?

Yes — send 1–2 files and we’ll provide a small sample bar set.

Need reliable bars for backtesting or model training?

Send your symbols/months and preferred formats — we’ll confirm scope and deliverables.

Request a Quote