Google's TimesFM: A Foundation Model for Time Series

TimesFM (Time Series Foundation Model) is Google’s answer to a question practitioners have asked for years: can a single pretrained model forecast any time series accurately, the way large language models handle any text task? As of late 2025, the answer is a qualified yes — with important caveats about what “accurate” means across different domains.

What Is TimesFM?

TimesFM is a decoder-only transformer model trained by Google Research on approximately 100 billion real-world time-points, designed to produce accurate time-series forecasts zero-shot — with no fine-tuning on the target dataset. A user provides historical values, specifies a horizon length, and the model outputs point forecasts and quantile uncertainty estimates.

The model was first published as an arXiv preprint in October 2023, accepted at ICML 2024, and open-sourced under Apache 2.0 in mid-2024.¹ Three versions have shipped since:

Version	Parameters	Max Context	Release
TimesFM 1.0	200M	512 time-points	Open-sourced May 2024
TimesFM 2.0	500M	2,048 time-points	December 31, 2024
TimesFM 2.5	200M	16,384 time-points	September 15, 2025

Version 2.5 is the most notable trajectory: the team halved the parameter count from 2.0 while quadrupling the context window by 8x, reaching #1 on the GIFT-Eval zero-shot benchmark at release — before Amazon’s Chronos-2 subsequently surpassed it.²

How TimesFM Works

Patch-Based Tokenization

Rather than treating each data point as a token (which would make sequences unworkably long), TimesFM groups contiguous time-points into patches — directly mirroring how Vision Transformers tile images.

Input patch: 32 time-points → one embedding token (via residual MLP)
Output patch: 128 time-points generated per decoding step

The asymmetry is a deliberate efficiency choice: the model generates 128 future values in a single autoregressive step, substantially reducing the steps needed for long horizons. An output horizon of 512 time-points requires only 4 decoding steps.

Decoder-Only Architecture

The GPT-style design choice is central to understanding both TimesFM’s strengths and limitations. Causal (unidirectional) self-attention means each position attends only to prior positions, enabling autoregressive generation. The backbone for TimesFM 2.0 uses 50 transformer layers at 1,280-dimensional model width; version 2.5 achieves comparable performance at a smaller footprint through architectural efficiency gains including QKV matrix fusion.³

Pretraining Data

The original 100-billion time-point corpus was assembled from:

Wikipedia Pageviews (2012–2023): Dominant source, covering daily/weekly/monthly aggregations
Google Trends: 22,000 search interest time series at hourly to weekly granularities
Public datasets: M4, electricity, and traffic benchmarks
Synthetic data: 3 million ARMA-generated series (~50% of training mix)

TimesFM 2.0 extended this with the LOTSA archive (cloud infrastructure traces, solar/wind, climate reanalysis). Version 2.5 additionally incorporated the GiftEvalPretrain dataset from Salesforce.⁴

Masking Strategy

Training uses a masking regime where both individual patches and leading patches can be masked during a batch. This teaches the model to handle variable-length historical series gracefully — a practical necessity since real-world datasets rarely provide uniform history lengths.

Outputs

TimesFM produces:

Point forecasts: Primary output, the model’s median prediction
Quantile forecasts: 10 uncertainty bands (10th through 90th percentiles)

The quantile heads were explicitly marked experimental in versions 1.0 and 2.0. TimesFM 2.5 introduced a separate 30M-parameter quantile head intended to produce better-calibrated continuous probabilistic outputs up to approximately 1,000 horizon steps.⁵

Benchmarks and Performance

Zero-Shot Results (ICML 2024)

On the Monash Forecasting Archive — 23+ datasets spanning diverse domains — TimesFM ranked in the top three models evaluated zero-shot, outperforming ARIMA, ETS, DeepAR, and llmtime (GPT-3.5) despite llmtime being “orders of magnitude larger.”⁶

On ETT (Electricity Transformer Temperature) long-horizon benchmarks, TimesFM’s zero-shot MAE matched supervised PatchTST — a model trained directly on target data. That result is the clearest demonstration of the pretrained-model thesis: comparable accuracy without dataset-specific training.

GIFT-Eval (2025)

The GIFT-Eval benchmark — 97 tasks spanning short, medium, and long horizons across diverse domains — has become the primary competitive arena for time-series foundation models. TimesFM 2.0 reached #1 at release in early 2025, posting 6% better aggregated MASE than the next-best model. TimesFM 2.5 retook the top spot in September 2025, notably while using only 200M parameters versus the 500M of its predecessor.

However, Amazon’s Chronos-2 subsequently surpassed TimesFM 2.5 on GIFT-Eval in late 2025, achieving higher win rates and CRPS scores.⁷ The leaderboard race reflects a broader pattern: foundation model rankings are shifting rapidly and no single model leads across all dataset types and granularities.

How TimesFM Compares to Alternatives

Model	Org	Architecture	Params	Multivariate	Zero-Shot
TimesFM 2.5	Google	Decoder-only, patch tokens	200M	Via XReg only	Yes
Chronos-2	Amazon	Encoder-decoder, discretized tokens	Various	Yes	Yes
Moirai	Salesforce	Any-variate encoder-decoder	Various	Yes (native)	Yes
Lag-LLaMA	Academic	Decoder-only, lag features	45M	No	Yes
TTM	IBM	MLP-Mixer hybrid	1M–48M	Yes	Yes
PatchTST	Academic	Encoder-only, patch tokens	Various	Channel-independent	No
N-HiTS	Academic	Hierarchical interpolation	Small	No	No

Key distinctions:

Chronos tokenizes continuous values into a discrete vocabulary (like text tokenization), making it probabilistic by design. TimesFM uses continuous patch embeddings and added probabilistic outputs later. Performance is dataset-dependent: Chronos Bolt (a distilled variant) matches TimesFM on speed while performing competitively on many benchmarks.

Moirai handles any-variate time series natively, modeling cross-series dependencies that TimesFM fundamentally cannot capture in its standard configuration. For demand forecasting where correlated product families matter, Moirai has an architectural advantage.

IBM TTM at 1M–48M parameters reportedly outperforms TimesFM by 19% on certain benchmarks, demonstrating that compact, domain-specialized architectures remain competitive against larger foundation models.

Quick Start

TimesFM is available on GitHub and Hugging Face, Apache 2.0 licensed.

pip install timesfm

import numpy as np
import timesfm

# Load TimesFM 2.5 (200M parameters)
model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(
    "google/timesfm-2.5-200m-pytorch",
    torch_compile=True
)

# Configure forecast behavior
model.compile(
    timesfm.ForecastConfig(
        max_context=1024,
        max_horizon=256,
        normalize_inputs=True,
        use_continuous_quantile_head=True,
        fix_quantile_crossing=True,
    )
)

# Forecast — accepts variable-length input series
point_forecast, quantile_forecast = model.forecast(
    horizon=12,
    inputs=[
        np.linspace(0, 1, 100),          # 100 historical points
        np.sin(np.linspace(0, 20, 67)),   # 67 historical points
    ],
)
# point_forecast.shape    → (2, 12)
# quantile_forecast.shape → (2, 12, 10)  — 10 quantile levels

TimesFM 2.5 accepts series of different lengths in the same batch — a practical improvement over models requiring fixed-length inputs.

Enterprise Integration: BigQuery and AlloyDB

The strongest signal of production adoption is Google’s integration of TimesFM directly into its cloud data services. BigQuery ML now exposes AI.FORECAST, AI.EVALUATE, and AI.DETECT_ANOMALIES functions powered by TimesFM 2.5, with dynamic context windows up to 15,000 time-points.⁸

-- Zero-shot time series forecast directly in SQL
SELECT *
FROM AI.FORECAST(
  MODEL `project.dataset.timesfm_model`,
  TABLE `project.dataset.sales_data`,
  STRUCT(30 AS horizon, 0.8 AS confidence_level)
);

AlloyDB (Google’s PostgreSQL-compatible database) has added preview AI.FORECAST support, enabling forecasts on operational data without ETL pipelines. This positions TimesFM as infrastructure rather than a research artifact — directly relevant for data engineering teams that want forecasting without exporting data to a separate ML platform.

Uber maintains a public fork of the TimesFM repository on GitHub, indicating active evaluation for internal forecasting use cases at scale.⁹

Where TimesFM Falls Short

Univariate-only core: TimesFM 1.0 and 2.0 forecast each series independently — no cross-series dependencies. Version 2.5 partially addresses this via XReg (external regressors), which applies a linear ridge regression correction using covariates on top of the model’s base forecast. This helps with known external signals (promotions, weather) but does not model correlated product demand the way Moirai or Chronos-2 do natively.

Long-horizon error accumulation: The decoder-only causal architecture compounds errors on long horizons. Each autoregressive step’s prediction error feeds into the next. Encoder-decoder architectures that predict full output sequences directly avoid this accumulation.

Calibration uncertainty: A 2025 paper specifically examining TSFM calibration found this to be an open problem across the field.¹⁰ TimesFM 2.5’s dedicated quantile head is an improvement, but practitioners should validate coverage on their specific distribution before relying on uncertainty estimates for decision-making.

Domain transfer gaps: TimesFM was pretrained primarily on consumer web (Wikipedia, Google Trends), electricity, and weather data. Financial time series with volatility clustering, network traffic with heavy tails, and industrial sensor data with abrupt structural breaks can exhibit statistical characteristics that fall outside the pretraining distribution. Specialized econometric models (GARCH, ECM) still outperform TimesFM on realized volatility forecasting.

Structural breaks: After a regime change — a market dislocation, supply chain shock, or product discontinuation — TimesFM’s pretraining priors can produce forecasts anchored to the wrong regime. The model has no mechanism for detecting and discounting pre-break history.

The Broader Trajectory

TimesFM represents the clearest implementation of the foundation model thesis applied to time series. The results are substantive: a 200M-parameter model trained once on web-scale data achieves zero-shot accuracy competitive with supervised models on standard benchmarks. Integrated into BigQuery and AlloyDB, it lowers the barrier to production forecasting substantially for teams already on Google Cloud.

What it does not represent is the final word on time-series modeling. The GIFT-Eval leaderboard has seen Amazon’s Chronos-2, Salesforce’s Moirai, and IBM’s TTM each demonstrate advantages on specific dataset characteristics. The field has moved from asking whether foundation models can forecast time series to asking which architectural choices (decoder vs. encoder-decoder, patch vs. lag features, univariate vs. multivariate) produce the best performance across the broadest range of domains.

For practitioners, the practical question is simpler: TimesFM 2.5 is free, Apache 2.0 licensed, requires no target-domain training data, and has first-class SQL integration for Google Cloud users. As a default starting point for forecasting tasks, that combination is difficult to dismiss — even when domain-specialized alternatives remain competitive on specific datasets.

Frequently Asked Questions

Q: Does TimesFM require fine-tuning on my data? A: No. TimesFM operates zero-shot — you provide historical series and a horizon length, and the model generates forecasts without any dataset-specific training. Fine-tuning is supported for higher accuracy on specific domains but not required.

Q: How does TimesFM handle multivariate time series? A: TimesFM’s core architecture is univariate: each series is forecast independently. TimesFM 2.5 added XReg covariate support, which applies a linear correction using external regressors, but does not model cross-series dependencies natively. For full multivariate modeling, Moirai or Chronos-2 are architecturally better suited.

Q: What horizon lengths does TimesFM support? A: TimesFM 2.5 supports up to 16,384 time-points of historical context and can forecast up to 1,000 horizon steps using the continuous quantile head. In practice, accuracy degrades on very long horizons due to autoregressive error accumulation inherent to the decoder-only architecture.

Q: Is TimesFM available without Python infrastructure? A: Yes. Google Cloud’s BigQuery ML exposes AI.FORECAST powered by TimesFM 2.5, enabling zero-shot forecasting directly from SQL queries with no ML infrastructure setup required.

Q: How does TimesFM compare to ARIMA for short series? A: On short or highly irregular series, classical methods like seasonal ARIMA can match TimesFM — the DARTS benchmark showed seasonal ARIMA remaining competitive. For large-scale forecasting where training and tuning thousands of individual ARIMA models is impractical, TimesFM’s single-model, zero-shot approach provides a practical advantage.

Das, A., Kong, W., Sen, R., Zhou, Y. “A decoder-only foundation model for time-series forecasting.” ICML 2024. arXiv
.10688. ↩
MarkTechPost. “Google AI Ships TimesFM-2.5: Smaller, Longer Context Foundation Model That Now Leads GIFT-Eval Zero-Shot Forecasting.” September 16, 2025. ↩
Google Research. GitHub: google-research/timesfm, v1.2.x releases. https://github.com/google-research/timesfm ↩
LOTSA: Large-scale Open Time Series Archive. Detailed data composition described in TimesFM 2.0 release notes and the GIFT-Eval paper (arXiv
.10393). ↩
Google Research. TimesFM 2.5 release documentation. Hugging Face: google/timesfm-2.5-200m-pytorch. ↩
ICML 2024 poster: “A decoder-only foundation model for time-series forecasting.” https://icml.cc/virtual/2024/poster/33288 ↩
GIFT-Eval benchmark: Iqbal, S. et al. arXiv
.10393. Chronos-2 results: Amazon Research, late 2025. ↩
Google Cloud Blog. “TimesFM models in BigQuery and AlloyDB.” cloud.google.com/blog/products/data-analytics/timesfm-models-in-bigquery-and-alloydb. ↩
Uber timesfm-fork: https://github.com/uber/timesfm-fork ↩
Koochali, A. et al. “Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?” arXiv
.16060. October 2025. ↩