Google's TimesFM: A Foundation Model for Time Series

Q: Is TimesFM available without Python infrastructure?

Yes. Google Cloud's BigQuery ML exposes AI.FORECAST (now GA) powered by TimesFM 2.5, enabling zero-shot forecasting directly from SQL queries with no ML infrastructure setup required.

TimesFM (Time Series Foundation Model) is Google’s answer to a question practitioners have asked for years: can a single pretrained model forecast any time series accurately, the way large language models handle any text task? As of late 2025, the answer is a qualified yes, with important caveats about what “accurate” means across different domains.

What Is TimesFM?

TimesFM is a decoder-only transformer model trained by Google Research on approximately 100 billion real-world time-points, designed to produce accurate time-series forecasts zero-shot, with no fine-tuning on the target dataset. A user provides historical values, specifies a horizon length, and the model outputs point forecasts and quantile uncertainty estimates.

The model was first published as an arXiv preprint in October 2023, accepted at ICML 2024, and open-sourced under Apache 2.0 in mid-2024 (Das et al., ICML 2024). Three versions have shipped since:

Version	Parameters	Max Context	Release
TimesFM 1.0	200M	512 time-points	Open-sourced May 2024
TimesFM 2.0	500M	2,048 time-points	December 31, 2024
TimesFM 2.5	200M	16,384 time-points	September 15, 2025

Version 2.5 is the most notable trajectory: the team halved the parameter count from 2.0 while expanding the context window 8x, reaching #1 on the GIFT-Eval zero-shot benchmark at release, before Amazon’s Chronos-2 and Salesforce’s Moirai 2.0 subsequently surpassed it (MarkTechPost, September 16, 2025). Since the September 2025 launch, the repository has added a Flax backend for faster inference, XReg covariate support, a LoRA fine-tuning example via HuggingFace Transformers + PEFT (April 2026), and an agent skill definition (March 2026) (Google Research, GitHub).

How TimesFM Works

Patch-Based Tokenization

Rather than treating each data point as a token (which would make sequences unworkably long), TimesFM groups contiguous time-points into patches, directly mirroring how Vision Transformers tile images.

Input patch: 32 time-points → one embedding token (via residual MLP)
Output patch: 128 time-points generated per decoding step (Das et al., ICML 2024)

The asymmetry is a deliberate efficiency choice: the model generates 128 future values in a single autoregressive step, substantially reducing the steps needed for long horizons. An output horizon of 512 time-points requires only 4 decoding steps (Das et al., ICML 2024).

Decoder-Only Architecture

The GPT-style design choice is central to understanding both TimesFM’s strengths and limitations. Causal (unidirectional) self-attention means each position attends only to prior positions, enabling autoregressive generation. The backbone for TimesFM 2.0 uses 50 transformer layers at 1,280-dimensional model width; version 2.5 achieves comparable performance at a smaller footprint through architectural efficiency gains including QKV matrix fusion (Google Research, GitHub).

Pretraining Data

The original 100-billion time-point corpus (Google Research blog) was assembled from:

Wikipedia Pageviews (2012-2023): Dominant source, covering daily/weekly/monthly aggregations
Google Trends: 22,000 search interest time series at hourly to weekly granularities (Das et al., ICML 2024)
Public datasets: M4, electricity, and traffic benchmarks
Synthetic data: 3 million ARMA-generated series (~50% of training mix) (Das et al., ICML 2024)

TimesFM 2.0 extended this with the LOTSA archive (cloud infrastructure traces, solar/wind, climate reanalysis). Version 2.5 additionally incorporated the GiftEvalPretrain dataset from Salesforce. Google Cloud’s November 2025 launch blog puts the current corpus at “over 400 billion real-world time-points,” a roughly 4x figure over the original pretraining mix that the post does not break down by source (Google Cloud Blog, November 2025).

Masking Strategy

Training uses a masking regime where both individual patches and leading patches can be masked during a batch. This teaches the model to handle variable-length historical series gracefully, a practical necessity since real-world datasets rarely provide uniform history lengths.

Outputs

TimesFM produces:

Point forecasts: Primary output, the model’s median prediction
Quantile forecasts: 10 uncertainty bands (10th through 90th percentiles)

The quantile heads were explicitly marked experimental in versions 1.0 and 2.0. TimesFM 2.5 introduced a separate 30M-parameter quantile head intended to produce better-calibrated continuous probabilistic outputs up to approximately 1,000 horizon steps (Google Research, Hugging Face).

Benchmarks and Performance

Zero-Shot Results (ICML 2024)

On the Monash Forecasting Archive (23+ datasets spanning diverse domains), TimesFM ranked in the top three models evaluated zero-shot, outperforming ARIMA, ETS, DeepAR, and llmtime (GPT-3.5) despite llmtime being “orders of magnitude larger” (ICML 2024 poster).

On ETT (Electricity Transformer Temperature) long-horizon benchmarks, TimesFM’s zero-shot MAE matched supervised PatchTST, a model trained directly on target data. That result is the clearest demonstration of the pretrained-model thesis: comparable accuracy without dataset-specific training.

GIFT-Eval (2025)

The GIFT-Eval benchmark, which covers 97 tasks spanning short, medium, and long horizons across diverse domains, has become the primary competitive arena for time-series foundation models. TimesFM 2.0 reached #1 at release in early 2025, posting better aggregated MASE than the next-best model. TimesFM 2.5 retook the top spot in September 2025, notably while using only 200M parameters versus the 500M of its predecessor.

Amazon’s Chronos-2 and Salesforce’s Moirai 2.0 subsequently displaced TimesFM 2.5 on GIFT-Eval. Chronos-2 ranks first among pretrained models overall, with a high win rate in head-to-head comparisons; Moirai 2.0 holds the top MASE score among non-test-data-leaking models (Ansari et al., 2025; Aksu et al., 2025). The leaderboard race reflects a broader pattern: foundation model rankings are shifting rapidly and no single model leads across all dataset types and granularities.

How TimesFM Compares to Alternatives

Model	Org	Architecture	Params	Multivariate	Zero-Shot
TimesFM 2.5	Google	Decoder-only, patch tokens	200M	Via XReg only	Yes
Chronos-2	Amazon	Encoder-only, group attention	120M	Yes	Yes
Moirai 2.0	Salesforce	Any-variate decoder-only	Various	Yes (native)	Yes
Lag-LLaMA	Academic	Decoder-only, lag features	45M	No	Yes
TTM	IBM	MLP-Mixer hybrid	1M-48M	Yes	Yes
PatchTST	Academic	Encoder-only, patch tokens	Various	Channel-independent	No
N-HiTS	Academic	Hierarchical interpolation	Small	No	No

Key distinctions:

Chronos-2 dropped the discrete-tokenization approach of the original Chronos in favor of an encoder-only transformer with a group attention mechanism, a dual-attention design that alternates between temporal self-attention within a series and cross-series attention across a group. This makes multivariate and covariate-informed tasks native to the architecture rather than bolted on. At 120M parameters, it outperforms Chronos-Bolt (the distilled Chronos 1.x variant) by a substantial margin on both GIFT-Eval and fev-bench (Ansari et al., 2025).

Moirai 2.0 handles any-variate time series natively, modeling cross-series dependencies that TimesFM fundamentally cannot capture in its standard configuration. Salesforce posted Moirai 2.0 to arXiv in November 2025, switching from the original masked encoder to a decoder-only design and dropping parameter count roughly 30x relative to Moirai 1.0-Large while posting better benchmark results. For demand forecasting where correlated product families matter, Moirai retains its architectural advantage over TimesFM despite the internal redesign (Aksu et al., 2025).

IBM TTM at 1M-48M parameters outperforms TimesFM on some benchmarks, demonstrating that compact, domain-specialized architectures remain competitive against larger foundation models.

Quick Start

TimesFM is available on GitHub and Hugging Face, Apache 2.0 licensed.

pip install timesfm

import numpy as np
import timesfm

# Load TimesFM 2.5 (200M parameters)
model = timesfm.TimesFM_2p5_200M_torch.from_pretrained(
    "google/timesfm-2.5-200m-pytorch",
    torch_compile=True
)

# Configure forecast behavior
model.compile(
    timesfm.ForecastConfig(
        max_context=1024,
        max_horizon=256,
        normalize_inputs=True,
        use_continuous_quantile_head=True,
        force_flip_invariance=True,
        infer_is_positive=True,
        fix_quantile_crossing=True,
    )
)

# Forecast; accepts variable-length input series
point_forecast, quantile_forecast = model.forecast(
    horizon=12,
    inputs=[
        np.linspace(0, 1, 100),          # 100 historical points
        np.sin(np.linspace(0, 20, 67)),   # 67 historical points
    ],
)
# point_forecast.shape    → (2, 12)
# quantile_forecast.shape → (2, 12, 10)  # 10 quantile levels

TimesFM 2.5 accepts series of different lengths in the same batch, a practical improvement over models requiring fixed-length inputs.

Enterprise Integration: BigQuery and AlloyDB

The strongest signal of production adoption is Google’s integration of TimesFM directly into its cloud data services. AI.FORECAST and AI.EVALUATE reached general availability in BigQuery ML in November 2025; AI.DETECT_ANOMALIES remains in public preview. All three functions support TimesFM 2.5 with dynamic context windows up to 15,000 time-points (Google Cloud Blog, November 2025).

-- Zero-shot time series forecast directly in SQL
SELECT *
FROM AI.FORECAST(
  MODEL `project.dataset.timesfm_model`,
  TABLE `project.dataset.sales_data`,
  STRUCT(30 AS horizon, 0.8 AS confidence_level)
);

AlloyDB (Google’s PostgreSQL-compatible database) has added preview AI.FORECAST support, enabling forecasts on operational data without ETL pipelines. The GA launch also exposes TimesFM through several open-source frameworks: the Agent Development Kit’s built-in tools, an MCP toolbox, a Gemini CLI extension, and BigQuery DataFrames (Google Cloud Blog, November 2025). Taken together, these integrations position TimesFM as infrastructure rather than a research artifact, directly relevant for data engineering teams that want forecasting without exporting data to a separate ML platform.

Uber maintains a public fork of the TimesFM repository on GitHub, indicating active evaluation for internal forecasting use cases across large volumes. (Uber timesfm-fork)

Where TimesFM Falls Short

Univariate-only core: TimesFM 1.0 and 2.0 forecast each series independently, with no cross-series dependencies. Version 2.5 partially addresses this via XReg (external regressors), which applies a linear ridge regression correction using covariates on top of the model’s base forecast. This helps with known external signals (promotions, weather) but does not model correlated product demand the way Moirai 2.0 or Chronos-2 do natively.

Long-horizon error accumulation: The decoder-only causal architecture compounds errors on long horizons. Each autoregressive step’s prediction error feeds into the next. Encoder-decoder architectures that predict full output sequences directly avoid this accumulation.

Calibration: A 2025 study of five TSFMs found that foundation models are consistently better calibrated than baseline forecasters and do not exhibit the systematic overconfidence typical of other deep learning domains (Koochali et al., ICLR 2026). TimesFM 2.5’s dedicated quantile head improves on the experimental heads in versions 1.0 and 2.0, but empirical coverage should still be validated on your specific distribution before treating uncertainty intervals as decision-grade risk estimates.

The cost of MSE-optimal point forecasts: A June 2026 arXiv paper (accepted at KDD 2026) formalizes a conditional uncertainty gap and proves that whenever this gap is nonzero, no deterministic predictor can simultaneously minimize MSE and match the marginal distribution of realized futures (arXiv:2606.04342). The authors show that as conditional uncertainty increases with forecast horizon, the attainable set expands into a pronounced Pareto front separating MSE-optimal but under-dispersed predictors from methods that trade accuracy for realistic marginal variability. Across nine real-world benchmarks, the authors report that small relaxations in MSE (≤5%) frequently yield disproportionate gains in marginal realism, with median improvements of 17.3% and gains exceeding 30% in some datasets (arXiv:2606.04342). The authors recast strategy selection as navigation of an unavoidable accuracy-realism trade-off, strengthening the case for probabilistic or quantile heads over point estimates alone. TimesFM 2.5’s optional quantile head is a partial response, but the critique applies to any model whose primary output is a median or mean prediction.

Domain transfer gaps: TimesFM was pretrained primarily on consumer web (Wikipedia, Google Trends), electricity, and weather data. Financial time series with volatility clustering, network traffic with heavy tails, and industrial sensor data with abrupt structural breaks can exhibit statistical characteristics that fall outside the pretraining distribution. Specialized econometric models (GARCH, ECM) still outperform TimesFM on realized volatility forecasting.

Structural breaks: After a regime change (a market dislocation, supply chain shock, or product discontinuation), TimesFM’s pretraining priors can produce forecasts anchored to the wrong regime. The model has no mechanism for detecting and discounting pre-break history.

The Broader Trajectory

TimesFM represents the clearest implementation of the foundation model thesis applied to time series. The results are substantive: a 200M-parameter model trained once on web-scale data achieves zero-shot accuracy competitive with supervised models on standard benchmarks. Integrated into BigQuery and AlloyDB at GA, it lowers the barrier to production forecasting substantially for teams already on Google Cloud.

What it does not represent is the final word on time-series modeling. The GIFT-Eval leaderboard has seen Amazon’s Chronos-2, Salesforce’s Moirai 2.0, and IBM’s TTM each demonstrate advantages on specific dataset characteristics. Notably, both Chronos-2 and Moirai 2.0 handle multivariate inputs natively, a capability TimesFM approximates but does not match architecturally. The field has moved from asking whether foundation models can forecast time series to asking which architectural choices (decoder vs. encoder-only, patch vs. group attention, univariate vs. multivariate) produce the best performance across the broadest range of domains.

For practitioners, the practical question is simpler: TimesFM 2.5 is free, Apache 2.0 licensed, requires no target-domain training data, and has first-class SQL integration for Google Cloud users. As a default starting point for forecasting tasks, that combination is difficult to dismiss, even when domain-specialized alternatives remain competitive on specific datasets.

Frequently Asked Questions

Q: Does TimesFM require fine-tuning on my data? A: No. TimesFM operates zero-shot: you provide historical series and a horizon length, and the model generates forecasts without any dataset-specific training. Fine-tuning is supported for higher accuracy on specific domains but not required.

Q: How does TimesFM handle multivariate time series? A: TimesFM’s core architecture is univariate: each series is forecast independently. TimesFM 2.5 added XReg covariate support, which applies a linear correction using external regressors, but does not model cross-series dependencies natively. For full multivariate modeling, Moirai 2.0 or Chronos-2 are architecturally better suited.

Q: What horizon lengths does TimesFM support? A: TimesFM 2.5 supports up to 16,384 time-points of historical context and can forecast up to 1,000 horizon steps (Google Research, GitHub) using the continuous quantile head. In practice, accuracy degrades on very long horizons due to autoregressive error accumulation inherent to the decoder-only architecture.

Q: Is TimesFM available without Python infrastructure? A: Yes. Google Cloud’s BigQuery ML exposes AI.FORECAST (now GA) powered by TimesFM 2.5, enabling zero-shot forecasting directly from SQL queries with no ML infrastructure setup required.

Q: How does TimesFM compare to ARIMA for short series? A: On short or highly irregular series, classical methods like seasonal ARIMA can match TimesFM; the DARTS benchmark showed seasonal ARIMA remaining competitive. For large-scale forecasting where training and tuning thousands of individual ARIMA models is impractical, TimesFM’s single-model, zero-shot approach provides a practical advantage.

Sources: