TimesFM (Time Series Foundation Model) is Google’s answer to a question practitioners have asked for years: can a single pretrained model forecast any time series accurately, the way large language models handle any text task? As of late 2025, the answer is a qualified yes — with important caveats about what “accurate” means across different domains.
What Is TimesFM?
TimesFM is a decoder-only transformer model trained by Google Research on approximately 100 billion real-world time-points, designed to produce accurate time-series forecasts zero-shot — with no fine-tuning on the target dataset. A user provides historical values, specifies a horizon length, and the model outputs point forecasts and quantile uncertainty estimates.
The model was first published as an arXiv preprint in October 2023, accepted at ICML 2024, and open-sourced under Apache 2.0 in mid-2024.1 Three versions have shipped since:
| Version | Parameters | Max Context | Release |
|---|---|---|---|
| TimesFM 1.0 | 200M | 512 time-points | Open-sourced May 2024 |
| TimesFM 2.0 | 500M | 2,048 time-points | December 31, 2024 |
| TimesFM 2.5 | 200M | 16,384 time-points | September 15, 2025 |
Version 2.5 is the most notable trajectory: the team halved the parameter count from 2.0 while quadrupling the context window by 8x, reaching #1 on the GIFT-Eval zero-shot benchmark at release — before Amazon’s Chronos-2 subsequently surpassed it.2
How TimesFM Works
Patch-Based Tokenization
Rather than treating each data point as a token (which would make sequences unworkably long), TimesFM groups contiguous time-points into patches — directly mirroring how Vision Transformers tile images.
- Input patch: 32 time-points → one embedding token (via residual MLP)
- Output patch: 128 time-points generated per decoding step
The asymmetry is a deliberate efficiency choice: the model generates 128 future values in a single autoregressive step, substantially reducing the steps needed for long horizons. An output horizon of 512 time-points requires only 4 decoding steps.
Decoder-Only Architecture
The GPT-style design choice is central to understanding both TimesFM’s strengths and limitations. Causal (unidirectional) self-attention means each position attends only to prior positions, enabling autoregressive generation. The backbone for TimesFM 2.0 uses 50 transformer layers at 1,280-dimensional model width; version 2.5 achieves comparable performance at a smaller footprint through architectural efficiency gains including QKV matrix fusion.3
Pretraining Data
The original 100-billion time-point corpus was assembled from:
- Wikipedia Pageviews (2012–2023): Dominant source, covering daily/weekly/monthly aggregations
- Google Trends: 22,000 search interest time series at hourly to weekly granularities
- Public datasets: M4, electricity, and traffic benchmarks
- Synthetic data: 3 million ARMA-generated series (~50% of training mix)
TimesFM 2.0 extended this with the LOTSA archive (cloud infrastructure traces, solar/wind, climate reanalysis). Version 2.5 additionally incorporated the GiftEvalPretrain dataset from Salesforce.4
Masking Strategy
Training uses a masking regime where both individual patches and leading patches can be masked during a batch. This teaches the model to handle variable-length historical series gracefully — a practical necessity since real-world datasets rarely provide uniform history lengths.
Outputs
TimesFM produces:
- Point forecasts: Primary output, the model’s median prediction
- Quantile forecasts: 10 uncertainty bands (10th through 90th percentiles)
The quantile heads were explicitly marked experimental in versions 1.0 and 2.0. TimesFM 2.5 introduced a separate 30M-parameter quantile head intended to produce better-calibrated continuous probabilistic outputs up to approximately 1,000 horizon steps.5
Benchmarks and Performance
Zero-Shot Results (ICML 2024)
On the Monash Forecasting Archive — 23+ datasets spanning diverse domains — TimesFM ranked in the top three models evaluated zero-shot, outperforming ARIMA, ETS, DeepAR, and llmtime (GPT-3.5) despite llmtime being “orders of magnitude larger.”6
On ETT (Electricity Transformer Temperature) long-horizon benchmarks, TimesFM’s zero-shot MAE matched supervised PatchTST — a model trained directly on target data. That result is the clearest demonstration of the pretrained-model thesis: comparable accuracy without dataset-specific training.
GIFT-Eval (2025)
The GIFT-Eval benchmark — 97 tasks spanning short, medium, and long horizons across diverse domains — has become the primary competitive arena for time-series foundation models. TimesFM 2.0 reached #1 at release in early 2025, posting 6% better aggregated MASE than the next-best model. TimesFM 2.5 retook the top spot in September 2025, notably while using only 200M parameters versus the 500M of its predecessor.
However, Amazon’s Chronos-2 subsequently surpassed TimesFM 2.5 on GIFT-Eval in late 2025, achieving higher win rates and CRPS scores.7 The leaderboard race reflects a broader pattern: foundation model rankings are shifting rapidly and no single model leads across all dataset types and granularities.
How TimesFM Compares to Alternatives
| Model | Org | Architecture | Params | Multivariate | Zero-Shot |
|---|---|---|---|---|---|
| TimesFM 2.5 | Decoder-only, patch tokens | 200M | Via XReg only | Yes | |
| Chronos-2 | Amazon | Encoder-decoder, discretized tokens | Various | Yes | Yes |
| Moirai | Salesforce | Any-variate encoder-decoder | Various | Yes (native) | Yes |
| Lag-LLaMA | Academic | Decoder-only, lag features | 45M | No | Yes |
| TTM | IBM | MLP-Mixer hybrid | 1M–48M | Yes | Yes |
| PatchTST | Academic | Encoder-only, patch tokens | Various | Channel-independent | No |
| N-HiTS | Academic | Hierarchical interpolation | Small | No | No |
Key distinctions:
Chronos tokenizes continuous values into a discrete vocabulary (like text tokenization), making it probabilistic by design. TimesFM uses continuous patch embeddings and added probabilistic outputs later. Performance is dataset-dependent: Chronos Bolt (a distilled variant) matches TimesFM on speed while performing competitively on many benchmarks.
Moirai handles any-variate time series natively, modeling cross-series dependencies that TimesFM fundamentally cannot capture in its standard configuration. For demand forecasting where correlated product families matter, Moirai has an architectural advantage.
IBM TTM at 1M–48M parameters reportedly outperforms TimesFM by 19% on certain benchmarks, demonstrating that compact, domain-specialized architectures remain competitive against larger foundation models.
Quick Start
TimesFM is available on GitHub and Hugging Face, Apache 2.0 licensed.
pip install timesfmimport numpy as npimport timesfm
# Load TimesFM 2.5 (200M parameters)model = timesfm.TimesFM_2p5_200M_torch.from_pretrained( "google/timesfm-2.5-200m-pytorch", torch_compile=True)
# Configure forecast behaviormodel.compile( timesfm.ForecastConfig( max_context=1024, max_horizon=256, normalize_inputs=True, use_continuous_quantile_head=True, fix_quantile_crossing=True, ))
# Forecast — accepts variable-length input seriespoint_forecast, quantile_forecast = model.forecast( horizon=12, inputs=[ np.linspace(0, 1, 100), # 100 historical points np.sin(np.linspace(0, 20, 67)), # 67 historical points ],)# point_forecast.shape → (2, 12)# quantile_forecast.shape → (2, 12, 10) — 10 quantile levelsTimesFM 2.5 accepts series of different lengths in the same batch — a practical improvement over models requiring fixed-length inputs.
Enterprise Integration: BigQuery and AlloyDB
The strongest signal of production adoption is Google’s integration of TimesFM directly into its cloud data services. BigQuery ML now exposes AI.FORECAST, AI.EVALUATE, and AI.DETECT_ANOMALIES functions powered by TimesFM 2.5, with dynamic context windows up to 15,000 time-points.8
-- Zero-shot time series forecast directly in SQLSELECT *FROM AI.FORECAST( MODEL `project.dataset.timesfm_model`, TABLE `project.dataset.sales_data`, STRUCT(30 AS horizon, 0.8 AS confidence_level));AlloyDB (Google’s PostgreSQL-compatible database) has added preview AI.FORECAST support, enabling forecasts on operational data without ETL pipelines. This positions TimesFM as infrastructure rather than a research artifact — directly relevant for data engineering teams that want forecasting without exporting data to a separate ML platform.
Uber maintains a public fork of the TimesFM repository on GitHub, indicating active evaluation for internal forecasting use cases at scale.9
Where TimesFM Falls Short
Univariate-only core: TimesFM 1.0 and 2.0 forecast each series independently — no cross-series dependencies. Version 2.5 partially addresses this via XReg (external regressors), which applies a linear ridge regression correction using covariates on top of the model’s base forecast. This helps with known external signals (promotions, weather) but does not model correlated product demand the way Moirai or Chronos-2 do natively.
Long-horizon error accumulation: The decoder-only causal architecture compounds errors on long horizons. Each autoregressive step’s prediction error feeds into the next. Encoder-decoder architectures that predict full output sequences directly avoid this accumulation.
Calibration uncertainty: A 2025 paper specifically examining TSFM calibration found this to be an open problem across the field.10 TimesFM 2.5’s dedicated quantile head is an improvement, but practitioners should validate coverage on their specific distribution before relying on uncertainty estimates for decision-making.
Domain transfer gaps: TimesFM was pretrained primarily on consumer web (Wikipedia, Google Trends), electricity, and weather data. Financial time series with volatility clustering, network traffic with heavy tails, and industrial sensor data with abrupt structural breaks can exhibit statistical characteristics that fall outside the pretraining distribution. Specialized econometric models (GARCH, ECM) still outperform TimesFM on realized volatility forecasting.
Structural breaks: After a regime change — a market dislocation, supply chain shock, or product discontinuation — TimesFM’s pretraining priors can produce forecasts anchored to the wrong regime. The model has no mechanism for detecting and discounting pre-break history.
The Broader Trajectory
TimesFM represents the clearest implementation of the foundation model thesis applied to time series. The results are substantive: a 200M-parameter model trained once on web-scale data achieves zero-shot accuracy competitive with supervised models on standard benchmarks. Integrated into BigQuery and AlloyDB, it lowers the barrier to production forecasting substantially for teams already on Google Cloud.
What it does not represent is the final word on time-series modeling. The GIFT-Eval leaderboard has seen Amazon’s Chronos-2, Salesforce’s Moirai, and IBM’s TTM each demonstrate advantages on specific dataset characteristics. The field has moved from asking whether foundation models can forecast time series to asking which architectural choices (decoder vs. encoder-decoder, patch vs. lag features, univariate vs. multivariate) produce the best performance across the broadest range of domains.
For practitioners, the practical question is simpler: TimesFM 2.5 is free, Apache 2.0 licensed, requires no target-domain training data, and has first-class SQL integration for Google Cloud users. As a default starting point for forecasting tasks, that combination is difficult to dismiss — even when domain-specialized alternatives remain competitive on specific datasets.
Frequently Asked Questions
Q: Does TimesFM require fine-tuning on my data? A: No. TimesFM operates zero-shot — you provide historical series and a horizon length, and the model generates forecasts without any dataset-specific training. Fine-tuning is supported for higher accuracy on specific domains but not required.
Q: How does TimesFM handle multivariate time series? A: TimesFM’s core architecture is univariate: each series is forecast independently. TimesFM 2.5 added XReg covariate support, which applies a linear correction using external regressors, but does not model cross-series dependencies natively. For full multivariate modeling, Moirai or Chronos-2 are architecturally better suited.
Q: What horizon lengths does TimesFM support? A: TimesFM 2.5 supports up to 16,384 time-points of historical context and can forecast up to 1,000 horizon steps using the continuous quantile head. In practice, accuracy degrades on very long horizons due to autoregressive error accumulation inherent to the decoder-only architecture.
Q: Is TimesFM available without Python infrastructure?
A: Yes. Google Cloud’s BigQuery ML exposes AI.FORECAST powered by TimesFM 2.5, enabling zero-shot forecasting directly from SQL queries with no ML infrastructure setup required.
Q: How does TimesFM compare to ARIMA for short series? A: On short or highly irregular series, classical methods like seasonal ARIMA can match TimesFM — the DARTS benchmark showed seasonal ARIMA remaining competitive. For large-scale forecasting where training and tuning thousands of individual ARIMA models is impractical, TimesFM’s single-model, zero-shot approach provides a practical advantage.
Footnotes
-
Das, A., Kong, W., Sen, R., Zhou, Y. “A decoder-only foundation model for time-series forecasting.” ICML 2024. arXiv
.10688. ↩ -
MarkTechPost. “Google AI Ships TimesFM-2.5: Smaller, Longer Context Foundation Model That Now Leads GIFT-Eval Zero-Shot Forecasting.” September 16, 2025. ↩
-
Google Research. GitHub: google-research/timesfm, v1.2.x releases. https://github.com/google-research/timesfm ↩
-
LOTSA: Large-scale Open Time Series Archive. Detailed data composition described in TimesFM 2.0 release notes and the GIFT-Eval paper (arXiv
.10393). ↩ -
Google Research. TimesFM 2.5 release documentation. Hugging Face: google/timesfm-2.5-200m-pytorch. ↩
-
ICML 2024 poster: “A decoder-only foundation model for time-series forecasting.” https://icml.cc/virtual/2024/poster/33288 ↩
-
GIFT-Eval benchmark: Iqbal, S. et al. arXiv
.10393. Chronos-2 results: Amazon Research, late 2025. ↩ -
Google Cloud Blog. “TimesFM models in BigQuery and AlloyDB.” cloud.google.com/blog/products/data-analytics/timesfm-models-in-bigquery-and-alloydb. ↩
-
Uber timesfm-fork: https://github.com/uber/timesfm-fork ↩
-
Koochali, A. et al. “Beyond Accuracy: Are Time Series Foundation Models Well-Calibrated?” arXiv
.16060. October 2025. ↩