TimesFM (Time Series Foundation Model) is Google’s answer to a question practitioners have asked for years: can a single pretrained model forecast any time series accurately, the way large language models handle any text task? As of late 2025, the answer is a qualified yes — with important caveats about what “accurate” means across different domains.
What Is TimesFM?
TimesFM is a decoder-only transformer model trained by Google Research on approximately 100 billion real-world time-points, designed to produce accurate time-series forecasts zero-shot — with no fine-tuning on the target dataset. A user provides historical values, specifies a horizon length, and the model outputs point forecasts and quantile uncertainty estimates.
The model was first published as an arXiv preprint in October 2023, accepted at ICML 2024, and open-sourced under Apache 2.0 in mid-2024.1 Three versions have shipped since:
| Version | Parameters | Max Context | Release |
|---|---|---|---|
| TimesFM 1.0 | 200M | 512 time-points | Open-sourced May 2024 |
| TimesFM 2.0 | 500M | 2,048 time-points | December 31, 2024 |
| TimesFM 2.5 | 200M | 16,384 time-points | September 15, 2025 |
Version 2.5 is the most notable trajectory: the team halved the parameter count from 2.0 while expanding the context window 8x, reaching #1 on the GIFT-Eval zero-shot benchmark at release — before Amazon’s Chronos-2 and Salesforce’s Moirai 2.0 subsequently surpassed it.2
How TimesFM Works
Patch-Based Tokenization
Rather than treating each data point as a token (which would make sequences unworkably long), TimesFM groups contiguous time-points into patches — directly mirroring how Vision Transformers tile images.
- Input patch: 32 time-points → one embedding token (via residual MLP)
- Output patch: 128 time-points generated per decoding step
The asymmetry is a deliberate efficiency choice: the model generates 128 future values in a single autoregressive step, substantially reducing the steps needed for long horizons. An output horizon of 512 time-points requires only 4 decoding steps.
Decoder-Only Architecture
The GPT-style design choice is central to understanding both TimesFM’s strengths and limitations. Causal (unidirectional) self-attention means each position attends only to prior positions, enabling autoregressive generation. The backbone for TimesFM 2.0 uses 50 transformer layers at 1,280-dimensional model width; version 2.5 achieves comparable performance at a smaller footprint through architectural efficiency gains including QKV matrix fusion. (Google Research. GitHub: google-research/timesfm, v1.2.x releases)
Pretraining Data
The original 100-billion time-point corpus was assembled from:
- Wikipedia Pageviews (2012–2023): Dominant source, covering daily/weekly/monthly aggregations
- Google Trends: 22,000 search interest time series at hourly to weekly granularities
- Public datasets: M4, electricity, and traffic benchmarks
- Synthetic data: 3 million ARMA-generated series (~50% of training mix)
TimesFM 2.0 extended this with the LOTSA archive (cloud infrastructure traces, solar/wind, climate reanalysis). Version 2.5 additionally incorporated the GiftEvalPretrain dataset from Salesforce.4
Masking Strategy
Training uses a masking regime where both individual patches and leading patches can be masked during a batch. This teaches the model to handle variable-length historical series gracefully — a practical necessity since real-world datasets rarely provide uniform history lengths.
Outputs
TimesFM produces:
- Point forecasts: Primary output, the model’s median prediction
- Quantile forecasts: 10 uncertainty bands (10th through 90th percentiles)
The quantile heads were explicitly marked experimental in versions 1.0 and 2.0. TimesFM 2.5 introduced a separate 30M-parameter quantile head intended to produce better-calibrated continuous probabilistic outputs up to approximately 1,000 horizon steps.5
Benchmarks and Performance
Zero-Shot Results (ICML 2024)
On the Monash Forecasting Archive — 23+ datasets spanning diverse domains — TimesFM ranked in the top three models evaluated zero-shot, outperforming ARIMA, ETS, DeepAR, and llmtime (GPT-3.5) despite llmtime being “orders of magnitude larger.” (ICML 2024 poster: “A decoder-only foundation model for time-series forecasting.”)
On ETT (Electricity Transformer Temperature) long-horizon benchmarks, TimesFM’s zero-shot MAE matched supervised PatchTST — a model trained directly on target data. That result is the clearest demonstration of the pretrained-model thesis: comparable accuracy without dataset-specific training.
GIFT-Eval (2025)
The GIFT-Eval benchmark — 97 tasks spanning short, medium, and long horizons across diverse domains — has become the primary competitive arena for time-series foundation models. TimesFM 2.0 reached #1 at release in early 2025, posting 6% better aggregated MASE than the next-best model. TimesFM 2.5 retook the top spot in September 2025, notably while using only 200M parameters versus the 500M of its predecessor.
Amazon’s Chronos-2 and Salesforce’s Moirai 2.0 subsequently displaced TimesFM 2.5 on GIFT-Eval. Chronos-2 ranks first among pretrained models overall, with a win rate exceeding 90% in head-to-head comparisons; Moirai 2.0 holds the top MASE score among non-test-data-leaking models.712 The leaderboard race reflects a broader pattern: foundation model rankings are shifting rapidly and no single model leads across all dataset types and granularities.
How TimesFM Compares to Alternatives
| Model | Org | Architecture | Params | Multivariate | Zero-Shot |
|---|---|---|---|---|---|
| TimesFM 2.5 | Decoder-only, patch tokens | 200M | Via XReg only | Yes | |
| Chronos-2 | Amazon | Encoder-only, group attention | 120M | Yes | Yes |
| Moirai 2.0 | Salesforce | Any-variate decoder-only | Various | Yes (native) | Yes |
| Lag-LLaMA | Academic | Decoder-only, lag features | 45M | No | Yes |
| TTM | IBM | MLP-Mixer hybrid | 1M–48M | Yes | Yes |
| PatchTST | Academic | Encoder-only, patch tokens | Various | Channel-independent | No |
| N-HiTS | Academic | Hierarchical interpolation | Small | No | No |
Key distinctions:
Chronos-2 dropped the discrete-tokenization approach of the original Chronos in favor of an encoder-only transformer with a group attention mechanism — a dual-attention design that alternates between temporal self-attention within a series and cross-series attention across a group. This makes multivariate and covariate-informed tasks native to the architecture rather than bolted on. At 120M parameters, it outperforms Chronos-Bolt (the distilled Chronos 1.x variant) by a substantial margin on both GIFT-Eval and fev-bench.7
Moirai 2.0 handles any-variate time series natively, modeling cross-series dependencies that TimesFM fundamentally cannot capture in its standard configuration. Salesforce released Moirai 2.0 in August 2025, switching from the original masked encoder to a decoder-only design — dropping parameter count roughly 30x relative to Moirai 1.0-Large while posting better benchmark results. For demand forecasting where correlated product families matter, Moirai retains its architectural advantage over TimesFM despite the internal redesign.12
IBM TTM at 1M–48M parameters reportedly outperforms TimesFM by 19% on certain benchmarks, demonstrating that compact, domain-specialized architectures remain competitive against larger foundation models.
Quick Start
TimesFM is available on GitHub and Hugging Face, Apache 2.0 licensed.
pip install timesfmimport numpy as npimport timesfm
# Load TimesFM 2.5 (200M parameters)model = timesfm.TimesFM_2p5_200M_torch.from_pretrained( "google/timesfm-2.5-200m-pytorch", torch_compile=True)
# Configure forecast behaviormodel.compile( timesfm.ForecastConfig( max_context=1024, max_horizon=256, normalize_inputs=True, use_continuous_quantile_head=True, fix_quantile_crossing=True, ))
# Forecast — accepts variable-length input seriespoint_forecast, quantile_forecast = model.forecast( horizon=12, inputs=[ np.linspace(0, 1, 100), # 100 historical points np.sin(np.linspace(0, 20, 67)), # 67 historical points ],)# point_forecast.shape → (2, 12)# quantile_forecast.shape → (2, 12, 10) — 10 quantile levelsTimesFM 2.5 accepts series of different lengths in the same batch — a practical improvement over models requiring fixed-length inputs.
Enterprise Integration: BigQuery and AlloyDB
The strongest signal of production adoption is Google’s integration of TimesFM directly into its cloud data services. AI.FORECAST and AI.EVALUATE reached general availability in BigQuery ML in November 2025; AI.DETECT_ANOMALIES remains in public preview. All three functions support TimesFM 2.5 with dynamic context windows up to 15,000 time-points.8
-- Zero-shot time series forecast directly in SQLSELECT *FROM AI.FORECAST( MODEL `project.dataset.timesfm_model`, TABLE `project.dataset.sales_data`, STRUCT(30 AS horizon, 0.8 AS confidence_level));AlloyDB (Google’s PostgreSQL-compatible database) has added preview AI.FORECAST support, enabling forecasts on operational data without ETL pipelines. In February 2026, the same capability reached Connected Sheets, allowing spreadsheet users to run AI.FORECAST on BigQuery tables directly from Google Sheets without SQL.11 Taken together, these integrations position TimesFM as infrastructure rather than a research artifact — directly relevant for data engineering teams that want forecasting without exporting data to a separate ML platform.
Uber maintains a public fork of the TimesFM repository on GitHub, indicating active evaluation for internal forecasting use cases at scale. (Uber timesfm-fork)
Where TimesFM Falls Short
Univariate-only core: TimesFM 1.0 and 2.0 forecast each series independently — no cross-series dependencies. Version 2.5 partially addresses this via XReg (external regressors), which applies a linear ridge regression correction using covariates on top of the model’s base forecast. This helps with known external signals (promotions, weather) but does not model correlated product demand the way Moirai 2.0 or Chronos-2 do natively.
Long-horizon error accumulation: The decoder-only causal architecture compounds errors on long horizons. Each autoregressive step’s prediction error feeds into the next. Encoder-decoder architectures that predict full output sequences directly avoid this accumulation.
Calibration uncertainty: A 2025 paper specifically examining TSFM calibration found this to be an open problem across the field.10 TimesFM 2.5’s dedicated quantile head is an improvement, but practitioners should validate coverage on their specific distribution before relying on uncertainty estimates for decision-making.
Domain transfer gaps: TimesFM was pretrained primarily on consumer web (Wikipedia, Google Trends), electricity, and weather data. Financial time series with volatility clustering, network traffic with heavy tails, and industrial sensor data with abrupt structural breaks can exhibit statistical characteristics that fall outside the pretraining distribution. Specialized econometric models (GARCH, ECM) still outperform TimesFM on realized volatility forecasting.
Structural breaks: After a regime change — a market dislocation, supply chain shock, or product discontinuation — TimesFM’s pretraining priors can produce forecasts anchored to the wrong regime. The model has no mechanism for detecting and discounting pre-break history.
The Broader Trajectory
TimesFM represents the clearest implementation of the foundation model thesis applied to time series. The results are substantive: a 200M-parameter model trained once on web-scale data achieves zero-shot accuracy competitive with supervised models on standard benchmarks. Integrated into BigQuery and AlloyDB at GA, it lowers the barrier to production forecasting substantially for teams already on Google Cloud.
What it does not represent is the final word on time-series modeling. The GIFT-Eval leaderboard has seen Amazon’s Chronos-2, Salesforce’s Moirai 2.0, and IBM’s TTM each demonstrate advantages on specific dataset characteristics. Notably, both Chronos-2 and Moirai 2.0 handle multivariate inputs natively — a capability TimesFM approximates but does not match architecturally. The field has moved from asking whether foundation models can forecast time series to asking which architectural choices (decoder vs. encoder-only, patch vs. group attention, univariate vs. multivariate) produce the best performance across the broadest range of domains.
For practitioners, the practical question is simpler: TimesFM 2.5 is free, Apache 2.0 licensed, requires no target-domain training data, and has first-class SQL integration for Google Cloud users. As a default starting point for forecasting tasks, that combination is difficult to dismiss — even when domain-specialized alternatives remain competitive on specific datasets.
Frequently Asked Questions
Q: Does TimesFM require fine-tuning on my data? A: No. TimesFM operates zero-shot — you provide historical series and a horizon length, and the model generates forecasts without any dataset-specific training. Fine-tuning is supported for higher accuracy on specific domains but not required.
Q: How does TimesFM handle multivariate time series? A: TimesFM’s core architecture is univariate: each series is forecast independently. TimesFM 2.5 added XReg covariate support, which applies a linear correction using external regressors, but does not model cross-series dependencies natively. For full multivariate modeling, Moirai 2.0 or Chronos-2 are architecturally better suited.
Q: What horizon lengths does TimesFM support? A: TimesFM 2.5 supports up to 16,384 time-points of historical context and can forecast up to 1,000 horizon steps using the continuous quantile head. In practice, accuracy degrades on very long horizons due to autoregressive error accumulation inherent to the decoder-only architecture.
Q: Is TimesFM available without Python infrastructure? A: Yes. Google Cloud’s BigQuery ML exposes AI.FORECAST (now GA) powered by TimesFM 2.5, enabling zero-shot forecasting directly from SQL queries with no ML infrastructure setup required.
Q: How does TimesFM compare to ARIMA for short series? A: On short or highly irregular series, classical methods like seasonal ARIMA can match TimesFM — the DARTS benchmark showed seasonal ARIMA remaining competitive. For large-scale forecasting where training and tuning thousands of individual ARIMA models is impractical, TimesFM’s single-model, zero-shot approach provides a practical advantage.
Sources:
- Google Research. GitHub: google-research/timesfm
- GIFT-Eval Leaderboard — Salesforce/GIFT-Eval on Hugging Face
- Introducing Chronos-2: From univariate to universal forecasting — Amazon Science
- Chronos-2: From Univariate to Universal Forecasting — arXiv.15821
- Moirai 2.0: When Less Is More for Time Series Forecasting — arXiv.11698
- TimesFM models in BigQuery and AlloyDB — Google Cloud Blog
- The TimesFM model — BigQuery Google Cloud Documentation
- TimesFM Release — Hugging Face collection