What Is Manual Backtesting (and What It Is Not)

There are two types of backtesting, and Notion is only useful for one of them:

Notion can do this

Manual / Discretionary Backtesting

You scroll through historical charts, identify every point where your setup criteria were met, and record what the outcome would have been. Notion stores each observation and calculates statistics across hundreds of trades. This is what discretionary traders — price action, order flow, pattern-based — actually need.

Notion cannot do this

Algorithmic Backtesting

Code runs against tick-by-tick or bar-by-bar historical price data, executing orders automatically based on programmatic rules. Requires TradingView Pine Script, MetaTrader Strategy Tester, or Python with a price data library. No manual chart analysis involved.

Most retail traders — especially those using price action, order flow, or pattern setups — need the manual approach. The question is not whether you can replicate algo-level testing in Notion. The question is whether you can build a statistically meaningful sample of your setup's historical performance, and the answer is yes.

A 150-trade backtest in Notion will tell you: what win rate to expect, what average R:R you actually captured (not what you planned), whether the setup works differently across sessions or market structures, and whether expectancy is positive enough to trade live. That is exactly what you need before putting real capital at risk.

Building the Backtest Database

Create a new Notion database called Backtest Log. Keep it separate from your live Trade Log — backtest data has different fields and you will want to filter and group it independently.

Here are all the properties to add:

Property Name Type Purpose
Trade # Number Sequential observation number. Makes it easy to track sample size at a glance.
Date Date Historical date of the setup (not the date you logged it).
Instrument Select The market/pair. Options: EURUSD, GBPUSD, NQ, ES, BTCUSD, etc.
Timeframe Select M1, M5, M15, H1, H4, D1. The timeframe you identified the setup on.
Setup Name Select Your named setups: "OB Pullback", "BOS Retest", "VWAP Fade", "Range Breakout", etc.
Market Structure Select Trending Up, Trending Down, Ranging, Choppy. Critical for identifying when setups work and when they don't.
Session Select Asian, London, New York, London/NY Overlap. Many setups are session-dependent.
Direction Select Long or Short. Useful for checking if your setup has directional bias.
Entry Price Number Where you would have entered the trade.
Stop Price Number Where the stop loss would have been placed.
Target Price Number The primary take-profit level.
Exit Price Number Where the trade actually would have ended (stop hit, target hit, or manual exit).
Planned R:R Formula abs(prop("Target Price") - prop("Entry Price")) / abs(prop("Entry Price") - prop("Stop Price"))
Outcome Select Winner, Loser, Breakeven, Partial. Set manually after determining the historical result.
Actual R:R Number The actual R multiple captured: +2.0 means 2R winner, -1.0 means full stop out. Enter manually.
Is Win Formula prop("Outcome") == "Winner". Powers all win rate rollups.
Gross Profit R Formula if(prop("Actual R:R") > 0, prop("Actual R:R"), 0). Used for profit factor calculation.
Gross Loss R Formula if(prop("Actual R:R") < 0, abs(prop("Actual R:R")), 0). Used for profit factor calculation.
Screenshot Files Chart screenshot at the time of entry. Essential for qualitative review.
Notes Text Why this was or was not a valid setup. What market context was present.

Once properties are set up, create a Rollup Summary database — one row per setup name — that links back to the Backtest Log and calculates aggregate statistics per setup. This is where you will see "OB Pullback: 62% win rate, 1.8R average, PF 2.1" vs "Range Breakout: 44% win rate, 1.2R average, PF 0.92 — no edge."

What to Log for Each Trade Observation

The value of a backtest is entirely determined by what you log. These are the non-obvious fields that most traders skip — and regret later:

Market Structure: This is the single most predictive field in a backtest. Most setups have wildly different performance in trending vs ranging markets. If you don't tag this, you will end up with averaged statistics that hide the fact your setup has a 70% win rate in trends and 35% in ranges — and you will trade it in ranges wondering why it stopped working.

Session: The London session and New York session have different volatility, spread, and liquidity profiles. A setup that triggers cleanly on London open may fail completely during the Asian session. Filter your statistics by session and you may discover your edge is much more concentrated than you thought.

Actual R:R (not just Winner/Loser): Most traders log "Win" or "Loss" but not the actual multiple captured. If you always target 3R but average only 1.8R because price never quite reaches your target, your expectancy calculation based on planned R:R will be significantly overstated. Log the actual exit multiple every time.

Screenshots: Attach a chart screenshot to every observation. Six months from now, when you are reviewing why your win rate dropped, screenshots let you see what the market actually looked like — not just what your notes say. They are the most time-consuming part but the most valuable for qualitative review.

Calculating Your Edge Statistics

Build a Performance Summary page for each setup. Add a linked view of your Backtest Log filtered by that setup name, then add rollup calculations:

Win Rate

Rollup: Count of records where Is Win = true, divided by total count.

toNumber(prop("Winning Trades")) / toNumber(prop("Total Trades")) * 100

Target: above your breakeven win rate at your average R:R. At 1:2 R:R, you need >33% to be profitable.

Profit Factor

Sum of all Gross Profit R divided by sum of all Gross Loss R.

toNumber(prop("Total Gross Profit R")) / toNumber(prop("Total Gross Loss R"))

PF above 1.5 = edge present. Above 2.0 = strong edge. Below 1.0 = losing strategy.

Expectancy per Trade (R)

Average amount won or lost per trade in R multiples.

(Win Rate × Avg Win R) - ((1 - Win Rate) × Avg Loss R)

Even at 45% win rate with 2.0R average win and 1.0R average loss: (0.45 × 2.0) - (0.55 × 1.0) = +0.35R per trade. That is a positive edge.

Max Consecutive Losses

Manually scan the backtest log sorted by date to find the longest losing streak.

Manual review — no Notion formula

This tells you the psychological difficulty of trading this strategy live. A 7-loss streak is expected if it happened in your backtest. If you would quit after 3 losses, you need to know that now.

For even more granular analysis, create filtered views of the backtest log grouped by Market Structure and Session. Add the same rollup calculations to each grouped view. The table you are building in your head looks like this:

Filter Win Rate Avg R:R PF Verdict
All trades 52% 1.7R 1.65 Trade it
Trending market only 68% 2.0R 2.7 Strong edge
Ranging market only 36% 1.2R 0.83 Avoid ranging
London session 61% 1.9R 2.2 Best session
New York session 48% 1.6R 1.5 Acceptable
Asian session 31% 1.1R 0.65 No edge here

This table — which Notion can produce once your data is logged — changes how you trade. You no longer trade this setup in ranging conditions or the Asian session. Your effective win rate goes from 52% to 65%, and your profit factor improves from 1.65 to 2.4. The backtest data defined your rules more precisely than intuition ever could.

The Backtesting Process Step by Step

1

Define the setup rules before you look at any charts

Write down exactly what conditions must be present for a valid trade. Be specific: "Price must have broken a swing high, retraced to the 50% level of the prior leg, shown a bullish engulfing candle on the 15-minute chart, with session at London open or New York open." Vague rules produce inconsistent data.

If you define the rules after looking at charts, you will unconsciously create rules that fit the winners you already see. This is the most common source of backtest bias.

2

Collect at least 100 historical observations

Scroll back through charts on TradingView, MetaTrader, or your broker's platform. Go back at least 12–18 months to capture different market conditions (trending periods, ranging periods, high-volatility events). Log every point where your rules were met — including the ones that would have lost. Cherry-picking only winners defeats the purpose.

Aim for 100 minimum, 200+ for high confidence. For setups that trigger once a day, 100 trades = roughly 5 months of history.

3

Log each observation immediately and completely

In Notion: open a new row in your Backtest Log, fill in all required fields, attach the screenshot. Do not skip fields because they feel unimportant — Market Structure and Session especially. These are the fields that turn a mediocre overall result into a clear, actionable rule.

Log while the chart is in front of you. Logging from memory introduces errors.

4

Review statistics after every 25–30 trades

Check your win rate and profit factor after each block of 25 observations. This lets you spot if the statistics are diverging from expectations early, rather than waiting until you have 100 trades and discover the strategy never had an edge. It also helps you calibrate — if 30 trades in your win rate is running at 70% when you expected 50%, you may need to widen your definition of what counts as a valid setup.

5

Run forward testing before risking real capital

Once your backtest shows positive expectancy (PF > 1.3, win rate above your calculated breakeven), run the strategy on a demo account for 30–50 live trades. Forward testing in real-time conditions catches two things backtesting cannot: slippage and spread in volatile conditions, and the psychological challenge of taking losses in real time.

In Notion, your live Trade Log and Backtest Log stay separate — but you can compare statistics side by side. If live performance significantly underperforms the backtest, the strategy may be over-fitted or market conditions have changed.

Reading and Acting on the Results

After 100+ observations, here is how to interpret what the numbers are telling you:

Statistic What You See What It Means
Profit Factor < 1.0 PF = 0.85 No edge. This strategy loses money over time. Do not trade it live. Refine the rules and retest, or discard it.
Profit Factor 1.0–1.3 PF = 1.15 Marginal. Technically profitable but thin margin for error. Transaction costs may eliminate the edge. Needs more testing or rule refinement.
Profit Factor 1.3–1.8 PF = 1.55 Tradeable. Positive edge with room for execution variation. Start with smallest position size and scale up only after live trading confirms the backtest.
Profit Factor > 1.8 PF = 2.1 Strong edge. Proceed to forward testing with confidence. If live results match, this is a core strategy worth investing in.
Win rate too high (>80%) WR = 83% Suspect data. Likely over-fitting — you may have unconsciously filtered out losses or set overly wide targets. Review your entry criteria for confirmation bias.
High variance by condition PF 2.5 in trend, 0.7 in range Use the data. Add a market structure filter to your entry rules. Only take the setup when the high-PF condition is present. This is backtesting working exactly as intended.

When Notion Is Not Enough

Notion Backtesting Has Real Limits

  • No automated chart replay. You scroll charts manually — slow and subject to hindsight bias. Dedicated tools like TradingView Bar Replay, Forex Tester, or TradingView's replay mode prevent you from seeing what came after your entry before deciding.
  • No automatic trade identification. You must manually find every valid setup in historical data. If your setup triggers 3 times per day over 12 months, that is 900+ rows of manual logging — possible, but extremely time-consuming.
  • No equity curve visualization. Notion cannot plot your cumulative P&L over 200 backtest trades. Seeing whether equity grew smoothly or had violent drawdown periods requires exporting data to a chart tool.
  • Prone to hindsight bias. Scrolling through past charts, you can unconsciously see future price action before logging the entry. Purpose-built chart replay tools physically block future bars to prevent this.

For serious strategy development, Notion backtest logs are best used as a second layer of documentation rather than a primary testing tool. Use a chart replay tool (TradingView, Forex Tester 4) to actually replay the market and log trades without seeing the future. Then import or transcribe the results into Notion for statistical analysis, grouping, and long-term record keeping.

Once you move from backtesting to live trading, your TSB Pro journal automatically imports trades from your broker — no manual logging. The same statistics you calculated manually in your backtest (win rate, profit factor, avg R:R) are recalculated automatically from live data, so you can compare whether your live performance matches the edge you found in testing.

Yes, for discretionary trading strategies. Notion cannot run automated code on price data, but you can manually log historical trade observations from chart replay or historical screenshots, then use Notion formulas and rollups to calculate win rate, profit factor, average R:R, expectancy, and max drawdown for each setup.

This is called manual backtesting — the same approach used by professional discretionary traders. For algorithmic backtesting on tick data, you need a dedicated platform like TradingView Pine Script, MetaTrader Strategy Tester, or Python.