AnalyticaHouse
Kerem Can Altuğ

Kerem Can Altuğ

Apr 29, 2026
7 min read

GA4 BigQuery Export: The Evolution of Traffic Sources and Correct Analysis Strategies

GA4 BigQuery Export: The Evolution of Traffic Sources and Correct Analysis Strategies

When working with raw data exported from Google Analytics 4 to BigQuery, understanding the source, medium, and campaign fields, which form the basis of digital marketing reports, is critical. The platform records data for each visitor and each event separately at the user, session, and item levels. This structure provides great flexibility but can create confusion about how UTM parameters will be reflected in reports.

Key points to consider when working with the GA4 data model are:

  • Separate traffic and campaign sources are maintained for each data layer (user, session, event, item).
  • Different levels of fields serve different segmentation strategies.

1. Traditional Approach and User Source (First User)

Most data analysts with classic Universal Analytics habits usually focus only on the traffic_source field in the export data. This field holds user-level (first user) source, medium, and campaign information. It is standard for determining the user's initial arrival source to the site, but it may miss changes within the session or subsequent interactions.

2. Event-Based Flexibility: collected_traffic_source

The collected_traffic_source field, added to the schema in June 2023, changed the rules of the game. This field provides session and traffic source information recorded specifically for each event. UTM parameters, ad click tags such as gclid/dclid, and manual campaign data are stored here.

This field provides solutions to the following needs of analysts:

  • Event-Level Analysis: Clarifies situations where a user interacts with multiple campaigns.
  • Dynamic Source Tracking: Provides the most accurate data for examining UTM changes between sessions.
  • Changing Campaign Impact: Acts as the "golden key" for measuring changing channel impacts in different user sessions.

3. Session-Focused Analysis: session_traffic_source_last_click

The session_traffic_source_last_click field, introduced with the July 2024 update, offers a structure closer to the "Acquisition Reports" in the GA4 interface. This field is specifically designed for the session-based last-click attribution model.

The main advantages provided by this field are:

  • Session Conversion Analysis: Provides a quick answer to the question, "Which channel generated this session conversion?"
  • E-commerce Focus: It is a critical data source, especially for those who do multi-channel advertising and e-commerce sites that measure performance by last click.

4. Advanced Channel Management and Product-Based Analytics

With the updates in October 2024, advanced fields such as cross_channel_campaign, sa360_campaign, and dv360_campaign were added to the schema. Now, not only Google Ads data but also Search Ads 360 or Display & Video 360 data can be analyzed as separate struct fields. This new breakdown provides marketing teams with a comprehensive perspective on platform-based ROI analysis.

However, the following caveats should be considered when analyzing at the item level:

  • Matching Logic: Product details (add to cart, etc.) may not directly match session or campaign data.
  • Mapping Requirement: In cases where multiple products are involved in a single transaction, matching should be done via transaction_id or user_pseudo_id.

Conclusion

In GA4's BigQuery export structure, there is no longer a single correct field for traffic and campaign reporting. Each field, from the initial user source to session-based last click or manually tagged campaign information, serves a different purpose. In modern marketing analytics, obtaining reliable results depends on selecting the data field that best suits the analysis objective.

Frequently Asked Questions

Which field should I choose as the “main source”?
There is no single “main” source definition; the selection should be made according to the purpose of the analysis. For user acquisition, traffic_source is preferred; for capturing UTM and click-id signals at the time of an event, collected_traffic_source is preferred; and for session-based results closer to GA4 Acquisition reports, session_traffic_source_last_click is preferred.

Why doesn't the Acquisition report in the GA4 interface look exactly the same as in BigQuery?
In the GA4 interface, some rule and priority layers are applied during reporting; BigQuery export provides raw fields. Therefore, differences can occur if user, session, and event levels are mixed in the same report.

If the UTM and gclid data appear simultaneously, which should be prioritized?
A single, universal priority rule is not always valid; the approach is expected to be defined consistently within the dataset. For signal control at the time of the event, using collected_traffic_source, and for session-based reportable results, using session_traffic_source_last_click yields more stable results.

Is it a bug if the same user_pseudo_id shows different sources/mediums in different sessions?
In most scenarios, this is not considered a bug; users may be exposed to different campaigns on different days. The critical point here is that acquiring a "first user" and session source information should not be treated as the same thing.

Where should one start for attribution analysis?
If last-click session conversion performance is to be measured, it is recommended to start with session_traffic_source_last_click. If the goal is to track touchpoints and campaign changes on an event basis, collected_traffic_source provides a more flexible basis.

What is the most reliable approach in e-commerce reporting?
It is considered more consistent to handle session performance separately with session_traffic_source_last_click, and touchpoint and change analyses separately with collected_traffic_source. When these two approaches are combined as a single “singular truth,” discrepancies can occur in channel comparisons.

Why is product-based source analysis difficult at the Items level?
It is observed that product lines do not always match campaign information exactly; multiple products may be present in a single transaction, and the event context may remain fragmented. Therefore, matching with transaction_id on the purchase side is considered more reliable; otherwise, rules such as session ID and time window are needed.

What should be checked first if (not set) or (direct) appears frequently in channel campaign information?
This situation is often considered to indicate problems in the measurement and labeling discipline; UTM standards, redirect flows, cross-domain setups, and consent effects can increase these results. It should be remembered that channel performance interpretations become risky without improving these areas.

More resources