The Themes tab in News Search provides insight into the main ideas within your query results. Each theme consists of two or more related events and can be explored via a timeline analysis. Themes can be thought of as "storylines" or "subtopics" for the terms of your query.
Each node within a theme timeline represents an individual event. Primer labels these nodes with additional information if we've discovered any. For instance, events that have been updated within the last 24 hours will have a ‘New’ icon, while events with a significance score of 4 or greater will have a ‘Fire’ icon. We also add event types (e.g. Mergers & Acquisitions) in red.
Theme Algorithm Configuration
Primer optimizes the settings for the themes page when you first load it, however, you have the ability to configure these to find the perfect set of themes/storylines. These settings are Minimum Event Size and Similarity Score, which are described below:
Minimum Event Size
The number on the minimum event size slider is the number of unique documents required per each event, for that event to be included in a theme. For example, if you would like to get broader themes in your query, you can reduce the minimum event size to 2, and then events that only contained 2 documents would now be included in your themes. Conversely, increasing the minimum event size will give you smaller, but potentially more significant themes.
Similarity Score
This score is on a scale of 0 to 1 and is how similar nodes must be to one another. A lower score will result in more total events and themes, but may have less event similarity and relevance than a theme analysis with a higher similarity score.
Viewing Event Details
You can double-click into any individual event within a theme for more detail:
How To Download Themes
To understand how to customize and download Themes, visit the help article here: How to Create a Custom Briefing
Themes FAQs
How are Themes created?
Themes are created by utilizing a Louvain clustering algorithm. The Louvain clustering algorithm is "non-deterministic" which means that you can run the same query search over the same time periods and potentially get different results each time. Because of this, themes should not be viewed as a "complete source of truth" but rather a snapshot of important topics within your main query.
In addition to the clustering algorithm, mentions of entities and locations are used to further refine what events are presented as part of a theme. Primer also considers temporal entailment and the pairwise compatibility of events to determine if they should be part of the same theme, create a new theme branch, or be included in a different theme.
How are the names for Themes generated?
Theme names are AI-generated based on a set of keywords appearing with frequency from within the events. For each event in each storyline, Primer generates a list of terms, and then backtracks along the storyline to generate the best phrase for that the Theme name.
What causes a Theme to branch off?
Different branches can be created within a theme when secondary entities and topics are identified within the events that comprise the original Theme.
Why do I see duplicate or highly similar events within the same Theme?
Primer implements many processes, including using Louvain clustering and TF-IDF similarity to generate events. Because of these various methods, it's not uncommon for there to be multiple similar events within a theme. Typically a "near-similar" event will be created if the articles that comprised the different events were published over a different time frame, or if there were different entities mentioned in some articles but not others, causing different events to be created for a similar topic.
How does Primer ensure that the surfaced themes are sufficient to provide the user with a good enough understanding of the topic?
A theme occurs when there are more than one events that talk about roughly similar things. The implicit assumption here is that if there is only one event that is not similar to others, it is not a trend/theme.Themes that will not be presented contain events that don't get attached to any existing Storyline. There are 2 ways to overcome this:
decreasing the thresholds AND event size
combing through the events table page to compare all events we’ve identified for a specific query against the events present in a theme.
What are the limitations or inherent model biases based on the current analytical model used for theme generation?
There are some limitations in the TF-IDF embeddings that we use. The limitations affect the following areas:
Clustering: Generating each theme or storyline is an unsupervised learning problem, so that the settings chosen are made to work on the most general and broad use cases, in this case generic English news reports. Since we are aware of the limitation, we apply careful tuning and adjustments to the model to adapt to each new data type.
Training Data Distribution: We currently use a transformer model Universal Sentence Encoder to generate embeddings that power the storylines, this model is trained on the usual open-source training datasets like Wikipedia and news, and so semantic capture is limited to common standard English capture.
How do our algorithms decide how many themes to generate? Can an analyst derive some conclusion based on the number of themes an event has?
The number of events per theme that we generate can first be based on 2 factors:
The “similarity score” of events within a particular theme, which is the percentage of language/semantic and entity overlap between each event.
The “minimum event size”, which is the minimum number of articles needed in an event for the event to show up in a theme.
The total number of themes presented for a query is dependent on both the query terms and the date range for the query. For example, a (“united states” AND election) query over 18 months has 8 unique themes, but the same query over the last 7 days returns 12 themes. One conclusion that can be drawn from the number of themes for each query, is that there were multiple storylines identified that progressed over time. Essentially you can think of the number of themes as the number of key topics for each query that have progressed over time, but that the events that comprise a theme are not the "complete picture" of the theme.
If you have any questions about Themes, please reach out to the team at [email protected]



