Skip to main content

Ask the News

How to leverage Primer's Semantic Search engine to search on news data

What is Ask the News?

Ask the News is a search engine designed to retrieve highly relevant open-source news reports using natural language queries. It also provides a machine generated question & answer (Q&A) response with citations to the most relevant snippets. Ask the News provides opportunities to both identify answers to narrow questions as well as initiate broader data discoverability.

It can be used in a wide range of topics such as Political Visits, International Relations, Global Business, Economic Trends, Current Affairs, and News Reporting

How To Get Started

To interact with Ask the News, navigate to the home page:

You can select from our list of suggested questions, or you can craft a question of your choosing. Please see the FAQ section below for tips and tricks for how to create the best question for Ask the News.

Time Periods

To help refine your results, you have the opportunity to select a time period from the "last 2 hours" to the "last 31 days"

Please note: If you select a short time period "last 2 hours", results may not be returned, depending on the question asked. .

Sources

Ask the News ingests over 50,000 English language news sources

Example Questions

Example questions include:

  • What is the purpose of Turkish President Recep Tayyip Erdoğan's visit to Russia?

  • Where has Xi Jinping visited?

  • What is the purpose of the meeting between Abdel Fattah al-Burhan and Abdel Fattah

  • al-Sisi?

  • What technical issue occurred with the United Kingdom's air traffic control system,

  • causing flight delays?

  • What social media platform is ranked as the most downloaded app in the world?

  • How is the global community responding to the humanitarian crisis in Gaza?

Example phrases include:

  • Zelensky's visit to United States

  • United States-Nepal Peace Corps agreement

  • Brics leaders call for comprehensive United Nations reform

  • Antarctic sea ice decline 2023

  • Australian union strikes

  • China's loan prime rate decline

  • South Korea - United States annual military drills

Re-generating Answers

You can also regenerate the answer to their question to have more options for what format they would like to use in their workflows. To regenerate, select the refresh icon in the top right of the answer block. Once you've generated a new answer(s) you can cycle through the various responses using the arrows in the bottom right of the answer block.

Viewing Search History

You can also view your recent searches (up to 100), viewing the cached answer and document list within browser, and rerunning their old query to get the latest answer. You can delete previous queries from your recent search list, and view the time range and date asked.

Please note: The previous questions asked, and results cached, are stored in browser. This means that if you use a different browser you will not see the past searches, and if you clear your session data in browser it will also clear the search and results history.

Tips and Tricks

  • Word choice and ordering can produce different results and responses, so it is recommended to experiment with different versions of the same query. Also, consider spelling out all acronyms.

  • As time passes and new reports are ingested by the system, the result list and answer for the same question may change.

  • Multiple searches on the same question may produce slightly different answers. More specific questions yield more relevant answers.

  • If a generative answer is not useful or is not produced for a question, the result list still may have highly relevant reports since the answer is only produced on a subset of the results list.

  • Sort the result list by date instead of relevance if you want to see the most current results. Currently, this will not change the generative response.

  • While this feature is designed to use a natural language question as input, it will also retrieve results for input in the form of phrases or sentences.

  • To optimize generative responses, focus on specifics and add context to queries. Instead of using the broad query “Israel and Hamas war”, add context for what you are searching for, eg “Where are the latest military engagements in the Israel and Hamas war”

  • Generative outputs are known to be inconsistent, and the same query can lead to different or zero outputs.

FAQs

How Does Ask the News Compare to Boolean Searches?

Unlike boolean searches, which rely on a set of predetermined rules, Primer's semantic searches can interpret and analyze natural language queries, including synonyms, abbreviations, and context, to provide more accurate and comprehensive results.

How are documents retrieved?

At a high-level, a semantic search model is used to transform natural language queries into a semantic embedding, that is then compared to the index of stored semantic embeddings. Step-by-step:

  1. News reports are first ingested into the application

  2. News reports are then separated into sentences

  3. Each sentence is ran through the semantic model and then stored into the index based on their semantic embeddings

    1. Note: A semantic embedding is a numeric representation of the text’s meaning

  4. User queries are also given semantic embeddings

  5. Ask the News then identifies sentences with similar embedding scores to the user’s query

    1. These sentences are highlighted within the application, with added context before and after the sentence from the news report

  6. Finally, the user is presented with a list of the most relevant news reports

How Does the Answer Box Get Generated?

Snippets (i.e., sentences) from the most highly relevant reports are sent to a Q&A model to generate a source-based answer using only those reports. The current implementation leverages ChatGPT 4o. Generally, between five to seven report snippets are used to generate the answer. Primer intentionally trained the Q&A model to be conservative, and it may tell you a response is not producible given the results returned.

As with all generative text, please exercise caution and ensure that the information is factually correct using the references provided. The response should not be copied directly into formal reporting, instead, the answer should be used to provide a quick understanding of the most relevant hits

What limitations should I be aware of?

  • The generative answer is produced only on the most highly relevant reports, not the entire result list. As a result, the response may not include the most current reports in your result list.

  • It may not provide a comprehensive result list or generative answer since the data available is limited to a 30-day window of news sources.

  • It does not include social media

  • While the models used for search and generating answers are powered by

    large-language models, you cannot directly prompt Semantic Search to perform arbitrary tasks (e.g., you cannot ask it to generate a table summarizing all of the locations found in the results list).

  • Limited source data is ingested so that the application can only use that data to retrieve results. E.g., if you ask for a pro-countryA slogan, and it gives a good answer. Then, if you ask for an anti-countryA slogan, it could give you a very similar answer.

    • Without knowing the exact details of the query, our assumption is that certain sources could be biased and thus only ingests “pro-country” slogans. When an “anti-country” slogan is used as a query the model is still able to find slogan type of results, but may not necessarily be “anti-country”.

In what ways is this potentially better than an NGT-style of search of news?

  • Results should have better precision(i.e.,fewer results that are not relevant)and recall (i.e., miss fewer relevant documents) since it is not dependent on keyword search using Boolean operators.

  • The relevance used to sort results should be more accurate since it is based on the semantic meaning of your natural language query instead of being based on the frequency of keywords from your query found in the resulting documents.

  • Triaging a result list should be faster since it highlights the specific sections of a report that are relevant to your query instead of presenting the entire document without context.

How should I format my input?

Example questions and phrases have been provided in the “What is it and what can it be used for?” section above. Generally speaking, search results become more relevant as more context is incorporated into the query.

What types of words should I use?

Semantic searches benefit from words that provide context or meaning, e.g. nouns, verbs, adjectives, adverbs, and question words (who, what, when, where, and how).

Do I need proper grammar?

Users do not have to query in complete sentences; however, words need to be spelled correctly to retrieve relevant information.

How long should the input be?

You can type in as little as a few words (phrase) and as much as one to two sentences. Beyond two sentences, semantic search does not perform well because it is unable to effectively capture all of the information in a single vector.

What’s the right degree to break down a question? For example, is it better to input a list of questions in the field or a single question followed up by another and another (general to narrow)?

Start by asking general questions and move to more refined questions

What are the benefits of a specific versus a broad question?

There are no particular benefits to a specific versus a broad question. It will depend on what the user is trying to discover. The ability of the application to discover what is needed is dependent on the data available to it. So, if there is no data that is specific enough to answer the question, it will find other semantically related data and surface that.

Why do I receive inaccurate associations to my queries?

The application will surface information based on the words in the query and their relationships; however, the application may miss the word relationships within more complex queries. For example, “What tanks did Russia use in the war against Ukraine?” will also provide results highlighting tanks used by Ukraine against Russia.

What is the best strategy for asking about more than one event or more than one location? For example, should I submit two separate questions, question 1: Is there a protest in CountryA? And then question 2: Is there a protest in CountryB? Or should I simply ask: Is there a protest in CountryA or CountryB?

Results are better when questions are not bundled, so for the example given, we recommend that you ask two separate questions.

For questions on how to leverage Ask the News or to learn more about Primer's work leveraging LLMs, please reach out to [email protected]

Did this answer your question?