Skip to content

Commit 36cbd23

Browse files
committed
clear commit history
0 parents  commit 36cbd23

16 files changed

+761
-0
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
qa/__pycache__/

CONTRIBUTORS.md

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Thank you for your interest in contributing to this repository. To help maintain
2+
the quality of the codebase and ensure a quick review of your pull request, you
3+
should:
4+
1. Write clear, clean code and format it in line with the style used in the
5+
repository.
6+
2. Leave comments, and use docstrings where appropriate.
7+
3. Add unit tests for any new functionality you introduce, if a set of test cases
8+
are already set up in the repository.
9+
4. Use git commit messages to leave an informative trace of what additions and
10+
changes were made.
11+
5. Write an informative high level description of the pull request, changes made,
12+
and the reason for these changes before submitting the pull request.
13+
14+
If you have not signed our Contributor License Agreement, you will be asked to
15+
sign one by our automated system when you submit your first pull request to
16+
a Cohere repository.

LICENSE

+21
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2022 Cohere Inc. and its affiliates.
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

+76
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
```
2+
################################################################################
3+
# ____ _ ____ _ _ #
4+
# / ___|___ | |__ ___ _ __ ___ / ___| __ _ _ __ __| | |__ _____ __ #
5+
# | | / _ \| '_ \ / _ \ '__/ _ \ \___ \ / _` | '_ \ / _` | '_ \ / _ \ \/ / #
6+
# | |__| (_) | | | | __/ | | __/ ___) | (_| | | | | (_| | |_) | (_) > < #
7+
# \____\___/|_| |_|\___|_| \___| |____/ \__,_|_| |_|\__,_|_.__/ \___/_/\_\ #
8+
# #
9+
# This project is part of Cohere Sandbox, Cohere's Experimental Open Source #
10+
# offering. This project provides a library, tooling, or demo making use of #
11+
# the Cohere Platform. You should expect (self-)documented, high quality code #
12+
# but be warned that this is EXPERIMENTAL. Therefore, also expect rough edges, #
13+
# non-backwards compatible changes, or potential changes in functionality as #
14+
# the library, tool, or demo evolves. Please consider referencing a specific #
15+
# git commit or version if depending upon the project in any mission-critical #
16+
# code as part of your own projects. #
17+
# #
18+
# Please don't hesitate to raise issues or submit pull requests, and thanks #
19+
# for checking out this project! #
20+
# #
21+
################################################################################
22+
```
23+
24+
**Maintainer:** [nickfrosst](https://github.com/nickfrosst) \
25+
**Project maintained until at least (YYYY-MM-DD):** 2023-01-01
26+
27+
# Grounded Question Answering
28+
29+
This is a Cohere API / Serp API powered contextualized factual question answering bot!
30+
31+
It responds to question in discord or in the cli by understanding the context, google
32+
searching what it believes to be the appropriate question, finding relevant
33+
information on the google result pages and then answering the question based on
34+
what it found.
35+
36+
## Motivation
37+
38+
Language models are very good at creating sensible answers to complex questions. They are not however very good at creating truthful answers. This is because language models don't have a mechanism for determining truth. They are trained on data from the web, and so pick up statistical correlations between words that make them ok at answering simple and static questions (things like "how far away is the moon from the earth", which has a single and unchanging factual answer), but more nuanced questions or that have factual answers which change over time (things like "who is the prime minister of the UK") are difficult or impossible for language models to answer.
39+
40+
Google search, on the other hand, is very good at retrieving factual information about these time-sensitive questions. Google makes use of a consensus mechanism for determining truth. Google search results are heavily affected by human user behaviour; which links people click, which links they stay on, and which ones they revisit all affect the ordering of the results. In this way, google determines which links are truthful through user consensus. Google however is quite poor at responding to contextual questions, and at responding to complex questions in natural language.
41+
42+
This project attempts to join the best of both of these methods; It uses cohere language models to contextualize the given questions and create a natural language answer, but it uses google search as a source of truth.
43+
44+
## Installation and Demo Use
45+
46+
To use this library, you will need:
47+
* A Serp API key, which you can obtain by registering at https://serpapi.com/users/welcome.
48+
* A Cohere API key: sign up for a free key at https://dashboard.cohere.ai/welcome/register.
49+
* (Optional) A Discord key, which is the Discord bot token obtained when creating and configuring a Discord bot. See [docs](https://discord.com/developers/docs/topics/oauth2) for more info.
50+
51+
1. Clone the repository.
52+
2. Install all the dependencies
53+
```sh
54+
pip install -r requirements.txt
55+
```
56+
3. Try the demo by running the cli tool
57+
```sh
58+
python3 cli_demo.py --cohere_api_key <API_KEY> --serp_api_key <API_KEY>
59+
```
60+
4. (Optional) Run the discord bot demo:
61+
You can create a discord both with this functionality by creating a bot account with message read and write permissions at https://discord.com/developers then running the following command
62+
```sh
63+
python3 cli_demo.py --cohere_api_key <API_KEY> --serp_api_key <API_KEY> --discord_key <DISCORD_KEY>
64+
```
65+
66+
# Get support
67+
If you have any questions or comments, please file an issue or reach out to us on [Discord](https://discord.gg/co-mmunity).
68+
69+
# Contributors
70+
If you would like to contribute to this project, please read `CONTRIBUTORS.md`
71+
in this repository, and sign the Contributor License Agreement before submitting
72+
any pull requests. A link to sign the Cohere CLA will be generated the first time
73+
you make a pull request to a Cohere repository.
74+
75+
# License
76+
Grounded Question Answering has an MIT license, as found in the LICENSE file.

cli_demo.py

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Copyright (c) 2022 Cohere Inc. and its affiliates.
2+
#
3+
# Licensed under the MIT License (the "License");
4+
# you may not use this file except in compliance with the License.
5+
#
6+
# You may obtain a copy of the License in the LICENSE file at the top
7+
# level of this repository.
8+
9+
# this is a cli demo of you the bot. You can run it and ask questions directly in your terminal
10+
11+
import argparse
12+
13+
from qa.bot import GroundedQaBot
14+
15+
parser = argparse.ArgumentParser(description="A grounded QA bot with cohere and google search")
16+
parser.add_argument("--cohere_api_key", type=str, help="api key for cohere", required=True)
17+
parser.add_argument("--serp_api_key", type=str, help="api key for serpAPI", required=True)
18+
parser.add_argument("--verbosity", type=int, default=0, help="verbosity level")
19+
args = parser.parse_args()
20+
21+
bot = GroundedQaBot(args.cohere_api_key, args.serp_api_key)
22+
23+
if __name__ == "__main__":
24+
while True:
25+
question = input("question: ")
26+
reply = bot.answer(question, verbosity=args.verbosity, n_paragraphs=2)
27+
print("answer: " + reply)

discord_bot.py

+73
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# Copyright (c) 2022 Cohere Inc. and its affiliates.
2+
#
3+
# Licensed under the MIT License (the "License");
4+
# you may not use this file except in compliance with the License.
5+
#
6+
# You may obtain a copy of the License in the LICENSE file at the top
7+
# level of this repository.
8+
9+
# this is a demo discord bot. You can make a discord bot token by visiting https://discord.com/developers
10+
11+
import argparse
12+
13+
import discord
14+
from discord import Embed
15+
from discord.ext import commands
16+
17+
from qa.bot import GroundedQaBot
18+
19+
parser = argparse.ArgumentParser(description="A grounded QA bot with cohere and google search")
20+
parser.add_argument("--cohere_api_key", type=str, help="api key for cohere", required=True)
21+
parser.add_argument("--serp_api_key", type=str, help="api key for serpAPI", required=True)
22+
parser.add_argument("--discord_key", type=str, help="api key for discord boat", required=True)
23+
parser.add_argument("--verbosity", type=int, default=0, help="verbosity level")
24+
args = parser.parse_args()
25+
26+
bot = GroundedQaBot(args.cohere_api_key, args.serp_api_key)
27+
28+
29+
class MyClient(discord.Client):
30+
31+
async def on_ready(self):
32+
"""Initializes bot"""
33+
print(f"We have logged in as {self.user}")
34+
35+
for guild in self.guilds:
36+
print(f"{self.user} is connected to the following guild:\n"
37+
f"{guild.name}(id: {guild.id})")
38+
39+
async def answer(self, message):
40+
"""Answers a question based on the context of the conversation and information from the web"""
41+
history = []
42+
async for historic_msg in message.channel.history(limit=6, before=message):
43+
if historic_msg.content:
44+
name = "user"
45+
if historic_msg.author.name == self.user.name:
46+
name = "bot"
47+
history = [f"{name}: {historic_msg.clean_content}"] + history
48+
49+
print(history)
50+
bot.set_chat_history(history)
51+
52+
async with message.channel.typing():
53+
reply = bot.answer(message.clean_content, verbosity=2, n_paragraphs=3)
54+
response_msg = await message.channel.send(reply, reference=message)
55+
await response_msg.edit(suppress=True)
56+
return
57+
58+
async def on_message(self, message):
59+
"""Handles query messages triggered by direct messages to the bot"""
60+
if isinstance(message.channel, discord.channel.DMChannel) and message.author != self.user:
61+
await self.answer(message)
62+
63+
async def on_reaction_add(self, reaction, user):
64+
"""Handles query messages triggered by emoji from user."""
65+
if user != self.user:
66+
if str(reaction.emoji) == "❓" and reaction.count == 1:
67+
await self.answer(reaction.message)
68+
69+
70+
if __name__ == "__main__":
71+
intents = discord.Intents.all()
72+
client = MyClient(intents=intents)
73+
client.run(args.discord_key)

qa/__init__.py

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Copyright (c) 2022 Cohere Inc. and its affiliates.
2+
#
3+
# Licensed under the MIT License (the "License");
4+
# you may not use this file except in compliance with the License.
5+
#
6+
# You may obtain a copy of the License in the LICENSE file at the top
7+
# level of this repository.

qa/answer.py

+85
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Copyright (c) 2022 Cohere Inc. and its affiliates.
2+
#
3+
# Licensed under the MIT License (the "License");
4+
# you may not use this file except in compliance with the License.
5+
#
6+
# You may obtain a copy of the License in the LICENSE file at the top
7+
# level of this repository.
8+
9+
import numpy as np
10+
11+
from qa.model import get_sample_answer
12+
from qa.search import embedding_search, get_results_paragraphs_multi_process
13+
from qa.util import pretty_print
14+
15+
16+
def trim_stop_sequences(s, stop_sequences):
17+
"""Remove stop sequences found at the end of returned generated text."""
18+
19+
for stop_sequence in stop_sequences:
20+
if s.endswith(stop_sequence):
21+
return s[:-len(stop_sequence)]
22+
return s
23+
24+
25+
def answer(question, context, co, model, chat_history=""):
26+
"""Answer a question given some context."""
27+
28+
prompt = ("This is an example of question answering based on a text passage:\n "
29+
f"Context:-{context}\nQuestion:\n-{question}\nAnswer:\n-")
30+
if chat_history:
31+
prompt = ("This is an example of factual question answering chat bot. It "
32+
"takes the text context and answers related questions:\n "
33+
f"Context:-{context}\nChat Log\n{chat_history}\nbot:")
34+
35+
stop_sequences = ["\n"]
36+
37+
num_generations = 4
38+
prompt = "".join(co.tokenize(text=prompt).token_strings[-1900:])
39+
prediction = co.generate(model=model,
40+
prompt=prompt,
41+
max_tokens=100,
42+
temperature=0.3,
43+
stop_sequences=stop_sequences,
44+
num_generations=num_generations,
45+
return_likelihoods="GENERATION")
46+
generations = [[
47+
trim_stop_sequences(prediction.generations[i].text.strip(), stop_sequences),
48+
prediction.generations[i].likelihood
49+
] for i in range(num_generations)]
50+
generations = list(filter(lambda x: not x[0].isspace(), generations))
51+
response = generations[np.argmax([g[1] for g in generations])][0]
52+
return response.strip()
53+
54+
55+
def answer_with_search(question,
56+
co,
57+
serp_api_token,
58+
chat_history="",
59+
model="xlarge",
60+
embedding_model="small",
61+
url=None,
62+
n_paragraphs=1,
63+
verbosity=0):
64+
"""Generates completion based on search results."""
65+
66+
paragraphs, paragraph_sources = get_results_paragraphs_multi_process(question, serp_api_token, url=url)
67+
if not paragraphs:
68+
return ("", "", "")
69+
sample_answer = get_sample_answer(question, co)
70+
71+
results = embedding_search(paragraphs, paragraph_sources, sample_answer, co, model=embedding_model)
72+
73+
if verbosity > 1:
74+
pprint_results = "\n".join([r[0] for r in results])
75+
pretty_print("OKGREEN", f"all search result context: {pprint_results}")
76+
77+
results = results[-n_paragraphs:]
78+
context = "\n".join([r[0] for r in results])
79+
80+
if verbosity:
81+
pretty_print("OKCYAN", "relevant result context: " + context)
82+
83+
response = answer(question, context, co, chat_history=chat_history, model=model)
84+
85+
return (response, [r[1] for r in results], [r[0] for r in results])

qa/bot.py

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Copyright (c) 2022 Cohere Inc. and its affiliates.
2+
#
3+
# Licensed under the MIT License (the "License");
4+
# you may not use this file except in compliance with the License.
5+
#
6+
# You may obtain a copy of the License in the LICENSE file at the top
7+
# level of this repository.
8+
9+
from sys import settrace
10+
11+
import cohere
12+
13+
from qa.answer import answer_with_search
14+
from qa.model import get_contextual_search_query
15+
from qa.util import pretty_print
16+
17+
18+
class GroundedQaBot():
19+
"""A class yielding Grounded question-answering conversational agents."""
20+
21+
def __init__(self, cohere_api_key, serp_api_key):
22+
self._cohere_api_key = cohere_api_key
23+
self._serp_api_key = serp_api_key
24+
self._chat_history = []
25+
self._co = cohere.Client(self._cohere_api_key)
26+
27+
@property
28+
def chat_history(self):
29+
return self._chat_history
30+
31+
def set_chat_history(self, chat_history):
32+
self._chat_history = chat_history
33+
34+
def answer(self, question, verbosity=0, n_paragraphs=1):
35+
"""Answer a question, based on recent conversational history."""
36+
37+
self.chat_history.append("user: " + question)
38+
39+
history = "\n".join(self.chat_history[-6:])
40+
question = get_contextual_search_query(history, self._co, verbosity=verbosity)
41+
42+
answer_text, source_urls, source_texts = answer_with_search(question,
43+
self._co,
44+
self._serp_api_key,
45+
verbosity=verbosity,
46+
n_paragraphs=n_paragraphs)
47+
48+
self._chat_history.append("bot: " + answer_text)
49+
50+
if not source_texts or "".join(source_texts).isspace():
51+
reply = ("Sorry, I could not find any relevant information for that " "question.")
52+
elif answer_text.strip() == question.strip():
53+
reply = ("I had trouble answering the question, but maybe this link on " "the right will help.")
54+
else:
55+
sources_str = "\n".join(list(set(source_urls)))
56+
reply = f"{answer_text}\nSource:\n{sources_str}"
57+
58+
return reply

0 commit comments

Comments
 (0)