Skip to content

Commit 78f9ad3

Browse files
author
Anton Sauchyk
committed
docs: add readme/tutorial
1 parent e106850 commit 78f9ad3

File tree

2 files changed

+341
-7
lines changed

2 files changed

+341
-7
lines changed

README.md

Lines changed: 337 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,337 @@
1+
## Introduction to Geyser plugin
2+
3+
There are multiple ways for listening to updates in Solana blockchain. The default method is to use WebSocket methods. However, they have certain limitations.
4+
5+
`blockSubscribe` is considered unstable and may not be supported by all RPC providers. Moreover, this method doesn't support `processed` commitment level. It waits for complete blocks to be confirmed or finalized before sending data.
6+
7+
`logsSubscribe` is faster than `blockSubscribe` since it streams logs in real-time as transactions are processed. Its limitation is that it truncates log messages that exceed size limits (typically around 10KB). If you need complete transaction data, you still need to use `blockSubscribe` or send additional RPC calls.
8+
9+
Luckily, a plugin mechanism has been introduced to Solana nodes. They have been enhanced to support a Geyser plugin mechanism which is a special interface that allows RPC providers to stream information about accounts, slots, blocks, and transactions to external data stores.
10+
11+
There are Geyser plugins for data storage (e.g., PostgreSQL, Google Bigtable), real-time streaming (gRPC, WebSocket plugins) and [other purposes](https://github.com/rpcpool/solana-geyser-park).
12+
13+
## Yellowstone gRPC plugin implementation
14+
15+
In this tutorial, we focus on the gRPC plugin which uses the [gRPC](https://grpc.io/docs/what-is-grpc/introduction/) protocol as a channel for transmitting Solana updates. Its most popular implementation is done by the Triton One team within the Project Yellowstone, hence the name is Yellowstone gRPC or Geyser gRPC.
16+
17+
By default, gRPC uses [protocol buffers](https://developers.google.com/protocol-buffers) as the Interface Definition Language (IDL) for describing both the service interface and the structure of the messages. For a user of Yellowstone gRPC plugin, it means that there are special proto files based on which client code will be generated and imported into other modules.
18+
19+
Those [proto files](https://github.com/rpcpool/yellowstone-grpc/tree/master/yellowstone-grpc-proto/proto) define the service interface (what RPC methods are available), message structures (request/response formats), data types, and enums. These definitions support code generation for all popular programming languages including Python, Go, Java, C++, JavaScript/Node.js, Rust, C#, and many others. A few examples below show the service interface and message structures defined in the proto files.
20+
21+
```protobuf
22+
// Service interface definition
23+
service Geyser {
24+
rpc Subscribe(stream SubscribeRequest) returns (stream SubscribeUpdate);
25+
rpc Ping(PingRequest) returns (PongResponse);
26+
rpc GetLatestBlockhash(GetLatestBlockhashRequest) returns (GetLatestBlockhashResponse);
27+
}
28+
29+
// Message structures definition
30+
message SubscribeRequest {
31+
map<string, SubscribeRequestFilterAccounts> accounts = 1;
32+
map<string, SubscribeRequestFilterSlots> slots = 2;
33+
CommitmentLevel commitment = 4;
34+
bool ping = 6;
35+
}
36+
37+
// Data types and enums definition
38+
enum CommitmentLevel {
39+
PROCESSED = 0;
40+
CONFIRMED = 1;
41+
FINALIZED = 2;
42+
}
43+
44+
message SubscribeRequestFilterAccounts {
45+
repeated string account = 1;
46+
repeated string owner = 2;
47+
}
48+
```
49+
50+
Client code is generated based on proto files using the protoc compiler (a part of [grpc tools](https://grpc.io/)). The compiler generates service stubs (client interfaces), message classes for data structures, and type definitions specific to the target programming language.
51+
52+
```bash
53+
# Generate Python client code
54+
python -m grpc_tools.protoc \
55+
--python_out=./generated \
56+
--grpc_python_out=./generated \
57+
--proto_path=./proto \
58+
geyser.proto solana-storage.proto
59+
```
60+
61+
The generated client code handles all the low-level complexity: serialization and deserialization between binary protobuf format and language-specific objects, network communication over HTTP/2, type safety enforcement, and message validation.
62+
63+
## Practice: listening to new tokens minted on PumpFun
64+
65+
Let's build a real-time token monitor that detects new pump.fun tokens the moment they're created. We'll construct this step-by-step.
66+
67+
For package management, we will super fast and modern [uv](https://docs.astral.sh/uv/getting-started/installation/) tool.
68+
69+
### Step 1: Project setup and dependencies
70+
71+
First, we need to install the required packages:
72+
73+
```bash
74+
uv init
75+
uv add grpcio grpcio-tools base58 solders python-dotenv
76+
```
77+
78+
**Why these specific packages?**
79+
80+
- `grpcio`: core gRPC library for Python.
81+
- `grpcio-tools`: contains protoc compiler for generating Python code from .proto files.
82+
- `base58`: Solana addresses use Base58 encoding, not standard Base64.
83+
- `solders`: Rust-based Solana library for Python, much faster than `solana-py`.
84+
- `python-dotenv`: manages environment variables safely.
85+
86+
### Step 2: Generate gRPC client code
87+
88+
Download the official Yellowstone proto files and generate Python client code:
89+
90+
```bash
91+
# Create folders for files to be downloaded and generated code
92+
mkdir proto
93+
mkdir generated
94+
95+
# Download proto files
96+
curl -o proto/geyser.proto https://raw.githubusercontent.com/rpcpool/yellowstone-grpc/master/yellowstone-grpc-proto/proto/geyser.proto
97+
98+
curl -o proto/solana-storage.proto https://raw.githubusercontent.com/rpcpool/yellowstone-grpc/master/yellowstone-grpc-proto/proto/solana-storage.proto
99+
100+
# Generate Python client code with uv
101+
uv run --with grpcio-tools -- python -m grpc_tools.protoc \
102+
--python_out=./generated \
103+
--grpc_python_out=./generated \
104+
--proto_path=./proto \
105+
geyser.proto solana-storage.proto
106+
```
107+
108+
**What happens here?** The protocol buffer compiler (protoc) compiler reads the `.proto` files (which define the gRPC service interface) and generates Python classes that handle all the networking, serialization, and type safety automatically.
109+
110+
The protoc generates the code with absolute imports. Since we placed the generated files to the `generated` folder, we need to fix imports to avoid errors during the further steps:
111+
112+
1. In `geyser_pb2.py`:
113+
- **Change from:** `import solana_storage_pb2 as solana__storage__pb2`
114+
- **Change to:** `from . import solana_storage_pb2 as solana__storage__pb2`
115+
116+
- **Change from**: `from solana_storage_pb2 import *`
117+
- **Change to**: `from .solana_storage_pb2 import *`
118+
119+
2. In `geyser_pb2_grpc.py`:
120+
- **Change from:** `import geyser_pb2 as geyser__pb2`
121+
- **Change to:** `from . import geyser_pb2 as geyser__pb2`
122+
123+
### Step 3: Environment configuration
124+
125+
Create a `.env` file with your Geyser credentials. If you use Chainstack, enable the Yellowstone gRPC Geyser Plugin and the credentials will appear on the overview page of your Solana node.
126+
127+
```env
128+
GEYSER_ENDPOINT=your_provider_endpoint
129+
GEYSER_API_TOKEN=your_api_token
130+
```
131+
132+
### Step 4: Core constants and setup
133+
134+
```python
135+
import asyncio
136+
import os
137+
import struct
138+
import base58
139+
import grpc
140+
from dotenv import load_dotenv
141+
from generated import geyser_pb2, geyser_pb2_grpc
142+
from solders.pubkey import Pubkey
143+
144+
load_dotenv()
145+
146+
GEYSER_ENDPOINT = os.getenv("GEYSER_ENDPOINT")
147+
GEYSER_API_TOKEN = os.getenv("GEYSER_API_TOKEN")
148+
AUTH_TYPE = "x-token" # or "basic"
149+
150+
PUMP_PROGRAM_ID = Pubkey.from_string("6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P")
151+
PUMP_CREATE_PREFIX = struct.pack("<Q", 8576854823835016728)
152+
```
153+
154+
**Why these specific values?**
155+
156+
- `PUMP_PROGRAM_ID`: this is pump.fun's program address on Solana mainnet.
157+
- `PUMP_CREATE_PREFIX`: this is the 8-byte discriminator for pump.fun's "create" instruction. Every instruction type has a unique discriminator calculated from its name hash.
158+
159+
**How discriminators work:** Solana programs use the first 8 bytes of instruction data to identify which function to call. Please check a more detailed example [here](https://github.com/chainstacklabs/pump-fun-bot/tree/main/learning-examples).
160+
161+
### Step 5: gRPC connection management
162+
163+
```python
164+
async def create_geyser_connection():
165+
"""Establish a secure connection to the Geyser endpoint."""
166+
if AUTH_TYPE == "x-token":
167+
auth = grpc.metadata_call_credentials(
168+
lambda _, callback: callback((("x-token", GEYSER_API_TOKEN),), None)
169+
)
170+
else: # Basic authentication
171+
auth = grpc.metadata_call_credentials(
172+
lambda _, callback: callback(
173+
(("authorization", f"Basic {GEYSER_API_TOKEN}"),), None
174+
)
175+
)
176+
177+
creds = grpc.composite_channel_credentials(grpc.ssl_channel_credentials(), auth)
178+
# We can add keepalive options to maintain the connection
179+
# This helps prevent the connection from being closed during periods of inactivity
180+
keepalive_options = [
181+
("grpc.keepalive_time_ms", 30000),
182+
("grpc.keepalive_timeout_ms", 10000),
183+
("grpc.keepalive_permit_without_calls", True),
184+
("grpc.http2.min_time_between_pings_ms", 10000),
185+
]
186+
187+
channel = grpc.aio.secure_channel(GEYSER_ENDPOINT, creds, options=keepalive_options)
188+
return geyser_pb2_grpc.GeyserStub(channel)
189+
```
190+
191+
### Step 6: Subscription configuration
192+
193+
```python
194+
def create_subscription_request():
195+
"""Create a subscription request for Pump.fun transactions."""
196+
request = geyser_pb2.SubscribeRequest()
197+
request.transactions["pump_filter"].account_include.append(str(PUMP_PROGRAM_ID))
198+
request.transactions["pump_filter"].failed = False
199+
request.commitment = geyser_pb2.CommitmentLevel.PROCESSED
200+
return request
201+
```
202+
203+
**Understanding the filter configuration:**
204+
- `account_include`: only stream transactions that interact with pump.fun program.
205+
- `failed = False`: exclude failed transactions to reduce noise.
206+
- `PROCESSED` commitment: get updates as soon as transactions are processed (fastest possible).
207+
208+
### Step 7: Instruction data decoding
209+
210+
This is where the magic happens - extracting meaningful data from raw blockchain bytes:
211+
212+
```python
213+
def decode_create_instruction(ix_data: bytes, keys, accounts) -> dict:
214+
"""Decode a create instruction from transaction data."""
215+
offset = 8 # Skip the 8-byte discriminator
216+
217+
def get_account_key(index):
218+
"""Extract account public key by index."""
219+
if index >= len(accounts):
220+
return "N/A"
221+
account_index = accounts[index]
222+
return base58.b58encode(keys[account_index]).decode()
223+
224+
def read_string():
225+
"""Read length-prefixed string from instruction data."""
226+
nonlocal offset
227+
length = struct.unpack_from("<I", ix_data, offset)[0] # Read 4-byte length
228+
offset += 4
229+
value = ix_data[offset:offset + length].decode() # Read string data
230+
offset += length
231+
return value
232+
233+
def read_pubkey():
234+
"""Read 32-byte public key from instruction data."""
235+
nonlocal offset
236+
value = base58.b58encode(ix_data[offset:offset + 32]).decode()
237+
offset += 32
238+
return value
239+
240+
# Parse instruction data according to pump.fun's create schema
241+
name = read_string()
242+
symbol = read_string()
243+
uri = read_string()
244+
creator = read_pubkey()
245+
246+
return {
247+
"name": name,
248+
"symbol": symbol,
249+
"uri": uri,
250+
"creator": creator,
251+
"mint": get_account_key(0), # New token mint address
252+
"bonding_curve": get_account_key(2), # Price discovery mechanism
253+
"associated_bonding_curve": get_account_key(3), # Token account for curve
254+
"user": get_account_key(7), # Transaction signer
255+
}
256+
```
257+
258+
**Understanding the data structure:**
259+
- **Length-prefixed strings**: Solana stores strings as `[4-byte length][string data]`
260+
- **Account indices**: instructions reference accounts by index to save space
261+
- **Fixed positions**: pump.fun's create instruction always puts specific accounts at predictable indices
262+
263+
**Why this parsing approach?** Instruction data is just raw bytes. We need to know the exact data layout (schema) to extract meaningful information. This schema knowledge comes from analyzing [pump.fun's IDL files](https://github.com/pump-fun/pump-public-docs/tree/main/idl).
264+
265+
### Step 8: Main monitoring loop
266+
267+
```python
268+
async def monitor_pump():
269+
"""Monitor Solana blockchain for new Pump.fun token creations."""
270+
print(f"Starting Pump.fun token monitor using {AUTH_TYPE.upper()} authentication")
271+
stub = await create_geyser_connection()
272+
request = create_subscription_request()
273+
274+
async for update in stub.Subscribe(iter([request])):
275+
# Only process transaction updates
276+
if not update.HasField("transaction"):
277+
continue
278+
279+
tx = update.transaction.transaction.transaction
280+
msg = getattr(tx, "message", None)
281+
if msg is None:
282+
continue
283+
284+
# Check each instruction in the transaction
285+
for ix in msg.instructions:
286+
# Quick check: is this a pump.fun create instruction?
287+
if not ix.data.startswith(PUMP_CREATE_PREFIX):
288+
continue
289+
290+
# Decode and display token information
291+
info = decode_create_instruction(ix.data, msg.account_keys, ix.accounts)
292+
signature = base58.b58encode(bytes(update.transaction.transaction.signature)).decode()
293+
print_token_info(info, signature)
294+
```
295+
296+
**Why check the discriminator first?** This 8-byte comparison is extremely fast and eliminates 99.9% of irrelevant instructions before attempting parsing.
297+
298+
### Step 9: Running the monitor
299+
300+
```python
301+
if __name__ == "__main__":
302+
asyncio.run(monitor_pump())
303+
```
304+
305+
To run the code, execute the following command in your terminal:
306+
307+
```bash
308+
uv run main.py
309+
```
310+
311+
## Other learning examples
312+
313+
Here are short summaries of each learning example file, from basic to advanced, for each group of filters.
314+
315+
### Account filters
316+
317+
- **`accounts_basic_filter.py`**: this example demonstrates how to subscribe to updates for a single, specific account. It's the simplest way to monitor changes to a particular address (e.g., reserve changes in a bonding curve account or AMM pool).
318+
- **`accounts_advanced_filter.py`**: this script shows a more advanced use case, subscribing to accounts based on the program that owns them and a data discriminator. This is useful for tracking all accounts of a certain type created by a specific program (e.g., reserve changes in all active bonding curves or certain AMM pools).
319+
320+
### Transaction filters
321+
322+
- **`transactions_basic_filter.py`**: this example shows how to subscribe to all successful transactions that interact with a specific account. It's a good starting point for tracking activity related to a particular address.
323+
- **`transactions_advanced_filter.py`**: this script demonstrates more complex filtering for transactions containing all required programs (e.g, for detecting arbitrage transactions).
324+
325+
### Other subscriptions
326+
327+
- **`slots_subscription.py`**: this example shows how to subscribe to slot updates, giving you a real-time feed of when new slots are processed by the validator.
328+
- **`blocks_subscription.py`**: this script demonstrates how to subscribe to entire blocks that contain transactions interacting with a specific account.
329+
- **`blocks_meta_subscription.py`**: this example shows how to subscribe to just the metadata of blocks, which is a lightweight way to track block production.
330+
- **`entries_subscription.py`**: this script demonstrates how to subscribe to ledger entries, which provides a low-level stream of the changes being written to the Solana ledger.
331+
- **`transaction_statuses_subscription.py`**: this example shows how to subscribe to the status of transactions, allowing you to track them from processing to finalization.
332+
333+
### Advanced examples
334+
335+
- **`meteora_dlmm_monitor.py`**: this is a more complex, real-world example that monitors a Meteora Dynamic Liquidity Market Maker (DLMM). It demonstrates how to decode complex account data and track price changes in a DLMM pool.
336+
337+

learning-examples/transactions_advanced_filter.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,8 @@
1515
GEYSER_ENDPOINT = os.getenv("GEYSER_ENDPOINT")
1616
GEYSER_API_TOKEN = os.getenv("GEYSER_API_TOKEN")
1717

18-
# This is the Raydium (WSOL-RAY) market address.
19-
# We will subscribe to updates for this specific account.
20-
RAYDIUM_MARKET_ADDRESS = "2AXXcN6oN9bBT5owwmTH53C7QHUXvhLeu718Kqt8rvY2"
18+
WHIRLPOOL_PROGRAM_ID = "whirLbMiicVdio4qvUfM5KAg6Ct8VwpYzGff3uctyCc"
19+
RAYDIUM_CONCENTRATED_LIQ_PROGRAM_ID = "CAMMCzo5YL8w4VFF8KVHrK22GGUsp5VTaW7grrKgrWqK"
2120

2221

2322
async def main():
@@ -45,11 +44,9 @@ async def main():
4544
# Otherwise fields works as logical AND and values in arrays as logical OR.
4645
vote=False, # Exclude vote transactions
4746
failed=False, # Exclude failed transactions
48-
account_include=[
49-
RAYDIUM_MARKET_ADDRESS
50-
], # Use this to ensure that ANY account is included
47+
account_include=[], # Use this to ensure that ANY account is included
5148
account_exclude=[], # Use this to exclude specific accounts
52-
account_required=[], # Use this to ensure ALL accounts are included
49+
account_required=[WHIRLPOOL_PROGRAM_ID, RAYDIUM_CONCENTRATED_LIQ_PROGRAM_ID], # Use this to ensure ALL accounts in the array are included
5350
)
5451
},
5552
commitment=geyser_pb2.CommitmentLevel.PROCESSED,

0 commit comments

Comments
 (0)