Skip to content
This repository was archived by the owner on Apr 14, 2024. It is now read-only.

Commit 4cde3c0

Browse files
committed
Added a minimal documentation, including a quick usage guide
1 parent 78e21e7 commit 4cde3c0

File tree

5 files changed

+197
-10
lines changed

5 files changed

+197
-10
lines changed

doc/source/api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.. _api_reference_toplevel:
1+
.. _api-label:
22

33
#############
44
API Reference

doc/source/conf.py

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,9 +45,9 @@
4545
# built documents.
4646
#
4747
# The short X.Y version.
48-
version = '1.0'
48+
version = '0.5'
4949
# The full version, including alpha/beta/rc tags.
50-
release = '1.0.0'
50+
release = '0.5.0'
5151

5252
# The language for content autogenerated by Sphinx. Refer to documentation
5353
# for a list of supported languages.
@@ -93,10 +93,9 @@
9393
# Sphinx are currently 'default' and 'sphinxdoc'.
9494
html_theme = 'default'
9595

96-
# Theme options are theme-specific and customize the look and feel of a theme
97-
# further. For a list of options available for each theme, see the
98-
# documentation.
99-
#html_theme_options = {}
96+
html_theme_options = {
97+
"stickysidebar": "true"
98+
}
10099

101100
# Add any paths that contain custom themes here, relative to this directory.
102101
#html_theme_path = []

doc/source/intro.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,28 @@
22
Overview
33
########
44

5+
The *GitDB* project implements interfaces to allow read and write access to git repositories. In its core lies the *db* package, which contains all database types necessary to read a complete git repository. These are the ``LooseObjectDB``, the ``PackedDB`` and the ``ReferenceDB`` which are combined into the ``GitDB`` to combine every aspect of the git database.
6+
7+
For this to work, GitDB implements pack reading, as well as loose object reading and writing. Data is always encapsulated in streams, which allows huge files to be handled as well as small ones, usually only chunks of the stream are kept in memory for processing, never the whole stream at once.
8+
9+
Interfaces are used to describe the API, making it easy to provide alternate implementations.
10+
11+
================
12+
Installing GitDB
13+
================
14+
Its easiest to install gitdb using the *easy_install* program, which is part of the `setuptools`_::
15+
16+
$ easy_install gitdb
17+
18+
As the command will install gitdb in your respective python distribution, you will most likely need root permissions to authorize the required changes.
19+
20+
If you have downloaded the source archive, the package can be installed by running the ``setup.py`` script::
21+
22+
$ python setup.py install
23+
24+
===============
25+
Getting Started
26+
===============
27+
It is advised to have a look at the :ref:`Usage Guide <tutorial-label>` for a brief introduction on the different database implementations.
28+
29+
.. _setuptools: http://peak.telecommunity.com/DevCenter/setuptools

doc/source/tutorial.rst

Lines changed: 113 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,114 @@
1+
.. _tutorial-label:
12

2-
########
3-
Tutorial
4-
########
3+
###########
4+
Usage Guide
5+
###########
6+
This text briefly introduces you to the basic design decisions and accompanying types.
7+
8+
******
9+
Design
10+
******
11+
The *GitDB* project models a standard git object database and implements it in pure python. This means that data, being classified by one of four types, can can be stored in the database and will in future be referred to by the generated SHA1 key, which is a 20 byte string within python.
12+
13+
*GitDB* implements *RW* access to loose objects, as well as *RO* access to packed objects. Compound Databases allow to combine multiple object databases into one.
14+
15+
All data is read and written using streams, which effectively prevents more than a chunk of the data being kept in memory at once mostly [#]_.
16+
17+
*******
18+
Streams
19+
*******
20+
In order to assure the object database can handle objects of any size, a stream interface is used for data retrieval as well as to fill data into the database.
21+
22+
Basic Stream Types
23+
==================
24+
There are two fundamentally different types of streams, **IStream**\ s and **OStream**\ s. IStreams are mutable and are used to provide data streams to the database to create new objects.
25+
26+
OStreams are immutable and are used to read data from the database. The base of this type, **OInfo**, contains only type and size information of the queried object, but no stream, which is slightly faster to retrieve depending on the database.
27+
28+
OStreams are tuples, IStreams are lists. Both, OInfo and OStream, have the same member ordering which allows quick conversion from one type to another.
29+
30+
****************************
31+
Data Query and Data Addition
32+
****************************
33+
Databases support query and/or addition of objects using simple interfaces. They are called **ObjectDBR** for read-only access, and **ObjectDBW** for write access to create new objects.
34+
35+
Both have two sets of methods, one of which allows interacting with single objects, the other one allowing to handle a stream of objects simultaneously and asynchronously.
36+
37+
Acquiring information about an object from a database is easy if you have a SHA1 to refer to the object::
38+
39+
40+
ldb = LooseObjectDB(fixture_path("../../.git/objects"))
41+
42+
for sha1 in ldb.sha_iter():
43+
oinfo = ldb.info(sha1)
44+
ostream = ldb.stream(sha1)
45+
assert oinfo[:3] == ostream[:3]
46+
47+
assert len(ostream.read()) == ostream.size
48+
# END for each sha in database
49+
50+
To store information, you prepare an *IStream* object with the required information. The provided stream will be read and converted into an object, and the respective 20 byte SHA1 identifier is stored in the IStream object::
51+
52+
data = "my data"
53+
istream = IStream("blob", len(data), StringIO(data))
54+
55+
# the object does not yet have a sha
56+
assert istream.binsha is None
57+
ldb.store(istream)
58+
# now the sha is set
59+
assert len(istream.binsha) == 20
60+
assert ldb.has_object(istream.binsha)
61+
62+
**********************
63+
Asynchronous Operation
64+
**********************
65+
For each read or write method that allows a single-object to be handled, an *_async* version exists which reads items to be processed from a channel, and writes the operation's result into an output channel that is read by the caller or by other async methods, to support chaining.
66+
67+
Using asynchronous operations is easy, but chaining multiple operations together to form a complex one would require you to read the docs of the *async* package. At the current time, due to the *GIL*, the *GitDB* can only achieve true concurrency during zlib compression and decompression if big objects, if the respective c modules where compiled in *async*.
68+
69+
Asynchronous operations are scheduled by a *ThreadPool* which resides in the *gitdb.util* module::
70+
71+
from gitdb.util import pool
72+
73+
# set the pool to use two threads
74+
pool.set_size(2)
75+
76+
# synchronize the mode of operation
77+
pool.set_size(0)
78+
79+
80+
Use async methods with readers, which supply items to be processed. The result is given through readers as well::
81+
82+
from async import IteratorReader
83+
84+
# Create a reader from an iterator
85+
reader = IteratorReader(ldb.sha_iter())
86+
87+
# get reader for object streams
88+
info_reader = ldb.stream_async(reader)
89+
90+
# read one
91+
info = info_reader.read(1)[0]
92+
93+
# read all the rest until depletion
94+
ostreams = info_reader.read()
95+
96+
97+
98+
*********
99+
Databases
100+
*********
101+
A database implements different interfaces, one if which will always be the *ObjectDBR* interface to support reading of object information and streams.
102+
103+
The *Loose Object Database* as well as the *Packed Object Database* are *File Databases*, hence they operate on a directory which contains files they can read.
104+
105+
File databases implementing the *ObjectDBW* interface can also be forced to write their output into the specified stream, using the ``set_ostream`` method. This effectively allows you to redirect its output to anywhere you like.
106+
107+
*Compound Databases* are not implementing their own access type, but instead combine multiple database implementations into one. Examples for this database type are the *Reference Database*, which reads object locations from a file, and the *GitDB* which combines loose, packed and referenced objects into one database interface.
108+
109+
For more information about the individual database types, please see the :ref:`API Reference <api-label>`, and the unittests for the respective types.
110+
111+
112+
----
113+
114+
.. [#] When reading streams from packs, all deltas are currently applied and the result written into a memory map before the first byte is returned. Future versions of the delta-apply algorithm might improve on this.

test/test_example.py

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
"""Module with examples from the tutorial section of the docs"""
2+
from lib import *
3+
from gitdb import IStream
4+
from gitdb.db import LooseObjectDB
5+
from gitdb.util import pool
6+
7+
from cStringIO import StringIO
8+
9+
from async import IteratorReader
10+
11+
class TestExamples(TestBase):
12+
13+
def test_base(self):
14+
ldb = LooseObjectDB(fixture_path("../../.git/objects"))
15+
16+
for sha1 in ldb.sha_iter():
17+
oinfo = ldb.info(sha1)
18+
ostream = ldb.stream(sha1)
19+
assert oinfo[:3] == ostream[:3]
20+
21+
assert len(ostream.read()) == ostream.size
22+
assert ldb.has_object(oinfo.binsha)
23+
# END for each sha in database
24+
25+
data = "my data"
26+
istream = IStream("blob", len(data), StringIO(data))
27+
28+
# the object does not yet have a sha
29+
assert istream.binsha is None
30+
ldb.store(istream)
31+
# now the sha is set
32+
assert len(istream.binsha) == 20
33+
assert ldb.has_object(istream.binsha)
34+
35+
36+
# async operation
37+
# Create a reader from an iterator
38+
reader = IteratorReader(ldb.sha_iter())
39+
40+
# get reader for object streams
41+
info_reader = ldb.stream_async(reader)
42+
43+
# read one
44+
info = info_reader.read(1)[0]
45+
46+
# read all the rest until depletion
47+
ostreams = info_reader.read()
48+
49+
# set the pool to use two threads
50+
pool.set_size(2)
51+
52+
# synchronize the mode of operation
53+
pool.set_size(0)

0 commit comments

Comments
 (0)