HBase Stargate (REST API) client wrapper for Python.
Read the official documentation of Stargate (http://wiki.apache.org/hadoop/Hbase/Stargate).
starbase is (at the moment) a client implementation of the Apache HBase REST API (Stargate).
Beware, that REST API is slow (not to blame on this library!). If you can operate with HBase directly better do so.
You need to have Hadoop, HBase, Thrift and Stargate running. If you want to make it easy for yourself, read my instructions on installing Cloudera manager (free) on Ubuntu 12.04 LTS here (http://barseghyanartur.blogspot.nl/2013/08/installing-cloudera-on-ubuntu-1204.html) or (https://bitbucket.org/barseghyanartur/simple-cloudera-install).
Once you have everything installed and running (by default Stargate runs on 127.0.0.1:8000), you should be able to run src/starbase/client/test.py without problems (UnitTest).
Project is still in development, thus not all the features of the API are available.
A lot of useful examples with comments could be found in stargate.client.tests module. Some most common operations are shown below.
>>> from starbase import Connection
Defaults to 127.0.0.1:8000. Specify when creating a connection instance if your settings are different.
>>> c = Connection()
Assuming that we have two tables named table1 and table2, we’ll see the following.
>>> c.tables()
['table1', 'table2']
Create a table instance (note, that at this step no table is created). If you need to operate with table data, you need to create a table instance.
>>> t = c.table('table3')
Create a table with columns column1, column2, column3 (this is the point where the table is actually created).
>>> t.create('column1', 'column2', 'column3')
201
>>> t.columns()
['column1', 'column2', 'column3']
>>> t.insert(
>>> 'my-key-1',
>>> {
>>> 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
>>> 'column2': {'key21': 'value 21', 'key22': 'value 22'},
>>> 'column3': {'key32': 'value 31', 'key32': 'value 32'}
>>> }
>>> )
200
Note, that you may also use the native way of naming the columns and cells (qualifiers).
>>> t.insert(
>>> 'my-key-1a',
>>> {
>>> 'column1:key11': 'value 11', 'column1:key12': 'value 12', 'column1:key13': 'value 13',
>>> 'column2:key21': 'value 21', 'column2:key22': 'value 22',
>>> 'column3:key32': 'value 31', 'column3:key32': 'value 32'
>>> }
>>> )
200
>>> t.fetch('my-key-1')
{
'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
'column2': {'key21': 'value 21', 'key22': 'value 22'},
'column3': {'key32': 'value 31', 'key32': 'value 32'}
}
>>> t.fetch('my-key-1', ['column1', 'column2'])
{
'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
'column2': {'key21': 'value 21', 'key22': 'value 22'},
}
>>> t.fetch('my-key-1', {'column1': ['key11', 'key13'], 'column3': ['key32']})
{
'column1': {'key11': 'value 11', 'key13': 'value 13'},
'column3': {'key32': 'value 32'}
}
Note, that you may also use the native way of naming the columns and cells (qualifiers).
>>> t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32'])
{
'column1': {'key11': 'value 11', 'key13': 'value 13'},
'column3': {'key32': 'value 32'}
}
If you set the perfect_dict argument to False, you’ll get the native data structure.
>>> t.fetch('my-key-1', ['column1:key11', 'column1:key13', 'column3:key32'], perfect_dict=False)
{
'column1:key11': 'value 11', 'column1:key13': 'value 13',
'column3:key32':'value 32'
}
Add columns given (column4, column5).
>>> t.add_columns('column4', 'column5')
200
>>> t.update(
>>> 'my-key-1',
>>> {'column4': {'key41': 'value 41', 'key42': 'value 42'}}
>>> )
200
Remove row cell (qualifier)
>>> t.remove('my-key-1', 'column4', 'key41')
200
Remove row column (column family)
>>> t.remove('my-key-1', 'column4')
200
Remove entire row
>>> t.remove('my-key-1')
200
Drop columns given (column4, column5).
>>> t.drop_columns('column4', 'column5')
201
Note, that if your columns contain data, even when dropped, the data is not immediately gone. If you first drop the column and the created it again, you will still have all your data originally stored in the column.
>>> data = {
>>> 'column1': {'key11': 'value 11', 'key12': 'value 12', 'key13': 'value 13'},
>>> 'column2': {'key21': 'value 21', 'key22': 'value 22'},
>>> }
>>> b = t.batch()
>>> for i in range(0, 5000):
>>> b.insert('my-key-%s' % i, data)
>>> b.commit(finalize=True)
{'method': 'PUT', 'response': [200], 'url': 'table3/bXkta2V5LTA='}
>>> data = {
>>> 'column3': {'key31': 'value 31', 'key32': 'value 32'},
>>> }
>>> b = t.batch()
>>> for i in range(0, 5000):
>>> b.update('my-key-%s' % i, data)
>>> b.commit(finalize=True)
{'method': 'POST', 'response': [200], 'url': 'table3/bXkta2V5LTA='}
Table scanning is in development. At the moment it’s only possible to fetch all rows from a table given. Results are stored in a generator.
>>> t.fetch_all_rows()
<generator object results at 0x28e9190>
>>> t.drop()
200
>>> print connection.version
{u'JVM': u'Sun Microsystems Inc. 1.6.0_43-20.14-b01',
u'Jersey': u'1.8',
u'OS': u'Linux 3.5.0-30-generic amd64',
u'REST': u'0.0.2',
u'Server': u'jetty/6.1.26'}
>>> print connection.cluster_version
u'0.94.7'
>>> print connection.cluster_status
{u'DeadNodes': [],
u'LiveNodes': [{u'Region': [{u'currentCompactedKVs': 0,
...
u'regions': 3,
u'requests': 0}
>>> print table.schema()
{u'ColumnSchema': [{u'BLOCKCACHE': u'true',
u'BLOCKSIZE': u'65536',
...
u'IS_ROOT': u'false',
u'name': u'messages'}
>>> print table.regions()
GPL 2.0/LGPL 2.1
For any issues contact me at the e-mail given in the Author section.
Artur Barseghyan <artur.barseghyan@gmail.com>
Contents: