Using the Kombilo engine in your own scripts¶
This document describes how to use the Kombilo database from your own Python scripts, enabling you to do all kinds of data mining on your SGF databases.
Getting started¶
It is easiest to create some databases using Kombilo, and then just use the
database files kombilo.d*
(or first copy them somewhere).
Then, a pattern search can be done in a few lines:
# set up the KEngine, load the database files
K = KEngine()
K.gamelist.DBlist.append({ 'sgfpath': '.', 'name':('.', 'kombilo1'),
'data': None, 'disabled': 0})
K.loadDBs()
# let us check whether this worked
print K.gamelist.noOfGames(), 'games in database.'
# define a search pattern
p = Pattern('''
.......
.......
...X...
....X..
...OX..
...OO..
.......
''', ptype=CORNER_NE_PATTERN, sizeX=7, sizeY=7)
# start pattern search
K.patternSearch(p)
# print some information
print K.patternSearchDetails()
For a slightly extended example, see basic_pattern_search
.
Instead of appending items to K.gamelist.DBlist
manually as above, you
can also use the py:meth:GameList.populateDBs method, see e.g.
sgftree
.
The scripts in the examples directory¶
basic_pattern_search.py¶
sgftree¶
This script takes an initial position, searches for it in the given database, and then searches for all continuations, then for all continuations in the newly found results etc. In this way, a tree of positions is computed, and in the end everything is written into an SGF file, with some information about the search results at each step. (Note that this functionality is also available from within Kombilo.)
Before starting the script, you need to write a configuration file. In the
[databases]
section, information about the databases to be used should be
given, in the [options]
section some options must be set.
Mandatory options are
output # name of the output file
Further options:
initialposition # the initial position; see below for examples, default: empty board
boardsize # board size, default: 19,
anchors # the rectangle within which the top left corner of the search pattern
# may move, default: (0, 0, 0, 0),
selection # the region on the board which is used as the search pattern,
# default: ((0, 0), (18, 18)),
depth # the highest move number that is considered, default: 10
min_number_of_hits # variations with less hits are not considered (black/white
# continuations are considered separately) default: 10
max_number_of_branches # if there are more continuations,
# only those with the most hits are considered, default: 20
gisearch # a query text for a game info search to be carried out
# before the pattern searches, default: no game info search
reset_game_list # should each search start from the initial game list, or from
# the list resulting from the search for the parent node?
# (This determines whether for some node all games featuring
# this position should be seen, or only those where it arose
# by the same sequence of moves as in the SGF file),
# default: False
comment_head # text that should be prepended to every comment,
# default: @@monospace
The default value for comment_head
is @@monospace
which causes Kombilo
to display the comment in a fixed width font. This is useful for output in
tabular form.
In the [searchoptions]
section, you can pass search options to Kombilo;
possible choices are
fixedColor, nextMove, searchInVariations, moveLimit
Example config file (starting from the empty board):
[databases]
d0 = /home/ug/go/gogod10W, /home/ug/go/gogod10W, kombilo1
d1 = /home/ug/go/go4go, /home/ug/go/go4go, kombilo1
d2 = /home/ug/go/go4goN, /home/ug/go/go4goN, kombilo1
[options]
output = out1.sgf
# start with empty board:
initialposition = '(;)'
depth = 15
min_number_of_hits = 20
max_number_of_branches = 20
[searchoptions]
fixedColor = 1
Example config file (starting with opposing san ren sei):
[databases]
d0 = /home/ug/go/gogod11W, /home/ug/go/gogod11W, kombilo3
[options]
output = out2.sgf
initialposition = '(;AB[pd][pp][pj]AW[dd][dp][dj])'
depth = 15
min_number_of_hits = 5
max_number_of_branches = 20
[searchoptions]
fixedColor = 1
profiler¶
This scripts performs a number of pattern searches and writes a HTML file with information about the results and the time used for the searches. This makes it easy to compare Kombilo performance with different search parameters. Invoking the script for different versions of the underlying libkombilo library, you can also experiment with changes to the search algorithms, or compare new algorithms to the existing ones.
Usage:
Invoke the script as
./profiler.py s1
where s1
is a subdirectory containing the following files.
Mandatory files:
kombilo1.d* # kombilo database files
hgsummary # a text file whose first line should contain information about the
# revision (inside the hg source code repository) of libkombilo
# used in this instance; to get started, just put the date (or
# anything) as the first line of a text file.
jquery.js # The `JQuery <http://jquery.com>`_ javascript library which is
# used in the HTML file produced by the script. Obtain a current
# version from the `JQuery <http://jquery.com>`_ web site.
Optional files:
libkombilo.py, _libkombilo.so # the files providing the libkombilo library
# If you do not put them in the subdirectory,
# they are taken from the ``src/`` directory of
# your Kombilo installation.
Of course, you could easily to change the script to read the database from a different path or to use more than one database.
test_pattern_search¶
In this directory there are a couple of scripts which I used to test the pattern search for consistency. You can use them as starting points for your own scripts.
final_position.py¶
This script takes a database, and searches for the final position of each game in the database. If the number of results is different from 1, the file names of the games having this patterns are printed. (Typically these are games which have duplicates in the database, or which are very short.)
After searching for each final position, the script searches for the position at move 50 in each game.
Usage: invoke as
./various_tests.py s1
where s1
is a subdirectory which contains data as for the
profiler
script. Output is to the console (instead of an HTML file).
various_tests.py¶
This script carries out a number of pattern searches, for various patterns and with various parameters, and checks the results for consistency.
Usage: invoke as
./various_tests.py s1
where s1
is a subdirectory which contains data as for the
profiler
script, and to which the output html page is written.
API¶
The kombiloNG module¶
The kombiloNG module provides much of the Kombilo functionality without the Graphical User Interface. You can use it to do pattern searches etc. in your Python scripts.
-
class
kombiloNG.
Cursor
(*args, **kwargs)¶ A Cursor which is used to traverse an SGF file. See the documentation of the
sgf
module for further details.
-
class
kombiloNG.
GameList
¶ A Kombilo list of games. The list can consist of several Kombilo databases. You do not construct instances of this class yourself. Rather, every
KEngine
instanceK
has a unique instanceK.gamelist
ofGameList
.As in Kombilo, the GameList maintains a list of games that are “currently visible” (think of all games matching some pattern). All search methods and many other methods work with this “current list”.
-
addTag
(tag, index)¶ Set tag on game at position index in the current list.
-
exportTags
(filename, which_tags=[])¶ Export all tags in all non-disabled databases into the file specified by
filename
.If which_tags is specified, then it has to be a list of positive integers, and only the tags in the list are exported.
-
getIndex
(i)¶ Returns dbIndex, j, such that self.DBlist[dbIndex][‘current’][j] corresponds to the i-th entry of the current list of games.
-
getProperty
(index, prop)¶ Return a property of the game at position
index
in the current list of games. Hereprop
should be one of the following constants:GL_FILENAME
- the filenameGL_PB
- the black playerGL_PW
- the white playerGL_RESULT
- the resultGL_SIGNATURE
- the symmetrized Dyer signatureGL_DATE
- the date.
-
getSGF
(index)¶ Return the SGF source of the game at position
index
in the current list of games.This returns the SGF if sgfInDB was True when processing the db; otherwise it returns the root node SGF.
-
getTags
(index)¶ Get all tags of the game at position index in the current list.
-
get_data
(i, showTags=True)¶ Return entry in line i of current list of games (as it appears in the Kombilo game list window).
-
importTags
(filename)¶ The file given by filename should be a file to which previously tags have been exported using
exportTags()
.This method imports all the tags into the current databases. The games are identified by the Dyer signature together with a hash value of their final position. So unless there are duplicates in the database, this should put the tags on those games where they were before exporting. In case of duplicates, all duplicates will receive the corresponding tags.
-
listOfCurrentSGFFiles
()¶ Return a list of file names for all SGF files of games in the current list of games.
-
noOfGames
()¶ Return the number of games in the current list of games.
-
noOfHits
()¶ Return the number of hits for the last pattern search.
-
noOfSwitched
()¶ Return the number of hits where the colors are reversed for the last pattern search.
-
populateDBlist
(d)¶ Add the databases specified in the dictionary
d
to this GameList.d
must have the following format:For each key
k
,d[k]
is a list of three entries. The first entry is thesgfpath
, i.e. the root folder where (and below which) the SGF files of this database are stored.The second entry is the path where the Kombilo database files are stored, and the third entry is the name of these database files, without the extension.
The keys are assumed to be strings. If``k`` ends with ‘disabled’, then the disabled flag will be set for the corresponding database.
After adding the databases in this way, you must call
KEngine.loadDBs()
to load the database files.
-
printGameInfo
(index)¶ Return a pair whose first entry is a string containing the game info for the game at index. The second entry is a string giving the reference to commentaries in the literature, if available.
-
printSignature
(index)¶ Return the symmetrized Dyer signature of the game at
index
in the current list of games.
-
reset
()¶ Reset the list, s.t. it includes all the games from self.data.
-
-
class
kombiloNG.
KEngine
¶ This is the class which you use to use the Kombilo search functionality.
After instantiating it, you need to tell the gamelist which databases you want to use, e.g. using
GameList.populateDBlist()
, and then callloadDBs()
. Afterwards you can usepatternSearch()
, for instance.See the Kombilo documentation on further information how to get started.
Further notes.
After a pattern search, the continuations are assembled into the list
self.continuations
, whose entries are instances of lk.Continuation, storing total number of hits, position in currentSearchPattern, number of black continuations, number of black wins after black play here, number of black losses after black play here, number of black plays here after tenuki, number of white continuations, number of black wins after white play here, number of black losses after white play here, number of white plays here after tenuki, label used on the board at this point.-
addDB
(dbp, datap=('', '#'), recursive=True, filenames='*.sgf', acceptDupl=True, strictDuplCheck=True, tagAsPro=0, processVariations=True, algos=None, messages=None, progBar=None, showwarning=None, index=None, all_in_one_db=True, sgfInDB=True, logDuplicates=True, stop_var=None)¶ Call this method to newly add a database of SGF files.
Parameters:
- dbp: the path where the sgf files are to be found.
- datap: the path where the database files will be stored. Leaving the default value means: store database at dbp, with base filename ‘kombilo’. Instead, you can specify a pair (path, filename). Then path/filenameN.d[ab]? will be the locations of the database files. Every Kombilo database consists of several files; they will have names with ? equal to a, b, ... N is a natural number chosen to make the file name unique.
- recursive: specifies whether subdirectories should be included recursively
- messages: a ‘message text window’ which receives status messages
- progBar: a progress bar
- showwarning: a method which display warnings (like Tkinter showwarning)
- index: where to add this in the DBlist (None means: add at end)
- all_in_one_db: Put all games found in this folder and all its subfolders into one db (rather than creating one db per folder)
-
addOneFolder
(arguments, dbpath, gl=None)¶ This should really be named add_one_folder: Adds all sgf files in the folder dbpath to gl, or to a newly created GameList.
-
copyCurrentGamesToFolder
(dir)¶ Copy all SGF files belonging to games in the current list to the folder given as
dir
.
-
dateProfile
(intervals=None)¶ Return the absolute numbers of games in the given date intervals among the games in the current list of games.
Default value for
intervals
is[ (0, 1900), (1900, 1950), (1950, 1975), (1975, 1985), (1985, 1992), (1992, 1997), (1997, 2002), (2002, 2006), (2006, 2009), (2009, 2013), ]
-
dateProfileRelative
()¶ Return the ratios of games in the current list versus games in the whole database, for each of the date intervals specified in
dateProfile()
.
-
gameinfoSearch
(query)¶ Do a game info search on the current list of games.
query
provides the query as part of an SQL clause which can be used as an SQL WHERE clause. Examples:date >= '2000-03-00' PB = 'Cho Chikun' PB like 'Cho%' PW like 'Go Seigen' and not PB like 'Hashimoto%'
After the
like
operator, you can use the percent sign%
as a wildcard to mach arbitrary text.The columns in the database are
PB (player black) PW (player white) RE (result) EV (event) DT (the date as given in the sgf file) date (the date in the form YYYY-MM-DD) filename sgf (the full SFG source).
-
gameinfoSearchNC
(query)¶ Returns the number of games matching the given query (see
gameinfoSearch()
for the format of the query) without changing the list of current games.
-
get_datapath
(datap, dbpath)¶ Ensure not to overwrite existing files: Add counter to path/file.db such that new Kombilo databases can be written.
-
get_pattern_from_node
(node, boardsize=19, **kwargs)¶ Return a full board pattern with the position at
node
. **kwargs are passed on toPattern.__init__()
.
-
loadDBs
(progBar=None, showwarning=None)¶ Load the database files for all databases that were added to the gamelist.
-
parseReferencesFile
(datafile, options=None)¶ Parse a file with references to commentaries in the literature. See the file
src/data/references
for the file format.The method builds up
self.gamelist.references
, a dictionary which for each Dyer signature has a list of all references for the corresponding game.datafile is expected to be a “file-like object” (like an opened file) with a .read() method.
-
patternSearch
(CSP, SO=None, CL='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz123456789', FL={}, progBar=None, sort_criterion=None, update_gamelist=True)¶ Start a pattern search on the current game list.
- CSP must be an instance of
Pattern
- it is the pattern that is searched for. - You can specify search options as SO - this must be an instance of
lk.SearchOptions
(see below). CL
,FL
,progBar
are used with the Kombilo GUI.- sort_criterion will be used for sorting the continuations:
- total: by number of occurrences
- earliest: by earliest occurrence (earliest first)
- latest: by latest occurrence (latest date first)
- average: by average date of occurrence (earliest date first)
- became popular: by weighted average which tries to measure when the move became popular (earliest date first)
- became unpopular: by weighted average which tries to measure when the move became unpopular (latest date first)
Search options. Create an instance of
lk.SearchOptions
byso = lk.SearchOptions
You can then set particular options on
so
, e.g.:so.fixedColor = 1 so.searchInVariations = False
Available options:
fixedColor, values: 0 = also search for pattern with colors reversed; 1 = fix colors as given in pattern; default value is 0
nextMove, values: 0 either player moves next, 1 = next move must be black, 2 = next move must be white; default value is 0
moveLimit, positive integer; pattern must occur at this move in the game or earlier; default value is 10000
trustHashFull, boolean, values: true = do not use ALGO_MOVELIST to confirm a hit given by ALGO_HASH_FULL, false = use ALGO_MOVELIST to confirm it; default value is false
searchInVariations, boolean; default value is true
algos, an integer which specifies which algorithms should be used; in practice, use one of the following:
lk.ALGO_FINALPOS | lk.ALGO_MOVELIST lk.ALGO_FINALPOS | lk.ALGO_MOVELIST | lk.ALGO_HASH_FULL lk.ALGO_FINALPOS | lk.ALGO_MOVELIST | lk.ALGO_HASH_FULL | lk.ALGO_HASH_CORNER
The default is to use all available algorithms.
- CSP must be an instance of
-
patternSearchDetails
(exportMode='ascii', showAllCont=False)¶ Returns a string with information on the most recent pattern search.
-
signatureSearch
(sig)¶ Do a signature search for the Dyer signature
sig
.
-
tagSearch
(tag)¶ Do a tag search on the current game list.
tag can be an expression like
H and (X or not M)
, where H, X, M are abbreviations for tags (i.e. keys in self.gamelist.customTags). In the simplest example, tag ==H
, i.e. we just search for all games tagged withH
.
-
-
class
kombiloNG.
Node
(*args, **kwargs)¶ A Node of an SGF file. Also see the documentation of the
sgf
module.
-
class
kombiloNG.
Pattern
(p, **kwargs)¶ A pattern, i.e., a configuration of black and white stones (and empty spots, and possibly wildcards) on a portion of the go board.
To create a pattern, pass the following arguments to Pattern:
p: The pattern as a string (
...XXO..X
). Blanks and line breaks will be ignored. Commas (to mark hoshis) will be replaces by periods.ptype (optional): one of
CORNER_NW_PATTERN, CORNER_NE_PATTERN, CORNER_SW_PATTERN, CORNER_SE_PATTERN # fixed in specified corner SIDE_N_PATTERN, SIDE_W_PATTERN, SIDE_E_PATTERN, SIDE_S_PATTERN # slides along specified side CENTER_PATTERN # movable in center FULLBOARD_PATTERN.
sizeX, sizeY: the size (horizontal/vertical) of the pattern (not needed, if ptype is
FULLBOARD_PATTERN
).anchors (optional): A tuple (right, left, top, bottom) which describe the rectangle containing all permissible positions for the top left corner of the pattern.
One of ptype and anchors must be present. If ptype is given, then anchors will be ignored.
- contlist (optional): A list of continuations, in SGF format, e.g.
;B[qq];W[de];B[gf]
, - topleft (optional): a pair of coordinates, specifying the top left corner of the pattern, needed for translating contlist into coordinates relative to the pattern
- contsinpattern (optional; used only if contlist is not given):
X
(black) orO
(white). If given, the labels 1, 2, 3, ... in the pattern are extracted and handled as continuations, with 1 played by the specified color. - contLabels (optional): A string of same size as p, with labels that should be used for labelling continuations.
Warning
Continuation and captures
With the
Pattern
class it is not currently possible to deal with captures made by one of the moves of the continuation list. While the libkombilo library allows to do this, I have yet to think of a good interface to access this functionality.-
getInitialPosAsList
(hoshi=False, boundary=False)¶ Export current pattern as list of lists, like [ [‘.’, ‘X’, ‘.’], [‘O’, ‘.’, ‘.’] ]
If boundary==True, a boundary of spaces, ‘-‘, ‘|’, ‘+’s is added. If hoshi==True, hoshi points are marked with ‘,’. (Of course, this is only applicable for fullboard or corner patterns, or patterns with fixed anchor.)
-
kombiloNG.
translateRE
(s)¶ Try to provide accurate translation of REsult string in an SGF file. See also the notes of Andries Brouwer at http://homepages.cwi.nl/~aeb/go/misc/sgfnotes.html
The sgf module¶
The sgf module provides functionality for handling SGF files.
-
class
sgf.
Cursor
(sgf, sloppy=False, encoding='utf8')¶ The Cursor class takes SGF data (as a string) and provides methods to traverse the game and to retrieve the information for each node stored in the SGF file.
To create a Cursor instance, call Cursor with the following arguments:
- sgf: The SGF data as a string.
- sloppy (optional, default is False): If this is True, then the parser tries to ignore deviations from the SGF format.
- encoding (optional, default is ‘utf-8’): This option is currently not used. Later, the parser will decode the file with the specified encoding.
-
exportGame
(gameNumber=None)¶ Return a string with the game attached to self in SGF format (with character encoding utf-8!). Depending on gameNumber:
- if None: only the currently “active” game in the collection is written (as specified by self.currentGame)
- if an integer: the game specified by gameNumber is written
- it a tuple of integer: the games for this tuple are written
- if == ‘ALL’: all games are written
-
getRootNode
(n)¶ Get the first node of the
n
-th node of this SGF game collection. Typically, SGF files contain only a single game;getRootNode(0)
will give you its root node.
-
next
(n=0, markCurrent=None)¶ Go to n-th child of current node. Default for n is 0, so if there are no variations, you can traverse the game by repeatedly calling
next()
.
-
noChildren
()¶ Returns the number of children of the current node, i.e. the number of variations starting here.
-
previous
()¶ Go to the previous node.
-
updateRootNode
(data, n=0)¶ Update the root node of the
n
-th game in this collection.data
is a dictionary which maps SGF properties like PB, PW, ... to their values.
-
class
sgf.
Node
(node)¶ The Node class represents a single node in a game. This class is a wrapper for lk.Node class. It has dictionary style access to sgf property values.
This class does not inherit from
lk.Node
. To construct a Node, pass an lk.Node instance to__init__
. It is stored asself.n
.You can check whether a Node
node
has an SGF property and retrieve its value like this:if 'B' in node: print node['B']
. Similarly, usingnode['B'] = ('pp', )
anddel node['B']
you can set values and delete properties fromnode
.-
add_property_value
(ID, item)¶ Add
item
to the listself[ID]
.
-
get_move_number
()¶ Returns the move number where the node sits inside the game. This is a list of non-negative integers. The entries with even indices mean “go right by this amount in the game tree”; the entries at odd places mean “go down as many steps as indicated, i.e. pass to the corresponding sibling”.
-
pathToNode
()¶ Returns ‘path’ to the specified node in the following format:
[0,1,0,2,0]
means: from rootNode, go * to next move (0-th variation), then ( * to first variation, then * to next move (0-th var.), then * to second variation, * then to next move.In other words, a cursor c pointing to rootNode can be moved to n by
for i in n.pathToNode(): c.next(i)
.
-
remove
(ID, item)¶ Remove
item
from the listself.n[ID]
.
-
Libkombilo constants¶
Pattern types:
CORNER_NW_PATTERN
CORNER_NE_PATTERN
CORNER_SW_PATTERN
CORNER_SE_PATTERN
SIDE_N_PATTERN
SIDE_W_PATTERN
SIDE_E_PATTERN
SIDE_S_PATTERN
CENTER_PATTERN
FULLBOARD_PATTERN
Algorithms:
ALGO_FINALPOS
ALGO_MOVELIST
ALGO_HASH_CORNER
ALGO_HASH_FULL
Libkombilo¶
To study the underlying C++ library in detail, look at the Libkombilo
documentation. A good starting point
is the cpptest.cpp
program in lk/examples
.