Using the Kombilo engine in your own scripts

This document describes how to use the Kombilo database from your own Python scripts, enabling you to do all kinds of data mining on your SGF databases.

Getting started

It is easiest to create some databases using Kombilo, and then just use the database files kombilo.d* (or first copy them somewhere).

Then, a pattern search can be done in a few lines:

# set up the KEngine, load the database files
K = KEngine()
K.gamelist.DBlist.append({ 'sgfpath': '.', 'name':('.', 'kombilo1'),
                           'data': None, 'disabled': 0})
K.loadDBs()

# let us check whether this worked
print K.gamelist.noOfGames(), 'games in database.'

# define a search pattern
p = Pattern('''
            .......
            .......
            ...X...
            ....X..
            ...OX..
            ...OO..
            .......
            ''', ptype=CORNER_NE_PATTERN, sizeX=7, sizeY=7)

# start pattern search
K.patternSearch(p)

# print some information
print K.patternSearchDetails()

For a slightly extended example, see basic_pattern_search. Instead of appending items to K.gamelist.DBlist manually as above, you can also use the py:meth:GameList.populateDBs method, see e.g. sgftree.

The scripts in the examples directory

basic_pattern_search.py

sgftree

This script takes an initial position, searches for it in the given database, and then searches for all continuations, then for all continuations in the newly found results etc. In this way, a tree of positions is computed, and in the end everything is written into an SGF file, with some information about the search results at each step. (Note that this functionality is also available from within Kombilo.)

Before starting the script, you need to write a configuration file. In the [databases] section, information about the databases to be used should be given, in the [options] section some options must be set.

Mandatory options are

output # name of the output file

Further options:

initialposition # the initial position; see below for examples, default: empty board
boardsize # board size, default: 19,
anchors # the rectangle within which the top left corner of the search pattern
        # may move, default: (0, 0, 0, 0),
selection # the region on the board which is used as the search pattern,
          # default: ((0, 0), (18, 18)),

depth # the highest move number that is considered, default: 10
min_number_of_hits # variations with less hits are not considered (black/white
                   # continuations are considered separately) default: 10
max_number_of_branches # if there are more continuations,
                       # only those with the most hits are considered, default: 20

gisearch # a query text for a game info search to be carried out
         # before the pattern searches, default: no game info search
reset_game_list # should each search start from the initial game list, or from
                # the list resulting from the search for the parent node?
                # (This determines whether for some node all games featuring
                # this position should be seen, or only those where it arose
                # by the same sequence of moves as in the SGF file),
                # default: False
comment_head # text that should be prepended to every comment,
             # default: @@monospace

The default value for comment_head is @@monospace which causes Kombilo to display the comment in a fixed width font. This is useful for output in tabular form.

In the [searchoptions] section, you can pass search options to Kombilo; possible choices are

fixedColor, nextMove, searchInVariations, moveLimit

Example config file (starting from the empty board):

[databases]
d0 = /home/ug/go/gogod10W, /home/ug/go/gogod10W, kombilo1
d1 = /home/ug/go/go4go, /home/ug/go/go4go, kombilo1
d2 = /home/ug/go/go4goN, /home/ug/go/go4goN, kombilo1
[options]
output = out1.sgf
# start with empty board:
initialposition = '(;)'
depth = 15
min_number_of_hits = 20
max_number_of_branches = 20
[searchoptions]
fixedColor = 1

Example config file (starting with opposing san ren sei):

[databases]
d0 = /home/ug/go/gogod11W, /home/ug/go/gogod11W, kombilo3
[options]
output = out2.sgf
initialposition = '(;AB[pd][pp][pj]AW[dd][dp][dj])'
depth = 15
min_number_of_hits = 5
max_number_of_branches = 20
[searchoptions]
fixedColor = 1

profiler

This scripts performs a number of pattern searches and writes a HTML file with information about the results and the time used for the searches. This makes it easy to compare Kombilo performance with different search parameters. Invoking the script for different versions of the underlying libkombilo library, you can also experiment with changes to the search algorithms, or compare new algorithms to the existing ones.

Usage:

Invoke the script as

./profiler.py s1

where s1 is a subdirectory containing the following files.

Mandatory files:

kombilo1.d* # kombilo database files  
hgsummary # a text file whose first line should contain information about the
          # revision (inside the hg source code repository) of libkombilo
          # used in this instance; to get started, just put the date (or
          # anything) as the first line of a text file.
jquery.js # The `JQuery <http://jquery.com>`_ javascript library which is
          # used in the HTML file produced by the script. Obtain a current
          # version from the `JQuery <http://jquery.com>`_ web site.

Optional files:

libkombilo.py, _libkombilo.so # the files providing the libkombilo library
                              # If you do not put them in the subdirectory,
                              # they are taken from the ``src/`` directory of
                              # your Kombilo installation.

Of course, you could easily to change the script to read the database from a different path or to use more than one database.

API

The kombiloNG module

The kombiloNG module provides much of the Kombilo functionality without the Graphical User Interface. You can use it to do pattern searches etc. in your Python scripts.

class kombiloNG.Cursor(*args, **kwargs)

A Cursor which is used to traverse an SGF file. See the documentation of the sgf module for further details.

class kombiloNG.GameList

A Kombilo list of games. The list can consist of several Kombilo databases. You do not construct instances of this class yourself. Rather, every KEngine instance K has a unique instance K.gamelist of GameList.

As in Kombilo, the GameList maintains a list of games that are “currently visible” (think of all games matching some pattern). All search methods and many other methods work with this “current list”.

addTag(tag, index)

Set tag on game at position index in the current list.

exportTags(filename, which_tags=[])

Export all tags in all non-disabled databases into the file specified by filename.

If which_tags is specified, then it has to be a list of positive integers, and only the tags in the list are exported.

getIndex(i)

Returns dbIndex, j, such that self.DBlist[dbIndex][‘current’][j] corresponds to the i-th entry of the current list of games.

getProperty(index, prop)

Return a property of the game at position index in the current list of games. Here prop should be one of the following constants:

  • GL_FILENAME - the filename
  • GL_PB - the black player
  • GL_PW - the white player
  • GL_RESULT - the result
  • GL_SIGNATURE - the symmetrized Dyer signature
  • GL_DATE - the date.
getSGF(index)

Return the SGF source of the game at position index in the current list of games.

This returns the SGF if sgfInDB was True when processing the db; otherwise it returns the root node SGF.

getTags(index)

Get all tags of the game at position index in the current list.

get_data(i, showTags=True)

Return entry in line i of current list of games (as it appears in the Kombilo game list window).

importTags(filename)

The file given by filename should be a file to which previously tags have been exported using exportTags().

This method imports all the tags into the current databases. The games are identified by the Dyer signature together with a hash value of their final position. So unless there are duplicates in the database, this should put the tags on those games where they were before exporting. In case of duplicates, all duplicates will receive the corresponding tags.

listOfCurrentSGFFiles()

Return a list of file names for all SGF files of games in the current list of games.

noOfGames()

Return the number of games in the current list of games.

noOfHits()

Return the number of hits for the last pattern search.

noOfSwitched()

Return the number of hits where the colors are reversed for the last pattern search.

populateDBlist(d)

Add the databases specified in the dictionary d to this GameList. d must have the following format:

For each key k, d[k] is a list of three entries. The first entry is the sgfpath, i.e. the root folder where (and below which) the SGF files of this database are stored.

The second entry is the path where the Kombilo database files are stored, and the third entry is the name of these database files, without the extension.

The keys are assumed to be strings. If``k`` ends with ‘disabled’, then the disabled flag will be set for the corresponding database.

After adding the databases in this way, you must call KEngine.loadDBs() to load the database files.

printGameInfo(index)

Return a pair whose first entry is a string containing the game info for the game at index. The second entry is a string giving the reference to commentaries in the literature, if available.

printSignature(index)

Return the symmetrized Dyer signature of the game at index in the current list of games.

reset()

Reset the list, s.t. it includes all the games from self.data.

class kombiloNG.KEngine

This is the class which you use to use the Kombilo search functionality.

After instantiating it, you need to tell the gamelist which databases you want to use, e.g. using GameList.populateDBlist(), and then call loadDBs(). Afterwards you can use patternSearch(), for instance.

See the Kombilo documentation on further information how to get started.

Further notes.

After a pattern search, the continuations are assembled into the list self.continuations, whose entries are instances of lk.Continuation, storing total number of hits, position in currentSearchPattern, number of black continuations, number of black wins after black play here, number of black losses after black play here, number of black plays here after tenuki, number of white continuations, number of black wins after white play here, number of black losses after white play here, number of white plays here after tenuki, label used on the board at this point.

addDB(dbp, datap=('', '#'), recursive=True, filenames='*.sgf', acceptDupl=True, strictDuplCheck=True, tagAsPro=0, processVariations=True, algos=None, messages=None, progBar=None, showwarning=None, index=None, all_in_one_db=True, sgfInDB=True, logDuplicates=True, stop_var=None)

Call this method to newly add a database of SGF files.

Parameters:

  • dbp: the path where the sgf files are to be found.
  • datap: the path where the database files will be stored. Leaving the default value means: store database at dbp, with base filename ‘kombilo’. Instead, you can specify a pair (path, filename). Then path/filenameN.d[ab]? will be the locations of the database files. Every Kombilo database consists of several files; they will have names with ? equal to a, b, ... N is a natural number chosen to make the file name unique.
  • recursive: specifies whether subdirectories should be included recursively
  • messages: a ‘message text window’ which receives status messages
  • progBar: a progress bar
  • showwarning: a method which display warnings (like Tkinter showwarning)
  • index: where to add this in the DBlist (None means: add at end)
  • all_in_one_db: Put all games found in this folder and all its subfolders into one db (rather than creating one db per folder)
addOneFolder(arguments, dbpath, gl=None)

This should really be named add_one_folder: Adds all sgf files in the folder dbpath to gl, or to a newly created GameList.

copyCurrentGamesToFolder(dir)

Copy all SGF files belonging to games in the current list to the folder given as dir.

dateProfile(intervals=None)

Return the absolute numbers of games in the given date intervals among the games in the current list of games.

Default value for intervals is

[ (0, 1900), (1900, 1950), (1950, 1975), (1975, 1985), (1985, 1992),
(1992, 1997), (1997, 2002), (2002, 2006), (2006, 2009), (2009, 2013),
]
dateProfileRelative()

Return the ratios of games in the current list versus games in the whole database, for each of the date intervals specified in dateProfile().

gameinfoSearch(query)

Do a game info search on the current list of games.

  • query provides the query as part of an SQL clause which can be used as an SQL WHERE clause. Examples:

    date >= '2000-03-00'
    PB = 'Cho Chikun'
    PB like 'Cho%'
    PW like 'Go Seigen' and not PB like 'Hashimoto%'
    

    After the like operator, you can use the percent sign % as a wildcard to mach arbitrary text.

    The columns in the database are

    PB (player black)
    PW (player white)
    RE (result)
    EV (event)
    DT (the date as given in the sgf file)
    date (the date in the form YYYY-MM-DD)
    filename
    sgf (the full SFG source).
    
gameinfoSearchNC(query)

Returns the number of games matching the given query (see gameinfoSearch() for the format of the query) without changing the list of current games.

get_datapath(datap, dbpath)

Ensure not to overwrite existing files: Add counter to path/file.db such that new Kombilo databases can be written.

get_pattern_from_node(node, boardsize=19, **kwargs)

Return a full board pattern with the position at node. **kwargs are passed on to Pattern.__init__().

loadDBs(progBar=None, showwarning=None)

Load the database files for all databases that were added to the gamelist.

parseReferencesFile(datafile, options=None)

Parse a file with references to commentaries in the literature. See the file src/data/references for the file format.

The method builds up self.gamelist.references, a dictionary which for each Dyer signature has a list of all references for the corresponding game.

datafile is expected to be a “file-like object” (like an opened file) with a .read() method.

patternSearch(CSP, SO=None, CL='ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz123456789', FL={}, progBar=None, sort_criterion=None, update_gamelist=True)

Start a pattern search on the current game list.

  • CSP must be an instance of Pattern - it is the pattern that is searched for.
  • You can specify search options as SO - this must be an instance of lk.SearchOptions (see below).
  • CL, FL, progBar are used with the Kombilo GUI.
  • sort_criterion will be used for sorting the continuations:
    • total: by number of occurrences
    • earliest: by earliest occurrence (earliest first)
    • latest: by latest occurrence (latest date first)
    • average: by average date of occurrence (earliest date first)
    • became popular: by weighted average which tries to measure when the move became popular (earliest date first)
    • became unpopular: by weighted average which tries to measure when the move became unpopular (latest date first)

Search options. Create an instance of lk.SearchOptions by

so = lk.SearchOptions

You can then set particular options on so, e.g.:

so.fixedColor = 1
so.searchInVariations = False

Available options:

  • fixedColor, values: 0 = also search for pattern with colors reversed; 1 = fix colors as given in pattern; default value is 0

  • nextMove, values: 0 either player moves next, 1 = next move must be black, 2 = next move must be white; default value is 0

  • moveLimit, positive integer; pattern must occur at this move in the game or earlier; default value is 10000

  • trustHashFull, boolean, values: true = do not use ALGO_MOVELIST to confirm a hit given by ALGO_HASH_FULL, false = use ALGO_MOVELIST to confirm it; default value is false

  • searchInVariations, boolean; default value is true

  • algos, an integer which specifies which algorithms should be used; in practice, use one of the following:

    lk.ALGO_FINALPOS | lk.ALGO_MOVELIST
    lk.ALGO_FINALPOS | lk.ALGO_MOVELIST | lk.ALGO_HASH_FULL
    lk.ALGO_FINALPOS | lk.ALGO_MOVELIST | lk.ALGO_HASH_FULL | lk.ALGO_HASH_CORNER
    

    The default is to use all available algorithms.

patternSearchDetails(exportMode='ascii', showAllCont=False)

Returns a string with information on the most recent pattern search.

signatureSearch(sig)

Do a signature search for the Dyer signature sig.

tagSearch(tag)

Do a tag search on the current game list.

tag can be an expression like H and (X or not M), where H, X, M are abbreviations for tags (i.e. keys in self.gamelist.customTags). In the simplest example, tag == H, i.e. we just search for all games tagged with H.

class kombiloNG.Node(*args, **kwargs)

A Node of an SGF file. Also see the documentation of the sgf module.

class kombiloNG.Pattern(p, **kwargs)

A pattern, i.e., a configuration of black and white stones (and empty spots, and possibly wildcards) on a portion of the go board.

To create a pattern, pass the following arguments to Pattern:

  • p: The pattern as a string (...XXO..X). Blanks and line breaks will be ignored. Commas (to mark hoshis) will be replaces by periods.

  • ptype (optional): one of

    CORNER_NW_PATTERN, CORNER_NE_PATTERN, CORNER_SW_PATTERN, CORNER_SE_PATTERN
    # fixed in specified corner
    
    SIDE_N_PATTERN, SIDE_W_PATTERN, SIDE_E_PATTERN, SIDE_S_PATTERN
    # slides along specified side
    
    CENTER_PATTERN
    # movable in center
    
    FULLBOARD_PATTERN.
    
  • sizeX, sizeY: the size (horizontal/vertical) of the pattern (not needed, if ptype is FULLBOARD_PATTERN).

  • anchors (optional): A tuple (right, left, top, bottom) which describe the rectangle containing all permissible positions for the top left corner of the pattern.

One of ptype and anchors must be present. If ptype is given, then anchors will be ignored.

  • contlist (optional): A list of continuations, in SGF format, e.g. ;B[qq];W[de];B[gf],
  • topleft (optional): a pair of coordinates, specifying the top left corner of the pattern, needed for translating contlist into coordinates relative to the pattern
  • contsinpattern (optional; used only if contlist is not given): X (black) or O (white). If given, the labels 1, 2, 3, ... in the pattern are extracted and handled as continuations, with 1 played by the specified color.
  • contLabels (optional): A string of same size as p, with labels that should be used for labelling continuations.

Warning

Continuation and captures

With the Pattern class it is not currently possible to deal with captures made by one of the moves of the continuation list. While the libkombilo library allows to do this, I have yet to think of a good interface to access this functionality.

getInitialPosAsList(hoshi=False, boundary=False)

Export current pattern as list of lists, like [ [‘.’, ‘X’, ‘.’], [‘O’, ‘.’, ‘.’] ]

If boundary==True, a boundary of spaces, ‘-‘, ‘|’, ‘+’s is added. If hoshi==True, hoshi points are marked with ‘,’. (Of course, this is only applicable for fullboard or corner patterns, or patterns with fixed anchor.)

kombiloNG.translateRE(s)

Try to provide accurate translation of REsult string in an SGF file. See also the notes of Andries Brouwer at http://homepages.cwi.nl/~aeb/go/misc/sgfnotes.html

The sgf module

The sgf module provides functionality for handling SGF files.

class sgf.Cursor(sgf, sloppy=False, encoding='utf8')

The Cursor class takes SGF data (as a string) and provides methods to traverse the game and to retrieve the information for each node stored in the SGF file.

To create a Cursor instance, call Cursor with the following arguments:

  • sgf: The SGF data as a string.
  • sloppy (optional, default is False): If this is True, then the parser tries to ignore deviations from the SGF format.
  • encoding (optional, default is ‘utf-8’): This option is currently not used. Later, the parser will decode the file with the specified encoding.
currentNode()

Get an instance of class Node for the node the cursor currently points to.

exportGame(gameNumber=None)

Return a string with the game attached to self in SGF format (with character encoding utf-8!). Depending on gameNumber:

  • if None: only the currently “active” game in the collection is written (as specified by self.currentGame)
  • if an integer: the game specified by gameNumber is written
  • it a tuple of integer: the games for this tuple are written
  • if == ‘ALL’: all games are written
getRootNode(n)

Get the first node of the n-th node of this SGF game collection. Typically, SGF files contain only a single game; getRootNode(0) will give you its root node.

next(n=0, markCurrent=None)

Go to n-th child of current node. Default for n is 0, so if there are no variations, you can traverse the game by repeatedly calling next().

noChildren()

Returns the number of children of the current node, i.e. the number of variations starting here.

previous()

Go to the previous node.

updateRootNode(data, n=0)

Update the root node of the n-th game in this collection.

data is a dictionary which maps SGF properties like PB, PW, ... to their values.

class sgf.Node(node)

The Node class represents a single node in a game. This class is a wrapper for lk.Node class. It has dictionary style access to sgf property values.

This class does not inherit from lk.Node. To construct a Node, pass an lk.Node instance to __init__. It is stored as self.n.

You can check whether a Node node has an SGF property and retrieve its value like this: if 'B' in node: print node['B']. Similarly, using node['B'] = ('pp', ) and del node['B'] you can set values and delete properties from node.

add_property_value(ID, item)

Add item to the list self[ID].

get_move_number()

Returns the move number where the node sits inside the game. This is a list of non-negative integers. The entries with even indices mean “go right by this amount in the game tree”; the entries at odd places mean “go down as many steps as indicated, i.e. pass to the corresponding sibling”.

pathToNode()

Returns ‘path’ to the specified node in the following format: [0,1,0,2,0] means: from rootNode, go * to next move (0-th variation), then ( * to first variation, then * to next move (0-th var.), then * to second variation, * then to next move.

In other words, a cursor c pointing to rootNode can be moved to n by for i in n.pathToNode(): c.next(i).

remove(ID, item)

Remove item from the list self.n[ID].

Libkombilo constants

Pattern types:

CORNER_NW_PATTERN
CORNER_NE_PATTERN
CORNER_SW_PATTERN
CORNER_SE_PATTERN

SIDE_N_PATTERN
SIDE_W_PATTERN
SIDE_E_PATTERN
SIDE_S_PATTERN

CENTER_PATTERN

FULLBOARD_PATTERN

Algorithms:

ALGO_FINALPOS
ALGO_MOVELIST
ALGO_HASH_CORNER
ALGO_HASH_FULL

Libkombilo

To study the underlying C++ library in detail, look at the Libkombilo documentation. A good starting point is the cpptest.cpp program in lk/examples.