SHOGUN  4.0.0
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Macros Groups Pages
README_developer.md
Go to the documentation of this file.
1 GETTING STARTED {#developer}
2 ===============
3 
4 Shogun is split up into libshogun which contains all the machine learning
5 algorithms and 'static interfaces' helpers,
6 the static interfaces python_static, octave_static, matlab_static, r_static and
7 the modular interfaces python_modular, octave_modular and r_modular (all found
8 in the src/interfaces/ subdirectory with corresponding name). See src/INSTALL on
9 how to install shogun.
10 
11 In case one wants to extend shogun the best way is to start using its library.
12 This can be easily done as a number of examples in examples/libshogun document.
13 
14 The simplest libshogun based program would be
15 
16  #include <shogun/base/init.h>
17 
18  using namespace shogun;
19 
20  int main(int argc, char** argv)
21  {
22  init_shogun();
23  exit_shogun();
24  return 0;
25  }
26 
27 which could be compiled with g++ -lshogun minimal.cpp -o minimal and obviously
28 does nothing (apart form initializing and destroying a couple of global shogun
29 objects internally).
30 
31 In case one wants to redirect shoguns output functions SG_DEBUG, SG_INFO,
32 SG_WARN, SG_ERROR, SG_PRINT etc, one has to pass them to init_shogun() as
33 parameters like this
34 
35  void print_message(FILE* target, const char* str)
36  {
37  fprintf(target, "%s", str);
38  }
39 
40  void print_warning(FILE* target, const char* str)
41  {
42  fprintf(target, "%s", str);
43  }
44 
45  void print_error(FILE* target, const char* str)
46  {
47  fprintf(target, "%s", str);
48  }
49 
50  init_shogun(&print_message, &print_warning,
51  &print_error);
52 
53 To finally see some action one has to include the appropriate header files,
54 e.g. we create some features and a gaussian kernel
55 
56  #include <shogun/labels/Labels.h>
60  #include <shogun/base/init.h>
61  #include <shogun/lib/common.h>
62  #include <shogun/io/SGIO.h>
63 
64  using namespace shogun;
65 
66  void print_message(FILE* target, const char* str)
67  {
68  fprintf(target, "%s", str);
69  }
70 
71  int main(int argc, char** argv)
72  {
73  init_shogun(&print_message);
74 
75  // create some data
76  SGMatrix<float64_t> matrix(2,3);
77  for (int32_t i=0; i<6; i++)
78  matrix.matrix[i]=i;
79 
80  // create three 2-dimensional vectors
81  // shogun will now own the matrix created
83  features->set_feature_matrix(matrix);
84 
85  // create three labels
86  CBinaryLabels* labels=new CBinaryLabels(3);
87  labels->set_label(0, -1);
88  labels->set_label(1, +1);
89  labels->set_label(2, -1);
90 
91  // create gaussian kernel with cache 10MB, width 0.5
92  CGaussianKernel* kernel = new CGaussianKernel(10, 0.5);
93  kernel->init(features, features);
94 
95  // create libsvm with C=10 and train
96  CLibSVM* svm = new CLibSVM(10, kernel, labels);
97  svm->train();
98 
99  // classify on training examples
100  for (int32_t i=0; i<3; i++)
101  SG_SPRINT("output[%d]=%f\n", i, svm->apply_one(i));
102 
103  // free up memory
104  SG_UNREF(svm);
105 
106  exit_shogun();
107  return 0;
108 
109  }
110 
111 Now you probably wonder why this example does not leak memory. First of all,
112 supplying pointers to arrays allocated with new[] will make shogun objects own
113 these objects and will make them take care of cleaning them up on object
114 destruction. Then, when creating shogun objects they keep a reference counter
115 internally. Whenever a shogun object is returned or supplied as an argument to
116 some function its reference counter is increased, for example in the example
117 above
118 
119  CLibSVM* svm = new CLibSVM(10, kernel, labels);
120 
121 increases the reference count of kernel and labels. On destruction the
122 reference counter is decreased and the object is freed if the counter is <= 0.
123 
124 It is therefore your duty to prevent objects from destruction if you keep a
125 handle to them globally *which you still intend to use later*. In the example
126 above accessing labels after the call to SG_UNREF(svm) will cause a
127 segmentation fault as the Label object was already destroyed in the SVM
128 destructor. You can do this by SG_REF(obj). To decrement the reference count of
129 an object, call SG_UNREF(obj) which will also automagically destroy it if the
130 counter is <= 0 and set obj=NULL only in this case.
131 
132 
133 Generally, all shogun C++ Objects are prefixed with C, e.g. CSVM and derived from
134 CSGObject. Since variables in the upper class hierarchy, need to be initialized
135 upon construction of the object, the constructor of base class needs to be
136 called in the constructor, e.g. CSVM calls CKernelMachine, CKernelMachine calls
137 CClassifier which finally calls CSGObject.
138 
139 For example if you implement your own SVM called MySVM you would in the
140 constructor do
141 
142  class MySVM : public CSVM
143  {
144  MySVM( ) : CSVM()
145  {
146  ...
147  }
148  };
149 
150 In case you got your object working we will happily integrate it into shogun
151 provided you follow a number of basic coding conventions detailed below (see
152 FORMATTING for formatting instructions, MACROS on how to use and name macros,
153 TYPES on which types to use, FUNCTIONS on how functions should look like and
154 NAMING CONVENTIONS for the naming scheme.
155 
156 *CODING STYLE:*
157 See [here](Code-style)
158 
159 *VERSIONING SCHEME:*
160 
161 The git repo for the project is hosted on GitHub at
162 https://github.com/shogun-toolbox/shogun. To get started, create your own fork
163 and clone it ([howto](https://help.github.com/articles/fork-a-repo "GitHub help - Fork a repo")).
164 Remember to set the upstream remote to the main repo by:
165 
166  git remote add upstream git://github.com/shogun-toolbox/shogun.git
167 
168 Its recommended to create local branches, which are linked to branches from
169 your remote repository. This will make "push" and "pull" work as expected:
170 
171  git checkout --track origin/master
172  git checkout --track origin/develop
173 
174 Each time you want to develop new feature / fix a bug / etc consider creating
175 new branch using:
176 
177  git checkout -b new_feature_name
178 
179 While being on new_feature_name branch, develop your code, commit things and do
180 everything you want.
181 
182 Once your feature is ready (please consider larger commits that keep shogun in
183 compileable state), rebase your new_feature_name branch on upstream/develop
184 with:
185 
186  git fetch upstream
187  git checkout develop
188  git rebase upstream/develop
189  git checkout new_feature_name
190  git rebase develop
191 
192 Now you can push it to your origin repository:
193 
194  git push
195 
196 And finally send a pull request (PR) to the develop branch of the shogun
197 repository in github.
198 
199 
200 - Why rebasing?
201 
202  What rebasing does is, in short, "Forward-port local commits to the updated
203  upstream head". A longer and more detailed illustration with nice figures
204  can be found at http://book.git-scm.com/4_rebasing.html. So rebasing (instead
205  of merging) makes the main "commit-thread" of the repo a simple series.
206 
207  Rebasing before issuing a pull request also enable us to find and fix any
208  potential conflicts early at the developer side (instead of at the one who
209  merges your pull request).
210 
211 - Multiple pull requests
212 
213  You can have multiple pull requests by creating multiple branches. Github
214  only tracks the branch names you used for identify the pull request. So when
215  you push new commits to your remote branch at github, the pull request will
216  "update" accordingly.
217 
218 - Non-fast-forward error
219 
220  This error happens when:
221 
222  1. `git checkout -b my-branch`
223  2. ... do something ...
224  3. ... rebasing ...
225  4. `git push origin my-branch`
226  5. ... do more thing ...
227  6. ... rebasing ...
228  7. `git push origin my-branch`
229 
230  then git will complain about non-fast-forward error and not pushing into the remote
231  my-branch branch. This is because the first push has already created the my-branch
232  branch in origin. Later when you run rebasing, which is a destructive operation for
233  the local history. Since the local history is no longer the same as those in the remote
234  branch, pushing is not allowed.
235 
236  Solution for this situation is to delete your remote branch by
237 
238  git push origin :my-branch
239 
240  and push again by
241 
242  git push origin my-branch
243 
244  note deleting your remote branch will not delete your pull request associated with that
245  branch. And as long as you push your branch there again, your pull request will be OK.
246 
247 - Unit testing/Pre-commit hook
248  As shogun-toolbox is getting bigger and bigger code-reviews of pull requests are getting
249  harder and harder. In order to avoid breaking the functionality of the existing code, we
250  highly encourage contributors of shogun to use the supplied unit testing, that is based
251  on Google C++ Mock Framework.
252 
253  In order to be able to use the unit testing framework one will need to have
254  Google C++ Mock Framework installed on your machine. The gmock version is
255  1.7.0 and the gtest version is 1.6.0 (or it will have some errors).
256 
257  - [Google Mock](https://code.google.com/p/googlemock/)
258  - [Google Test](https://code.google.com/p/googletest/)
259 
260  Then use cmake/ccmake with the ENABLE_TESTING switching on.
261 
262  For example:
263 
264  cmake -DENABLE_TESTING=on ..
265 
266  Once it's detected if you add new classes to the code please define some basic
267  unit tests for them under ./tests/unit (see some of the examples under that directory).
268  As one can see the naming convention for files that contains the unit tests are:
269  <classname>_unittest.cc
270 
271  Before committing or sending a pull request please run 'make unit-tests' under root
272  directory in order to check that nothing has been broken by the modifications and
273  the library is still acting as it's intended.
274 
275  One possible way to do this automatically is to add into your pre-commit hook the
276  following code snippet (.git/hook/pre-commit):
277 
278  #!/bin/sh
279 
280  # run unit testing for basic checks
281  # and only let commiting if the unit testing runs successfully
282  make unit-tests
283 
284  This way before each commit the unit testing will run automatically and if it
285  fails it won't let you commit until you don't fix the problem (or remove the
286  pre-commit script :P
287 
288  Note that the script should be executable, i.e.
289 
290  chmod +x .git/hook/pre-commit
291 
292  You can also test all the examples in shogun/exapmles to check whether your configuration and environment is totally okay. Please note that some of the examples are dependent on data sets, which should be downloaded beforehand, and so that you can pass all the tests of those examples. Downloading data can be easily done by calling a git command (please refer to [README_data.md](https://github.com/shogun-toolbox/shogun/blob/develop/doc/md/README_data.md)). Afterwards, you can test the examples by:
293 
294  make test
295 
296 To make a release, adjust the [NEWS](NEWS) file properly, i.e. date, release version (like 3.0.0), adjust the soname if required (cf. [README_soname](README_soname.md)) and if a new data version is required add that too. If parameters have been seen changes increase the parameter version too.
virtual float64_t apply_one(int32_t num)
void init_shogun(void(*print_message)(FILE *target, const char *str), void(*print_warning)(FILE *target, const char *str), void(*print_error)(FILE *target, const char *str), void(*cancel_computations)(bool &delayed, bool &immediately))
Definition: init.cpp:54
#define SG_INFO(...)
Definition: SGIO.h:118
void set_feature_matrix(SGMatrix< ST > matrix)
LibSVM.
Definition: LibSVM.h:30
#define SG_ERROR(...)
Definition: SGIO.h:129
void exit_shogun()
Definition: init.cpp:100
void split(v_array< ds_node< P > > &point_set, v_array< ds_node< P > > &far_set, int max_scale)
Definition: JLCoverTree.h:149
A generic KernelMachine interface.
Definition: KernelMachine.h:51
#define SG_REF(x)
Definition: SGObject.h:51
bool set_label(int32_t idx, float64_t label)
void push(v_array< T > &v, const T &new_ele)
#define SG_PRINT(...)
Definition: SGIO.h:137
#define SG_SPRINT(...)
Definition: SGIO.h:180
Class SGObject is the base class of all shogun objects.
Definition: SGObject.h:112
void add(Matrix A, Matrix B, Matrix C, typename Matrix::Scalar alpha=1.0, typename Matrix::Scalar beta=1.0)
Definition: Core.h:58
The well known Gaussian kernel (swiss army knife for SVMs) computed on CDotFeatures.
#define SG_UNREF(x)
Definition: SGObject.h:52
#define SG_DEBUG(...)
Definition: SGIO.h:107
virtual bool train(CFeatures *data=NULL)
Definition: Machine.cpp:47
A generic Support Vector Machine Interface.
Definition: SVM.h:49
static float base
Definition: JLCoverTree.h:84
Binary Labels for binary classification.
Definition: BinaryLabels.h:37
virtual bool init(CFeatures *l, CFeatures *r)

SHOGUN Machine Learning Toolbox - Documentation