doc/big-config.mkd


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432

# what is a "big-config"

In this document:

  * <a href="#_when_why_do_we_need_it_">when/why do we need it?</a>
  * <a href="#_how_do_we_use_it_">how do we use it?</a>
      * <a href="#_access_rules_for_groups">access rules for groups</a>
      * <a href="#_access_rules_for_individual_repos_split_config_">access rules for individual repos (split config)</a>
  * <a href="#_other_optimisations">other optimisations</a>
      * <a href="#_disabling_various_defaults">disabling various defaults</a>
      * <a href="#_optimising_the_authkeys_file">optimising the authkeys file</a>
  * <a href="#_what_are_the_downsides_">what are the downsides?</a>
  * <a href="#_storing_usergroup_information_outside_gitolite_like_in_LDAP_">storing usergroup information outside gitolite (like in LDAP)</a>
      * <a href="#_why">why</a>
      * <a href="#_how">how</a>
  * <a href="#_implementation_notes">implementation notes</a>

<a name="_when_why_do_we_need_it_"></a>

### when/why do we need it?

A "big config" is anything that has a few thousand users and a few thousand
repos, resulting in a very large 'compiled' config file.

To understand the problem, consider what happens if you have something like
this in your gitolite conf file:

    @wbr    =   lynx firefox
    @devs   =   alice bob

    repo @wbr
        RW+     next    =   @devs
        RW    master    =   @devs

Without the 'big config' setting, gitolite internally translates this to:

    repo lynx firefox
        RW+     next    =   alice bob
        RW    master    =   alice bob

and then generates the actual config rules once for each user-repo-ref
combination (there are 8 combinations above); the compiled config file looks
somewhat like this:

    %repos = (
      'firefox' => {
        'R' => {
          'alice' => 1,
          'bob' => 1
        },
        'W' => {
          'alice' => 1,
          'bob' => 1
        },
        'alice' => [
          [
            0,
            'refs/heads/next',
            'RW+'
          ],
          [
            4,
            'refs/heads/master',
            'RW'
          ]
        ],
        'bob' => [
          [
            1,
            'refs/heads/next',
            'RW+'
          ],
          [
            5,
            'refs/heads/master',
            'RW'
          ]
        ]
      },
      'lynx' => {
        'R' => {
          'alice' => 1,
          'bob' => 1
        },
        'W' => {
          'alice' => 1,
          'bob' => 1
        },
        'alice' => [
          [
            2,
            'refs/heads/next',
            'RW+'
          ],
          [
            6,
            'refs/heads/master',
            'RW'
          ]
        ],
        'bob' => [
          [
            3,
            'refs/heads/next',
            'RW+'
          ],
          [
            7,
            'refs/heads/master',
            'RW'
          ]
        ]
      }
    );

Phew!

Of course, the output is the same whether you used groups (like `@wbr` and
`@devs` in the example above) or listed the repos directly on the 'repo'
lines.

Anyway, you can imagine what that does when you have 10,000 users and 10,000
repos.  Let's just say it's not pretty :)

<a name="_how_do_we_use_it_"></a>

### how do we use it?

Just set

    $GL_BIG_CONFIG = 1;

in the `~/.gitolite.rc` file on the server (see next section for more
variables).  When you do that, and push this configuration, one of two things
happens.

<a name="_access_rules_for_groups"></a>

#### access rules for groups

If you used group names in the 'repo' lines (as in `repo @wbr`), then the
compiled config looks like this:

    %repos = (
      '@wbr' => {
        '@devs' => [
          [
            0,
            'refs/heads/next',
            'RW+'
          ],
          [
            1,
            'refs/heads/master',
            'RW'
          ]
        ],
        'R' => {
          '@devs' => 1
        },
        'W' => {
          '@devs' => 1
        }
      }
    );
    %groups = (
      '@devs' => {
        'alice' => 'master',
        'bob' => 'master'
      },
      '@wbr' => {
        'firefox' => 'master',
        'lynx' => 'master'
      }
    );

That's a lot smaller, and allows orders of magintude more repos and groups to
be supported.

<a name="_access_rules_for_individual_repos_split_config_"></a>

#### access rules for individual repos (split config)

If, on the other hand, you had the repos listed individually, (as in `repo
lynx firefox`), then the main config file would now look like this:

    %repos = ();
    %split_conf = (
      'firefox' => 1,
      'lynx' => 1
    );

And each individual repo's configuration would go its own directory.  For
instance, `~/repositories/lynx.git/gl-conf` would look like this:

    %one_repo = (
      'lynx' => {
        'R' => {
          'alice' => 1,
          'bob' => 1
        },
        'W' => {
          'alice' => 1,
          'bob' => 1
        },
        'alice' => [
          [
            0,
            'refs/heads/next',
            'RW+'
          ],
          [
            4,
            'refs/heads/master',
            'RW'
          ]
        ],
        'bob' => [
          [
            1,
            'refs/heads/next',
            'RW+'
          ],
          [
            5,
            'refs/heads/master',
            'RW'
          ]
        ]
      }
    );

That does not reduce the overall size of the repo config (because you did not
group the repos), but the main repo config is now even smaller!

<a name="_other_optimisations"></a>

### other optimisations

<a name="_disabling_various_defaults"></a>

#### disabling various defaults

The default RC file contains the following lines (we've already discussed the
first one):

    $GL_BIG_CONFIG = 0;
    $GL_NO_DAEMON_NO_GITWEB = 0;
    $GL_NO_CREATE_REPOS = 0;
    $GL_NO_SETUP_AUTHKEYS = 0;

`GL_NO_DAEMON_NO_GITWEB` is a very useful optimisation that you *must* enable
if you *do* have a large number of repositories, and do *not* use gitolite's
support for gitweb or git-daemon access (see "[easier to specify gitweb
description and gitweb/daemon access][gwd]" for details).  This will save a
lot of time when you push the gitolite-admin repo with changes.  This variable
also controls whether "git config" lines (such as `config hooks.emailprefix =
"[gitolite]"`) will be processed or not.

You should be a lot more careful with `GL_NO_CREATE_REPOS` and
`GL_NO_SETUP_AUTHKEYS`.  These are meant for installations where some backend
system already exists that does all the actual repo creation, (including
setting up the proper hooks -- very important for access control), and all the
authentication setup (ssh auth keys), respectively.

Summary: Please **leave those two variables alone** unless you're initials are
"JK" ;-)

<a name="_optimising_the_authkeys_file"></a>

#### optimising the authkeys file

Sshd does a linear scan of the `~/.ssh/authorized_keys` file when an incoming
connection shows up.  This means that keys found near the top get served
faster than keys near the bottom.  On my laptop, it takes about 2500 keys
before I notice the delay; on a typical server it could be double that, so
don't worry about all this unless your user-count is in that range.

One way to deal with 5000+ keys is to use customised, database-backed ssh
daemons, but many people are uncomfortable with taking non-standard versions
of such a critical piece of the security infrastructure.  In addition, most
distributions do not make it painless to use them.

So what do you do?

The following trick uses the Pareto principle (a.k.a the "80-20 rule")
to get an immediate boost in response for the most frequent or prolific
developers.  It can allow you to ignore the problem until the next big
increase in your user counts!

Here's how:

  * create subdirectories of keydir/ called 0, 1, (maybe 2, 3, etc., also),
    and 9.
  * in 0/, put in the pubkeys of the most frequent users
  * in 1/, add the next most important set of users, and so on for 2, 3, etc.
  * finally, put all the rest in 9/

Make sure "9" contains at least 70-90% of the total number of pubkeys,
otherwise this doesn't really help.

You can easily determine who your top users are by runnning something like
this (note the clever date command that always gets you last months log file!)

    cat .gitolite/logs/gitolite-`date +%Y-%m -d -30days`.log |
        cut -f2 | sort | uniq -c | sort -n -r

<a name="_what_are_the_downsides_"></a>

### what are the downsides?

There are some downsides.  The first one applies in all cases:

  * If you use the delegation feature, you can no longer define or extend
    @groups in a fragment, for security reasons.  It will also not let you use
    any group other than the @fragname itself (specifically, groups which
    contained a subset of the allowed @fragname, which would work normally, do
    not work now).

    (If you didn't understand all that, you're probably not using delegation,
    so feel free to ignore it!)

The following apply if individual ("split") conf files are written, which in
turn only happens if you used repo names instead of group names on the `repo`
lines:

  * the compile (gitolite-admin push) is now slower, because it potentially
    has to write a few thousand small files instead of one large one.  Since
    the compile should be relatively infrequent compared to developer access,
    this is ok -- the main config file is parsed much faster now, so every hit
    to the server will benefit.

  * we can no longer distinguish 'repo not found on disk' from 'you dont have
    access'.  They both now look like 'you dont have access'.

<a name="_storing_usergroup_information_outside_gitolite_like_in_LDAP_"></a>

### storing usergroup information outside gitolite (like in LDAP)

[Please NOTE: this is all about *user* groups, not *repo* groups]

[WARNING: the earlier method of doing this has been discontinued; please see
the commit message for details]

Gitolite now allows usergroup information to be stored outside its own config
file.  We'll see "why" first, then the "how".

<a name="_why"></a>

#### why

Large sites often have LDAP servers that already contain user and group
information, including group membership details.  Such sites may prefer that
gitolite just pick up that info instead of having to redundantly put it in
gitolite's config file.

Consider this example config for one repo:

    repo foo
        RW+ =   @lead_devs
        RW  =   @devs
        R   =   @interns

Normally, you would also need to specify:

    @lead_devs  =   dilbert alice
    @devs       =   wally
    @interns    =   ashok

However, if the corporate LDAP server already tags these people correctly, and
if there is some way of getting that information out **at run time**, that
would be cool.

<a name="_how"></a>

#### how

All you need is a script that, given a username, queries your LDAP or similar
server, and returns a space-separated list of all the groups she is a member
of.  If an invalid user name is sent in, or the user is valid but is not part
of any groups, it should print nothing.

This script will probably be specific to your site.  [**Help wanted**: I don't
know LDAP, so if someone wants to contribute some sample code I'd be happy to
put it in contrib/, with credit of course!]

Then set the `$GL_GET_MEMBERSHIPS_PGM` variable in the rc file to the full
path to this program, set `$GL_BIG_CONFIG` to 1, and that will be that.

[gwd]: http://github.com/sitaramc/gitolite/blob/pu/doc/3-faq-tips-etc.mkd#gwd

<a name="_implementation_notes"></a>

### implementation notes

To understand how big-config works (at least when you're using grouped repos),
we'll first look at how it works without this setting.  Think back to the
example at the top, and assume 'alice' is accessing the 'lynx' repo.  The
various rights are governed by the following hash elements:

    # for the first level checks
    $repos{'lynx'}{'R'}{'alice'} = 1
    $repos{'lynx'}{'W'}{'alice'} = 1

    # for the second level checks
    $repos{'lynx'}{'alice'}{'refs/heads/master'} = 'RW';
    $repos{'lynx'}{'alice'}{'refs/heads/next'} = 'RW+';

Those elements are explicitly specified in the compiled hash, as you can see
(you don't need to know perl too much to read a hash; just make some educated
guesses if needed!)

Now look at the compiled hash produced when `GL_BIG_CONFIG` is set.  In place
of both 'firefox' and 'lynx' you have '@wbr', and similarly '@devs' for both
'alice' and 'bob'.  In addition, there is a group hash at the bottom that
lists each group and its members.

When 'alice' tries to access the 'lynx' repo, gitolite collects all the group
names that these names belong to, so '@devs' is added to the list of 'user'
names that 'alice' inherits permissions from, and '@wbr' is added to the list
of 'repo' names that 'lynx' inherits from.  This means that the final access
inherits all permissions pertaining to the following combinations:

    alice, lynx
    alice, @wbr
    @devs, lynx
    @devs, @wbr

(Actually there are 3 more... try and guess what they may be!)

Anyway, all ACL rules for these combinations are clubbed together to make the
composite set of rules that 'alice' accessing 'lynx' is subject to.