Let's Share!: January 2017

Currently we have tests to check performance of tagged or normal iteration, in terms of speed or time of iteration. Let's see how these work.

static void benchmark_size(unsigned long size, unsigned long step, int order)
{
        RADIX_TREE(tree, GFP_KERNEL);
        long long normal, tagged;
        unsigned long index;

        for (index = 0 ; index < size ; index += step) {
                item_insert_order(&tree, index, order);
                radix_tree_tag_set(&tree, index, 0); 
        }

        tagged = benchmark_iter(&tree, true);
        normal = benchmark_iter(&tree, false);

        printf("Size %ld, step %6ld, order %d tagged %10lld ns, normal %10lld ns\n",
                size, step, order, tagged, normal);

        item_kill_tree(&tree);
        rcu_barrier();
}

We start with an arbitrary size of the radix tree say 2¹⁰ or 2²⁰. A step of 128 means that starting from index 0 till the max index i.e. size, every 128th key index will be used to insert a tagged item. Order denotes the number of indices covered by a particular key. So a given key covers 2^order indices around it. So if the order is non zero, the step size passed in will be 2^order times it.

So we insert an item at every index which is an integral multiple of 'step', between 0 to size, and tag it with TAG 0, then execute tagged followed by normal iteration.

static long long benchmark_iter(struct radix_tree_root *root, bool tagged)
{
        volatile unsigned long sink = 0;
        struct radix_tree_iter iter;
        struct timespec start, finish;
        long long nsec;
        int l, loops = 1;
        void **slot;

#ifdef BENCHMARK
again:
#endif
        clock_gettime(CLOCK_MONOTONIC, &start);
        for (l = 0; l < loops; l++) {
                if (tagged) {
                        radix_tree_for_each_tagged(slot, root, &iter, 0, 0)
                                sink ^= (unsigned long)slot;
                } else {
                        radix_tree_for_each_slot(slot, root, &iter, 0)
                                sink ^= (unsigned long)slot;
                }
        }
        clock_gettime(CLOCK_MONOTONIC, &finish);

        nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
               (finish.tv_nsec - start.tv_nsec);

#ifdef BENCHMARK
        if (loops == 1 && nsec * 5 < NSEC_PER_SEC) {
                loops = NSEC_PER_SEC / nsec / 4 + 1;
                goto again;
        }
#endif

        nsec /= loops;
        return nsec;
}

By default these tests are executed with RADIX_TREE_MAP_SHIFT value of 3, but if we want to get performance comparable to in kernel performance, we can compile using BENCHMARK=1 which sets RADIX_TREE_MAP_SHIFT to 6.

We track the time elapsed for the iteration in nanoseconds in variable nsec. If RADIX_TREE_MAP_SHIFT is 6, i.e. benchmark is 1, we may get nsec too small (less than 0.2th fraction of NSEC_PER_SEC), in which case we want to measure time elapsed for multiple iterations and take the average. The number of iterations is decided by how small nsec initially is.

We start by creating two void pointers ptr0 with value 0x4 and ptr with value 0x8.

void *ptr0 = (void *)4ul;
void *ptr = (void *)8ul;

Next we insert the entry ptr0 (0x4) into the tree and tag it with TAG 0.

radix_tree_insert(&root, 0, ptr0);
radix_tree_tag_set(&root, 0, 0);

Now we begin the tagged iteration. We get the first tagged slot 0x4 at index 0, then insert a new item 0x8 at index 1, so as to trigger radix_tree_deref_retry, as slot at index 0 is moved after the insertion.

radix_tree_for_each_tagged(slot, &root, &iter, 0, 0) {
        printv(2, "tagged %ld %p\n", iter.index, *slot);
        if (first) {
                radix_tree_insert(&root, 1, ptr);
                radix_tree_tag_set(&root, 1, 0);
                first = false;
        }
        if (radix_tree_deref_retry(*slot)) {
                printv(2, "retry at %ld\n", iter.index);
                slot = radix_tree_iter_retry(&iter);
                continue;
        }
}

radix_tree_iter_retry(&iter) works by updating the iter->next_index to iter->index, iter->tags to 0 and slot to NULL, so that the subsequent call to radix_tree_next_slot() returns NULL and the subsequent call to radix_tree_next_chunk() returns the first slot of the chunk associated with iter, which was the slot for which we needed to repeat the look-up.

static __always_inline void **
radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags)
{
        if (flags & RADIX_TREE_ITER_TAGGED) {
                iter->tags >>= 1;
                if (unlikely(!iter->tags))
                        return NULL;
...

However, if care is not taken we can get a segfault if we try to de-reference a NULL slot. This is checked with the if (unlikely(!iter->tags)).

Similar is the case with iteration for all non-empty slots via the radix_tree_for_each_slot(). We start with the tree containing only one item (0x4) at index 0. After looking it up we insert another item 0x8 at index 1, so as to trigger radix_tree_deref_retry, as slot at index 0 is moved after the insertion.

radix_tree_for_each_slot(slot, &root, &iter, 0) {
        printv(2, "slot %ld %p\n", iter.index, *slot);
        if (first) {
                radix_tree_insert(&root, 1, ptr);
                first = false;
        }
        if (radix_tree_deref_retry(*slot)) {
                printv(2, "retry at %ld\n", iter.index);
                slot = radix_tree_iter_retry(&iter);
                continue;
        }
}

In this case segfault is avoided by proceeding with de-referencing the slot only if radix_tree_chunk_size(iter) returns a positive value.

static __always_inline long
radix_tree_chunk_size(struct radix_tree_iter *iter)
{
        return (iter->next_index - iter->index) >> iter_shift(iter);
}

radix_tree_iter_retry(&iter) works by updating the iter->next_index to iter->index so count is returned as 0 (for 0 order slots).

Similar is the case for iteration over all contiguous slots, until the first empty slot, via the radix_tree_for_each_contig(). We start with the tree containing only one item (0x4) at index 0. After looking it up we insert another item 0x8 at index 1, so as to trigger radix_tree_deref_retry, as slot at index 0 is moved after the insertion.

radix_tree_for_each_contig(slot, &root, &iter, 0) {
        printv(2, "contig %ld %p\n", iter.index, *slot);
        if (first) {
                radix_tree_insert(&root, 1, ptr);
                first = false;
        }
        if (radix_tree_deref_retry(*slot)) {
                printv(2, "retry at %ld\n", iter.index);
                slot = radix_tree_iter_retry(&iter);
                continue;
        }
}

In this case segfault is avoided by the following type of check in radix_tree_next_slot() and radix_tree_next_chunk().

if (flags & RADIX_TREE_ITER_CONTIG) {
        /* forbid switching to the next chunk */
        iter->next_index = 0;
        break;
}
...
return NULL;

The next three tests check the same behavior of radix_tree_next_slot() to prevent a segfault, the difference here being that the slot is made NULL by radix_tree_iter_resume().

Let's Share!

Friday 20 January 2017

Performance benchmarks in radix tree test suite

Sunday 1 January 2017

Radix Tree Test Suite Regression Test 3