Wednesday 28 December 2016

Iteration tests in Radix tree test suite

According to Wikipedia, unit testing is a software testing method by which individual units of source code, sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures, are tested to determine whether they are fit for use.

Iteration test in Radix tree test suite is a unit test for a bug found by the syzkaller tester.

Iteration tests can be run for radix tree with zero order or multi-order items. Essentially these launch five threads to test parallel iteration over tagged or untagged entries, adding, removing and tagging entries, for an arbitrary period of time.

Let's see what each thread is doing in more detail.

/*
 * Iterate over the tagged entries, doing a radix_tree_iter_retry() as we find
 * things that have been removed and randomly resetting our iteration to the
 * next chunk with radix_tree_iter_resume().  Both radix_tree_iter_retry() and
 * radix_tree_iter_resume() cause radix_tree_next_slot() to be called with a
 * NULL 'slot' variable.
 */
static void *tagged_iteration_fn(void *arg)
{
 struct radix_tree_iter iter;
 void **slot;

 rcu_register_thread();

 while (!test_complete) {
  rcu_read_lock();
  radix_tree_for_each_tagged(slot, &tree, &iter, 0, TAG) {
   void *entry = radix_tree_deref_slot(slot);
   if (unlikely(!entry))
    continue;

   if (radix_tree_deref_retry(entry)) {
    slot = radix_tree_iter_retry(&iter);
    continue;
   }

   if (rand_r(&seeds[0]) % 50 == 0) {
    slot = radix_tree_iter_resume(slot, &iter);
    rcu_read_unlock();
    rcu_barrier();
    rcu_read_lock();
   }
  }
  rcu_read_unlock();
 }

 rcu_unregister_thread();

 return NULL;
}

Iterating the tree can be done through radix_tree_iter object. This radix tree iterator works in terms of "chunks" of slots. A chunk is a sub-interval of slots contained within one radix tree leaf node. Thus we need to track the 'index' of the current slot and 'next_index' (one beyond the last index for this chunk) to track the size  of the chunk. tags is a bitmask corresponding to each slot in the chunk for a particular tag. *node denotes the node containing this slot and shift is that node's property that holds the bits remaining for each slot in that node.

struct radix_tree_iter {
        unsigned long   index;
        unsigned long   next_index;
        unsigned long   tags;
        struct radix_tree_node *node;
#ifdef CONFIG_RADIX_TREE_MULTIORDER
        unsigned int    shift;
#endif
};

#define radix_tree_for_each_tagged(slot, root, iter, start, tag)        \
        for (slot = radix_tree_iter_init(iter, start) ;                 \
             slot || (slot = radix_tree_next_chunk(root, iter,          \
                              RADIX_TREE_ITER_TAGGED | tag)) ;          \
             slot = radix_tree_next_slot(slot, iter,                    \
                                RADIX_TREE_ITER_TAGGED | tag))

This traverses the tree starting from the slot with key start, chunk by chunk and slot by slot within a chunk. The RADIX_TREE_ITER_TAGGED mask in the tag  tells the radix_tree_next_chunk() function that we are interested in the lookup of tagged slots and the tag argument is the tag we are looking for.

Once we look-up a slot we need to de-reference it to get the entry/item that is stored in the slot. Before proceeding  we need to check if another thread has moved the item stored in the slot to another location (using radix_tree_deref_retry()), in which case we need to retry the look-up.

radix_tree_iter_retry(&iter) works by updating the iter->next_index to iter->index, iter->tags to 0 and slot to NULL, so that the subsequent call to radix_tree_next_slot() returns NULL and the subsequent call to radix_tree_next_chunk() returns the first slot of the chunk associated with iter, which was the slot for which we needed to repeat the look-up.

Now we can simply continue the iteration in this manner.

void **radix_tree_iter_resume(void **slot, struct radix_tree_iter *iter)
{
        struct radix_tree_node *node;

        slot++;
        iter->index = __radix_tree_iter_add(iter, 1);
        node = rcu_dereference_raw(*slot);
        skip_siblings(&node, slot, iter);
        iter->next_index = iter->index;
        iter->tags = 0; 
        return NULL;
}

Once in every fifty times, we are done with the iteration and we want to resume it at a later point of time. So we update the iter and slot so that when we have the read lock again, we can resume iteration from the desired slot, with that slot indicating the start of our next chunk.

untagged_iteration_fn() does the same thing, with the only difference being that now we are not iterating over entries with a particular tag but all slots which are non-empty, via :

#define radix_tree_for_each_slot(slot, root, iter, start)               \
        for (slot = radix_tree_iter_init(iter, start) ;                 \
             slot || (slot = radix_tree_next_chunk(root, iter, 0)) ;    \
             slot = radix_tree_next_slot(slot, iter, 0))

/* relentlessly fill the tree with tagged entries */
static void *add_entries_fn(void *arg)
{
        rcu_register_thread();

        while (!test_complete) {
                unsigned long pgoff;
                int order;

                for (pgoff = 0; pgoff < MAX_IDX; pgoff++) {
                        pthread_mutex_lock(&tree_lock);
                        for (order = max_order; order >= 0; order--) {
                                if (item_insert_order(&tree, pgoff, order)
                                                == 0) {
                                        item_tag_set(&tree, pgoff, TAG);
                                        break;
                                }
                        }
                        pthread_mutex_unlock(&tree_lock);
                }
        }

        rcu_unregister_thread();

        return NULL;
}

This keeps inserting entries with indices/keys 0-MAX_IDX and tagging them with the given TAG. For multi order case we pass in the order as the maximum possible and keep reducing it until we get a successful insertion. item_insert_order() returns 0 for a successful insertion and non-zero for error cases such as not being able to create a required new node, or not being able to extend the tree in terms of its shift, due to failed memory allocation (ENOMEM), or if an item already exists at that index (EEXIST).

/*
 * Randomly remove entries to help induce radix_tree_iter_retry() calls in the
 * two iteration functions.
 */
static void *remove_entries_fn(void *arg)
{
        rcu_register_thread();

        while (!test_complete) {
                int pgoff;

                pgoff = rand_r(&seeds[2]) % MAX_IDX;

                pthread_mutex_lock(&tree_lock);
                item_delete(&tree, pgoff);
                pthread_mutex_unlock(&tree_lock);
        }

        rcu_unregister_thread();

        return NULL;
}

remove_entries_fn() works by randomly selecting a key to delete and item_delete() works by freeing the item at passed index and updating the tags in the tree, or returns NULL if the item was not present.

static void *tag_entries_fn(void *arg)
{
        rcu_register_thread();

        while (!test_complete) {
                tag_tagged_items(&tree, &tree_lock, 0, MAX_IDX, 10, TAG,
                                        NEW_TAG);
        }
        rcu_unregister_thread();
        return NULL;
}

We want to update items between indices 0 to MAX_IDX which are tagged with TAG, to NEW_TAG in batches of 10.

To conclude, when all these functions are happening in parallel, we want to test if there are races between them.

No comments:

Post a Comment