Archives

Anticipation

  • No dates present

Multi-thread auto_ptr

So, last time I talked a bit about lock-free data structures. Here’s the second part, as promised.

I’m not going to go into great detail on the topic (unless people seem interested), and there’s plenty of other pages and sites dedicated to it. But I just wanted to share this little class I made. I doubt it’s original, but it is something I came up with on my own. (If that makes sense.)

This is a completely threadsafe and lock-free (though not atomic-free) way to pass an object from one thread to another. If you have many such objects, then a queue is typically the better way to go, but sometimes that’s a little too heavyweight for your needs. Essentially, this behaves like a queue or stack that can only hold a single object at a time — adding a second object will result in throwing away the first, but in a completely memory-safe manner. No memory will be leaked, no matter how many threads are adding or removing elements.

Yes, that’s right, it’s perfectly safe for multiple producers and consumers in a lock-free manner without any SMR techniques whatsoever. Having said that, it’s most suited to a single producer and single consumer.

A typical usage scenario is where you have one “engine” thread that’s doing all the work and one “config” thread that’s receiving configuration from some external source.

Imagine a network router — when it starts up it loads a network configuration table, and then routes packets according to that table. But someone can log into the router via a web GUI or telnet console or similar and issue commands to change the routing table. It’s likely that the thread doing the routing will be separate from the thread processing the GUI/console interface. And yet the latter must somehow convey the changes to the routing table to the former, otherwise those changes will never actually take effect. You obviously can’t update the routing table being used by the router thread in-place — if you try, it could see some strange in-between state and possibly go follow an invalid pointer and crash. You could use a lock to ensure that the routing thread can’t access the table while you’re updating it, but this means that for however long it takes to do that, no packets can get routed through at all, which is also undesirable.

A simple solution to this is to have the GUI thread build a complete copy of the new routing table, and then use an atomic pointer exchange to swap out the “live” table with the new version. But this still presents a problem — when it does this, the routing thread might still be in the middle of accessing the old table. Now, there’s nothing directly wrong with this — the old table is still internally consistent, and next time through it should pick up the new one. The trouble is: who is responsible for cleaning up the memory used by these tables? The GUI thread can’t delete the old table when it swaps it out, because the routing thread might still be using it.

Things get worse if there is more than one thread producing new tables (eg. a web GUI thread and a separate console UI thread). If one thread swaps out the “real” table with one version, and then a second thread swaps that out for another one, then it gets really messy to track which thread needs to be in charge of deleting which table, and when it’s safe to do so. (Of course, it’s also messy to ensure that changes made by one thread don’t get lost when the table made by the other thread gets used — but we won’t worry about that for now. One way to solve this would be to use an actual queue and only send through the changes rather than the complete table, but for now we’ll just assume that it’s ok if only one thread “wins” and the other changes are lost.)

There are other ways to handle the memory tracking, too. One is with some kind of SMR or GC system, or a reference-counting system like shared_ptr. These use internal counters to keep track of which threads are still using the objects, and thus when it’s safe to delete them. But again, sometimes these just seem a bit heavy-handed — SMR or GC often require fairly complex setup, and sometimes will internally use locks or suspend threads in order to find unreferenced objects. Meanwhile shared_ptr requires an atomic operation whenever it goes into or out of scope, and by default uses a spinlock and another atomic operation to transfer one between threads.

Sometimes it’s worth paying those costs to get the extra flexibility that they afford. But other times it’s too much, or just unnecessary. And I think that’s enough waffling about it; time for some code:

template<typename T>
class multithread_auto_ptr
{
public:
    typedef T value_type;
    typedef value_type* pointer;
    typedef std::auto_ptr<value_type> auto_pointer;
 
    multithread_auto_ptr() : m_item(NULL) {}
    multithread_auto_ptr(auto_pointer p) : m_item(p.release()) {}
    ~multithread_auto_ptr() { dequeue(); }
 
    void swap(multithread_auto_ptr& o)
    {
        enqueue(o.swap(dequeue()));
    }
 
    auto_pointer swap(auto_pointer p) // takes and releases ownership
    {
        return auto_pointer(m_item.exchange(p.release(), std::memory_order_acq_rel));
    }
 
    void enqueue(auto_pointer p)      // takes ownership
    {
        swap(p);
    }
 
    auto_pointer dequeue()            // releases ownership
    {
        return swap(auto_pointer());
    }
 
    /*
     * Copying these objects is not intended behaviour, but it's
     * not fatal provided that the contained object is also copyable.
     * The methods below permit these to be copied; it will replicate
     * the object across both copies (with some potential for races).
     */
    multithread_auto_ptr(const multithread_auto_ptr& o) : m_item(NULL)
    {
        multithread_auto_ptr& mo = const_cast<multithread_auto_ptr&>(o);
        auto_pointer item = mo.dequeue();
        if (item.get())
        {
            enqueue(auto_pointer(new value_type(*item)));
            mo.enqueue(item);
        }
    }
    multithread_auto_ptr& operator=(multithread_auto_ptr o) // intentionally copied
    {
        o.swap(*this);
        return *this;
    }
 
private:
    std::atomic<pointer> m_item;
};
 
template<typename T>
void swap(multithread_auto_ptr<T>& s1, multithread_auto_ptr<T>& s2)
{
    s1.swap(s2);
}

(As always, you’re free to use this code in your own applications or blog posts, as long as you include a link to this page.)

Note that construction and destruction are not themselves inherently threadsafe (which is typical and hopefully obvious), so you should always ensure that this object lives longer than the threads using it, or that you can otherwise guarantee that once you start to destroy it, no thread will try to access it in any way.

Unlike a standard auto_ptr, this pointer is copy-safe (if the contained object is), although that’s not the intended usage. At the moment that it is copied, it will act as a consumer, dequeue itself, copy the resulting object (if any), and then enqueue back to itself and the new queue. Though obviously unless you know what other threads are doing at the time you won’t be able to guarantee whether some other consumer hasn’t removed the item before copying, or some other producer replaces the item (on only one of the queues) after copying. And of course this requires a memory allocation, which might defeat the point of using a lock-free structure in the first place, since most allocators are lock-based.

Usage is pretty simple. Just declare an instance of this globally (or somewhere else accessible to both threads, with a longer lifetime than both); you can then allocate objects and enqueue them from one thread, and then dequeue them on the other. This uses auto_ptr as its basic parameter and return value, in order to emphasise that this queue takes ownership of the objects when enqueued, and relinquishes ownership when dequeued. (And they’re passed by value to emphasise this even further — the only time I’ll ever pass something like this by value.)

Going back to our router example, the GUI thread would create a new copy of the routing table, enqueue it, and forget about it (if it’s held within an auto_ptr in the GUI thread, then enqueueing it will also release the pointer so it doesn’t get double-freed). Meanwhile the routing thread is still doing its thing, somewhere in the middle of processing a packet and in the middle of the old routing table. Once it finishes that, it goes back and attempts to dequeue an updated queue — it succeeds, and can then delete its old table and replace it with the new one, at a time when it knows that it’s safe to do so. It can then process the next packet without interruption, and then try dequeuing again. Assuming that another table hasn’t been queued in the meantime, the new dequeue will fail (by returning an empty pointer), and it can just keep using its current table. Or, to put this example in more of a pseudocode form:

multithread_auto_ptr<RoutingTable> PendingRoutingTable;
 
void RoutingThread()
{
    std::auto_ptr<RoutingTable> activeRoutingTable = CreateRoutingTable();
    while (!ShouldShutDown())
    {
        std::auto_ptr<Packet> packet = ReceivePacket();
        std::auto_ptr<RoutingTable> newTable = PendingRoutingTable.dequeue();
        if (newTable.get())
        {
            activeRoutingTable = newTable;
        }
        RoutePacket(*packet, *activeRoutingTable);
    }
}
 
void GuiThread_UpdateRoutingTable(const RoutingTableChanges& changelist)
{
    std::auto_ptr<RoutingTable> newTable = CreateUpdatedTable(changelist);
    PendingRoutingTable.enqueue(newTable);
}

Note that the auto_ptr takes care of the memory management and the multithread_auto_ptr takes care of the cross-thread semantics, so the code is all very straightforward.

The implementation makes use of the new C++11 std::atomic class. If you’re using a compiler that doesn’t support this yet, you’ll either need to find a compatible library implementation (eg. libcds), or some compatible atomic pointer exchange function, such as InterlockedExchangePointer.

Of course, if you are using a C++11 compiler, you might notice that auto_ptr is deprecated now. It’s still safe to use in this context, and I didn’t particularly feel like rewriting this to use unique_ptr and std::move at the moment. I’ll probably post a followup to this sometime once I’ve had more of a play with move semantics myself.

In final wrapup: this is very efficient, with only one atomic per operation (enqueue or dequeue), and is completely threadsafe for any number of threads on either side (although if you do use multiple producers you can’t guarantee which will “win” and get their value through to the consumer, and if you use multiple consumers you can’t guarantee which will “win” and acquire the new value — especially if you have multiple producers as well).

Leave a Reply

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>