My 2016 GSoC Project

So, if you read my previous posts, you know that, up till now, we have a Jupyter kernel that opens its needed ZeroMQ sockets on arbitrary ports and does pretty much nothing besides that.

If you are a little versed in socket programming, or even just Internet Protocol (IP) things, you know that this won’t work: we can’t just start listening (bind) to a port on a server and not tell the client which port it should connect to, or vice versa.

So how do we (or Jupyter, in that case) keep everyone in synchrony, talking on the same communication channels ?

Remember that in the end of the last post I mentioned a configuration file being read ? That’s the connection file generated by provided to both client (frontend) and server (kernel) when they are started, containing the necessary communication info in JSON format.

The default file provided by Jupyter (here located in /run/user/1000/jupyter/kernel-9617.json) looks like this:

{
  "kernel_name": "scilab",
  "transport": "tcp",
  "hb_port": 36040,
  "signature_scheme": "hmac-sha256",
  "control_port": 57994,
  "key": "0abfec75-90fd-4a0e-ad2e-f8125118b3a9",
  "ip": "127.0.0.1",
  "shell_port": 44010,
  "stdin_port": 58295,
  "iopub_port": 36388
}

You can see it contains the IP host string and the port numbers used by the kernel sockets. Transport protocol (TCP) and other relevant (well, not to me for now, lol) info are also provided. The fact that the kernel name is there too makes me thing that the file is generated each time we start the client console with the –kernel parameter, but that’s just a guess.

As we started talking about JSON, let me say that its usage is widespread in Jupyter. Which is subjectively good, since I like it (looks nicer than XML, at least).

I didn’t talk about it in details before, but to make a new kernel available for Jupyter clients, its executable/script should be located at one of the default paths (or some defined in the JUPYTER_PATH env variable), inside its own directory, accompanied by a kernel.json configuration file.

Mine looks like this:

{
 "argv": ["/home/leonardojc/.local/share/jupyter/kernels/Scilab/ScilabKernel", "{connection_file}"],
 "display_name": "Scilab Native",
 "language": "scilab"
}

The first line represent the command-line arguments passed to the client to use our kernel: the first one is the executable path (guess it shoul be relative, but I didn’t get how do it so far), the second, “{connection_file}”, is replaced by the shared connection file name.

After that, we can verify the available kernels list:

$ jupyter kernelspec list
Available kernels:
  python3    /usr/lib/python3.5/site-packages/ipykernel/resources
  scilab     /home/leonardojc/.local/share/jupyter/kernels/Scilab

Now, guess which format is used for the frontend-backend message passing… Would you believe it is also JSON ?

“It must be hard to read/parse/write all that, right ?”

Well, I didn’t even think about it, as I immediately start looking for an easier solution (laziness can, paradoxically, be a driving force). If even C has nice libraries for this task (like Klib or Json-C), how on earth C++ wouldn’t have it ?

Fairly quickly, I ended up finding the JsonCpp library. It is so simple that even the author recommends that you bundle its source with your application (which is fine, since both my code and his use the MIT license*).

JsonCpp is really nice: it allows you to serialize a JSON object/dictionary to a string or deserialize a JSON string to a dictionary in a single call (actually, a overloaded operator). I know that there is no magic in computing, but abstraction still amazes me sometimes…

At long last, let’s put it all to use to set up or connections properly and, for now, just handle the kernel shutdown message, sent from the client as a multi-part JSON string:

[
  b'u-u-i-d',         # zmq identity(ies)
  b'<IDS|MSG>',       # delimiter
  b'baddad42',        # HMAC signature
  b'{header}',        # serialized header dict
  b'{parent_header}', # serialized parent header dict
  b'{metadata}',      # serialized metadata dict
  b'{content}',       # serialized content dict
  b'blob',            # extra raw data buffer(s)
  ...
]

Relevant code:

// Other includes

#include <json/json.h>  // JsonCpp functions

static volatile bool isRunning = true;

// Heartbeat thread function declaration
void HeartbeatLoopRun( zmq::context_t*, std::string );

// Message handler function declarations
void HandleIOPubMessage( zmq::socket_t& );
void HandleControlMessage( zmq::socket_t& );
void HandleStdinMessage( zmq::socket_t& );
void HandleShellMessage( zmq::socket_t& );

int main( int argc, char* argv[] ) 
{
  Json::Value connectionConfig;

  if( argc > 1 )
  {
    std::cout << "Reading config file " << argv[ 1 ] << std::endl;

    // Load connection configuration file
    std::fstream connectionFile( argv[ 1 ], std::ios_base::in );
    
    connectionFile >> connectionConfig;
    std::cout << connectionConfig << std::endl;
  }
  else
  {
    std::cout << "Connection file name not provided" << std::endl;
  }

  // The second argument of get() is the default value
  std::string connectionTransport = connectionConfig.get( "transport", "tcp" ).asString();
  std::string connectionIPHost = connectionConfig.get( "ip", "*" ).asString();
  std::string connectionKey = connectionConfig.get( "key", 
                                                    "a0436f6c-1916-498b-8eb9-e81ab9368e84" 
                                                    ).asString();
  std::string signatureScheme = connectionConfig.get( "signature_scheme", 
                                                      "hmac-sha256" ).asString();

  // Every ZeroMQ application should create its own unique context
  zmq::context_t context( 1 );

  // Creating I/O Pub socket on arbitrary port
  zmq::socket_t ioPubSocket( context, ZMQ_PUB );
  ioPubSocket.bind( connectionTransport + "://" + connectionIPHost + ":" 
                    + connectionConfig.get( "iopub_port", "0" ).asString() );

  // Creating Control socket on arbitrary port
  zmq::socket_t controlSocket( context, ZMQ_ROUTER );
  controlSocket.bind( connectionTransport + "://" + connectionIPHost + ":" 
                      + connectionConfig.get( "control_port", "0" ).asString() );

  // Creating Stdin socket on arbitrary port
  zmq::socket_t stdinSocket( context, ZMQ_ROUTER );
  stdinSocket.bind( connectionTransport + "://" + connectionIPHost + ":" 
                    + connectionConfig.get( "stdin_port", "0" ).asString() );

  // Creating Shell socket on arbitrary port
  zmq::socket_t shellSocket( context, ZMQ_ROUTER );
  shellSocket.bind( connectionTransport + "://" + connectionIPHost + ":" 
                    + connectionConfig.get( "shell_port", "0" ).asString() );

  // Run Heartbeat thread. Context is thread-safe, so we can safely pass it
  std::thread heartbeatThread( HeartbeatLoopRun, &context, connectionTransport + "://" 
                                + connectionIPHost + ":" 
                                + connectionConfig.get( "hb_port", "0" ).asString() );

  // Poller for detecting incoming Shell and Control request messages
  zmq::pollitem_t requestsPoller[ 4 ];
  // Overloaded (void*) cast operator. Returns underlying socket_t::ptr member
  requestsPoller[ 0 ].socket = (void*) ioPubSocket;
  requestsPoller[ 0 ].events = ZMQ_POLLIN;
  requestsPoller[ 1 ].socket = (void*) controlSocket;
  requestsPoller[ 1 ].events = ZMQ_POLLIN;
  requestsPoller[ 2 ].socket = (void*) stdinSocket;
  requestsPoller[ 2 ].events = ZMQ_POLLIN;
  requestsPoller[ 3 ].socket = (void*) shellSocket;
  requestsPoller[ 3 ].events = ZMQ_POLLIN;

  while( isRunning ) // Run while we do not press Ctrl^C or receive shutdown request
  {
    // Poll sockets. We use it instead of blocking read calls to detect interrupts
    zmq::poll( requestsPoller, 4, -1 );
    
    if( requestsPoller[ 0 ].revents == ZMQ_POLLIN )
    {
      // We received a I/O Pub message
      HandleIOPubMessage( ioPubSocket );
    }
    if( requestsPoller[ 1 ].revents == ZMQ_POLLIN )
    {
      // We received a Control message
      HandleControlMessage( controlSocket );
    }
    if( requestsPoller[ 2 ].revents == ZMQ_POLLIN )
    {
      // We received a Stdin message
      HandleStdinMessage( stdinSocket );
    }
    if( requestsPoller[ 3 ].revents == ZMQ_POLLIN )
    {
      // We received a Shell message
      HandleShellMessage( shellSocket );
    }
  }

  heartbeatThread.join(); // Wait for the Heartbeat thread to return

  exit( 0 );
}

// Control message handler function definition
void HandleControlMessage( zmq::socket_t& controlSocket )
{
  zmq::message_t request;
  Json::Value requestHeader;

  std::cout << "Received Control message: [" << std::endl;
  controlSocket.recv( &request );         // Uuid
  std::cout << "UUID: " << (char*) request.data() << std::endl;         
  controlSocket.recv( &request );         // Delimiter
  std::cout << "Delimiter: " << (char*) request.data() << std::endl;
  controlSocket.recv( &request );         // HMAC signature
  std::cout << "HMAC: " << (char*) request.data() << std::endl;

  controlSocket.recv( &request );         // Header
  // I don't know if this is the most efficient way to do it, but it looks cool anyways...
  std::stringstream( (char*) request.data() ) >> requestHeader;
  std::cout << "Header: " << requestHeader << std::endl;

  controlSocket.recv( &request );         // Parent Header
  std::cout << "Parent Header: " << (char*) request.data() << std::endl;
  controlSocket.recv( &request );         // Metadata
  std::cout << "Metadata: " << (char*) request.data() << std::endl;
  controlSocket.recv( &request );         // Content
  std::cout << "Content: " << (char*) request.data() << std::endl;
  std::cout << "]" << std::endl;

  if( requestHeader.get( "msg_type", "" ).asString() == "shutdown_request" )
  {
    // Handle shutdown request
    std::cout << "Shutting Scilab kernel down" << std::endl;
    isRunning = false;
  }
}

For testing it, I just ran the Jupyter QtConsole and closed it, which makes it emit the shutdown message:

$ jupyter qtconsole --debug --kernel scilab
[JupyterQtConsoleApp] Searching ['/home/leonardojc', '/home/leonardojc/.jupyter', '/usr/etc/jupyter', '/usr/local/etc/jupyter', '/etc/jupyter'] for config files
...
[JupyterQtConsoleApp] Starting kernel: ['/home/leonardojc/.local/share/jupyter/kernels/Scilab/ScilabKernel', '/run/user/1000/jupyter/kernel-12496.json']
Reading config file /run/user/1000/jupyter/kernel-12496.json
{
        "control_port" : 40408,
        "hb_port" : 39314,
        "iopub_port" : 42679,
        "ip" : "127.0.0.1",
        "kernel_name" : "scilab",
        "key" : "3a11a764-b273-4ffd-8c4a-7f6050bc6927",
        "shell_port" : 54700,
        "signature_scheme" : "hmac-sha256",
        "stdin_port" : 46033,
        "transport" : "tcp"
}
[JupyterQtConsoleApp] Connecting to: tcp://127.0.0.1:40408
[JupyterQtConsoleApp] connecting shell channel to tcp://127.0.0.1:54700
[JupyterQtConsoleApp] Connecting to: tcp://127.0.0.1:54700
[JupyterQtConsoleApp] connecting iopub channel to tcp://127.0.0.1:42679
[JupyterQtConsoleApp] Connecting to: tcp://127.0.0.1:42679
[JupyterQtConsoleApp] connecting stdin channel to tcp://127.0.0.1:46033
[JupyterQtConsoleApp] Connecting to: tcp://127.0.0.1:46033
[JupyterQtConsoleApp] connecting heartbeat channel to tcp://127.0.0.1:39314
Received Heartbeat message: pingDY
                                  Socket-Type
Received Heartbeat message: pingDY
                                  Socket-Type
Received Heartbeat message: pingDY
                                  Socket-Type
Received Heartbeat message: pingDY
                                  Socket-Type
Received Control message: [
UUID: 
Delimiter: <IDS|MSG>cket-Type
HMAC: 3a49bfc2b6408e7c2b069d91b8fe96319cde7fbee0d52953a369dec10e7a2861u
Header: {
        "date" : "2016-05-29T12:55:24.340110",
        "msg_id" : "ba94722e-fd74-48db-ba2f-ab563b5e02bc",
        "msg_type" : "shutdown_request",
        "session" : "d14c41f3-a9e1-4fb2-a0d1-574ea54c8213",
        "username" : "leonardojc",
        "version" : "5.0"
}
Parent Header: {}�
Metadata: {}�
Content: {"restart":false}e
]
Shutting Scilab kernel down

Apart from some weird formatting issues that I have to check later (thread synchronization, maybe), now you can see that we are binding to the proper ports, receiving Heartbeat messages and, at the end, reading and parsing a shutdown message correctly.

I hope you enjoyed today’s reading. See ya !!

* Always check for the license, my children !!

My 2016 GSoC Project - Part V

Everything is about JSON