Monday, May 16, 2016

Custom commands during MySQL Sandbox installation

MySQL Sandbox 3.1.07 adds several options to execute shell or SQL commands during the sandbox installation.

Mysql sandbox states and hooks

Figure 1: MySQL Sandbox states and where you can run the hooks

There are 3 options to run shell commands, 2 to run SQL queries, and 2 to run SQL files.

## Shell commands
--pre_start_exec=command  : runs 'command' after the installation, before the server starts
--pre_grants_exec=command : runs 'command' after the server starts, before loading the grants.
--post_grants_exec=command : runs 'command' after the loading the grants.

## SQL statements
--pre_grants_sql=query : runs 'query' before loading the grants.
--pre_grants_file=filename : runs SQL file 'filename' before loading the grants.
--post_grants_sql=query : runs 'query' after the loading the grants.
--post_grants_file=filename : runs SQL file 'filename' before loading the grants.

Moreover, a new option --load_plugin=plugin[:plugin_file_name] loads a given plugin before the grants file is loaded.

What can you use this for?

The need arose when I needed general log activated at start-up, but did not want the option to stay in the configuration file. Then I also found that running a SQL command to install a plugin and check its status would be nice to have. When enough cases piled up, I implemented a few changes that allow MySQL Sandbox to accept commands at specific stages of the deployment. Figure 1 shows where these hooks are available.

Some use cases for SQL:

  • activate general log;
  • install plugins (although there is a specialized option for that)
  • enable superreadonly : this will only work after loading grants. If enabled before, the user creation commands will fail.
  • show the status of database variables, status, and objects.

Some use case for shell commands:

  • See the composition of the sandbox at various stages;
  • Run diagnostic tools, backups, synchronization tasks;
  • Perform dangerous tasks to test the server responsiveness;
  • General purpose hacks.

When the shell command runs, MySQL Sandbox provides several environment variables that facilitate interaction with the database that has been installed.

SANDBOX_DIR   =  sandbox directory;
BASEDIR       =  base directory for the sandbox binaries
DB_DATADIR    =  data directory
MY_CNF        =  configuration file
DB_PORT       =  database port
DB_USER       =  database user
DB_PASSWORD   =  database password
DB_SOCKET     =  database socket
MYSQL_VERSION =  MySQL version (e.g. 5.7.12)
MYSQL_MAJOR   =  Major part of the version (e.g 5)
MYSQL_MINOR   =  Minor part of the version (e.g 7)
MYSQL_REV     =  Revision part of the version (e.g 12)
EXEC_STAGE    =  Stage of the execution (pre_start_exec, pre_grants_exec, post_grants_exec)

Mysql sandbox order of execution

Figure 2: MySQL Sandbox order of execution

You can combine shell and SQL calls for all stages. They will run in the order shown in Figure 2.

Examples

Let's install the XPlugin. And using the shell commands we can check the status of the system from the outside.

$ make_sandbox 5.7.12 -- --load_plugin=mysqlx \
  --pre_grants_exec='echo "##open ports";sudo netstat -atn  |grep LISTEN|grep 33060; echo "##"' \
  --post_grants_exec='echo "open ports";sudo netstat -atn  |grep LISTEN|grep 33060;echo "##"'
[...]
# Starting server
.. sandbox server started
# [pre_grants_exec] system "echo "##open ports";sudo netstat -atn  |grep LISTEN|grep 33060; echo "##""
##open ports
##
--------------
INSTALL PLUGIN mysqlx soname 'mysqlx.so'
--------------

--------------
select plugin_name, plugin_version, plugin_status from information_schema.plugins where plugin_name = 'mysqlx'
--------------

+-------------+----------------+---------------+
| plugin_name | plugin_version | plugin_status |
+-------------+----------------+---------------+
| mysqlx      | 1.0            | ACTIVE        |
+-------------+----------------+---------------+
# Loading grants
# [post_grants_exec] system "echo "open ports";sudo netstat -atn  |grep LISTEN|grep 33060;echo "##""
open ports
tcp4       0      0  *.33060                *.*                    LISTEN
##
Your sandbox server was installed in $HOME/sandboxes/msb_5_7_12

Here, the load_plugin option loads the plugin before the grants. Before this happens, the shell command runs netstat to check the status of port 33060, which is the one that the MySQL XProtocol uses by default. We can see that the port is not available at that stage. The same command running after the grants are loaded shows that the port is open.

Another example. This time we will use a shell script that uses most of the variables listed above:

$ cat t/test_init_exec.sh
#!/bin/bash
echo "----------------------------------------------------------------"
echo "Stage: $EXEC_STAGE"
if [ "$EXEC_STAGE" == "pre_start_exec" ]
then
    echo "PWD <$PWD> "
    echo "VER <$MYSQL_VERSION> "
    echo "DIR <$SANDBOX_DIR> "
    echo "DATADIR <$DB_DATADIR> "
    echo "BASEDIR <$BASEDIR> "
    echo "SOCKET <$DB_SOCKET> "
    echo "MY_CNF <$MY_CNF>"
    echo "USER/PASSWORD/PORT <$DB_USER> <$DB_PASSWORD> <$DB_PORT> "
    echo "Version components <$MYSQL_MAJOR> <$MYSQL_MINOR> <$MYSQL_REV>"
fi
cd $SANDBOX_DIR
# ls
echo '## Data directory'
ls  data
echo "----------------------------------------------------------------"

The script shows the value of the variables for the first stage only and the contents of the data directory:

$ make_sandbox 5.7.12 --  --pre_start_exec=./t/test_init_exec.sh --post_grants_exec=./t/test_init_exec.sh
[...]
# [pre_start_exec] system "./t/test_init_exec.sh"
----------------------------------------------------------------
Stage: pre_start_exec
PWD <$HOME/workdir/git/mysql-sandbox>
VER <5.7.12>
DIR <$HOME/sandboxes/msb_5_7_12>
DATADIR </Users/gmax/sandboxes/msb_5_7_12/data>
BASEDIR <$HOME/opt/mysql/5.7.12>
SOCKET </tmp/mysql_sandbox5712.sock>
MY_CNF <$HOME/sandboxes/msb_5_7_12/my.sandbox.cnf>
USER/PASSWORD/PORT <msandbox> <msandbox> <5712>
Version components <5> <7> <12>
## Data directory
auto.cnf        ib_buffer_pool        ibdata1            mysql            performance_schema    sys
----------------------------------------------------------------
# Starting server
.. sandbox server started
# Loading grants
# [post_grants_exec] system "./t/test_init_exec.sh"
----------------------------------------------------------------
Stage: post_grants_exec
## Data directory
auto.cnf        ib_logfile0        ibdata1            msandbox.err        mysql_sandbox5712.pid    performance_schema    test
ib_buffer_pool        ib_logfile1        ibtmp1            mysql            mysqld_safe.pid        sys
----------------------------------------------------------------
Your sandbox server was installed in $HOME/sandboxes/msb_5_7_12

We can see that the data directory, before the server starts, contains only the files created by mysqld --initialize-insecure, while after the start and the grant load we get the InnoDB log files, the .pid files, and the test database (created by the grants script).

Wednesday, May 04, 2016

Docker for Mac beta and MySQL - First impressions

Using Docker for development is a great way of ensuring that what you develop will be the same that you deploy in production. This is true for almost everything. If you develop on Linux, the above statement holds. If you develop on a different operating system (OSX or Windows) there are several restrictions.

I showed one of those issues in a recent article (MySQL and Docker on a Mac: networking oddity.) When you want to export a port from a service running in the container, the exported port is not available in your mac, but in the virtual machine that runs Docker services. This happens with any application that listens to a port.

The second limitation I found affects only MySQL, and it is related to using volumes. The proper way of achieving data persistence with containers is through volumes, i.e. telling the container to run the data directory in a virtual path that refers to some safe place in the host computer. That can't be done on a Mac, because the host computer is a virtual machine, and even though Docker can access a folder in your Mac, the server installation fails for lack of permissions.

Both the above restrictions are lifted if you use the beta release of Docker for Mac and Windows. It's a private beta: you need to apply and wait to be given an operational token, but once you are in, you notice the differences between the beta and the "old" Docker-Toolbox:

Docker beta

  • The Docker app is a native app, which you install by copying its icon to the /Application folder;
  • You don't need Virtualbox or VMware Fusion. It comes with its own lightweight VM based on xhyve.
  • There is no need to run docker-machine start xxx and eval $(docker-machine env xxx). The new app is fully integrated with the OS.
  • Ports exported from a container are available in your Mac.
  • You can keep both the Docker Toolbox and the new Docker app in the same host, provided that you don't run them both in the same terminal session.

Back to our claim of lifted limitations: let's try a full installation on a Mac as we would do it on Linux.

$ docker run --name mybox -e MYSQL_ROOT_PASSWORD=secret -d \
    -v ~/docker/mysql/single:/var/lib/mysql \
    -p 5000:3306 mysql/mysql-server
72ca99918076ff0e5702514311cc706ffcc27f98917f211e98ed187dfda3b47b

$ ls  ~/docker/mysql/single/
auto.cnf    client-key.pem  ibdata1     mysql.sock.lock    server-cert.pem
ca-key.pem  ib_buffer_pool  ibtmp1      performance_schema server-key.pem
ca.pem      ib_logfile0     mysql       private_key.pem    sys
client-cert.pem             ib_logfile1 mysql.sock         public_key.pem

We create a MySQL server container with the internal port 3306 exposed to the external port 5000, and the data directory running in the host directory $HOME/docker/mysql/single. It seems that the data directory was created correctly. Now we use a MySQL client on the Mac to connect to the container, using port 5000 on the local network address (Note: there is NO database server running on my mac. Only in the container).

$ sudo netstat -atn  |grep LISTEN | grep 5000
tcp4       0      0  *.5000                 *.*                    LISTEN

$ ~/opt/mysql/5.7.12/bin/mysql -h 127.0.0.1  -u root -psecret -P 5000
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.12 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> create schema hello_docker;
Query OK, 1 row affected (0.01 sec)

mysql> use hello_docker
Database changed
mysql> create table i_am_here(id int);
Query OK, 0 rows affected (0.03 sec)

mysql> exit
Bye

$ ls  ~/docker/mysql/single/hello_docker/
db.opt        i_am_here.frm    i_am_here.ibd

This is full integration! Using a Mac client we connected to the container, where we created a table, which then appeared inside the data directory in the Mac host!

It's still early to say if this beta is ready for more serious work, but the first impressions are really good!

Monday, May 02, 2016

Taking the MySQL document store for a spin

This is not a comprehensive review, nor an user guide. It's a step-by-step account of my initial impressions while trying the new MySQL XProtocol and the document store capabilities. In fact, I am barely scratching the surface here: more articles will come as time allows.

MySQL 5.7 has been GA for several months, as it was released in October 2015. Among the many features and improvements, I was surprised to see the MySQL team emphasizing the JSON data type. While it is an interesting feature per se, I failed to see the reason why so many articles and conference talks were focused around this single feature. Everything became clear when, with the release of MySQL 5.7.12, the MySQL team announced a new release model.

Overview

In MySQL 5.7.12, we get the usual MySQL server, which shouldn't have new features. However, in an attempt to combine the stability of the server with a more dynamic release cycle, the server ships with a new plugin, unimaginatively named X-Plugin, which supports an alternative communication protocol, named X-Protocol.

In short, the X-Protocol extends and replaces the traditional client/server protocol, by allowing asynchronous communication to the server, using different API calls, which are available, as of today, in Javascript, Python, C#, and Java, with more languages to come.

The reason for this decision is easy to see. Many developers struggle with relational tables and SQL, while they understand structures made of arrays and associative arrays (maps.) This is also one of the reasons for the recent popularity of NoSQL databases, where schemas and tables are replaced by collections of documents or similar schema-less structures. With this new release, MySQL wants to offer the best of two worlds, by allowing developers to use the database with the tools they feel most comfortable with.

To use the new plugin, you need two components:

  • The plugin itself, which ships with the server package, but is not enabled by default;
  • The MySQL shell, a new command line tool that you have to download and install separately and will allow you to use Javascript or Python with the database.

As a QA professional, I am a bit concerned about this mix of GA and alpha features (The MySQL shell is defined as alpha software. and the shell itself says development preview in its help). Theoretically, the two worlds should be separated. If you don't install the plugin, the server should work as usual. But practice and experience tell me that there are dangers waiting for a chance to destroy our data. If you want a single piece of advice to summarize this article, DON'T USE the new MySQL shell with a production server. That said, let's start a quick tour.

Installation

You need to install the shell, which comes in a package that is different from the rest of MySQL products. The manual shows how to install it on OSX or Linux. The only mention that this product could be dangerous to use is a note reminding the user to enable the MySQL Preview Packages when installing from a Linux repository. The procedure, on any operating system, will install library and executables globally. Unlike the server package, it is not possible to install it in a user-defined directory, like you install the server with MySQL Sandbox. In this context, the standard Oracle disclaimer may have a meaning that goes beyond a regular CYA.

Next, you need to enable the plugin. You can do it in three ways:

(1)

$ mysqlsh --classic -u youruser -p --dba enableXProtocol
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating a Classic Session to youruser@localhost:3306
Enter password:
No default schema selected.

enableXProtocol: Installing plugin mysqlx...
enableXProtocol: done

(2)

Start the server with --plugin-load=mysqlx=mysqlx.so. This will enable the plugin, although it does not seem to work the way it should.

(3)

Enable the plugin with a SQL command.

mysql> install plugin mysqlx soname 'mysqlx.so';

I prefer method #3 because is the only one that does not have side effects or misunderstanding. The issue that hit me when I tried method #1 for the first time is that calling mysqlsh --classic uses the client/server protocol on port 3306 (or the port that you defined for the database) while subsequent calls will use the X-Protocol on port 33060.

Alternatives. Using Docker

If what I said previously has made you cautious and you have decided not to use the shell in your main computer (as you should), there are alternative ways. If you have a data center at your disposal, just fire a virtual machine and play with it. However, be aware that the MySQL shell does not install in Ubuntu 15.04 and 16.04.

A lightweight method to try on the new shell without endangering your production server is to use a Docker image for MySQL, or a combination of MySQL Sandbox and Docker.

In Docker, the MySQL shell does not ship together with the server. It requires a separate image. A quick guide is available in a recent article. I don't like the current approach: having two images is a waste of space. It would be acceptable if the images were based on a slim Linux distribution, such as Alpine. Since they run on OracleLinux, instead, you need to download two beefy images to start testing. With a fast internet connection this should not be a problem, but if you live in a place where 3 MBPS is the norm or if you are traveling, this could become an annoyance. Once you have pulled the images, you can use them at will, even without internet connection.

The above mentioned quick guide suggests using docker run --link to connect the two containers. I recommend a different approach, as the link option is now considered legacy.

$ docker network create mynet
edcc36be21e54cdb91fdc91f2c320efabf62d36ab9d31b0142e901da7e3c84e9
$ docker network ls
NETWORK ID          NAME                DRIVER
a64b55fb7c92        bridge              bridge
0b8a52002dfd        none                null
cc775ec7edab        host                host
edcc36be21e5        mynet               bridge

$ docker run --name mybox  -e MYSQL_ROOT_PASSWORD=secret -d --net mynet mysql/mysql-server:5.7.12 \
    --plugin-load=mysqlx=mysqlx.so
ecbfc322bb17ec0b1511ea7321c2b10f9c7b5091baee4240ab51b7bf77c1e424

$ docker run -it --net mynet mysql/shell -u root -h mybox -p
Creating an X Session to root@mybox:33060
Enter password:
No default schema selected.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.
mysql-js>

The first command creates a network (called mynet).

The second command creates the server container, which is launched using the network mynet and with the plugin-load option (which seems to work well with the docker image). When you use a docker network, the container name is recognized by the network as an host name, and can be called by other members of the network. This is much cleaner than using --link.

The third command runs the MySQL shell, using the same network. This allows us to use the container name (mybox) without any other options.

Running the MySQL Javascript shell

My favorite setup for this test is a mix of MySQL Sandbox for the server and Docker for the shell. This way I can use the alpha shell without polluting my Linux host and use a feature rich MySQL Sandbox to control the server.

Here is what I do:

$ make_sandbox 5.7.12 -- --no_show -c general_log=1 -c general_log_file=general.log

I start a sandbox with MySQL 5.7.12 (tarball expanded and renamed into /opt/mysql/5.7.12), with the general log enabled. We need this to peek under the hood when we use the document store.

Next, we load the sample world_x database from the MySQL documentation page.

$ ~/sandboxes/msb_5_7_12/use  -e 'source world_x.sql'

Finally, we enable the plugin.

$ ~/sandboxes/msb_5_7_12/use  -e "install plugin mysqlx soname 'mysqlx.so'"

Now we can connect the shell:

$ docker run -it --net host mysql/shell -u msandbox -pmsandbox world_x
mysqlx: [Warning] Using a password on the command line interface can be insecure.
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

Welcome to MySQL Shell 1.0.3 Development Preview

Copyright (c) 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type '\help', '\h' or '\?' for help.

Currently in JavaScript mode. Use \sql to switch to SQL mode and execute queries.
mysql-js>

What have we done? We use the network named 'host', which is a standard Docker protocol that lets a container use the host environment. We don't need to specify a port, since the shell assumes 33060 (enabled by the X-Plugin). The username and password are the usual ones for a sandbox. We enter inside a Javascript shell, where we can communicate with the database server using an alternative syntax. Let's see what we have:

  • We have an "X-Session" using port 33060 and working on database world_x;
  • There is a help, same as in the MySQL client;
  • The database world_x is accessible through the variable db.
  • Note: all the commands used below are the same for Python and Javascript. There are differences only when using the language extensively.

With the above elements, we can try getting data from the database.

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>
}

mysql-js> db.tables
{
    "City": <Table:City>,
    "Country": <Table:Country>,
    "CountryLanguage": <Table:CountryLanguage>
}

What does it mean? Let's abandon the Javascript shell and look at the traditional client:

mysql [localhost] {msandbox} (world_x) > show tables;
+-------------------+
| Tables_in_world_x |
+-------------------+
| City              |
| Country           |
| CountryInfo       |
| CountryLanguage   |
+-------------------+
4 rows in set (0.00 sec)

Here we see 4 tables, while the Javascript console lists only 3. However, the fourth table has the same name as the "collection." Let's have a look:

mysql [localhost] {msandbox} (world_x) > desc CountryInfo;
+-------+-------------+------+-----+---------+------------------+
| Field | Type        | Null | Key | Default | Extra            |
+-------+-------------+------+-----+---------+------------------+
| doc   | json        | YES  |     | NULL    |                  |
| _id   | varchar(32) | YES  |     | NULL    | STORED GENERATED |
+-------+-------------+------+-----+---------+------------------+
2 rows in set (0.00 sec)

mysql [localhost] {msandbox} (world_x) > show create table CountryInfo\G
*************************** 1. row ***************************
       Table: CountryInfo
Create Table: CREATE TABLE `CountryInfo` (
  `doc` json DEFAULT NULL,
  `_id` varchar(32) GENERATED ALWAYS AS (json_unquote(json_extract(`doc`,'$._id'))) STORED
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

Look what we got! A JSON column with a dynamic index implemented as a virtual column. Now we can appreciate why the JSON data type was such an important thing.

Back to the Javascript shell, let's get something from the database. (You can get all the commands I am using, and much more, from the manual.)

mysql-js> db.collections.CountryInfo.find("_id='USA'")
[
    {
        "GNP": 8510700,
        "IndepYear": 1776,
        "Name": "United States",
        "_id": "USA",
        "demographics": {
            "LifeExpectancy": 77.0999984741211,
            "Population": 278357000
        },
        "geography": {
            "Continent": "North America",
            "Region": "North America",
            "SurfaceArea": 9363520
        },
        "government": {
            "GovernmentForm": "Federal Republic",
            "HeadOfState": "George W. Bush"
        }
    }
]
1 document in set (0.00 sec)

Apart from the feeling of being back in the good old times when MySQL was still playing with IPO dreams (look at the HeadOfState field in the above data), this record is a straightforward JSON document, where data that should belong to different normalized tables are bundled together in this unified view. So, we are really querying a Table that contains JSON data associated with an _id. We know because the general log lists what happens after our simple query:

SELECT doc FROM `world_x`.`CountryInfo` WHERE (`_id` = 'USA')

Let's try a more complex query. We want all countries in Oceania with a population of more than 150,000 people, and whose Head of State is Elisabeth II. The query is a bit intimidating, albeit eerily familiar:

mysql-js> db.collections.CountryInfo.find("government.HeadOfState='Elisabeth II' AND geography.Continent = 'Oceania' AND demographics.Population > 150000").fields(["Name", "demographics.Population","geography.Continent"])
[
    {
        "Name": "Australia",
        "demographics.Population": 18886000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "New Zealand",
        "demographics.Population": 3862000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "Papua New Guinea",
        "demographics.Population": 4807000,
        "geography.Continent": "Oceania"
    },
    {
        "Name": "Solomon Islands",
        "demographics.Population": 444000,
        "geography.Continent": "Oceania"
    }
]
4 documents in set (0.00 sec)

Here is the corresponding SQL query recorder in the general log:

SELECT JSON_OBJECT(
    'Name', JSON_EXTRACT(doc,'$.Name'),'demographics.Population', \
    JSON_EXTRACT(doc,'$.demographics.Population'),'geography.Continent', \
    JSON_EXTRACT(doc,'$.geography.Continent')
) AS doc 
FROM `world_x`.`CountryInfo` \
WHERE (
    ((JSON_EXTRACT(doc,'$.government.HeadOfState') = 'Elisabeth II') \
    AND (JSON_EXTRACT(doc,'$.geography.Continent') = 'Oceania')) \
    AND (JSON_EXTRACT(doc,'$.demographics.Population') > 150000)
    )

I am not sure which one I prefer. The SQL looks strange, with all those JSON functions, while the Javascript command seems more readable (I had never thought I would say what I have just said!)

Enough with reading data. I want to manipulate some. I'll start by creating a new collection.

 mysql-js> db.createCollection('somethingNew')
 <Collection:somethingNew>

And the general log shows what should not be a surprise, as we have seen a similar structure for CountryInfo:

CREATE TABLE `world_x`.`somethingNew` (doc JSON, \
_id VARCHAR(32) \
    GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(doc, '$._id'))) \
    STORED NOT NULL UNIQUE
) CHARSET utf8mb4 ENGINE=InnoDB

Now, to the data manipulation:

mysql-js> mynew=db.getCollection('somethingNew')
<Collection:somethingNew>

The variable mynew can access the new collection. It's a shortcut to avoid db.collections.somethingNew

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}
mysql-js> mynew.find()
Empty set (0.00 sec)

As expected, there is nothing inside the new collection. Now we enter a very minimal record.

mysql-js> mynew.add({Name:'Joe'})
Query OK, 1 item affected (0.01 sec)

mysql-js> mynew.find()
[
    {
        "Name": "Joe",
        "_id": "e09ef177c50fe6110100b8aeed734276"
    }
]
1 document in set (0.00 sec)

The collection contains more than what we have inserted. There is an apparently auto-generated _id field. Looking at the general log, we see that the data includes the new field.

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Joe\",\"_id\":\"e09ef177c50fe6110100b8aeed734276\"}')

As you can see, an _id field was added automatically. We could override that behavior by providing our own value:

mysql-js> mynew.add({_id: "a dummy string", Name:"Frank", country: "UK"})

The data inserted now includes the _id filed with our manual value. The general log says:

INSERT INTO `world_x`.`somethingNew` (doc) VALUES ('{\"Name\":\"Frank\",\"_id\":\"a dummy string\",\"country\":\"UK\"}')

The value of _id, however, must be unique, or the engine will generate an error:

mysql-js> mynew.add({_id: "a dummy string", Name:"Sam", country: "USA"})
MySQL Error (5116): Document contains a field value that is not unique but required to be

If all this gives you a sense of deja-vu, you're right. This feels and smells a lot like MongoDB, and I am sure it isn't a coincidence.

Synchronizing operations

As our last attempt for the day, we will see what happens when we manipulate data in SQL and then retrieve it in Javascript or Python.

We leave the JS console open, and we do something in SQL

mysql [localhost] {msandbox} (world_x) > drop table somethingNew;
Query OK, 0 rows affected (0.01 sec)

How does it look like on the other side?

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}
mysql-js> db.getCollections()
{
    "CountryInfo": <Collection:CountryInfo>,
    "somethingNew": <Collection:somethingNew>
}

Oops! mysqlsh didn't get the memo! It still considers somethingNew to be available.

mysql-js> db.collections.somethingNew.find()
MySQL Error (1146): Table 'world_x.somethingNew' doesn't exist

We need to refresh the connection. Unlike the SQL client, you need to specify the connection parameters.

mysql-js> \connect msandbox:msandbox@localhost:33060/world_x
Closing old connection...
Creating an X Session to msandbox@localhost:33060/world_x
Default schema `world_x` accessible through db.

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>
}

We can see the same happening when we create a new table in SQL. The session in mysqlsh keeps showing the cached contents, and we need to refresh the session to see the changes. Looking at the general log, there are no changes when we issue commands asking for metadata, such as db.collections or db.tables. Instead, when we refresh the session, we see this:

SELECT table_name, COUNT(table_name) c FROM information_schema.columns \
   WHERE ((column_name = 'doc' and data_type = 'json') 
   OR (column_name = '_id' and generation_expression = 'json_unquote(json_extract(`doc`,''$._id''))')) \
   AND table_schema = 'world_x' GROUP BY table_name HAVING c = 2
SHOW FULL TABLES FROM `world_x`

The first query lists all tables that contain a JSON document and a generated _id (these are the collections). The second one lists all tables. Then the shell removes from the table list all the ones that were in the collections list.

Given the way it is done, we can cheat the system easily by creating something that looks like a collection, but has extra fields:

CREATE TABLE strangedoc (doc JSON, \
_id VARCHAR(32) \
    GENERATED ALWAYS AS (JSON_UNQUOTE(JSON_EXTRACT(doc, '$._id'))) \
    STORED NOT NULL UNIQUE,
    secret_stash varchar(200),
    more_secret_info mediumtext
) CHARSET utf8mb4 ENGINE=InnoDB;

mysql [localhost] {msandbox} (world_x) > insert into strangedoc (doc,secret_stash,more_secret_info) values \
('{"_id": "abc", "name": "Susan"}', \
'and now for something completely different', \
'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.');
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (world_x) > select * from strangedoc\G
*************************** 1. row ***************************
             doc: {"_id": "abc", "name": "Susan"}
             _id: abc
    secret_stash: and now for something completely different
more_secret_info: Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
1 row in set (0.00 sec)

And the Javascript console will be unaware of the extra material:

mysql-js> db.collections
{
    "CountryInfo": <Collection:CountryInfo>,
    "strangedoc": <Collection:strangedoc>
}
mysql-js> db.strangedoc.find()
[
    {
        "_id": "abc",
        "name": "Susan"
    }
]
1 document in set (0.00 sec)

We can add contents to the collection in Javascript, and the database server won't protest (provided that the extra fields are nullable or have a default value). Is it a bug or a feature?

Parting thoughts

As I have said at the beginning, this is a very simple exploration. More work is required to test the full potential of the new model. My impressions are mildly positive. On one hand, it's an exciting environment, which promises to expand to better usefulness with more programming languages and possibly better coordination between shell and server software. On the other hand, there are many bugs, and the software is still very green. It will require more iterations from the community and the development team before it could be trusted with important data.

Monday, February 29, 2016

MySQL Sandboxes in Docker

Overview

When I got interested in Docker, I started playing idly with the idea of integrating containers and MySQL Sandbox. My first experiments were not encouraging. Using a container the same way I would use a regular server produced horrible results. I started by creating a Debian or CentOS container, installing MySQL Sandbox, and then importing an expanded tarball into the container. What happens is that tarballs of recent MySQL versions expand to roughly 2 GB of binaries. When you try to put that into a container you get a bloated file system. If you want to expand more than one tarball, you get an enormous unusable blob that is contrary to what containers should be used for. There is, of course, the possibility of using volumes, which would avoid the problem of making the container too big, but would not make it easier for users. The current difficulty of MySQL Sandbox is its non-intuitiveness. I want to use containers to make things easier for users. Getting the tarballs for more than one version and using them transparently is possible, but not simple. This was my starting point.

Shrinking MySQL tarballs

I read two articles recently about reducing the size of a MySQL distribution. I hadn't given much thought to this issue, other than noticing that the binaries are taking more and more storage in my hosts. While this has not been a huge problem in my servers so far, it would be nice to save a few gigabytes in my laptop. So these two posts made me think again:

The procedure is simple. What you need to run a MySQL server in a sandbox is only a handful of files. For example, for MySQL 5.7.11, you need, from the ./bin directory:

3.5M my_print_defaults
4.4M mysql
4.1M mysqladmin
4.8K mysql_config
 25M mysqld
 26K mysqld_safe

This is about 33M. The sizes are small because I run strip on all binaries. Additionally, I also need some files from the ./share directory, from which I shave away all languages except English, leaving me with about 3.5M

To complete the usability of the sandbox, I need these files, which add 8.6M to the total.

4.4M mysqlbinlog
4.2M mysqldump

This brings the grand total for MySQL 5.7 to ~ 47M. Compared to the original size of the tarball at 2GB, it's an impressive reduction. However, I need to add something if I plan to use plugins in the sandbox. This requires the ./lib directory, with the contents of ./lib/plugin. This will double the size of the repository. But 90M is hardly a problem, compared to dealing with the size of original tarballs.

Of course, I have automated the process, with a file list for every version of MySQL. After the reduction, I got the following (sizes in MB):

Version original sizereduced size
5.037144
5.148559
5.568749
5.6110061
5.7200091



Building the images

My goal with using Docker in conjunction with MySQL Sandbox is to simplify the user experience, while keeping intact the advanced functionalities. After a few unsatisfactory attempts that allowed me to learn a few tricks and avoid pitfalls, I came up with the plan of producing three images:

  • One with the MySQL Sandbox software and the libraries to support it;
  • One with that also includes the MySQL tarballs compressed;
  • And one that includes the tarballs expanded.

The main image mysql-sb-base

This image datacharmer/mysql-sb-base is the base for the others, but it can be used stand-alone with the tarballs already in your host computer. The image is relatively light, is based on Debian, and its size is 167 MB.

For example, you have expanded tarballs in $HOME/opt/mysql

$ ls $HOME/opt/mysql
5.6.28  5.7.11

$ docker run -ti --name sbox -v $HOME/opt/mysql:/opt/mysql datacharmer/mysql-sb-base bash
msandbox@3d4a8d9ca186:~$ make_sandbox 5.6.28 -- --no_show
The MySQL Sandbox,  version 3.1.05
(C) 2006-2016 Giuseppe Maxia
loading grants
. sandbox server started
Your sandbox server was installed in $HOME/sandboxes/msb_5_6_28
msandbox@3d4a8d9ca186:~$ ~/sandboxes/msb_5_6_28/use
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.6.28 MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {msandbox} ((none)) >

Or, you may have a tarball in a directory:

$ ls $HOME/downloads
mysql-5.1.72-linux-x86_64-glibc23.tar.gz

$ docker run -ti --name sbox -v $HOME/downloads:/opt/mysql datacharmer/mysql-sb-base bash

msandbox@2b0a53ad8e71:~$ make_sandbox ~/opt/mysql/mysql-5.1.72-linux-x86_64-glibc23.tar.gz -- --no_show
unpacking /home/msandbox/opt/mysql/mysql-5.1.72-linux-x86_64-glibc23.tar.gz
    The MySQL Sandbox,  version 3.1.05
    (C) 2006-2016 Giuseppe Maxia
loading grants
. sandbox server started
Your sandbox server was installed in $HOME/sandboxes/msb_5_1_72

msandbox@2b0a53ad8e71:~$ ~/sandboxes/msb_5_1_72/use
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.1.72 MySQL Community Server (GPL)

Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {msandbox} ((none)) >

The intermediate image mysql-sb-gz

If you want to use the reduced tarballs that I have prepared, the less expensive image is datacharmer/mysql-sb-gz. It contains all the binaries mentioned above (from 5.0 to 5.7) still compressed. The image weighs 272 MB, and you can use the binaries with the help of a script that ships with the image.

$ docker run -ti --name sbox datacharmer/mysql-sb-gz bash

msandbox@b0672d141e3d:~$ ls
opt  setup.sh  README
msandbox@b0672d141e3d:~$ ./setup.sh
# expanding 5.0.96.tar.gz
# expanding 5.1.72.tar.gz
# expanding 5.5.48.tar.gz
# expanding 5.6.28.tar.gz
# expanding 5.7.11.tar.gz

msandbox@b0672d141e3d:~$ ls ~/opt/mysql/
5.0.96  5.1.72  5.5.48  5.6.28  5.7.11

msandbox@b0672d141e3d:~$ make_sandbox 5.7.11 -- --no_show
    The MySQL Sandbox,  version 3.1.05
    (C) 2006-2016 Giuseppe Maxia
loading grants
. sandbox server started
Your sandbox server was installed in $HOME/sandboxes/msb_5_7_11
msandbox@b0672d141e3d:~$
msandbox@b0672d141e3d:~$ ~/sandboxes/msb_5_7_11/use
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.11 MySQL Community Server (GPL)

Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql [localhost] {msandbox} ((none)) >

The complete image mysql-sb-full

The image that allows you to use a sandbox without any extra work is datacharmer/mysql-sb-full. This one contains the expanded binaries that are ready for consumption.

$ docker run -ti datacharmer/mysql-sb-full bash
msandbox@ed3cbbc088a7:~$ ls ~/opt/mysql
5.0.96  5.1.72  5.5.48  5.6.28  5.7.11  COPYING
msandbox@ed3cbbc088a7:~$ make_sandbox 5.5.48 -- --no_show
    The MySQL Sandbox,  version 3.1.05
    (C) 2006-2016 Giuseppe Maxia
loading grants
.. sandbox server started
Your sandbox server was installed in $HOME/sandboxes/msb_5_5_48
msandbox@ed3cbbc088a7:~$ make_replication_sandbox 5.6.28
installing and starting master
installing slave 1
installing slave 2
starting slave 1
. sandbox server started
starting slave 2
. sandbox server started
initializing slave 1
initializing slave 2
replication directory installed in $HOME/sandboxes/rsandbox_5_6_28
msandbox@ed3cbbc088a7:~$ ~/sandboxes/use_all 'select version()'
version()
5.5.48
# master
version()
5.6.28-log
# server: 1:
version()
5.6.28-log
# server: 2:
version()
5.6.28-log
msandbox@ed3cbbc088a7:~$

What's next

This is a first attempt, and I am sure it can be improved. Things that come to mind, in no particular order:

  • Making the container run a sandbox transparently in the host computer. This is already feasible, but not out-of-the-box. It will require a wrapper script.
  • Adding binaries for more versions, such as Percona and MariaDB servers, or preview releases (MySQL Labs);
  • Finding a way of adding binaries seamlessly, without making the images unnecessarily big;
  • Getting the binaries from a centralised repository instead of shipping them with the image. This is also feasible, but not immediately reliable as of today. The ideal solution would be for Oracle, Percona, and MariaDB to release and maintain reduced binaries of every shipped version.

The code to create the images is already available on GitHub. Contributions are welcome!

Thursday, February 04, 2016

Lightning talks at Percona Live Data Performance Conference

The main schedule for the Percona Live Data Performance Conference is available. Almost everything has been defined. There are tutorials and plenty of sessions waiting for conference attendees.

One thing that is still undefined is the session of lightning talks. The call for participation for these mini sessions of 5 minutes each is still open. If you plan to attend Percona Live, this is your chance to get your 5 minutes of celebrity: you can submit a proposal up to February 7th, 2016. There is a lot that can be said in 5 minutes. If you have an interesting topic to highlight, a pet project to show off, a neat trick to recommend, a happy or painful experience to share, a lightning talk is the right place to apply.

There are also open slots for Birds Of A Feather (BoF) sessions. These are not lectures, but rather meetings of users who share the same interest. If you want to apply for one of these sessions, don't propose a topic where you address the audience, but propose a theme for a discussion among peers. BoF sessions are often the place where new ideas are born, helped by the free discussion among passionate users. If you have an open source project and want to ask for feedback, or if you want help defining the road map for an already successful project, a BoF is what you need. Also for this kind of sessions, the deadline is February 7th.

And remember: the conference is not limited to MySQL. Every data related topic (such as nosql, big data, database engines, data storage technologies) could trigger an interesting talk.

Monday, February 01, 2016

A safer MySQL box in Docker

The MySQL team has been maintaining a MySQL image on Docker. They have been listening to requests from the community and reacting quickly. So far, they have fixed two bugs that I reported and introduced a feature request that I suggested to make the server more secure.

Thanks, folks!

My latest request was about password management in a MySQL container. I have mentioned in previous posts the compatibility problems introduced by MySQL 5.7 security enhancements. Let me recap the main issues here:

MySQL is secure by default.

The recommended method to install MySQL is mysqld --initialize, which will generate a random password that the DBA will then use to access the server and change it. This method would be acceptable if we were still in the 1990s, where installing servers one by one was a common practice. In the world of automated deployed servers, this practice would kill productivity in most environments.

MySQL provides a script-friendly alternative, mysql --initialize-insecure, which will not generate any password, and leave the server with old root without password, ready for another automated process to set a secure password. Despite the intimidating name, this method is not more insecure than it was before, because it is used in automated processes, where the security does not depend on the vendor decisions but on a careful deployment plan that include password generation and management at company level.

MySQL on Docker is insecure by default

The above problem came to bite the MySQL team when they released an image for Docker. Here, to deploy a container you need to execute:

docker run -e MYSQL_ROOT_PASSWORD=something -d mysql/mysql-server

Meaning that you need to pass your valuable password on the command line. There is an alternative. Instead of MYSQL_ROOT_PASSWORD, you can pass the variable MYSQL_RANDOM_ROOT_PASSWORD, which will generate a random password and you would then see the password in the logs:

$ docker logs mybox
Initializing database
Database initialized
MySQL init process in progress...
GENERATED ROOT PASSWORD: Ix.oqdovoj!AGSEtYBnOnebag]u

Here the situation is worse than the regular case where we install with an automated tool, because containers are not supposed to be modified post installation. They should work out of the box. The password that we pass on the command line should be the real one, but given the exposure it could only be temporary and then require a change with a further operation. The random password in the logs does not make the process simple for a script.

The latest change by the MySQL team, though, fixes the problem. Now you can pass a file name inside MYSQL_ROOT_PASSWORD and the server installation procedure will get the password from that file. This way, there will be no more exposure, and a reliable password could be used from the beginning.

Here is an example of a script that creates a container with a secure password, which nobody needs to see on screen at any moment:

#!/bin/bash

# Generate a random password (for demo purposes only: there are more secure methods)
RANDOM_PASSWORD=$(echo $RANDOM | sha256sum | cut -c 1-10 )

# Save the random password to a file
echo $RANDOM_PASSWORD > secretpassword

# Create the .my.cnf file
echo '[client]' > home_my_cnf
echo 'user=root' >> home_my_cnf
echo "password=$RANDOM_PASSWORD" >> home_my_cnf

# set -x
docker run --name mybox \
    -v $PWD/secretpassword:/root/secretpassword \
    -v $PWD/home_my_cnf:/root/home_my \
    -e MYSQL_ROOT_PASSWORD=/root/secretpassword -d mysql/mysql-server $@

This script creates a random password, saves it to a file, and stores the file inside the container (using the -v volume option). Inside the container, the installer reads the password from the file and proceeds like before. The difference is that this password has not been shown on screen or in the logs. Notice that this script has also written the password to a second file, which is in the format required by MySQL for its options (same as /etc/my.cnf.) We can use this file to connect without typing a password.:

$ docker exec -ti mybox mysql --defaults-file=~/home_my
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.10 MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

We can't deploy the configuration file as /root/.my.cnf because the server would read it before installing, i.e. before the password has been set, and thus the installation will fail. Once the installation is concluded, though, we can do it, and access mysql without a password.

$ docker exec -ti mybox sh -c 'cp ~/home_my ~/.my.cnf'
$ docker exec -ti mybox mysql
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.10 MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

Sunday, January 31, 2016

Why log_slave_updates is a bad default

In a recent post the MySQL product managers asked the community for feedback about proposed new defaults. One of the proposals is to make log-slave-updates on by default.

There are other important options that require some debate. They all look reasonable to me. This one, instead, which implies funnelling the replication events in a slave to its binary log, is questionable.

Let's start for the reason why it is a good idea. The scenario in which it makes sense is when you want a slave to be a master of one or more slaves. This is a common scenario in many cases where you need to support many servers in production. It is not, however always useful. I would like to show, here, a few cases when log-slaves-update is a problem rather than a solution.

Regular Master/Slave deployments

I would say that this case, rather than the hierarchical topology, is the most common situation. When you have a master/slave topology, you want the slave to be ready to replace the master, and for this you don't need log-slave-updates. Also the manual recommends having a slave without log-slave-updates when your goal is to switch roles after a failover.

When using multi-source replication

In multi-source replication, you can have more than one topology. Unlike the ancient circular replication, which was the only way of implementing multi source replication prior to MySQL 5.7, and has proven to be inefficient, the new methods for multi source don't need to enable log-slave-update, unless you are using complex and hybrid topologies like a star or a composite topology. Having used and tested such methods with MySQL and Tungsten replication for years, I can tell confidently that having a direct node-to-node replication stream is always preferable to using an intermediate node: there is better performance, less risk of data muddling, more certainty about the data origin.

Whenever you need performance in the slave

If we are not dealing with a hierarchical topology, having log-slave-updates means that the slave has a penalty performance for no good reason.

Suggesting a better improvement than the intended default

Rather than making log-slave-update the default, what I would like to see in the next MySQL version is a mechanism for log servers, i.e. an intermediate node that receives binary logs from a master and serves them to the slaves without need of applying anything. If hierarchical topologies are so important, this should be the way to go, rather than stretching an obsolete technology beyond its capabilities.

Thursday, January 21, 2016

Frictionless MySQL installation

I saw an interesting post about the ability of installing MySQL 5.7 without changing existing tools and procedures. The post is a plea to make MySQL installation frictionless.

That post was followed by a conversation on Twitter, where the recent security enhancements are blamed for getting in the way of existing practices and need a rewrite of installation tools.

I know the problem very well, as I have faced the installation change in MySQL Sandbox. SO I can sympathize with the ones who have to change deployment tools that rely on mysql_install_db, which was a Perl script up to version 5.6, then it was replaced with a C++ program in 5.7 and deprecated in the same version.

It occurred to me that, in order to keep the existing tools working and at the same time having a recommended installation, a DBA could just quickly replace the existing mysql_install_db with the following shell script:


#!/bin/bash
exec_dir=$(dirname $0);

if [ ! -x $exec_dir/mysqld ]
then
    echo "$exec_dir/mysqld not found"
    exit 1
fi

$exec_dir/mysqld --initialize-insecure --explicit_defaults_for_timestamp $@

This is Unix only solution. A corresponding script for Windows should be easy to come by.

It is not the optimal way, but it could alleviate the work of a DBA that wants to use tools that would be too cumbersome to adapt to the new requirements.

Wednesday, November 25, 2015

Percona Live Data Performance Conference 2016 - CFP and Community voting

The call for participation for Percona Live Data Performance Conference 2016 is still open. Deadline for submission is November 29th.

There are two immediately noticeable novelties in this edition:

  • The name change. Not "MySQL & Expo" but "Data Performance Conference." It makes the conference open to a broader set of topics.
  • The community voting. Proposals can get evaluated by the community before the review committee takes decisions.

I think it's a good choice. Other conferences adopt the same method. The attendees choose what they want to see and hear. In this case, it's mixed method, where the community voting is used as an indication for the review committee, which, by my understanding, has the final say.

Vote for my proposals!

Below are my proposals. Clicking on the links below will take you to the conference site, where you can say if you want to see these talks or not. You will need to register (to the site, not yet to the conference) in order to cast votes.

Here is a talk that is in continuous evolution. It discusses the latest advances in replication, and gives an honest evaluation of the features from a new user standpoint. This talk, if accepted, will be updated with the latest novelties in MariaDB and MySQL, if they come out in time for the conference. You can see in my blog six articles covering related matters.
Another tutorial, this one aimed at users of containers who want to get started with this exciting technology. Also for this topic I have written a few articles.
This is a short talk that wants to explain the differences between deployment methods. Standalone physical servers, sandboxes, virtual machines, and containers are choices that require some information to get started. This talk, for which I also wrote an article, wants to show the good and bad of each choice.
This is a lightning talk, which is not about data performance, but it's a geeky topic and I was asked to submit it. So here it is!

Sunday, November 22, 2015

MySQL and Docker on a Mac: networking oddity

This is a quick post only indirectly related to the series of articles about Docker that I have written recently.

Yesterday I was giving a presentation about Docker in Buenos Aires, and as usual I included a long live demo. Almost all went as expected. There was one step that I tried some time ago, and had always worked well, but when I tried to repeat it on stage, it failed miserably:

  • Step 1: run the container
$ docker run  -d --name mybox -e MYSQL_ROOT_PASSWORD=secret mysql/mysql-server
50acff1c7d237b2944f3fbdd5d230e4ecca2bcccd4e4ba06aa778ee20d41631e
  • Step 2: get the IP address
$ docker inspect --format '{{ .NetworkSettings.IPAddress}}'  mybox
172.17.0.2
  • Step 3: connect to the container from a local client.

$ mysql -h 172.17.0.2 -u root -psecret
Enter Password:

And here the connection hangs.

That was odd. This operation was working fine at home. I have it pasted into my first article of the series. Why it did not work here?

I could not solve the issue on stage. I had to go on with my presentation. The installation of three nodes in replication went well. Everything that I tried at home was working, except connecting from the external client to the server inside a container.

When I came back to the hotel, I realised what was different: I had done all the tests for the first article on Linux only, and I started testing on Mac when using the material for the second article. Thus I hadn't tried this particular example on a Mac until I went on stage. And of course it did not work: I was trying to connect from my Mac to a container inside a virtual machine which uses a different subnet and cannot be reached without indirection.

The only way of connecting a local client to the container is by exposing explicitly the MySQL port to the external (and in this case the "external" is the virtual machine itself) and then connect to the virtual machine.

This approach works:

$ docker rm -v -f mybox
mybox

$ docker run -p 3306:3306 -d --name mybox -e MYSQL_ROOT_PASSWORD=secret mysql/mysql-server
2798f4d1b9abb023453b1141a808014fdf7666221498b42c96e2a5d58fc6ce65

Here the port 3306 of the container is attached to the port 3306 to the host (= the virtual machine).


$ docker-machine ip default
192.168.99.100

Now we get the address of the virtual machine, so that we can connect to its MySQL port.


$ mysql -h 192.168.99.100 -u root -psecret
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.7.9-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql>

And the connection now works!

I owe it to the attendees to my talk to explain the oddity, but I believe it could be useful for others out there as well.

Saturday, November 21, 2015

Default users in MySQL 5.7

Among the many New features introduced by MySQL 5.7, we can notice a strong trend towards improving the server security by default. Two features stand out in this respect:

  • A password-less root is no longer the default for new installations. Unless you say otherwise, the default installers mysqld --initialize and the deprecated mysql_install_db will generate a random password which the user needs to change.
  • The anonymous accounts are no longer created by default. When you start MySQL, you only get the root user (and a new one: read on).

The above features are a great advance not only for security but also for usability. The anonymous users were a continuous source of mismatched connections, with difficult to explain errors, and confusion for beginners and experts alike. That's why MySQL-Sandbox has removed the anonymous accounts since its first release.

There are, however, two more changes in the privileges tables:

  1. We have a new user, mysql.sys, which is not a usable account, as it comes with a deliberately invalid password. Its role is only to allow the sys objects to have an owner different from root. This novelty has caught me by surprise, and I had to adjust the management of users in MySQL-Sandbox, to prevent removal of this account attributes from various mysql tables: user, db, tables_priv. In fact, since we don't have anonymous accounts anymore, MySQL-Sandbox does not remove rows from db and tables_priv. (The changes were apparently new for the team maintaining MySQL images on Docker, as this user is not available on Docker MySQL containers: the entrypoint file removes all accounts from the user file.)
  2. The test database is removed by default. This is, in my opinion, a mistake. The reason for the vulnerability of the test database was that it was open to use for the anonymous users. But since we don't have anonymous users anymore, deleting the test database is like obeying a superstitious belief.

Anyway, MySQL-Sandbox 3.1.02 comes with a few small bug fixes, among which is the preservation of the mysql.sys user and a few adjustments to the tests to take into account the latest change. The test database is always present in sandboxes, despite the above mentioned irrational removal.

Let me demonstrate the issue. In a brand new installation, we create an anonymous user and the test database:


mysql> create user '';
Query OK, 0 rows affected (0.02 sec)

mysql> select host,user from user;
+------+------+
| host | user |
+------+------+
| %    |      |
| %    | root |
+------+------+
2 rows in set (0.00 sec)

mysql> create schema test;
Query OK, 1 row affected (0.01 sec)

Then we try to access the server


# mysql -user=''
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 5
Server version: 5.7.9-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> use test
ERROR 1044 (42000): Access denied for user ''@'%' to database 'test'
mysql> use performance_schema
ERROR 1044 (42000): Access denied for user ''@'%' to database 'performance_schema'
mysql> use sys;
ERROR 1044 (42000): Access denied for user ''@'%' to database 'sys'

mysql> show schemas;
+--------------------+
| Database           |
+--------------------+
| information_schema |
+--------------------+
1 row in set (0.00 sec)

mysql> select table_schema,engine,count(*) from information_schema.tables group by table_schema,engine;
+--------------------+--------+----------+
| table_schema       | engine | count(*) |
+--------------------+--------+----------+
| information_schema | InnoDB |       10 |
| information_schema | MEMORY |       51 |
+--------------------+--------+----------+
2 rows in set (0.00 sec)

mysql> show grants for '';
+------------------------------+
| Grants for @%                |
+------------------------------+
| GRANT USAGE ON *.* TO ''@'%' |
+------------------------------+
1 row in set (0.00 sec)

So, the anonymous user is unable to do harm, as it can't even see the databases. The only effective measure was cleaning up the table mysql.db, which was the one giving access to the test database to the anonymous users.

Monday, November 16, 2015

MySQL-Docker operations. - Part 4: Sandboxes, virtual machines, containers.

Previous episodes:

We're going to explore the choices and the differences between various types of deployments. We will consider four use cases:

  1. [Friendly]: Testing an application on a server where a different version of the same application is already installed (examples: a Python app requiring many libraries, a MySQL server);
  2. [Intrusive]: Testing a potentially intrusive application (anything that changes your general settings in /usr or /etc);
  3. [Conflicting]: Running a service that has lots of conflicting dependencies (an updated database driver compiled with a version of MySQL different from what you have installed);
  4. [Intractable]: Running an intractable service, one of those that require a specific user to run and assume they have full control of the operating system (e.g. Postgresql, Oracle).

For each case, we need to determine the impact on our well being. We assume that the user starts with one reasonably powerful server.

The method used will affect our operations in several ways:

  1. Cost: How much would it cost to implement this method.
  2. Time: How much time will be needed to get things done.
  3. Performance: Can we run things as fast as we need.
  4. Ease of use: Can we get things done without reading a lengthy manual or using an unforgiving and complicated procedure;
  5. Isolation: Can you run your server without affecting other servers?
  6. Storage: Can we add or change storage easily?
  7. Scalability: Can we easily repeat the procedure as many times as needed.
  8. Availability: Can we run any service using this method?
  9. Portability: Can we run this service on several operating systems?
  10. Networking: Can we use this method to run operations that require a network?

Running servers on a regular host

The first possibility to solve our problem is simple. Take an empty server, install the service, run it. Until not long ago, before the advent of cloud computing, this was the only way to run operations: if your server is not enough, buy a bigger one, or buy many small ones and get smarter with them. But inevitably, whether we wanted to install a new service or test a new version of a known application, we needed to find money and physical space to get the job done.

Regular apps

Figure 1 : Applications within a server share the operating system and library resources

In this configuration, everything is by the book. We assume that we will use one physical host to run a main service, using the best configuration we can get to achieve the purpose.

The evaluations in the following table are based on my own experience and may differ from what others feel or need.

Requirement score notes
Cost –10 You need to own a new server
Time 8 You need to install it, but it can be easily automated
Performance 10 Nothing beats bare metal
Ease of use 8 As easy as the installation procedure makes it.
Isolation 10 Not going to affect services in other machines.
Storage –10 Changing storage requires physical manipulation
Scalability –10 Every new server requires a new purchase
Availability 10 We can run anything.
Portability 10 We can install the O.S. that we need, and the services on top of it
Networking 0 We can use, but can't create or simulate networking.
Total +56 / -30

The negative results should be considered separately from the positive ones. What could be a prohibitive condition for an individual could be merely a nuisance for someone in a stronger position. For example, if you already have access to bare metal servers for the next two years, thanks to an advantageous merger, you may not feel the cost factor to affect you too much.

Evaluation for the bare metal servers usability:

  • [Friendly]: easily used. No problems here.
  • [Intrusive]: difficult to use. Installing one of those means that you may have trouble installing anything else.
  • [Conflicting]: Extremely difficult. You may end up with the inability of upgrading a given service unless you also upgrade all the dependencies, and end up upgrading the whole operating system out of desperation.
  • [Intractable]: Extremely difficult. Once you install one of those, you may not be able to use the server for anything else.

Running servers in a sandbox

In this context, by sandbox I mean an application that runs on a server with strict configuration settings that prevent it from misbehaving. One example for this category is MySQL-Sandbox, where one or more MySQL servers are installed in a host, each of them configured in such a way that it does not clash with the others.

Sandboxes

Figure 2 : Sandboxes are regular applications that were carefully configured to behave well without disturbing the neighbors.

While MySQL-Sandbox is designed for testing, deploying several production servers on the same host is a common practice. The main reason for it is that commodity servers have become more and more powerful, but the software hasn't caught up to utilise such power to its fullest. In this context, using a single server on such powerful hosts would be a waste, while installing two or three servers would provide for better effectiveness.

This type is similar to running a plain bare metal server. You are running your MySQL server very close to the metal, as there is no software layer between the server and the operating system. Applications configured this way are as fast as the hardware allows. However, they are not as secure. While a lonely server running inside its dedicated host does not have to worry about clashing, a sandbox is sharing libraries and other operating system resources to other similar servers, and a clash is easy to provoke. It would be enough to mix up the configuration settings, and one or more of them would either stop working or corrupt data. Or it could happen that a sandbox could drain all the resources (e.g. the main memory) leaving all the other contenders in the cold.

Requirement score notes
Cost 10 No investment required
Time 8 As easy as the installation procedure makes it
Performance 10 Still bare metal, even if there is potential concurrency.
Ease of use 10 As easy as the manual says it is
Isolation -5 Depends on the service configuration.
Although it is functionally independent, the services can clash.
Storage 5 Sandboxes can be resized at will (within the limits of existing storage).
Scalability 10 Deployment of new instances is only limited by the host resources.
Availability 5 We can run only applications that are fully configurable.
Portability -5 We can only run applications for the host O.S.
Networking We can use but can't create or simulate networking.
Total +58 / -10

There are several advantages to using sandboxes instead of a dedicated host, such as being able to deploy multiple servers without buying new hardware or installing virtual machines. There are, however, obvious limitations, like the lack of isolation mentioned above and the fact that only applications compiled for the host operating system can run in this fashion.

Evaluation for the sandboxes usability:

  • [Friendly]: easily used. This is the strong point of sandboxed applications.
  • [Intrusive]: Difficult to use. Sometimes impossible.
  • [Conflicting]: Difficult but possible to use. It's one of the case where having a conflicting application used in a parallel environment could be beneficial.
  • [Intractable]: Almost impossible to reduce to a sandboxed environment.

Running servers in virtual machines

VM

Figure 3 : A virtual machine isolates the application and the operating system.

Virtual machines are the heart of current cloud computing strategies. The ability of creating servers that behave almost like bare metal ones –without need for physically buying them and transporting into a data center– has changed the economy of most companies in the past decade.

Requirement score notes
Cost -5 Moderate investment required
Time As easy as the installation procedure makes it. But the O.S. must be installed as well
Performance -10 There is much overhead from the additional layers and the need of having a full O.S..
Ease of use 8 Everything that is allowed through the interface.
Isolation 9 It can be as good as a physical host.
There is still the risk of a VM affecting negatively others.
Storage 5 V.M.s can be resized at will (within the limits of existing storage).
Scalability 10 Deployment of new instances is only limited by the host resources.
Availability 10 We can run anything.
Portability 10 We can install the O.S. that we need, and the services on top of it
Networking 10 We can use and create networks.
Total +62 / -15

Compared to bare metal, virtual machines can scale at will. You can deploy in a few minutes a new VM of the size that is needed for your current business, and get rid of it when the need ends. Unlike sandboxes, you can run any operating system and any application. In addition, you can have a network for public and private communication between servers.

There are prices to pay. First of all, it will cost you. Depending on the usage, they could be much cheaper than buying and storing your own physical servers, but they won't be free. Sure, you can install a virtual machine in your initial server, the same way that you can do it for a sandbox, but then you get into the second great limitation: performance. Even with the best software available today, the performance of a server running in a VM is greatly inferior to a server on bare metal.

You can compensate for performance by splitting the job into many parts and deploying many small virtual machines that will work in parallel. When a solution like this is successfully deployed, the performance of the group of virtual machines can surpass that of a single bare metal server. Unfortunately, to achieve this goal, you would incur more costs than buying a single server, and your application will need to be adapted to working in a distributed environment. This solution can work, and it has been deployed successfully in many cases, but it is not a one-size-fits-all, and done with poor planning can backfire.

Evaluation for the virtual machines usability:

  • [Friendly]: easily used. No problems here.
  • [Intrusive]: Easily used with overhead. Just install another virtual machine.
  • [Conflicting]: Easily used with overhead.
  • [Intractable]: Easily used with overhead.

Running servers in containers

Docker

Figure 4 : Docker containers are thin layers of libraries and applications on top of a common kernel.

Containers are a growing trend in the virtualization ecosystem. If, by the previous statement, you believe that containers are virtual machines, you need to reconsider immediately, or risk failing to understand this technology. Containers are not virtual machines, although they have many things in common. Like virtual machines, containers are entities that are not in the host computer, can be deployed in a package, started, and the service inside it can be used more or less like a server on bare metal.

The differences between virtual machines and containers are a few, and very important:

  • A container does not pack a full operating system, but just a thin layer of the needed libraries to run the service in it;
  • The service itself is often a stripped down version of the original application.
  • Most important, the software in the container uses the host kernel directly, without any intermediate layer.
  • For the above reasons, while a virtual machine starts up in minutes, a container starts up in less than a second.

A container is a well packaged application that can be downloaded very quickly, and once downloaded can be instantiated several times with incredible speed.

Another notable difference between containers and virtual machines is that containers are less isolated, because they use the same kernel as the host, rather than a virtualized one. On one hand, this makes containers less secure, on the other hand, they are blazingly fast.

Docker shared

Figure 5 : Docker containers can share libraries and other image layers

There is another reason for containers speed and low storage occupancy. Docker containers are deployed in layers. Some of those layers can be used by a single container, others could be in common between two or more containers. While a virtual machine is an enormous blob which can reach several GB, a container could be a thin modification of an existing image, and thus can be downloaded in seconds and deployed even faster.

Requirement score notes
Cost 10 No investment required
Time 10 Fast, fast, fast!
Performance 9 Almost as fast as running on bare metal. Tiny overhead.
Ease of use 3 Requires some learning and new workflows.
Isolation 7 Much better than a sandbox.
Less than a V.M., because containers use the same kernel.
Storage 5 Containers can be resized at will (within the limits of existing storage).
Scalability 10 Deployment of new instances is only limited by the host resources.
Availability 3 We can run only applications that have been adapted for containers.
Portability -5 We can only run applications for the host O.S.
Networking 10 We can create and use netweoks.
Total +67 / -5

What are the strong points of containers? Low cost (or no cost, if all you need is what fits in your current server), good performance, private networking, easy to scale.

The limitations, as of today, are portability (applications can only run in the same OS as the host) and the ease of use. This is a point that is going to change. Using containers requires some changes in the applications (or finding ready made images) and an understanding of the environment, which could be intimidating for people used to the old ways. But once you get past the initial learning phase, everything feels very easy, and eventually the usage will be far easier than the old ways.

Evaluation for the containers usability:

  • [Friendly]: easily used. No problems here.
  • [Intrusive]: easily used, with little or no overhead.
  • [Conflicting]: easily used with little or no overhead.
  • [Intractable]: difficult to use, sometimes impossible if the intractable application or service was built without flexibility in mind.

All solutions comparison

For convenience, I made a table with a comparison of the solutions examined above.

I must stress that these evaluations are my own, very much subjective, based on my experience. The evaluations may differ from others, and possibly also from my own in a few years or months. Talking about Docker is like catching eels: it's a moving target where the technology evolves and improves daily. This fluidity is possibly the most appealing characteristic of Docker and the container related technology: its evolution has been and continues to be fast and effective, addressing the users needs at incredible speed.

Requirement Bare metal Sandbox Virtual machine Container
Cost –10 10 –5 10
Time 8 8 0 10
Performance 10 10 –10 8
Ease of use 8 10 8 3
Isolation 10 –5 9 7
Storage –10 5 5 5
Scalability –10 10 10 10
Availability 10 5 10 3
Portability 10 –5 10 –5
Networking 0 0 10 10
Total +56 –30 +58 –10 +62 –15 +66 –5

I believe we haven't seen the end of this trend yet. What we have seen so far with containers and virtual machines seems to aim at an architecture built on micro services. Containers could take a substantial role in the transition towards that reality.

What can we take away from this analysis?

  • Bare metal servers are not outdated yet. There are still cases where they are irreplaceable. Despite the cost associated with their usage, they are not extinct yet, but just.
  • Virtual machines are still in charge of the scalability department in many cases. However, they feel the advance of containers and need to either evolve or merge into a more flexible architecture to deal with increasing demands from users.
  • Containers are the new force in IT. They can play well with both bare metal servers and virtual machines, waiting for the rise of container-oriented operating systems, which already exist and aim at world domination in a not distant future.

I see a future where the rise of containers and micro systems will force software makers to simplify their products and make them more modular and easy to play with. This trend is important in the current cloud architecture and will become vital when containers take over.

In the meantime, I am not giving up MySQL-Sandbox, which is still indispensable I'm most scenarios, but I am starting to rethink the architecture to fit smarter future uses.

MySQL deployment summary

With all the above considerations, where do we stand with MySQL? My view is that we're still in middle ground. MySQL is still used heavily on bare metal, either as a stand-alone server or as a part of multi server deployments in the same host.

It is also massively employed in the cloud, where it offers many advantages for deployment flexibility and ease of scalability. Yet it still lacks the agility necessary to be a native cloud component. There are several attempts at creating a better cloud player out of MySQL, some successful, some less so.

When it comes to containers, MySQL has still much work to do to become an efficient building block in the new ebullient architecture expansion. The MySQL team provides an official package, which is a first step towards becoming a good player. But in the near future there will be demands of more integration and better modularity than what's available today. Looking at the internals of MySQL deployment in a container shows that the system is struggling to adapt to the new medium. I see the container revolution as an opportunity for established applications like MySQL to improve their usability and increase their ability to play well with other components of the emerging IT infrastructure.

What's next

In the next (and last) episode we will see MySQL, Docker and orchestrating tools playing together to deliver faster and more powerful operations.

Vote on Planet MySQL