|
Since Redis appeared, to a certain extent in the use of storage and analysis of time series data. Redis is implemented initially as a buffer for the purpose of recording the log, but with the continuous development of its function, it already has five explicit, implicit structure of three kinds or types of Redis in data analysis provides a variety of ways. This article will introduce the reader to use Redis time series analysis of the most flexible method.
About race with the transaction
In Redis, each separate command itself is atomic, but multiple commands executed sequentially, but may not be atomic, may be due to race conditions caused by incorrect behavior. To address this limitation, we will use "transaction pipeline" and "Lua script" two ways to avoid race conditions conflict data.
When using Redis Redis and Python client for connecting an end, we will call .pipeline () method Redis connection to create a "transaction pipeline" (when using other clients, also commonly referred to as "transaction" or "MULTI / EXEC transaction"), without having to pass parameters when calling, or you can pass a Boolean value True. Pipeline created by the method to collect all incoming commands until you call .execute () method so far. When .execute () method is called, the client will send MULTI Redis command and then send all the collected command, and finally the EXEC command. When Redis in implementing this set of commands will not be interrupted by any other command, ensuring atomic execution.
In a series of commands to the Redis atomic execution, there are still another option, namely Lua script end service. In simple terms, the behavior of Lua script and relational database stored procedure is very similar, but only using the Lua language and a dedicated Redis API to execute Lua. And transaction behavior is very similar, in general, Lua script will not be interrupted while performing 1, but can also cause an unhandled error interrupt Lua script in advance. Syntactically, we will call Redis connection object .register_script () method to load a Lua script, the object returned by this method can be used as a function to call Redis the script, without having to call Redis connections other methods, and used in conjunction with EVALSHA sCRIPT lOAD command to load and execute the script.
Example
When it comes to Redis and use it as a time-series database, we first asked a question is: "What is the purpose or object of the time series database is?" Use cases and more time-series data related to the database, especially in your data structure is defined as a series of events, with the example of one or more values, and change over time metrics. The following are some examples of these applications (but not limited to):
Stock Trading selling price and trading volume
The total price of the order and online retailers shipping address
Video game player's operation
Data IoT sensor device embedded collected
We will continue to conduct in-depth discussion, but basically, the role of time-series database is that if something happens, or if you conducted an assessment after the operation, you can add a timestamp recorded data. Once you have collected information on certain events, you can analyze these events. When analyzed at the same time you can choose to collect real-time analysis, you may also need to make some more complex queries after the event.
Use advanced analytics through an ordered set of hash
In Redis, for the preservation and analysis of time-series data are one of the most flexible way, it requires the use of two different structures in Redis binding, ie ordered set (Sorted Set) and hash (Hash).
In Redis, the ordered set this structure combines the hash table and sort tree (Redis internally use a jump table structure, but you can just ignore this detail) characteristics. In simple terms, an ordered set of each item are "members" of a string and a double-type "score" combination. Members played a key role in the hash, and the score is assuming the role of the tree sort value. With this combination, you can score by a member or members of the value of direct access points. In addition, you can also score in various ways according to the value of the ordered members with access to 2 points.
Save event
Today, from a variety of ways, using one or more ordered collections and combinations of hash portion for storing time-series data practices are one of the most common use cases Redis. It shows a bottom building block for implementing a variety of different applications. Including social networks like Twitter, as well as similar Reddit and Hacker News news sites, and even Redis itself based on a nearly complete relationship - Object Mapper
In our example, we will get a variety of user behavior in the event of site generated. All events will share four kinds of attributes, and other properties of different numbers, depending on the type of event. We know attributes include: id, timestamp, type, and user. In order to save each event, we will use a Redis hash whose keys are derived from the id of the event. To generate an event id, we will choose a way in a large number of sources, but now we will be generated by our id Redis a counter. If you use a 64-bit Redis on 64-bit platforms, we will be able to create up to 263-1 events, major limitation depends on the available memory size.
When we are ready to record and insert data, we need to save the data in the hash, and insert a member of an ordered set / scores right, corresponding to the time stamp event id (member) and events (Score ). Code recording an event is as follows
def record_event (conn, event):
id = conn.incr ( 'event: id')
event [ 'id'] = id
event_key = 'event: {id}'. format (id = id)
pipe = conn.pipeline (True)
pipe.hmset (event_key, event)
pipe.zadd ( 'events', ** {id: event [ 'timestamp']})
pipe.execute ()
In this record_event () function, we get an event, get a new id calculated from Redis, and assign it to the event, and generates a key event saved. The key is a string constituted "event" adds a new id, and divided by a colon between the two constituted 3. Then we create a pipeline and ready to set up the event all the data related to while preparing for the event id and timestamp stored in the ordered collection. When the transaction is completed the pipeline execution, this event will be recorded and stored in the Redis.
Event Analysis
From now on, we can be a variety of ways to analyze the time series. We can ZRANGE 4 setting the latest or the oldest event id scanning, and can get these events themselves later for analysis. By using ZRANGEBYSCORE combined with LIMIT parameters, we can obtain a timestamp immediately before or after the 10, or even 100 events. We can also calculate the number of times a specific time period of events through ZCOUNT, or even choose to implement their own analytical manner Lua script. The following example Lua script by calculating a quantity within a given time frame various types of events to.
import json
def count_types (conn, start, end):
counts = count_types_lua (keys = [ 'events'], args = [start, end])
return json.loads (counts)
count_types_lua = conn.register_script ( '' '
local counts = {}
local ids = redis.call ( 'ZRANGEBYSCORE', KEYS [1], ARGV [1], ARGV [2])
for i, id in ipairs (ids) do
local type = redis.call ( 'HGET', 'event:' .. id, 'type')
counts [type] = (counts [type] or 0) + 1
end
return cjson.encode (counts)
'' ')
count_types defined herein () function first argument passed to the Lua script encapsulated, and after json encoded event type and number of mapping decoded. Lua script first creates a result table (corresponding to the variable counts), followed by ZRANGEBYSCORE read a list of events within the time range of the id. When they get to the id, the script will read a one-time property of each event type, so that the number of events table to keep growing, and finally return after a json encoded mapping result at the end.
Reflections on performance and data modeling
As demonstrated by the code, this method is used to calculate the number of different types of events within a specific time frame to work properly, but this approach requires the type attribute of each event within the time range of a lot of reading. For the time range containing hundreds or thousands of events, this type of analysis is relatively fast. However, if within a certain time frame diet tens of thousands or even millions of events, the situation then? The answer is simple, Redis in the calculation results will be blocked.
There is a way to handle the event stream analysis, performance problems due to prolonged script execution generated that anticipate what queries need to be executed. Specifically, if you know you need the total number for each event within a certain period of time of the query, you can use a type of event that each additional ordered sets, each set only save this type of event id and timestamp right. When you need to calculate the total number of each type of event, you can perform the same function or a series of ZCOUNT 5 method calls, and returns the result. Let's look at this modified record_event () function, which will save the collection based on the event type and orderly.
def record_event_by_type (conn, event):
id = conn.incr ( 'event: id')
event [ 'id'] = id
event_key = 'event: {id}'. format (id = id)
type_key = 'events: {type}' format (type = event [ 'type']).
ref = {id: event [ 'timestamp']}
pipe = conn.pipeline (True)
pipe.hmset (event_key, event)
pipe.zadd ( 'events', ** ref)
pipe.zadd (type_key, ** ref)
pipe.execute ()
New record_event_by_type () function with the old record_event () function in many ways are the same, but added some new operations. In the new function, we will calculate a type_key, where the event will be saved in the corresponding ordered set of events of this type in the location index. When the id and timestamp of events added to the ordered set, we will also add to the id and timestamp for type_key this sorted set, and then the same with the old method of performing data insertion operations.
Now, if you need to calculate the number of times between two points in time "visit" the type of event that occurred, we simply pass a particular key event types calculated, as well as the start and end of the time stamp when calling ZCOUNT command .
def count_type (conn, type, start, end):
type_key = 'events: {type}' format (type = type).
return conn.zcount (type_key, start, end)
If we are able to know in advance all the types of events that may occur, we will be able to call each type were more count_type () function, and build a table before the count_types () are created. And if we can not know in advance of all types of events occur, or there may be a new type of event in the future, we will be able to join a collection of each type (Set) structure, and after use to discover this collection all event types. The following event is logged by our modified function.
def record_event_types (conn, event):
id = conn.incr ( 'event: id')
event [ 'id'] = id
event_key = 'event: {id}'. format (id = id)
type_key = 'events: {type}' format (type = event [ 'type']).
ref = {id: event [ 'timestamp']}
pipe = conn.pipeline (True)
pipe.hmset (event_key, event)
pipe.zadd ( 'events', ** ref)
pipe.zadd (type_key, ** ref)
pipe.sadd ( 'event: types', event [ 'type'])
pipe.execute ()
If a time range in the number of events, the new count_types_fast () function than the old count_types () function performs faster, mainly because ZCOUNT command than getting each event type faster from the hash.
In Redis as a data storage
Although Redis commands and built-in analysis tools and Lua scripts are very flexible and excellent performance, but some types of time series analysis can also benefit from a particular calculation method, libraries or tools. For these cases, the data will be saved in Redis is still a very interesting approach, because Redis for accessing data very quickly.
For example, for a stock, the whole 10 years of turnover per minute according to data sampling, but also up to 1.2 million data, this data can be easily stored in the Redis. But if you want to perform any complex function data in Redis Lua script, you need to transplant an existing optimization library or debugging, so that they Redis also achieve the same functionality. But if you use Redis data store, you can get the data within the time frame, they will save the existing kernel optimized in order to calculate the average price, price fluctuations changing and so on.
So why not choose a relational database as an alternative to it? The reason is speed. Redis all data stored in RAM, and the data structure has been optimized (As examples we cited as ordered set). Data stored in memory and treated with a combination of optimized data structures not only in terms of speed compared to the SSD as a storage medium database three orders of magnitude faster, and memory keys for general storage system, or stored in memory serialization the data system is also fast 1-2 orders of magnitude.
Conclusions and follow-up
When using Redis time series analysis, as well as any type of analysis, a reasonable way is to record some common attributes and values different events, stored in a common address, in order to search for these events contain common attributes and values . We achieve an ordered set corresponding to each event type to achieve this, and also mentioned the use of the collection. Although this article focuses on the application of an ordered set, but Redis there are still more structure, used in the analysis work Redis there are many other different options. In addition to an ordered set of hash addition, analytical work there some common structure, including (but not limited to): bitmap array index byte strings, HyperLogLogs, a list (List), collections, and it is We will soon publish location-based index of the ordered set of commands 6.
When using Redis, you will from time to time to rethink how to add a data structure related to more specific data access patterns. Data stored in the form of your choice only provide you the ability to save, but also defines the types of queries you can perform, which is almost always the same. It is important to understand, because the traditional, more people are familiar with relational databases, query and operate Redis is limited by available data storage type.
After seeing these examples of the analysis of time series data, you can further read "Redis in Action" in Chapter 7 of this book about various ways to find relevant data by creating an index, you can RedisLabs.com of eBooks column find it. In "Redis in Action" book, Chapter 8 provides a nearly complete, similar to the realization of the social network Twitter, including followers, lists, time lines, and a streaming server, the content for understanding how to use Redis to save time series and events as well as a time line response to the query is a good starting point.
1 If you start lua-time-limit configuration option, and the execution time of the script exceeds the maximum configuration, the read-only scripts may also be interrupted.
2 When the score is the same, in alphabetical order for the project members themselves sorted.
3 In this article, we typically use a colon operation Redis data partitioning for the name, namespace, and data symbol, but you can choose any kind of symbol. . "" Other Redis users may choose a period or a semicolon ";" and as a separator. Just select One common character does not appear in the key or data, it is a good practice.
4 ZRANGE and get ZREVRANGE provides an ordered set of elements from the position based on the sort of function, ZRANGE minimum score index is 0, and the maximum score ZREVRANGE index of 0.
5 ZCOUNT command will be an ordered set of data to calculate the sum of values within a certain range, but its practice is to start from an endpoint incremental traverse the entire range. For the range contains a large number of projects, the cost of this order may be large. As an alternative, you can use ZRANGEBYSCORE and ZREVRANGEBYSCORE command to find the start and end point of the range of members. The list of members at both ends by the use of ZRANK, you can look in the ordered set of indices of these two members, through the use of these two indexes, you can subtract the two (plus 1) to give the same As a result, while the computational overhead is greatly reduced, even if this approach requires Redis for more calls.
6 Z * LEX command Redis 2.8.9 introduced will be used to provide an ordered set of ordered sets of finite prefix search function, Similarly, the release date has not yet Redis 3.2 in order to provide limited by GEO * location search and indexing. |
|
|
|