When you create tables and data warehouses often need to make many decisions. Some decisions may seem insignificant at the time, but it will eventually lead to you and your customers have suffered pain in the whole process of using the database.
And we have thousands of people and their dealings with the database, after countless hours of reading and writing after inquiry, we can say we've seen almost every situation. Here's what we help create summed painless mode (Schema) of 10 rules.
1. Use only lowercase letters, numbers, and underscores
Do not in the database, schema, table or column names using the dots, dashes or spaces. Because the point number is used to identify the object, typically used only in database.schema.table.column this case.
In the name of the object also contains a dot will bring trouble. Similarly, use spaces in the object name query will force you to join a bunch of unnecessary quotes:
select "user name" from events
select user_name from events
In addition, once on the table or column names use uppercase letters, the query becomes more difficult to write. Because if the words are all lowercase letters, there is no need to remember special table in the end user is the Users or users.
Not only that, when you finally change the database or copy a table into the data warehouse, in addition to a number of databases, you do not remember which database is case sensitive.
2. Use simple and descriptive column names
If users need to define a reference table foreign key packages table, then named package_id is a good choice. We should avoid this like pkg_fk short and ambiguous column names, because other people's hard to know what that means. Descriptive names can make it easier for others to understand mode, and when the team expanded that to maintain the efficiency is very important.
Do not use ambiguous named may have a variety of data interpretation. If you find yourself in item_type or item_value this naming style when you create a column, then probably you should use them more columns with a specific name, such as photo_count, view_count and transaction_price.
Because doing so, the column stores what data can always be learned from the model, and does not need to be derived by the other values in the row.
select sum (item_value) as photo_count
where item_type = 'Photo Count'
select sum (photo_count) from items
Do not use the table name as a prefix column names. Because users generally defined in the table, such as user_birthday, user_created_at or user_name such a column name not achieve any supporting role.
Finally, we need to avoid, such as column, tag, or user of such reserved keywords as column names. Because once the use of reserved keywords, it means that operators have to use an additional reference in the query, and when someone forgot to do so, the database could be very confusing error message. And if you use the keywords in this place is the column name appears, then the database can not understand the query.
3. Use simple and descriptive table name
If the table name is more than one word, then use the underscore split them. Because package_deliveries than packagedeliveries easier to read.
If possible, always use one word instead of two, as deliveries to more easily read.
select * from packagedeliveries
select * from deliveries
Do not use the name as a prefix pattern table name. If you need some tables included a range, then simply put these tables into a pattern can be. And prefix column names, such as store_items, store_transactions or store_coupons this table are usually no additional prefixes.
We recommend the use of the plural form of the name of the table name (such as packages), and the name of the table to join tables in two words that also use the plural form. The singular form of table names are more likely to accidentally conflict with general keywords in your query and its readability is not high.
4. shaping as the primary key
Whether you are using a variety of UUID (universally unique identifier) column as the primary key, or do you think with the added growth since the integer primary key sequence simply does not make sense (eg for joint table), we recommend that you add a belt self-growth standard integer sequence id columns. This type of the primary key will make specific analysis easier, such as a set of data selected from only the first line.
When importing data and work has led to repeated data, the primary key will be the panacea, because we can easily delete a particular row by primary key:
whereidin (select ...) as duplicated_ids
Avoid multi-column primary key. When trying to write efficient queries, multi-column primary key will result in a query is difficult to understand and difficult to modify. We can use an integer primary key, unique constraint or a plurality of columns, and then a few single or index to replace the multi-column primary key.
5. consistent with the foreign key
Naming primary and foreign keys There are many kinds of styles. We recommend the use of you is the most popular style, that is, for any form foo, foo will be the primary key named id, all of the foreign key named foo_id.
Another style is to use a global unified primary key name. In this style, the table's primary key called foo foo_id, and all of the foreign key is also known foo_id. But no matter what style, using an abbreviated words (such as the users table abbreviated uid), will always cause distress or name conflict, it should avoid the use of abbreviations.
And, no matter what style you choose, you must stick to it. Do not use uid in some places, but use user_id or users_fk elsewhere.
joinusers on users.user_id = packages.uid
joinusers on users.id = packages.user_id
In addition to also pay attention to the foreign key does not match the case of an explicit list. A column called owner_id may be a foreign key users of the table, of course, may not. So if necessary, add a foreign key column named user_id or owner_user_id.
6. The date and time is stored as a variety of datetime type
Do not use Unix timestamp or a string to store the date, but to convert them to a variety of date and time types. Although SQL date calculation function is not the best, but to call these functions to handle timestamps than their own to deal simpler. At query time, we need to relate to each type from the timestamp to datetime conversion date functions SQL query calls
selectdate (from_unixtime (created_at))
Do not place the year, month, day respectively stored in different columns. As this would cause the query to each respective time series is more difficult to write, but also an obstacle to the use of most of the SQL beginner date information when this table.
selectdate (created_year || '-'
|| Created_month || '-'
7. Always use UTC
Use the time zone instead of UTC will cause endless problems. Good tools (including our Periscope) has converted from UTC time in your zone data to all the features you need. In the Periscope, simply add: pst can convert UTC to Pacific Time.
select [created_at: pst], email_address
Time zone database should be set to UTC, and all datetime column should be the type of area after the time of peeling (eg, no time zone timestamp).
If your database is not UTC time zone, or a mix of your database and non-UTC UTC date and time, then the query time series analysis will become more difficult.
8. A single source of truth
There should be only a single data source of truth (Source of Truth). View and summary (Rollup) itself should be marked. In doing so, consumers will know the difference between the data and the data they use native truth.
On the other hand, such as a user_id, user_id_old or left column user_id_v2 are reserved, it will only bring endless trouble. So make sure routine maintenance work will be carried out to remove obsolete tables and fields are no longer in use.
9. priority not JSON-column table
Do not use too many columns of the table. If a table has more than dozens of columns and some of them are based on sequence name (for example, answer1, answer2, answer3), then immediately you will feel bad after.
The correct approach is thus transformed into a table that does not contain duplicate column model, because this model will be very easy to query. For example, to calculate a questionnaire completed subject in a number of query:
(Casewhen answer1 isnotnull
(Casewhen answer2 isnotnull
(Casewhen answer3 isnotnull
) As num_answers
whereid = 123
select count (response)
where survey_id = 123
For query and analysis, the column is withdrawn from the JSON data operations to substantially reduce the performance of the query. Although there are many good reasons to support our use JSON listed in the product, but it is not the case for query and analysis. The column was bold JSON dismantling a more simple data types, query and analysis can make faster and easier.
10. Do not over-standardization
Date, zip code and country do not use query tables with foreign keys stored separately. Excessive standardization will lead to the end of each query must bring some of the same table join operation. This will not only create a lot of duplicate SQL, database and for this we need to do a lot of extra work.
join dates on users.created_date_id = dates.id
Tables are first-class objects in the database, with many of their own data. Any remaining data can be used as another important object additional columns.
Better models waiting for you! As you would expect, and the new members of your team, it will help to follow these rules under a table or query data warehouse easier. If you do not believe there are more rules or suggestions, please contact us. We are looking forward to hear from you!