Monday, November 9, 2009

Capacity Planning

Capacity planning involves two main subject areas, current storage space usage and potential growth. Both current storage and potential growth of a database can be assessed using the same basic tools.
What are those basic tools?
Simple tools for capacity planning can best be explained by starting with the following question. How do we accurately assess the current physical size of an Oracle database? There are various methods. Some methods are highly accurate but accuracy will hurt performance. Other methods are extremely fast but under some circumstances can be so inaccurate so as to be rendered useless. So how do we assess the current physical size of an Oracle database? The various methods are listed below.
Datafile sizes. Get datafile sizes of all data containing datafiles from the operating system. This method could be the most inaccurate of all.
Datafile content sizes. This method requires a join on the DBA_DATA_FILES and DBA_FREE_SPACE performance views. This method accumulates values based on what Oracle calls Extents. Extents are the physical chunks that an Oracle datafile is increased by when more space is required. This method is totally inappropriate for an Oracle database with poor physical storage organization, particularly the storage parameter PCTINCREASE. Most commercial Oracle databases are not properly structured in this respect. This method is more accurate than retrieval of datafile sizes from the operating system but can result in very unreliable results.
DBMS_SPACE. This Oracle provided package is utilized to sum up space used and space free at the block level, by database object name. This method adds space used at the block level. This method is more accurate than both of the datafile methods already listed above. However, if many blocks have are partially filled or block usage parameters are left at Oracle installation default values, inaccuracy could be as much as 50% incorrect, smaller or larger. For a small database this is not an issue but a small database does not really need capacity planning.
Statistics. Internal Oracle optimization statistics can be generated with either a minimal or fairly heavy performance impact; the heavier the performance impact the better the quality of the statistics. Also generation of sizing statistics for indexes requires an extra burden on performance. Use of statistics is probably the all-round most effective method for capacity planning. However, some older versions of Oracle databases and applications will use rule-based rather than cost-based optimization. The simple existence of statistics can cause performance problems for rule-based tuned applications. However, the OPTIMIZER_MODE parameter can resolve these issues on a database level. If multiple applications use the same database and these applications used a mix of rule-based and cost-based applications then rule-based applications could have performance problems.
Exact column data lengths. All column lengths in all rows in all tables in a database can be counted using length functions for string fields plus fixed known lengths for all other basic data types. Object data types could be an issue using this method. This method would be the most accurate for a purely relational database (no object data types at all). This method will also affect performance in the extreme.
Oracle Enterprise Manager Capacity Planner Package. The Capacity Planner package in Oracle Enterprise Manager does a lot of the work for you. It also provides lots of very nice graphical representation, automated warnings and bells and whistles. Be warned though! This package as with Oracle Enterprise Manager in general has had many problems in the past. However, the word on the street is that Oracle Enterprise Manager as released with Oracle9i (9.2) is now in excellent condition. Oracle Enterprise Manager will not be covered in this paper.

Now let’s go through each of the above listed methods for capacity assessment and planning in detail.
Datafile Sizes
On Solaris this would require the use of the df –k command and on NT/2000 either the dir command and a Perl script or just a Perl script using a Perl contained package allowing disk reads from within Perl. The script shown in Figure 1 could be used to validate the existence of an acceptable level of disk space available on a Solaris box. This script will indicate when disk space is reaching critical levels. Some simple adjustments to this script could be made to check current disk space results in the file diskspace.log, with previous executions of the script stored in older log files; thus estimating growth rate over a period of time.

Figure 1: A partial script for extracting available disk space
Datafile Content Sizes The query shown in Figure 2 is a partial script showing a join between the Oracle performance views DBA_DATA_FILES and DBA_FREE_SPACE. Note that this script will not amalgamate datafiles into tablespaces but will show space used for each datafile (temporary datafiles excluded). Effectively this query will assess database size in terms of datafile extents. Whenever an autoextensible datafile runs out of space a new extent is automatically allocated to that datafile. For a non-autoextensible datafile new extents have to be added manually by resizing the datafile using the ALTER DATABASE command. The DBA_DATA_FILES column BYTES shows the total size of a datafile in bytes. The DBA_FREE_SPACE column BYTES shows the BYTE size of all free extents in a datafile within a tablespace.
The consistency of extent sizes is largely dependant on settings for the storage parameter PCTINCREASE, for tablespaces and database objects such as tables and indexes. The important point to note about the query in Figure 2 is that the result of byte values in the DBA_FREE_SPACE view can be very inaccurate if PCTINCREASE is set anywhere in the database at greater than 0%. Why? A static database, which has never grown in size, would not be affected by PCTINCREASE if PCTINCREASE has never been applied by the creation of a new extent. This is probably very unlikely. If many new extents are added to a datafile it is quite possible that a new extent added could be much larger than expected. The other issue with setting PCTINCREASE greater than 0% is that empty extents, as a result of deletions, will not be reused since new extents, created larger than old extents, will not be able to reuse old extents, which are smaller. Coalescence can help alleviate this problem but coalescence only manages to join extents, which are physically next to each other.
i In Oracle9i the PCTINCREASE parameter is largely irrelevant. The default for the CREATE TABLESPACE command is to create a locally managed tablespace. The PCT_INCREASE parameter is not set for locally managed tablespaces. The Oracle configuration parameter COMPATIBLE must be set to at least 9.0.0 for this locally managed tablespace default to take effect.
Since many existing Oracle database are pre-Oracle9i this script is still relevant.


Figure 2: A partial script for datafile extent sizes

DBMS_SPACE
DBMS_SPACE is an Oracle provided package capable of summing all blocks for each database object, namely tables, indexes and clusters. Something like the stored PL/SQL code shown in Figure 3 could be used to execute capacity planning database space usage based on used and unused blocks.

The problem with this method is that the settings of block storage parameters such as PCTUSED could cause an unrealistic picture of the actual size of the data. PCTUSED is defaulted to 40% for all tables and indexes. If a block has rows has deleted from it then the block will not be used until the block gets to below 40% filled. If a database has a lot of delete activity, either in many or a few large tables, this method could give very misleading results.



Figure 3: An PL/SQL script calling DBMS_SPACE.UNUSED_SPACE (untested)

Statistics
Before discussing how we can use statistics to capacity plan, let’s go over how we can gather statistics. There are two methods of gathering Oracle database statistics:
The ANALYZE command. Can be used to collect non-optimizer statistics and will be deprecated in a future version of Oracle.

The DBMS_STATS Oracle provided package. Collects optimizer statistics and can be executed in parallel on a multiple CPU system for better performance.
The ANALYZE Command
The command shown below will create non-optimizer statistics for all rows in a specified table.

ANALYZE table name COMPUTE STATISTICS;

A more efficient but less accurate form of the same command estimates the values for statistics by sampling 1064 rows from the specified table. Note the optional SAMPLE clause allowing specification of percentage or number of rows.

ANALYZE table_name ESTIMATE STATISTICS [ SAMPLE { 1-99% rows ];

Estimating statistics is much better for performance but potentially much less accurate than computing all the statistics. Accuracy of estimating statistics depends largely on the size of the database object.

i The ANALYZE command can be used to generate statistics for tables, indexes and clusters.
The DBMS_STATS Package
Statistics can be generated and have all sorts of other things done with them, in various ways, using the DBMS_STATS package. For the purposes of capacity planning the most important aspect of the DBMS_STATS package is that of gathering statistics. Statistics can be gathered for indexes and tables and even for a whole schema or an entire database. The commands below can be used to gather statistics for a single table and a single index.
EXEC DBMS_STATS.GATHER_TABLE_STATS(‘owner’, ‘table name’);
EXEC DBMS_STATS.GATHER_INDEX_STATS(‘owner’, ‘index name’);
i DBMS_STATS is faster at generating statistics then the ANALYZE command.
Using Statistics for Capacity Planning
Calculating the number of blocks or byte size for a table using statistics is simple. After statistics have been generated query the USER_TABLES Oracle performance view with the query as shown below.

SELECT TABLE_NAME, NUM_ROWS,BLOCKS,BLOCKS*db_block_size AS BYTES
FROM USER_TABLES;

Find the db_block_size (database block size), when logged in as SYS or SYSTEM, by executing the query shown below.

SELECT VALUE FROM V$PARAMETER WHERE NAME=’db_block_size’;

You could then add up the size of all tables in a specified schema, assuming you have generated statisitics for all tables in that schema by executing the following query.

SELECT SUM(BLOCKS*db_block_size) AS BYTES
,SUM(BLOCKS*db_block_size)/1024/1024 AS MB
FROM USER_TABLES;

So now we know how to utilize statistics to find the size of all your data. Other objects that could be analyzed and summed up would be objects such as indexes and clusters. Since clusters are rarely used we will ignore them. Indexes on the other hand are generally more numerous in number than tables. It is often found that the total byte size of all indexes in a schema can be larger than that of all tables. We could estimate the size of indexes based on an assumed table to index ratio but that would be inaccurate.

The INDEX_STATS Oracle performance view is the only view that can be used to calculate an exact size for an index. Generation of data into the INDEX_STATS view requires used of the ANALYZE INDEX index name VALIDATE STRUCTURE; command. This command can cause problems and it will contain statistics for only a single index at a time.

A simple method of assessing index size is to use index statistics generated by the ANALYZE INDEX command or the DBMS_STATS.GATHER_INDEX_STATS procedure; the USER_INDEXES view will contain the resulting statistics. The query shown below will retrieve a good estimate of size for an index.

SELECT INDEX_NAME,LEAF_BLOCKS*db_block_size/BLEVEL AS BYTES
FROM USER_INDEXES;

As with the USER_TABLES view, obtain the total size of all indexes in a schema by executing a query like this one.

SELECT SUM(LEAF_BLOCKS*db_block_size/BLEVEL) AS BYTES
,SUM(LEAF_BLOCKS*db_block_size/BLEVEL)/1024/1024 AS MB
FROM USER_INDEXES;

Dividing the number of leaf blocks in an index by its branch level value is a bit of a guess as far as assessing size. For very large indexes it would be sensible to validate this by executing the ANALYZE INDEX index name VALIDATE STRUCTURE; command for that particular index and then executing the query shown below to get a better perspective on the actual size of that index.

SELECT NAME,BR_BLKS+LF_BLKS,USED_SPACE AS BYTES FROM INDEX_STATS;

Four queries are shown below in succession. Each query progressively produces a summary of current database storage capacity for objects within a schema. All of the four queries list byte, megabyte and gigabyte sizes.

The first query lists sizes for all tables in a schema.

SELECT TABLE_NAME
,BLOCKS*8192 AS BYTES
,ROUND(BLOCKS*8192/1024/1024) AS MB
,ROUND(BLOCKS*8192/1024/1024/1024,1) AS GB
FROM USER_TABLES;

The second query shows sizes for all indexes in a schema for each table.

SELECT TABLE_NAME
,SUM(LEAF_BLOCKS*8192/DECODE(BLEVEL,0,1,BLEVEL)) AS BYTES
,ROUND(SUM(LEAF_BLOCKS*8192/DECODE(BLEVEL,0,1,BLEVEL))/1024/1024) AS MB
,ROUND(SUM(LEAF_BLOCKS*8192/DECODE(BLEVEL,0,1,BLEVEL))
/1024/1024/1024,1) AS GB
FROM USER_INDEXES GROUP BY TABLE_NAME;

The third query shows sizes for all tables in a schema plus sizes of indexes created for each of those tables.

SELECT t.TABLE_NAME
,(t.BLOCKS+SUM(i.LEAF_BLOCKS/DECODE(i.BLEVEL,0,1,i.BLEVEL)))
*8192 AS BYTES
,ROUND((t.BLOCKS+SUM(i.LEAF_BLOCKS/DECODE(i.BLEVEL,0,1,i.BLEVEL)))
*8192/1024/1024) AS MB
,ROUND((t.BLOCKS+SUM(i.LEAF_BLOCKS/DECODE(i.BLEVEL,0,1,i.BLEVEL)))
*8192/1024/1024/1024,1) AS GB
FROM USER_TABLES t, USER_INDEXES i
WHERE t.TABLE_NAME = i.TABLE_NAME
GROUP BY t.TABLE_NAME,t.BLOCKS;

The last query of the four is the same as the query used in the second half of the script sample shown in Figure 4. This query sums total physical size of all tables and indexes in a schema.

SELECT SUM((t.BLOCKS+SUM(i.LEAF_BLOCKS/DECODE(i.BLEVEL,0,1,i.BLEVEL))))
*8192 AS BYTES
,ROUND(SUM((t.BLOCKS+SUM(i.LEAF_BLOCKS/DECODE(i.BLEVEL,0,1,i.BLEVEL)))
*8192/1024/1024)) AS MB
,ROUND(SUM((t.BLOCKS+SUM(i.LEAF_BLOCKS/DECODE(i.BLEVEL,0,1,i.BLEVEL)))
*8192/1024/1024/1024),1) AS GB
FROM USER_TABLES t, USER_INDEXES i
WHERE t.TABLE_NAME = i.TABLE_NAME
GROUP BY t.TABLE_NAME,t.BLOCKS;
The script shown in Figure 4 could be used, in part or in whole, to generate statistics and calculate the size of all tables and indexes in a schema.


Figure 4: Using statistics for capacity planning

Exact Column Data Lengths
This method of calculating the physical size of a database involves measuring the lengths of all columns, of all rows in all tables and indexes in the database. This method should be the most accurate but a real killer for performance. How accurate do you want to get? This method does border on the ridiculous. However, when doing a migration from a non-Oracle database such as PostGres, and your database is very small, how else would you be able to predict expected capacities? The simple fact is the smaller your database is, and the larger you expect it to get in the future, the more accurate your capacity planning has to be. This type of scenario is common in startup companies where an initial database is miniscule. These types of environments usually expect unimaginable growth. Sometimes these companies are correct. Admittedly not very often but it does happen. In a situation like this a method such as this would apply.

In describing this approach to capacity planning it is best to keep things simple. Thus we will stick to simple datatypes; thus no binary objects or other such nasty, unpredictable things. If non-simple datatypes have to be included then they have to be included, and factored into the calculations.

The simple data types in an Oracle database and their respective lengths are as follows. Note that some datatypes make things simple for use because their actual physical lengths are fixed. Also some datatypes are automatically converted to more general datatypes when applied to columns in tables. Not very efficient use of space but that is something invented by relational database vendors such as Oracle many years ago.

This method may seem complicated but is actually very simple. The only variable length simple datatypes in Oracle are the VARCHAR2 datatypes. Thus VARCHAR2 datatypes are the only columns requiring application of the LENGTH function to the column to find the length of the data in the column. All other datatypes will have fixed lengths.

If you want you can execute the scripts following in a tool such as SQL*Plus Worksheet and you will see what I mean. Create the table shown below and insert a single row as shown. This table is deliberately created with a very large mix of simple datatypes.

CREATE TABLE ATABLE(
rnum INTEGER,
vc1 VARCHAR2(4000) DEFAULT '0123456789',
vc2 VARCHAR2(4000) DEFAULT '012',
c1 CHAR(2) DEFAULT 'ab',
c2 CHAR(10) DEFAULT 'abc',
n1 NUMBER DEFAULT 100,
n2 NUMBER(10) DEFAULT 101,
n3 NUMBER(12,4) DEFAULT 103.1234,
f1 FLOAT DEFAULT 1.23334,
si1 SMALLINT DEFAULT 0,
si2 SMALLINT DEFAULT 100,
i1 INTEGER DEFAULT 0,
i2 INTEGER DEFAULT 222,
d1 DATE DEFAULT SYSDATE,
t1 TIMESTAMP DEFAULT SYSDATE
);
INSERT INTO ATABLE(rnum) VALUES(1);
COMMIT;

Now let’s examine the column length specifications of the table we have just created by selecting the appropriate columns from the USER_TAB_COLUMNS Oracle performance view. Execute the query shown in Figure 5, I have used SQL*Plus Worksheet; it looks nice in here.

In Figure 5 all the data lengths are shown for each type of datatype contained in the table created by the script shown above. Once again, the only variable datatype of the datatypes in the table is the VARCHAR2 datatype. The VARCHAR2 datatype is therefore the only datatype required to have the length of its value measured using the LENGTH function.

The resulting calculation simply needs to multiply the number of rows in each table with the sum of the lengths of its datatype defined column lengths as shown in Figure 5. Adjustment is required for VARCHAR2 datatypes by applying the length functions to the column values. This is why this method can seriously hurt performance. Index sizes can be assessed simply by multiplying table rows again by the sum of the lengths of the columns for all indexes created on each table. Indexes can be erroneous due to null values not being stored in indexes in some cases. However, most modern databases use object front-end applications. These types of applications tend to avoid use of composite keys for anything but many-to-many join resolution entities; these entities usually contain integers in their key values which are never null.

Figure 5: Querying the USER_TAB_COLUMNS view

No comments:

Post a Comment