SAS Index – Make Subsetting Quick

Author: Dinesh Motkar – Clinical SAS Programmer at Genpro

As part of SAS programming, we often come across situations where we need to remove unwanted data or to locate specific rows from data. Performing this processing using where clause or statement along with index can be more efficient and faster. An Index is a supplementary file that stores the location of values for specified key columns from the data. The index, each unique value of the column is paired with location id for a row that contains a value like an index of a book, which helps us to find specific topics from the book. The same way index gives direct access to specific rows in the data. Index can be created with SQL and DATA STEP.

When we submit a where statement/clause it does not include an index, Data step/ Proc SQL accesses rows in sequential order in which they are sorted. If you are working on student’s data and you want information about a student named Rogers. To identify this row Data step/ Proc SQL will begin from the first row and reads through all the data rows. Reading rows sequentially can be time-consuming. Sometimes using an index on data/ table allows access a subset of rows more efficiently. An index saves unique values for specified key column/variable in ascending order and includes the physical location of the specified values paired with the identifier, which enables you to access a row quickly.

Without Index –   

                                                                 

With Index –

         

 

Syntax –

SQLData Step
proc sql;

 

     create <unique> index index-name

on work.dm (column-1, column-n);

quit;

 

data sas-dataset-name (INDEX=index-specification-1/<unique>, index-specification-n/<unique>);

 

     set work.dm;

run;

 

Indexes can be created by two types

  1. Simple –

A Simple index is based on single column/variable which you specified. It can be either numeric, character. If you are creating simple index, then name of index should be same as indexed column/variable.

  1. Composite –

A Composite index is based on two or more column/variable which you specified. It can be either numeric, character, or combination of both. The name of index must be different than the indexed variables.

Example –

 

SQLData Step
proc sql;

 

create index name on work.dm (name);

quit;

 

data sas-dataset-name (INDEX=(name));

 

set work.dm;

run;

proc sql;

 

create index compos on work.dm (Firstname, Lastname);

quit;

 

data sas-dataset-name (INDEX= (compos= (Firstname, Lastname)));

 

set work.dm;

run;

An index can reduce the required time to locate the specific row(s), especially when working on large data, you can access less than 15% of rows more efficiently. If the size of subset increases, then performance of index will decrease. Avoid creating index on small data file or table because sequential access is quicker than index.

Guidelines for index –

If you are using index and you want effective results, following guidelines should be followed:

  • If you are creating index(s), try to keep minimum numbers
  • Avoid creating indexes on small data file/table
  • Do not create index on variables having very less unique values or binary values
  • Try to avoid creating index for same index with different names.

Conclusion –

Using index can provide improvement in performance in different circumstances, also if you use it properly it will effective in daily programming.

Wish to receive more SAS techniques? Feel free to write to us at info@genproresearch.com

Leave a Reply

Your email address will not be published. Required fields are marked *