Friday, 21 April 2017

Constructing Iterative Do Loops

To build an iterative DO loop, you need to begin with a DO command, then incorporate some action command, and then end with an END command. Here's what a simple iterative DO loop should look like:

DO index-variable = start TO stop BY increment;
       
action statements;

END;

Here are brief explanation about each commands
  • DO, index-variable, start, TO, stop, and END are required in every iterative DO loop.
  • Index-variable, which stores the value of the present iteration of the DO loop, can be any valid SAS variable name. It is common, however, to use a single letter, with i and j being the most used.
  • Start is the value of the index variable which tells SAS to start the loop.
  • Stop is the value of the index variable which tells SAS to stop the loop.
  • Increment is by how much you want SAS to alter the index variable after each iteration. The most commonly used increment is 1. In fact, if you don't specify a BY clause, SAS uses the default increment of 1.

For example let us see this DO statement:

         do name= 1 to 5;

The above statement SAS to generate an index variable called name, start at 1, increment by 1, and end at 5, so that the values of name from iteration to iteration are 1, 2, 3, 4, and 5. 

          do var= 2 to 12 by 2;

The above statement tells SAS to generate an index variable called var, start at 2, increment by 2, and end at 12, so that the values of var from iteration to iteration are 2, 4, 6, 8, 10, and 12.

The following program uses a DO loop to tell SAS to determine what four times three (4 × 3) equals:

Constructing iterative loops in SAS Base


The key to understanding the DATA step here is to recall that multiplication is just repeated addition. That is, four times three (4 × 3) is the same as adding three for four times. That's all that the iterative DO loop in the DATA step is telling SAS to do. After having initialized answer to 0, add 3 to answer, then add 3 to answer again, and add 3 to answer again, and add 3 to answer again. After SAS has added 3 to the answer variable four times, SAS exits the DO loop, and since that's the end of the DATA step, SAS moves onto the next process and gives the result.
Read More »

Wednesday, 12 April 2017

SAS : Arrays

SAS Arrays : Introduction

It provides a simple, appropriate way to process a group of variables in a SAS DATA step.

Syntax
Array array-name {number-of-elements} list-of-variables;
Note: You can use [ ] or { } or ( ) for defining number of elements in the ARRAY statement.

Examples

1.  ARRAY ABC[5] a b c d e; 

In the example above, ABC is an array-name, 5 implies the number of variables in array and "a b c d e" are the fields that make up the array.

2. ARRAY ABC[*] a b c d e; 

In the example above, SAS would automatically calculate the number of variables in array.

3. ARRAY ABC[*] X1-X10;

Where the X1 variable contains the X1 value, X2 contains the X2 value, etc.

4. ARRAY ABC[*]  $ X1-X10;

If the variables are of character type then use $ sign before specifying list of variables.

Sample Data

SAS Array : Example
data temp;
input x1 x2 x3 x4$ x5$;
cards;
1 2 3 AA BB
2 3 4 AB CC
3 4 5 AC DD
4 5 6 AD EE
5 6 7 AE FF
6 7 8 AF GG
;
run;
Example I : Numeric variables having value greater than 3 need to be replaced with missing value

data test;
set temp;
array nvars {3} x1-x3;
do i = 1 to 3;
if nvars{i} > 3 then nvars{i} =.;
end;
run;  
Output : Array Statement
Why i is 4 in the output data set?

The first time the loop processes, the value of count is 1; the second time, 2; and the third time, 3. At the beginning of the fourth iteration, the value of count is 4, which is found to be greater than the stop value of 3 so the loop stops. However, the value of i is now 4 and not 3, the last value before it would be greater than 3 as the stop value.
Note : We can drop variable "i" with drop statement or drop data set option.
Improvised version of the above code 
data test;
set temp;
array nvars (*) _numeric_;
do i = 1 to dim(nvars);
if nvars{i} > 3 then nvars{i} =.;
end;
drop i;
run;
Notes - 
  1. The "_numeric_" is used to specify all the numeric variables.
  2. The DIM function returns the number of elements (variables).

Example II. : Extract first letter of all the character variables
data test;
set temp;
array cvars (*) _character_;
do i = 1 to dim(cvars);
cvars{i} = substr(cvars{i},1,1);
end;
drop i;
run;
 Note - The "_character_" is used to specify all the character variables.

Example III. : Extract first letter and fill in the new character variables
data test;
set temp;
array cvars (*) _character_;
array dvars (*) $ x6 X7;
do i = 1 to dim(cvars);
dvars{i} = substr(cvars{i},1,1) ;
end;
drop i;
run;
Example IV : Assign Initial Values in a SAS Array
data abcd;
set temp;
array nvars (*) _numeric_;
array pvars (*) px1 px2 px3;
array pctinc {3} _temporary_ (1.1 , 1.2 ,1.3); do i = 1 to dim(nvars);
pvars{i} = nvars{i} * pctinc{i};
end;
drop i;
run;  
Notes -
  1. In the above example, we are multiplying variables' values with different numbers.
  2. When the key word _TEMPORARY_ is used in a ARRAY statement, data elements are created but are not stored in the data file.

Example V : Calculate Percentage Growth
data abcd;
set temp;
array nvars(*) _numeric_;
array diff{2} _temporary_;
array percent{2};
do i = 1 to 2;
diff{i} = nvars{i +1} - nvars{i};
percent{i} = diff{i} / nvars{i} ;
end;
drop i;
run;

Using the OF Operator in a SAS Array

Examples :

array gnp (*) x y z;

The following two statements are equivalent.
sumgnp = sum(of gnp(*)) ;
sumgnp = sum(x,y,z); 
*Calculate the mean;
  mean_score = mean(of gnp(*));

* Calculate the minimum;
   min_score = min(of gnp(*));

DO OVER LOOP

The DO OVER loop is one of the most useful DO loops. It can be used with an array when indexing of the array is not needed.
data test;
set temp;
array nvars _numeric_;
do over nvars;
if nvars > 3 then nvars = .;
end;
run;
                     
     SOURCE  
Read More »

Friday, 7 April 2017

SAS : Cartesian Product

1) Cartesian Product

A simplest join in proc sql where the final out is each row of first dataset combined with each row of the second. Its syntax is:

Proc sql;
      create table company as
            select a.empid, b.name, b.salary
            from employee as a , salary as b
            ;
Quit;

Here the final output will be product of all rows from the employee dataset with all rows in the salary dataset.
Practically this looks wrong as everyone’s salary will be everyone others’ salary, perfect Socialism J

But surprisingly this Cartesian product is the basis of all joins. Whenever you specify a join a Cartesian product is done and the output rows are restricted by certain conditions. So knowing this is necessary.

2) Cartesian Product through a datastep

This technique is not very intuitive but is asked in a lot of interviews so I am including it here:

data every_combination;
  /* Set one of your data sets, usually the larger data set */
  set one;
  do i=1 to n;
    /* For every observation in the first data set,    */
    /* read in each observation in the second data set */
    set two point=i nobs=n;
    output;
  end;
 run;

This technique creates a product of all rows.
So above is the hint for answering the question in merging SAS datasets with different id variables names. Let’s see whether you can find the answer.

for more click here or click here
Read More »

Tuesday, 4 April 2017

SAS :Convert all Numeric Variables to Character using proc SQL

The task is to create a successor to a SAS data set, replacing each numeric variable in the original with a character variable. Here is a method which preserves variable order and variable attributes. First create a data set for the demonstration:


proc sql;
create table mixedtype as
 select *
  from sashelp.class;


quit;

Result:


                  Age as

                    of 1
 First             March
 Name      Sex      2008    Height  Weight
 -----------------------------------------
 Alfred    M          14        69  112.50
 Alice     F          13      56.5   84.00
 Barbara   F          13      65.3   98.00
 Carol     F          14      62.8  102.50
 Henry     M          14      63.5  102.50
 James     M          12      57.3   83.00
 Jane      F          12      59.8   84.50
 Janet     F          15      62.5  112.50
 Jeffrey   M          13      62.5   84.00
 John      M          12        59   99.50
 Joyce     F          11      51.3   50.50
 Judy      F          14      64.3   90.00
 Louise    F          12      56.3   77.00
 Mary      F          15      66.5  112.00
 Philip    M          16        72  150.00
 Robert    M          12      64.8  128.00
 Ronald    M          15        67  133.00
 Thomas    M          11      57.5   85.00
 William   M          15      66.5  112.00

Next, this somewhat tricky query which uses DICTIONARY.COLUMNS as a source of metadata and creates the bulk of another query:



proc sql;
select case type
        when 'num'  then catx (   ' '
                                , 'left( put ('
                                , name
                                , ','
                                , case
                                   when format is null then 'best12.'
                                   else format
                                   end
                                , ') ) as'
                                , name
                                , 'label ='
                                , quote( strip(label) )                              
                              )
        when 'char' then name
        else             catx (   ' '
                                , quote('Error on type')
                                , 'as'
                                , name
                              )
        end
 into : selections separated by ' , '
 from dictionary.columns
 where libname='WORK' and memname='MIXEDTYPE';



To adapt for other tables change only the literals in the WHERE clause.

Result:


 Name

 Sex
 left( put ( Age , best12. ) ) as Age label = "Age as of 1 March 2008"
 left( put ( Height , best12. ) ) as Height label = ""
 left( put ( Weight , 6.2 ) ) as Weight label = ""

The INTO clause takes these expressions and concatenates them into a comma-separated string which it stores in the macro variable SELECTIONS, which in turn is used in the statement which actually does the work:



proc sql;
create table allchar as select &selections from mixedtype;


quit;

Result:


 First          Age as of 1

 Name      Sex  March 2008    Height        Weight
 -------------------------------------------------
 Alfred    M    14            69            112.50
 Alice     F    13            56.5          84.00
 Barbara   F    13            65.3          98.00
 Carol     F    14            62.8          102.50
 Henry     M    14            63.5          102.50
 James     M    12            57.3          83.00
 Jane      F    12            59.8          84.50
 Janet     F    15            62.5          112.50
 Jeffrey   M    13            62.5          84.00
 John      M    12            59            99.50
 Joyce     F    11            51.3          50.50
 Judy      F    14            64.3          90.00
 Louise    F    12            56.3          77.00
 Mary      F    15            66.5          112.00
 Philip    M    16            72            150.00
 Robert    M    12            64.8          128.00
 Ronald    M    15            67            133.00
 Thomas    M    11            57.5          85.00
 William   M    15            66.5          112.00

Another query against DICTIONARY.COLUMNS compares the attributes of the original and replacement variables:



 select   name    format=$10.
        , memname format=$10.
        , varnum  format=10.
        , type    format=$8.
        , length  format=6.
        , format  format=$8.
        , label   format=$30.
  from dictionary.columns
  where libname='WORK' and memname in ('ALLCHAR','MIXEDTYPE')
  order by varnum, memname desc;


 quit;

Result:



                                          Column


 Column      Member       Number in  C     olumn        Column     Column

 Name        Name             Table  Type      Length  Format    Column Label
 --------------------------------------------------------------------------------------

 Name        MIXEDTYPE             1  char            8            First Name
 Name        ALLCHAR                 1  char            8            First Name
 Sex          MIXEDTYPE             2  char            1  $1.
 Sex          ALLCHAR                 2  char            1  $1.
 Age          MIXEDTYPE             3  num            8            Age as of 1 March 2008
 Age          ALLCHAR                 3  char           12            Age as of 1 March 2008
 Height       MIXEDTYPE            4  num            8
 Height      ALLCHAR                 4  char           12
 Weight      MIXEDTYPE            5  num            8  6.2
 Weight      ALLCHAR                5  char             6

This demonstrates that variable order and variable attributes are preserved.
Read More »