Saturday, 15 June 2013

Construct SAS dataset based on file containing metadata -


I have two text files that have no headers and some other column names and length raw data. I want to use these two files to create the same SAS dataset, which will include data from one file to another column name and length.

The file containing the data is a fixed-width text file. That is, each column of data is aligned with a special column of the text file, which has a pad with spaces to ensure alignment.

datafile.txt:

  John has a 37 favorite books via children, Sally 29 is an astronaut bill 60 drink coffee  
< P> The file containing the metadata is tab-delimited with two columns: one in the data file with the name of the column and the length of the character of that column. Names are listed in the order in which they appear in the data file.

/ P>
  names | Age Comments ------- + ------ + ----------------- John | 45 Will kill two children. 37 | Selection of books Sally | 29 | Is an astronaut bill 60 | Drink coffee  

I have to keep the every column as a character with the length specified in the metadata file.

A better way should be to create a length statement and an input statement by using imported metadata from my inexperienced approach such as: < / P>

  / * import metadata * / data meta; Length colname $ 50 collen 8; Infile 'C: \ Metadata .txt' DSD DLM = '09 'x; Input column $ collen; Run; / * Create LENGTH and INPUT statements / data_null_; Length lenstmt inptstmt $ 1000; Linstat aptmt "collart 1; Set meta end = eof; Call catux ('', lenestat, column, '$', colen); Call catx ('', apststmt, cats ('@', callstart), column, '$ & amp;'); Colstart + collen; If eof then; Call Simples ('Lenestat', Lenestat); Call Simples ('ApptSTMT', APTSTMT); End; Run; / * Import data file * / data datafile; Length and lensstat; Infile 'c: \ datafile.txt' dsd DLM = '09 'x; Input & amp; Inptstmt; Run;  

This should be what I need, but should be a clean way. With this approach if there is an insufficient space allocated to the variable that is length and input statements, or if the length of the statement is greater than the maximum macro variable length There may be trouble. / P>

Any thoughts?

There is a general standard way of doing what you are doing. Yes, you can see things a little more carefully; I assign $ 32767 for two statements, for example, just be careful.

There are some ways that you can improve it, however, this may remove some of your worries.

First of all, a common solution is to use proc sql to create a macro variable at the line level (as you do) and then create the macro variable. It has a maximum maximum limit as compared to the data phase method (data phase method maximum $ 32767 if you do not use multiple variables, SQL is at 64kib).

  proc sql; Select cats ('', column, '$', colen): separating lenstmt from '' meta; * And similar to inputstmt; leave;  

Second, you can exceed the 64k limit by typing the file instead of the macro variable, take your data step, and instead of caching and then using call simpt Using, write each line in a temp file (or two). Then instead of using Macro variables in these input datesteps, you can % include - yes, you can do this in the middle of a datastop in % include .

There are other methods, but these are both the most common and should work for most use cases. In some other methods, the command to open the file to work directly with call execution , run_macro , or the file is used Generally, they are either more complex Or two are less useful than most, although they are certainly acceptable solutions and are not uncommon for viewing in practice.


No comments:

Post a Comment