Thursday, 15 July 2010

python - Rename Pandas Multiindex based on another column's name -


I have found a CSV file that is generated in the format that I can not change. The file has a multi index: headers on two rows.

What does my headline look like:

header luos

It really comes down and what I want:

the header really What is

I would like to process it correctly with Pandya in Python 2.7.

I had to loosen to the first level of the index and if the value is empty, set it to the same one on the left side.

I start by loading dataframes into pandas:

  df = pd.read_csv (myFile, header = [0,1], sep = ',') df  

Dotframe loaded in pandals

I have tried the following:

 for , value in value (df.columns.values): If Val [0] [: 7] == 'Anonymous': l.append ([L [i-1] [0], val [1]] and: L.append (val)  

list "L" I think what I want to do ('Foo', 'A'), ['Foo', 'B'], ['Foo' 'C'], ('Bar', 'A' ), ['Bar', 'b'], ['bar', 'c']] < I have tried both:

  df.column = l  

produces a non-multi index dataframe

flat dataframe

  index = pd Multindex From_tepplease (L) df.reindex (column = index)  

This one gives me the correct indicator, but values ​​disappear.

missing value

I think the whole approach I try I am not sure that the whole approach is very dragon, nor does it make sense to use a list that there is a strong intestine, how can I do the multi index properly with any idea? Instead of using

reindex , set directly in your new index Column:

  df.columns = pd.MultiIndex.from_tuples (l)  

It should produce the desired result

reindex not only changes the index values ​​(though it seems to be what it should do, and the documentation is not particularly obvious). Instead it goes through your new indices, chooses the rows or columns that match the new indices, and puts NaN , where no old index matches the new index. What's happening to: When reindex hits ['Foo', 'B'] , which is not present in your original dataframe, this column fills in the new < Dataframe with Code> NaN .

If your columns always follow a consistent pattern (for example, a top-level column for each three-second-level column), you can also use the column index to create To: Products (Iterables) from

> iterables = [["foo", "bar"], ["a", "b", "c"]] index = pd.multiindex


No comments:

Post a Comment