Thursday, 15 September 2011

apache spark - SparkSQL and UDT -


I try to use SparkSQL (v.1.3.0) to access a PostgreSQL database. I have a table in this database

  Make the Tables Test (id bilint, double working []) ;;  

To access the table, I use

  val sparkConf = new SparkConf (). SetAppName ("TestRead") SetMaster ("Local [2]") Val Scheduled = New SparkContext (sparkConf) Val sqlContext = New SQLContext (Scheduled Cast) Val jdbcDF = sqlContext.load ("JDBC", Map ("url" -> "JDBC: PostgreSQL: // ... "," dbtable "- & gt;" schema.test "," user "- & gt;" ... "," password "->" ... ")) sqlContext However, every time I try to reach the table containing this array I get a  java.sql.SQLException: .sql ("select * from schema.test")  . Unsupported type 2003  

I have found an example inside the SPARC test code, which makes the UDT for two-dimensional point (see) in spark. However, I do not know how I can use this code.

This can be obtained at least by casting pyspark

I'm not sure that the syntax is correct, but something like this will go:

  val query_table = "(select ID, cast ( Value text as) Schema.test) AS casted_table "val jdbcDF = sqlContext.load (" jdbc ", map (" url "->" jdbc: postgresql: // ... "," dbtable "- & Gt; query_table, "user" - & gt; "...", "password" -> "...")) jdbcDF.map (x => (x.id, x.values. ToArray))  

I'm pretty sure there is no .toArray , which will put a string representation back into the array, it just Ark code .. but now it is a matter of parsing correctly.

Of course, this is just a patch, but it works.


No comments:

Post a Comment