Re: multiple outputs from compiled function

*To*: mathgroup at smc.vnet.net*Subject*: [mg114647] Re: multiple outputs from compiled function*From*: Oliver Ruebenkoenig <ruebenko at wolfram.com>*Date*: Sat, 11 Dec 2010 01:55:50 -0500 (EST)

Hello Eric, I have added some comments and further improvements. On Thu, 2 Dec 2010, Oliver Ruebenkoenig wrote: > > Hi Eric, > > On Tue, 30 Nov 2010, Eric Michielssen wrote: > >> Thanks for the suggestion. > > most welcome. > >> >> I was thinking more about a routine that populates a huge sparse matrix, say >> for finite element analysis. SparseArray does not work within Compile, and >> that's understandable. >> >> So the next best thing would be to have a compiled function generate a long >> list of positions ( pos = {pos1,pos2,pos3...} with posi = {n1,n2} ) and a >> list of values ( val = {val1,val2,val3,...} ), and then construct the >> SparseArray externally to a compiled function. >> >> But how do I get my compiled function to return both pos and val? Upon > > You don not. Let me show you why. > >> reaching Return[{pos,val}], Mathematica reverts to the noncompiled function, >> in essence starting over from scratch. (I can only think of a few not so >> elegant solutions, none of which would apply in situations more complex than >> the above). >> > > First you might find this link interesting: > > http://library.wolfram.com/infocenter/Conferences/7549/ > > (*Some setup:*) > > $HistoryLength = 0; > On["Packing"] > pack = Developer`ToPackedArray; > packedQ = Developer`PackedArrayQ; > (*This is to keep track of memory consumption.*) Mem := {MemoryInUse[]/1024.^2, MaxMemoryUsed[]/1024.^2} > (*OK, lets create an example mesh. For this we use the new TetGenLink:*) > > Needs["TetGenLink`"] > pts = {{0., 0., 0.}, {10., 0., 0.}, {10., 2., 0.}, {0., 2., 0.}, {0., > 0., 1.}, {10., 0., 1.}, {10., 2., 1.}, {0., 2., 1.}}; > facets = {{{1, 2, 3, 4}}, {{5, 6, 7, 8}}, {{1, 2, 6, 5}}, {{2, 3, 7, > 6}}, {{3, 4, 8, 7}}, {{4, 1, 5, 8}}}; > inInst = TetGenCreate[]; > TetGenSetPoints[inInst, pts]; > TetGenSetFacets[inInst, facets]; > (* to get more elements use: "pqa0.000075" *) > outInst = TetGenTetrahedralize[inInst, "pqa0.075"]; (* this generates about 500000 tets and about 90K Dofs. *) outInst = TetGenTetrahedralize[inInst, "pqa0.000075"]; > coords = TetGenGetPoints[outInst]; > surface = TetGenGetFaces[outInst]; > meshElements = TetGenGetElements[outInst]; > dofs = Length[coords] > Length[meshElements] > > (* visualize *) > TetrahedraWireframe[i_] := > Line[ Join @@ (i[[All, #]] & /@ {{1, 2}, {2, 3}, {3, 1}, {1, 4}, {2, > 4}, {3, 4}})] > Graphics3D[GraphicsComplex[coords, TetrahedraWireframe[meshElements]], > Boxed -> False] > > > (* find position of Dirichlet values *) > diriPos1 = > Flatten[ pack[ > Position[ > coords[[All, 1]], _?(Function[xCoord, xCoord <= 0.]), {1}, > Heads -> False ]]]; > diriPos2 = > Flatten[ pack[ > Position[ > coords[[All, 1]], _?(Function[xCoord, xCoord >= 10.]), {1}, > Heads -> False ]]]; > diriPos = Join[ diriPos1, diriPos2]; > diriVals = > Join[ ConstantArray[ 1., {Length[ diriPos1]}], > ConstantArray[ 0., {Length[ diriPos2]}]]; > > (* visualize *) > Show[ { > Graphics3D[GraphicsComplex[coords, Polygon[surface]]], > Graphics3D[ { > {Red, PointSize[0.02], Point[ coords[[diriPos1]]]}, > {Blue, PointSize[0.02], Point[ coords[[diriPos2]]]}}] > }, Boxed -> False] (* Here you could use to not draw the edges. *) Graphics3D[{EdgeForm[], GraphicsComplex[coords, Polygon[surface]] > > > (* symbolically precompute a FEM diffusion kernel and inject in compile *) > tmp = Join[ {{1, 1, 1, 1}}, > Transpose[ Quiet[Array[Part[var, ##] &, {4, 3}]]]]; > me = {{0, 0, 0}, {1, 0, 0}, {0, 1, 0}, {0, 0, 1}}; > p = Inverse[ tmp].me; > help = Transpose[ (p.Transpose[p])*Abs[Det[ tmp]]/6 ]; > femk = With[{code = help}, > Compile[{{coords, _Real, 2}, {incidents, _Integer, 1}}, Block[{var}, > var = coords[[incidents]]; > code > ], RuntimeAttributes -> Listable, CompilationTarget -> "C"]]; > > > (* compute the local elements. - note just the elements. we do the > positions later. For fun, you may want to scale the number of elements > later *) > AbsoluteTiming[ localElements = femk[coords, meshElements]; ] > packedQ[localElements] > (* For 1/2M tets this takes 0.3 sec on my laptop. *) > (* this is the function to assemble the local elements into a sparse array > *) > matrixAssembly[ values_, pos_, dim_] := Block[{matrix}, > System`SetSystemOptions[ > "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}]; > matrix = SparseArray[ pos -> Flatten[ values], dim]; > System`SetSystemOptions[ > "SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}]; > Return[ matrix]] > > > (* > "TreatRepeatedEntries" does the following. If during the SparseArray > creation duplicate positions are found such {1,1}->1 and {1,1}->2, the > values are added and you'll find the content to be {1,1}->3 *) > > (* now the positions: > The reason to compute the position independent from the elements is to > save them for later. Consider this, if you want to assemble a mass matrix > the entries will be at exactly the same positions as in the stiffness > matrix, just different values. This is quite efficient. > *) > AbsoluteTiming[ > pos = Flatten[ Map[ Outer[ List, #, #] &, meshElements], 2]; ] In[23]:= Mem Out[23]= {32.1475, 34.0788} In[25]:= AbsoluteTiming[ pos = Flatten[Map[Outer[List, #, #] &, meshElements], 2];] Out[25]= {0.303801, Null} In[26]:= Mem Out[26]= {93.8901, 155.627} In[27]:= ByteCount[pos]/1024.^2 Out[27]= 61.7348 In[28]:= packedQ[pos] Out[28]= True (* so pos needs 60MB but during the computation we needed a copy of it, thus 155MB - we can ovoid this. If you quit the kernel and restart and use the following WVM code no copy of that data is made and less memory is needed. This is not terribly important since the direct linear solver will take much more memory. *) In[24]:= Mem Out[24]= {32.1475, 34.0788} In[25]:= cPos = Compile[{{inci, _Integer, 2}}, Flatten[ Map[ Outer[ List, #, #] &, inci], 2], CompilationTarget -> "WVM"]; (* the reason I choose to compile to WVM is the following. I have tried ->"C" but that did not result in a substantial speedup. But compiling to C has the disadvantage that the C compiler - in my case gcc - takes a "long" time. Compiling to byte code is faster. And since there was no speed up with generated C code WVM is very good here. *) In[26]:= Mem Out[26]= {32.1514, 34.0788} In[27]:= AbsoluteTiming[ pos = cPos[meshElements];] Out[27]= {0.251773, Null} In[28]:= Mem Out[28]= {93.8888, 93.892} (* no additonal memory needed. *) > > > (*assemble the system*) > AbsoluteTiming[ > stiffness = matrixAssembly[ localElements, pos, dofs ] ] takes about 2.6 sec. > > (*now for the Dirichlet conditions, we need something to pass the > stiffness by reference and modify that matrix directly, you can do so*) > > SetAttributes[dirichletBoundary, HoldFirst] > dirichletBoundary[ {load_, matrix_}, fPos_List, fValue_List] := > Block[{}, > load -= matrix[[All, fPos]].fValue; > load[[fPos]] = fValue; > matrix[[All, fPos]] = matrix[[fPos, All]] = 0.; > Transpose[ {fPos, fPos}] -> ConstantArray[ 1., {Length[ fPos]}]; > matrix += SparseArray[ > Transpose[ {fPos, fPos}] -> ConstantArray[ 1., {Length[ fPos]}], > Dimensions[ matrix], 0]; > ] > > load = Table[ 0., {dofs}]; > dirichletBoundary[{load, stiffness}, diriPos, diriVals] > > (* dirichletBoundary returns Null and operates on stiffness and load > directly *) > > (* solve with a direct solver *) > AbsoluteTiming[ > solution = LinearSolve[ stiffness, load]; ] > OK, here is an alternative route. Let's use an iterative solver. If we look at the structure of the stiffness matrix Off["Packing"] MatrixPlot[stiffness] we see that it is not banded. That is natural for FEM applications. In[36] := ByteCount[stiffness]/1024.^2 Out[36] = 15.1168 In[37] := stiffness["Properties"] Out[37] = {AdjacencyLists, Background, NonzeroPositions, NonzeroValues, PatternArray, Properties} In[38] := ByteCount[stiffness["PatternArray"]]/1024.^2 Out[38] = 5.26147 (* we load the GraphUtilities and the MinimumBandwidthOrdering *) In[39] := Needs["GraphUtilities`"] In[40] := {r, c} = MinimumBandwidthOrdering[stiffness["PatternArray"]]; (* now the matrix is banded. *) MatrixPlot[ stiffness[[r, c]]] (* this ordering is then the inverse ordering *) On["Packing"]; ord = Ordering[r]; AbsoluteTiming[ solution = LinearSolve[ stiffness[[r, c]], load[[r]], Method -> {"Krylov", Tolerance -> 10^-10.}][[ord]]; ] (* takes about 8 sec. *) (* and compared to the direct solver less memory *) In[47]:= Norm[load - stiffness.solution] Out[47]= 1.91458*10^-9 In[48]:= Norm[load - stiffness.solution]/Norm[load] Out[48]= 7.78699*10^-11 I hope this interesting. Oliver > (* visualize *) > Graphics3D[{EdgeForm[], > GraphicsComplex[ coords, Polygon[ surface ], > VertexColors -> (ColorData["TemperatureMap"][#] & /@ solution) ]}, > Boxed -> False, Background -> Gray ] > > > > I hope this helps to get you FEM code going. > > Oliver > > > >> Eric >> >> -----Original Message----- >> From: Oliver Ruebenkoenig [mailto:ruebenko at wolfram.com] >> Sent: Tuesday, November 30, 2010 6:24 AM >> To: mathgroup at smc.vnet.net >> Subject: [mg114315] Re: multiple outputs from compiled function >> >> On Tue, 30 Nov 2010, Eric Michielssen wrote: >> >>> I have a compiled function that internally generates 10 large lists (list1 >>> though list10), all of different ranks and sizes. Problem is that a >> compiled >>> function can only return one packed array, so returning them to the main >>> code is an issue. Of course, I could combine/pack all these lists into one >>> list by Flatten[list1,...,list10], return the new structure, and then >> unpack >>> it in the main code. Is there a less cumbersome way of doing this? >>> >>> Eric Michielssen >>> >>> >>> >>> >> Eric, >> >> Here are some thoughts: >> >> It is correct that compile can only work with rectangular tensors. See the >> following: >> >> f = Compile[{{t1, _Real, 2}, {t2, _Real, 2}}, Join[t1, t2]] >> >> f[RandomReal[{0, 1}, {2, 2}], RandomReal[{0, 1}, {2, 1}]] >> >> compared to: >> >> f[RandomReal[{0, 1}, {2, 2}], RandomReal[{0, 1}, {2, 2}]] >> >> But if the result of the base CompiledFunction is a list, and the list >> results come out different lengths, it returns a list of PackedArrays, >> >> cf = Compile[{{x, _Real}, {n, _Integer}}, >> Module[{y = 1.}, Table[y = (y + x/y)/2, {n}]], >> RuntimeAttributes -> Listable]; >> >> res = cf[{2., 3.}, {3, 2}] >> >> {Developer`PackedArrayQ[res], Map[Developer`PackedArrayQ, res]} >> >> resp = cf[{2., 3.}, 3] >> >> Developer`PackedArrayQ[resp] >> >> What do exactly in your case depends on the code. >> >> Hope this helps, >> Oliver >> >> > >