Creating huge matrix:how?

Questions or comments concerning matrices, import/export filters, basic calculations and matrix visualization can be submitted here.

Creating huge matrix:how?

Postby alexbtr » Wed Feb 10, 2010 9:16 pm

Hi,

I have a matrix of Double about 10.000*10.000 big (800 mb....) and i need to create it and then using it in an algorithm that will fill it one position at a time .
1)What's the best way of doing it? I did some research but looking at APIs and tutorial didn't help.
2)Assuming that I'll use some kind of swap file to do it, i noticed that CSV files are used, but doesn't plain text representation of a double lead to precision loss? How do you circumvent this?
3)do basic matrix operations like row-col product take advantage of multicore cpus?

Thanks in advance.
alexbtr
 
Posts: 3
Joined: Wed Feb 10, 2010 12:47 pm

Re: Creating huge matrix:how?

Postby arndt » Wed Feb 10, 2010 11:51 pm

I assume that your matrix is not sparse, you cannot use float instead of double, and that there is really no way to allocate the required memory to your JVM. In this case you should create a binary file on disk:

Code: Select all
Matrix m = new DenseFileMatrix2D(new File("myfile.dat"), 10000, 10000);


This will create a file with exactly 800MB with one double for every 8 Bytes. If possible, you should read and write the data starting from the first row to the last, otherwise seeking in the file will slow down performance.

CSV files are only supported for reading, and I guess this is related to your second question: Of course it is possible to convert a double without loss into a String, the same happens for BigDecimal in Java. But - and this is the problem if UJMP would try to write to a CSV file - all columns would have to have the same width in this case, to ensure that no data from the next column is overwritten, when a longer String must be stored. For reading, UJMP builds an index of the file, so the columns can have different widths, as is common in CSV files. Don't know if this has become clear.

Yes, the basic matrix operations plus, minus, scale, multiply and transpose can take advantage of multi-core CPUs. They do this for some matrix implementations when the matrix is sufficiently large, so that the overhead for creating threads is small compared to the performance gain.

Note however, that this will probably not bring a performance gain for your matrix on disk, as the HDD is the bottleneck. To prevent this, you can do:

Code: Select all
UJMPSettings.setNumberOfThreads(1);
Holger
arndt
Site Admin
 
Posts: 168
Joined: Mon Feb 02, 2009 7:02 pm
Location: Munich, Germany

Re: Creating huge matrix:how?

Postby alexbtr » Tue Feb 16, 2010 3:04 pm

thanks for your reply.
alexbtr
 
Posts: 3
Joined: Wed Feb 10, 2010 12:47 pm

Re: Creating huge matrix:how?

Postby alexbtr » Tue Feb 23, 2010 3:50 pm

it's me again!

I experimented with the library to check performances. Populating a dense double file-based matrix 1000*1000 took approx. 19 seconds.
This is quite weird as it gives me a write speed of about 1000*1000*8/19 = 420 kbyte/s which is really poor if compared to actual hard disks write speed (about 70 Mbyte/s). Reading from the same matrix was, on the other hand, quicker, as it took less than 1 second to sum all the elements of the matrix.
So i am here wondering if the write speed is normal or there is something i have not taken into account.

as a quick note:
- writing was done sequentially. row by row
- don't know if it's related but i am on mac osx leopard.

thx in advance
alexbtr
 
Posts: 3
Joined: Wed Feb 10, 2010 12:47 pm

Re: Creating huge matrix:how?

Postby arndt » Tue Feb 23, 2010 4:05 pm

Write speed can indeed be very low, as the data is written to disk entry by entry and not blockwise. Reading will be performed in blocks with additional caching which is much faster.

For writing, there should be some mechanism which marks whole blocks as "dirty" whenever data is changed, and a background thread to write the changes to disk. Anybody interested in implementing this? ;-)
Holger
arndt
Site Admin
 
Posts: 168
Joined: Mon Feb 02, 2009 7:02 pm
Location: Munich, Germany


Return to Universal Java Matrix Package

Who is online

Users browsing this forum: No registered users and 0 guests

cron