This post was originally posted on Cooper Hewitt Labs on Mar 29, 2012
Last month we released our collection data on Github.com. It was a pretty monumental occasion for the museum and we all worked very hard to make it happen. In an attempt to build a small example of what one might do with all of this data, we decided to build a new visualization of our collection in the form of the “Collection Wall Alpha.”
The idea behind the collection wall was simple enough–create a visual display of the objects in our collection that is fun and interactive. I thought about how we might accomplish this, what it would look like, and how much work it would be to get it done in a short amount of time. I thought about using our own .csv data, I tinkered, and played, and extracted, and extracted, and played some more. I realized quickly that the very data we were about to release required some thought to make it useful in practice. I probably over-thought.
After a short time, we found this lovely JQuery plugin called Isotope. Designed by David DeSandro, Isotope offers “an exquisite Jquery plugin of magical layouts.” And it does! I quickly realized we should just use this plugin to display a never-ending waterfall of collection objects, each with a thumbnail, and linked back to the records in our online collection database. Sounds easy enough, right?
Getting Isotope to work was pretty straight-forward. You simply create each item you want on the page, and add class identifiers to control how things are sorted and displayed. It has many options, and I picked the ones I thought would make the wall work.
Next I needed a way to reference the data, and I needed to produce the right subset of the data–the objects that actually have images! For this I decided to turn to Amazon’s SimpleDB. SimpleDB is pretty much exactly what it sounds like. It’s a super-simple to implement, scalable, non-relational database which requires no setup, configuration, or maintenance. I figured it would be the ideal place to store the data for this little project.
Once I had the data I was after, I used a tool called RazorSQL to upload the records to our SimpleDB domain. I then downloaded the AWS PHP SDK and used a few basic commands to query the data and populate the collection wall with images and data. Initially things were looking good, but I ran into a few problems. First, the data I was querying was over 16K rows tall. Thats allot of data to store in memory. Fortunately, SimpleDB is already designed with this issue in mind. By default, a call to SimpleDB only returns the first 100 rows ( you can override this up to 2500 rows ). The last element in the returned data is a special token key which you can then use to call the next 100 rows.
Using this in a loop one could easily see how to grab all 16K rows, but that sort of defeats the purpose as it still fills up the memory with the full 16K records. My next thought was to use paging, and essentially grab 100 rows at a time, per page. Isotope offers a pretty nifty “Infinite Scroll” configuration. I thought this would be ideal, allowing viewers to scroll through all 16K images. Once I got the infinite scroll feature to work, I realized that it is an issue once you page down 30 or 40 pages. So, I’m going to have to figure out a way to dump out the buffer, or something along those lines in a future release.
After about a month online, I noticed that SimpleDB charges were starting to add up. I haven’t really been able to figure out why. According to the docs, AWS only charges for “compute hours” which in my thinking should be much less than what I am seeing here. I’ll have to do some more digging on this one so we don’t break the bank!
Another issue I noticed was that we were going to be calling lots of thumbnail images directly from our collection servers. This didn’t seem like such a great idea, so I decided to upload them all to an Amazon S3 bucket. To make sure I got the correct images, I created simple php script that went through the 16K referenced images and automatically downloaded the correct resolution. It also auto-renamed each file to correspond with the record ID. Lastly, I set up an Amazon CloudFront CDN for the bucket, in hopes that this would speed up access to the images for users far and wide.
Overall I think this demonstrates just one possible outcome of our releasing of the collection meta-data. I have plans to add more features such as sorting and filtering in the near future, but it’s a start!
Check out the code after the jump ( a little rough, I know ).