Feeds:
Posts
Comments

CloudPresrevation.com includes a very powerful search capability, so that you can gather quite a bit of information about your archived websites and social media.

In this post, we’ll walk you through some tips and tricks for using dates and crawl times to isolate documents that appeared in a specific timeframe.

What you should know about document dates in CloudPreservation.com

CloudPreservation requests information from Twitter, LinkedIn, and Facebook via their respective APIs. Because of the structured and predictable nature of these APIs, CloudPreservation.com is able to store these dates in it’s database, as well as it’s search index.

Since web pages don’t provide a date posted in a predictable manner, CloudPreservation cannot determine what date pages are posted on. Therefore, CloudPreservation does not have any data in it’s database or index for the document date.

However, CloudPreservation does crawl web sites at a specified intervals, so you can use these intervals to determine when a page was added, changed or deleted. The accuracy of this method is determined by how frequently your crawl interval is configured.

What this means is that with CloudPreservation you can search by document date for Twitter, LinkedIn or Facebook posts, and you can search web pages by using crawl frequency ranges.

With that, let’s look at some common search scenarios and how you’d execute that using CloudPreservation’s powerful search functionality.

Show me my what my social media feed looked like on a certain date

Often times you’d like to see what your social media feed looks like on a specific date. In this case, what you’d like to tell CloudPreservation.com is: “Show me all posts on or before this date, exclude offsite links, and order by date in reverse-chronological order.”

Using a combination of the date range search condition and a document type condition, CloudPreservation can deliver you this information. So, if you’d like to see what your social media feed looked like on June 22nd, 2011, you could construct your search like so:

document_date:[1970-01-01 2011-06-22] AND NOT document_type:"Web Page"

Once you have your results, you can order by document date in descending order.

Show me all new pages added in the last crawl

Sometimes you just want to see everything that’s new in your feed since the last time it was crawled. To do that, select a crawl from the crawl list below the search text box. Once a crawl is selected, a checkbox will show that allows you to restrict the search to pages that were created in the selected crawl.

Your results then should reflect any new pages, or pages that have changed since the crawl previous to the one selected. You can optionally enter a search term to narrow the results here as well.

See this blog post on the feature for further information.

Show me pages that are in one crawl but not the other

Sometimes you’d like to see the complement of a crawl, to determine what’s been removed between crawls. In this case, we build the search syntax like so:

crawl:"My Web Site - 2011-04-27 - 2011-05-28" AND NOT crawl:"My Web Site - 2011-05-28 - 2011-06-28"

To find out what exactly to put inside the quotes as the crawl name, you can copy the name of the crawls you are interested in from the crawl list below the search text box.

Note: Minimizing duplicates in your web site crawls enhances this report greatly. You can work with Nextpoint to build a customized SmartCrawl, which can filter out irrelevant changes between documents from crawl to crawl.

Show me the history of a page

One other common task is looking at the history of a page within CloudPreservation. By looking at the history, you can see what changed, and, depending on the feed’s crawl frequency setting, get a timeframe for when the page was added, updated or deleted.

To get the history of a page you need to peform a search based on the url of the page.

web_addresses:"http://www.mywebsite.com/terms.html"

This will return all instances of this page that exist in CloudPreservation.com. You can view each of the results and see how the page has changed through time, get an idea of when it arrived on the site, or when it was removed from the site.

Note: Again, minimizing duplicates in your web site crawls enhances this report greatly.

Hopefully you’ll find these tips and tricks helpful when searching your feeds within CloudPreservation.com.

Enjoy!

Saved Search

Today we launch Saved Search for TrialCloud, DiscoveryCloud, and CloudPreservation, bringing you easily repeatable (and sharable) searches.

Execute a search to ensure your syntax is correct before clicking the Save Search icon.



Anyone can save a search for themselves.  Advanced users can publish searches to everyone (“public”).


At any time in the future, select a previously saved search to re-execute the search and view the updated results via the same Save Search icon.


Saved Search provides the convenience of repeatable searches in an easily shared form.  We hope you get a lot of mileage out of them.

Very soon we’ll be releasing a streamlined interface for browsing your documents we’re calling “Grid View.”

Sample grid view

Grid view

Grid View is going to be great, as it lets you get an overview of your documents in a more compact, easy to scan package — similar to how you can easily scan for information contained in a spreadsheet. Lining your documents’ data up like this makes it much easier to intuitively sort and browse too, so you can find just the data you need more effectively.

Don’t worry if you’ve grown attached to the older interface either, when Grid View launches you’ll find a link to “Classic View” prominently displayed at the top of any listing of documents in the application. If you do decide to use Classic View, we’ll remember and keep giving you your documents in the older style, no configuration necessary.

Look for Grid View to be released to all Trial Cloud and Discovery Cloud customers in the very near future soon.

The initial launch of S3 Folders brought a more convenient process of importing Document data to Trial Cloud and Discovery Cloud.  It’s role has already begun to evolve as it has expands to Depositions & Transcripts.

Depositions and Transcripts may now be sourced from a single file or zip.

Videos and syncfiles may be added via the Case Folder, providing a more convenient way to deal with these potentially very large files.

We’re looking to forward to seeing what doors Case Folders opens next for data transfer!

Importing File Option

Previously in TrialCloud and DiscoveryCloud, there were two distinct mechanisms to import documents.  First as a single file or document and second as a batch containing multiple files and an optional load file. With our last release and the inclusion of S3 Folders these methods have been combined to share a common interface called “Import Files” while maintaining and building upon the original functionality.  Here’s how it works.

After selecting the file(s) to import on the Import Files page, there is an option to process as a container file or as a native file.

Importing Container Files

Choose “Container Files” when uploading a single file or folder that is being used only as a means of organizing and uploading the files it contains.

Container Import Results

Information from load file will be applied

Only the contents of the file or folder will be processed, indexed, and included in the case or review, not the container file itself. If a container file or folder is found to contain a Nextpoint load file, that load file will be applied as part of the import.

Importing Native Files

Native File ImportChoose “Native Files” if you are uploading multiple files or folders, or if uploading a single zip file or single folder that is itself evidence. All files, including load files, found as part of processing the selected items will be processed as native files and the information in the load files will not be applied.  So be careful to use the Container Files option if you are looking to utilize a load file.

Native Results

Native files will be processed as if they are Evidence

These improvements, along with the addition of S3 Folders, have streamlined the Import process and provided more flexibility in importing files to your Nextpoint TrialCloud or DiscoveryCloud repositories.  As always, we welcome your comments and feedback.

Today Cloudpreservation.com is happy to announce archival functionality for the social photography site Flickr.com.  Flickr account holders are now able to automatically backup their Flickr photos and videos with Cloudpreservation.

The U.S. Food and Drug Administration's Flickr Profile

The U.S. Food and Drug Administration's Flickr Profile

Cloudpreservation offers two different options for archiving accounts: authenticated and public feeds.  Authenticated feeds archive all of a user’s photos, vidoes, profile information, contacts, comments, favorites and photosets.  When archiving a public feed, Cloudpreservation has access to only the profile information, contacts, favorites, photos and videos that is publicly available.  All of the public user’s photosets will be archived, but private photos within the photosets will not.  Public Flickr feeds do not include a user’s comments.

Example Archived Flickr Photoset

The U.S. Food and Drug Administration Archived Flickr Photoset: Recalled Products

When archiving Flickr photos, Cloudpreservation stores the highest resolution version of the file available as well as the metadata associated with it.  Exif data, tags, timestamp and licensing information are all archived and are easily searchable.

Example Flickr Photo with Data

Example Flickr Photo with Data

Cloudpreservation also stores social data from the Flickr website like comments and favorites.  This allows the documentation of social interaction with the added context of an image or video.

Example Archived Flickr Comments

Example Archived Flickr Comments

Currently, over 5 billion photos are stored at Flickr. It’s used by many companies and government agencies to store and promote their digital media.  We’re glad to be able to provide Flickr users with a way to archive their accounts and fulfill legal and compliance obligations.

Announcing S3 Folders

S3 Folders is an alternative to browser-based data import, allowing you to utilize a variety of  client-based uploading tools to transmit data to Discovery Cloud and Trial Cloud.  File size limitations are effectively removed, allowing you to upload large files (i.e. pst mailboxes) without the hassles associated with splitting them up.

Following upload to Amazon S3, files may be selected for import via the batch creation screen’s file-picker:

Load file formats traditionally supported in browser uploads continue to be supported via both browser upload and Case Folder selection.  Additionally, Case Folders supports the selection of loose files or directories containing loose files, making the upload process that much simpler.  Uploaded directory structures containing load files enjoy the extra benefit of easy correction and drop-in replacement of load files when issues are realized and remedied.

It’s an exciting development that we hope you’ll get a lot of mileage out of.

Follow

Get every new post delivered to your Inbox.