PaperDoll Parsing

figure

Clothing recognition is an extremely challenging problem due to wide variation in clothing item appearance, layering, and style. In this paper, we tackle the clothing parsing problem using a retrieval based approach. For a query image, we find similar styles from a large database of tagged fashion images and use these examples to effectively parse the query. Our approach combines: trained global models of clothing items, local clothing models learned on the fly from retrieved examples, and mask transfer (paper doll item transfer) from retrieved examples to the query. Experimental evaluation shows that our approach significantly outperforms state of the art in parsing accuracy.

Demo

You can try our online demo at clothingparsing.com .

We now have a better parser based on deep architecture .

Download

Download the latest release at Github

filename size description
fashionista-v0.2.1.tgz 156 MB Fashionista benchmark dataset v0.2 with parsing results of both CRF [Yamaguchi 2012] and Paper Doll. (April 2014)
paperdoll-v1.0.tgz 3.4 MB Initial release. Only codes are included. (Sep 2013)
models-v1.0.tar.00
models-v1.0.tar.01
models-v1.0.tar.02
models-v1.0.tar.03
models-v1.0.tar.04
models-v1.0.tar.05
models-v1.0.tar.06
models-v1.0.tar.07
models-v1.0.tar.08
models-v1.0.tar.09
models-v1.0.tar.10
models-v1.0.tar.11
models-v1.0.tar.12
models-v1.0.tar.13
models-v1.0.tar.14
5 GB each
Pre-trained models. Caution big size. (Oct 2013)
Use the cat command to extract.
cat models-v1.0.tar.* | tar xf -
MD5SUM
for i in 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14; do md5sum -b models-v1.0.tar.$i; done
3f14f5d90e4c3c3ce014311dce0df1bf *models-v1.0.tar.00
46bb5d046dc6f9a6e6cb3c9832ab4c6d *models-v1.0.tar.01
85f089dd4a589e02fe5da1fb16b7dbae *models-v1.0.tar.02
b0f0d18bd9ec13fbc6c63e0a1fd6356d *models-v1.0.tar.03
1b7838c2d4c8287f900992f3e7969f9c *models-v1.0.tar.04
5e7f9c7a87e3cc753b4508daa65c247a *models-v1.0.tar.05
e7ae269f42e1b7bdf30f9cac3b7ea62a *models-v1.0.tar.06
96c92e94ae179fd805f731da65636604 *models-v1.0.tar.07
b3c5f7a89a78a7dc60ee57641b6297e9 *models-v1.0.tar.08
0371ddec6c5ce04cf185f30cfd8e92ce *models-v1.0.tar.09
e9b7a90856b58d7d47f5f28902ccc561 *models-v1.0.tar.10
6ced6bf6292c3893cc4ba429ac4617b8 *models-v1.0.tar.11
57d4b0617d984c767b4617da2e44158f *models-v1.0.tar.12
1ee83b90fd49b0fe4310c89ceaf69a17 *models-v1.0.tar.13
7db0e3291730e53ffed526144c2c8e10 *models-v1.0.tar.14
data-v1.0.tar 228 MB All training data necessary to learn a model. Not needed to use a pre-trained model. (Sep 2013)

README

PaperDoll clothing parser

Unconstrained clothing parser for a full-body picture.

Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items
Kota Yamaguchi, M. Hadi Kiapour, Tamara L. Berg
ICCV 2013

This package only contains source codes. Download additional data files to use the parser or to run an experiment.

To parse a new image using a pre-trained models, only download the model file (Caution: ~70GB).

$ cd paperdoll-v1.0/
$ wget http://vision.cs.stonybrook.edu/~kyamagu/paperdoll/models-v1.0.tar
$ tar xvf models-v1.0.tar
$ rm models-v1.0.tar

To run an experiment from scratch, download the dataset.

$ cd paperdoll-v1.0/
$ wget http://vision.cs.stonybrook.edu/~kyamagu/paperdoll/data-v1.0.tar
$ tar xvf data-v1.0.tar
$ rm data-v1.0.tar

Contents

data/        Directory to place data.
lib/         Library directory.
log/         Log directory.
tasks/       Experimental scripts.
tmp/         Temporary data directory.
README.md    This file.
LICENSE.txt  Lincense notice.
make.m       Build script.
startup.m    Runtime initialization script.

Build

The software is designed and tested using Ubuntu 12.04.

The following are the prerequisites for clothing parser.

Also, to run all the experiments in the paper, it is required to have a computing grid with Sun Grid Engine (SGE) or compatible distributed environment. In Ubuntu, search for how to use grindengine package.

To install these requirements in Ubuntu,

$ apt-get install build-essential libcv-dev libcvaux-dev libdb-dev \
                  libboost-all-dev

In OS X with Macports,

$ port install opencv db53 boost

After installing prerequisites, the attached make.m script will compile all the necessary binaries within Matlab.

>> make

In OS X, probably it is necessary to pass additional flags.

>> make('-I/opt/local/include/db53', '-L/opt/local/lib/db53')

Runtime error

Depending on the Matlab installation, it is probably necessary to resolve conflicting library dependency. Use LD_PRELOAD environmental variable to prevent conflict at runtime. For example, in Ubuntu,

$ LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6:/lib/x86_64-linux-gnu/libgcc_s.so.1:/lib/x86_64-linux-gnu/libz.so.1 matlab -singleCompThread

To find a conflicting library, use ldd tool within Matlab and also from outside of Matlab, then compare the output. Append suspicious library to the LD_PRELOAD variable.

>> !ldd lib/mexopencv/+cv/imread.mex*
$ ldd lib/mexopencv/+cv/imread.mex*

In OS X, the variable is named DYLD_INSERT_LIBRARIES instead. The ldd equivalent is otool -L.

$ DYLD_INSERT_LIBRARIES=/opt/local/lib/libtiff.5.dylib matlab

Usage

Launch Matlab from the project root directory (i.e., paperdoll-v1.0/). This will automatically call startup to initialize necessary environment.

Run a pre-trained parser for a new image

>> load data/paperdoll_pipeline.mat config;
>> input_image = imread('/path/to/new_image.jpg');
>> input_sample = struct('image', imencode(input_image, 'jpg'));
>> result = feature_calculator.apply(config, input_sample)

The result is a struct with the following fields.

To get a per-pixel labeling, use imdecode. For example, the following example access the label of the pixel at (100, 100).

>> labeling = imdecode(result.final_labeling, 'png');
>> label = result.refined_labels{labeling(100, 100)};

To visualize the parsing result.

>> show_parsing(result.image, result.final_labeling, result.refined_labels);

TIPS

The pose estimator is set up to process roughly 600x400 pixels in the pre-trained model. Change the configuration by setting the image scaling parameter. Also, lower the threshold value if the pipeline throws an error in pose estimation.

config{1}.scale = 200; % Set the maximum image size in the pose estimator.
                                  % It is best to specify no larger than 200 pixels.
config{1}.model.thresh = -2; % Change the threshold value if pose estimation fails.

Run an experiment from scratch

Due to the copyright concern, we only provide image URLs in the PaperDoll dataset. We also provide a script to download images. Please note that some of the images might not be accessible at the provided URL since they might be deleted by users. Depending on the network connection, downloading images takes a day or more.

$ echo task100_download_paperdoll_photos | matlab -nodisplay

After getting training images, use tasks/paperdoll_main.sh to run an experiment from scratch. The script is designed to run on an SGE cluster environment with Ubuntu 12.04 and all the required libraries.

$ nohup ./tasks/paperdoll_main.sh < /dev/null > log/paperdoll_main.log 2>&1 &

Again, depending on the configuration, this can take a few days. Note that because of the randomness in some of the algorithms and also the data availability, we don't guarantee this reproduces the exact numbers reported in the paper. However, the resulting model should give a similar figure.

SGE cluster with Debian/Ubuntu

To build an SGE grid in Debian/Ubuntu, install the following packages.

Master

apt-get install gridengine-* default-jre

Clients

apt-get install gridengine-exec gridengine-client default-jre

See Documentation for configuration details. The qmon tool can be used to set up the environment. Sometimes it is necessary to change how the hostname is resolved in /etc/hosts.

Data format

data/fashionista_v0.2.mat

This file contains the Fashionista dataset from [Yamaguchi et. al. CVPR 2011] with ground truth annotation and also their parsing results in unconstrained parsing. The file contains three variables:

The sample struct has the following fields.

The pose annotation contains 14 points in image coordinates (x,y). The order of annotation is the following.

{...
    'right_ankle',...
    'right_knee',...
    'right_hip',...
    'left_hip',...
    'left_knee',...
    'left_ankle',...
    'right_hand',...
    'right_elbow',...
    'right_shoulder',...
    'left_shoulder',...
    'left_elbow',...
    'left_hand',...
    'neck',...
    'head'...
}

The clothing segmentation struct consists of the following fields.

To access per-pixel annotation of sample i,

segmentation = imdecode(truths(i).annotation.superpixel_map, 'png');
clothing_annotation = truhts(i).annotation.superpixel_labels(segmentation);

To get a label at pixel (100, 100),

label = truths(i).annotation.labels{clothing_annotation(100, 100)}

data/paperdoll_dataset.mat

The file contains two variables:

Each sample has the following fields.

To access tags of the sample i:

tags = labels(samples(i).tagging);

data/INRIA_data.mat

The file contains negative samples to train a pose estimator. There is one variable:

License

The PaperDoll codes are distributed under BSD license. However, some of the dependent libraries in lib/ might be protected by other license. Check each directory for detail.